linux

Commit Graph

Author	SHA1	Message	Date
Prike Liang	e2e9743578	drm/amdgpu: set the right AMDGPU sg segment limitation The driver needs to set the correct max_segment_size; otherwise debug_dma_map_sg() will complain about the over-mapping of the AMDGPU sg length as following: WARNING: CPU: 6 PID: 1964 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2dc/0x370 [ 364.049444] Modules linked in: veth amdgpu(OE) amdxcp drm_exec gpu_sched drm_buddy drm_ttm_helper ttm(OE) drm_suballoc_helper drm_display_helper drm_kms_helper i2c_algo_bit rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter br_netfilter nvme_fabrics overlay nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc amd_atl intel_rapl_msr intel_rapl_common sunrpc sch_fq_codel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg edac_mce_amd binfmt_misc snd_hda_codec snd_pci_acp6x snd_hda_core snd_acp_config snd_hwdep snd_soc_acpi kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 sha1_ssse3 aesni_intel snd_seq nls_iso8859_1 crypto_simd snd_seq_device cryptd snd_timer rapl input_leds snd [ 364.049532] ipmi_devintf wmi_bmof ccp serio_raw k10temp sp5100_tco soundcore ipmi_msghandler cm32181 industrialio mac_hid msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables pci_stub crc32_pclmul nvme ahci libahci i2c_piix4 r8169 nvme_core i2c_designware_pci realtek i2c_ccgx_ucsi video wmi hid_generic cdc_ether usbnet usbhid hid r8152 mii [ 364.049576] CPU: 6 PID: 1964 Comm: rocminfo Tainted: G OE 6.10.0-custom #492 [ 364.049579] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021 [ 364.049582] RIP: 0010:debug_dma_map_sg+0x2dc/0x370 [ 364.049585] Code: 89 4d b8 e8 36 b1 86 00 8b 4d b8 48 8b 55 b0 44 8b 45 a8 4c 8b 4d a0 48 89 c6 48 c7 c7 00 4b 74 bc 4c 89 4d b8 e8 b4 73 f3 ff <0f> 0b 4c 8b 4d b8 8b 15 c8 2c b8 01 85 d2 0f 85 ee fd ff ff 8b 05 [ 364.049588] RSP: 0018:ffff9ca600b57ac0 EFLAGS: 00010286 [ 364.049590] RAX: 0000000000000000 RBX: ffff88b7c132b0c8 RCX: 0000000000000027 [ 364.049592] RDX: ffff88bb0f521688 RSI: 0000000000000001 RDI: ffff88bb0f521680 [ 364.049594] RBP: ffff9ca600b57b20 R08: 000000000000006f R09: ffff9ca600b57930 [ 364.049596] R10: ffff9ca600b57928 R11: ffffffffbcb46328 R12: 0000000000000000 [ 364.049597] R13: 0000000000000001 R14: ffff88b7c19c0700 R15: ffff88b7c9059800 [ 364.049599] FS: 00007fb2d3516e80(0000) GS:ffff88bb0f500000(0000) knlGS:0000000000000000 [ 364.049601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 364.049603] CR2: 000055610bd03598 CR3: 00000001049f6000 CR4: 0000000000350ef0 [ 364.049605] Call Trace: [ 364.049607] <TASK> [ 364.049609] ? show_regs+0x6d/0x80 [ 364.049614] ? __warn+0x8c/0x140 [ 364.049618] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049621] ? report_bug+0x193/0x1a0 [ 364.049627] ? handle_bug+0x46/0x80 [ 364.049631] ? exc_invalid_op+0x1d/0x80 [ 364.049635] ? asm_exc_invalid_op+0x1f/0x30 [ 364.049642] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049647] __dma_map_sg_attrs+0x90/0xe0 [ 364.049651] dma_map_sgtable+0x25/0x40 [ 364.049654] amdgpu_bo_move+0x59a/0x850 [amdgpu] [ 364.049935] ? srso_return_thunk+0x5/0x5f [ 364.049939] ? amdgpu_ttm_tt_populate+0x5d/0xc0 [amdgpu] [ 364.050095] ttm_bo_handle_move_mem+0xc3/0x180 [ttm] [ 364.050103] ttm_bo_validate+0xc1/0x160 [ttm] [ 364.050108] ? amdgpu_ttm_tt_get_user_pages+0xe5/0x1b0 [amdgpu] [ 364.050263] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0xa12/0xc90 [amdgpu] [ 364.050473] kfd_ioctl_alloc_memory_of_gpu+0x16b/0x3b0 [amdgpu] [ 364.050680] kfd_ioctl+0x3c2/0x530 [amdgpu] [ 364.050866] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu] [ 364.051054] ? srso_return_thunk+0x5/0x5f [ 364.051057] ? tomoyo_file_ioctl+0x20/0x30 [ 364.051063] __x64_sys_ioctl+0x9c/0xd0 [ 364.051068] x64_sys_call+0x1219/0x20d0 [ 364.051073] do_syscall_64+0x51/0x120 [ 364.051077] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 364.051081] RIP: 0033:0x7fb2d2f1a94f Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:35:26 -05:00
Lijo Lazar	990c4f5807	drm/amdgpu: Fix DPX valid mode check on GC 9.4.3 For DPX mode, the number of memory partitions supported should be less than or equal to 2. Fixes: `1589c82a10` ("drm/amdgpu: Check memory ranges for valid xcp mode") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:34:06 -05:00
Srinivasan Shanmugam	949d817c78	drm/amdgpu/gfx11: Add cleaner shader for GFX11.0.3 This commit adds the cleaner shader microcode for GFX11.0.3 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_11_0_3_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:33:56 -05:00
Alex Deucher	e89bd3615b	drm/amdgpu/mes: fetch fw version from firmware header We need this prior to the firmware being loaded so fetch from the header. v2: fetch directly from the firmware v3: store both fw versions Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:33:39 -05:00
Thomas Weißschuh	b626816fdd	sysfs: treewide: constify attribute callback of bin_is_visible() The is_bin_visible() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-05 14:00:28 +01:00
Antonio Quartulli	a6dd15981c	drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported acpi_evaluate_object() may return AE_NOT_FOUND (failure), which would result in dereferencing buffer.pointer (obj) while being NULL. Although this case may be unrealistic for the current code, it is still better to protect against possible bugs. Bail out also when status is AE_NOT_FOUND. This fixes 1 FORWARD_NULL issue reported by Coverity Report: CID 1600951: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Fixes: `c9b7c809b8` ("drm/amd: Guard against bad data for ATIF ACPI method") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `91c9e221fe`) Cc: stable@vger.kernel.org	2024-11-04 12:48:21 -05:00
Lijo Lazar	e5ad71779d	drm/amdgpu: Add compatible NPS mode info Populate the compatible NPS modes also for providing partition configuration details through sysfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:23 -05:00
Lijo Lazar	81db4eab28	drm/amdgpu: Skip IP coredump for RAS errors For RAS errors, source of error is known. Skip the core dump of IP states. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:23 -05:00
Lijo Lazar	047767ddc9	drm/amdgpu: Group gfx sysfs functions Make amdgpu_gfx_sysfs_init/fini functions as common entry points for all gfx related sysfs nodes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:23 -05:00
Candice Li	12e5df81bb	drm/amdgpu: Add nps_mode in RAS init_flag Add nps_mode in RAS init_flag. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:23 -05:00
Jesse Zhang	d2e3961ae3	drm/amdgpu: add amdgpu_sdma_sched_mask debugfs Userspace wants to run jobs on a specific sdma ring for verification purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in the sdma ip. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:15 -05:00
Jesse Zhang	c5c63d9cb5	drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs compute/gfx may have multiple rings on some hardware. In some cases, userspace wants to run jobs on a specific ring for validation purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in the gfx/compute ip. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:05:53 -05:00
Prike Liang	b78612939d	drm/amdgpu: Fix dummy_read_page overlapping mappings Use the dma_map_page_attrs() with DMA_ATTR_SKIP_CPU_SYNC attribute setting to handle the dummy page overlapping mappings. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:05:30 -05:00
Victor Zhao	afe260df55	drm/amdgpu: skip amdgpu_device_cache_pci_state under sriov Under sriov, host driver will save and restore vf pci cfg space during reset. And during device init, under sriov, pci_restore_state happens after fullaccess released, and it can have race condition with mmio protection enable from host side leading to missing interrupts. So skip amdgpu_device_cache_pci_state for sriov. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:05:30 -05:00
Antonio Quartulli	91c9e221fe	drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported acpi_evaluate_object() may return AE_NOT_FOUND (failure), which would result in dereferencing buffer.pointer (obj) while being NULL. Although this case may be unrealistic for the current code, it is still better to protect against possible bugs. Bail out also when status is AE_NOT_FOUND. This fixes 1 FORWARD_NULL issue reported by Coverity Report: CID 1600951: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Fixes: `c9b7c809b8` ("drm/amd: Guard against bad data for ATIF ACPI method") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:05:16 -05:00
R Sundar	b95264cf75	drm/amdgpu: use string choice helpers Use string choice helpers for better readability. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Closes: https://lore.kernel.org/r/202410161814.I6p2Nnux-lkp@intel.com/ Signed-off-by: R Sundar <prosunofficial@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:33:24 -05:00
jeffbai@aosc.io	0174c0791c	drm/amdgpu: fix comment about amdgpu.abmlevel defaults Since `040fdcde28` ("drm/amdgpu: respect the abmlevel module parameter value if it is set"), the default value for amdgpu.abmlevel was set to -1, or auto. However, the comment explaining the default value was not updated to reflect the change (-1, or auto; not -1, or disabled). Clarify that the default value (-1) means auto. Fixes: `040fdcde28` ("drm/amdgpu: respect the abmlevel module parameter value if it is set") Reported-by: Ruikai Liu <rickliu2000@outlook.com> Signed-off-by: Mingcong Bai <jeffbai@aosc.io> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:32:57 -05:00
Tvrtko Ursulin	aa2ac51c8e	drm/amdgpu: Expose special on chip memory pools in fdinfo In the past these specialized on chip memory pools were reported as system memory (aka 'cpu') which was not correct and misleading. That has since been removed so lets make them visible as their own respective memory regions. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:32:52 -05:00
Tvrtko Ursulin	cd3037f3fc	drm/amdgpu: Stop reporting special chip memory pools as CPU memory in fdinfo So far these specialized on chip memory pools were reported as system memory (aka 'cpu') which is not correct and misleading. Lets remove that and consider later making them visible as their own thing. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:32:47 -05:00
Yunxiang Li	fdee0872a2	drm/amdgpu: stop tracking visible memory stats Since on modern systems all of vram can be made visible anyways, to simplify the new implementation, drops tracking how much memory is visible for now. If this is really needed we can add it back on top of the new implementation, or just report all the BOs as visible. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:32:40 -05:00
Yunxiang Li	f286365038	drm/amdgpu: make drm-memory-* report resident memory The old behavior reports the resident memory usage for this key and the documentation say so as well. However this was accidentally changed to include buffers that was evicted. Fixes: `04bdba4654` ("drm/amdgpu: Use drm_print_memory_stats helper from fdinfo") Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:30:58 -05:00
Li Huafei	a1144da794	drm/amdgpu: Fix the memory allocation issue in amdgpu_discovery_get_nps_info() Fix two issues with memory allocation in amdgpu_discovery_get_nps_info() for mem_ranges: - Add a check for allocation failure to avoid dereferencing a null pointer. - As suggested by Christophe, use kvcalloc() for memory allocation, which checks for multiplication overflow. Additionally, assign the output parameters nps_type and range_cnt after the kvcalloc() call to prevent modifying the output parameters in case of an error return. Fixes: `b194d21b9b` ("drm/amdgpu: Use NPS ranges from discovery table") Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:30:28 -05:00
Alex Deucher	35984fd4a0	drm/amdgpu: add ring reset messages Add messages to make it clear when a per ring reset happens. This is helpful for debugging and aligns with other reset methods. v2: add ring name in success/fail messages (Lijo) Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:29:49 -05:00
Alex Deucher	efe6a87743	drm/amdgpu: fix fairness in enforce isolation handling Make sure KFD gets a turn when serializing access to the GC IP. Currently non-KFD jobs can starve KFD if they submit often enough. This patch prevents that by stalling non-KFD if its time period has elapsed. v2: fix units v3: check enablement properly Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:27:04 -05:00
Alex Deucher	8fe7cf58ff	drm/amdkfd: add an interface to query whether is KFD is active Add an interface to query whether KFD has any active queues. v2: fix build issues Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:25:42 -05:00
Al Viro	6348be02ee	fdget(), trivial conversions fdget() is the first thing done in scope, all matching fdput() are immediately followed by leaving the scope. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-11-03 01:28:06 -05:00
Dave Airlie	8a07b2623e	Merge tag 'drm-misc-next-2024-10-31' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.13: All of the previous pull request, with MORE! Core Changes: - Update documentation for scheduler start/stop and job init. - Add dedede and sm8350-hdk hardware to ci runs. Driver Changes: - Small fixes and cleanups to panfrost, omap, nouveau, ivpu, zynqmp, v3d, panthor docs, and leadtek-ltk050h3146w. - Crashdump support for qaic. - Support DP compliance in zynqmp. - Add Samsung S6E88A0-AMS427AP24 panel. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/deeef745-f3fb-4e85-a9d0-e8d38d43c1cf@linux.intel.com	2024-11-01 13:46:03 +10:00
Dave Airlie	e7103f8785	amd-drm-next-6.13-2024-10-25: amdgpu: - SDMA queue reset support - SMU 13.0.6 updates - Add debugfs interface to help limit jpeg queue scheduling for testing - JPEG 4.0.3 updates - Initial runtime repartitioning support - GFX9 fixes - Misc code cleanups - Rework IP structures to better handle multiple instances of an IP - DML updates - DSC fixes - HDR fixes - Brightness control updates - Runtime pm cleanup - DMCUB fixes - DCN 3.5 updates - Struct drm_edid cleanup - Fetch EDID from _DDC if available - Ring noop optimizations - MES logging fixes - 3DLUT fixes - DCN 4.x fixes - SMU 13.x fixes - Fixes for set_soft_freq_range() - ACPI fixes - SMU 14.x updates - PSR-SU fixes - fdinfo cleanup - DCN documentation updates amdkfd: - Misc code cleanups - Increase event FIFO size - Copy wave state fixes for SDMA radeon: - Fix possible overflow in packet3 check - Late init connector fix - Always set GEM function pointer Documentation: - Update drm-memory documentation -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZxua4QAKCRC93/aFa7yZ 2C/TAQC3PZqI36hkKOPwdcbFq2ydK1r3xiG7Q60K0PxpTnsqKQEAuF1MEuTXfamv mVqZfJuqF3wWXzoqM190qf3947f0eQk= =MSZa -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.13-2024-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.13-2024-10-25: amdgpu: - SDMA queue reset support - SMU 13.0.6 updates - Add debugfs interface to help limit jpeg queue scheduling for testing - JPEG 4.0.3 updates - Initial runtime repartitioning support - GFX9 fixes - Misc code cleanups - Rework IP structures to better handle multiple instances of an IP - DML updates - DSC fixes - HDR fixes - Brightness control updates - Runtime pm cleanup - DMCUB fixes - DCN 3.5 updates - Struct drm_edid cleanup - Fetch EDID from _DDC if available - Ring noop optimizations - MES logging fixes - 3DLUT fixes - DCN 4.x fixes - SMU 13.x fixes - Fixes for set_soft_freq_range() - ACPI fixes - SMU 14.x updates - PSR-SU fixes - fdinfo cleanup - DCN documentation updates amdkfd: - Misc code cleanups - Increase event FIFO size - Copy wave state fixes for SDMA radeon: - Fix possible overflow in packet3 check - Late init connector fix - Always set GEM function pointer Documentation: - Update drm-memory documentation From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241025132336.2416913-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-10-29 18:25:24 +10:00
Yang Wang	7daa0f6b28	drm/amdgpu: optimize ACA log print - skip to print CE ACA log. - optimize ACA log print for MCA. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:41:26 -04:00
Le Ma	ea9d8863da	drm/amdgpu: add generic func to check if ta fw is applicable Separated xgmi ta is required for specific APU, and driver needs parse the ta binary properly with aux xgmi ta packed. v2: make the check function more generic (Lijo) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:41:13 -04:00
Prike Liang	d5e3d8a2a6	drm/amdgpu: clean up the suspend_complete To check the status of S3 suspend completion, use the PM core pm_suspend_global_flags bit(1) to detect S3 abort events. Therefore, clean up the AMDGPU driver's private flag suspend_complete. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:40:58 -04:00
Prike Liang	58a8c756fc	drm/amdgpu: correct the S3 abort check condition In the normal S3 entry, the TOS cycle counter is not reset during BIOS execution the _S3 method, so it doesn't determine whether the _S3 method is executed exactly. Howerver, the PM core performs the S3 suspend will set the PM_SUSPEND_FLAG_FW_RESUME bit if all the devices suspend successfully. Therefore, drivers can check the pm_suspend_global_flags bit(1) to detect the S3 suspend abort event. Fixes: `6704dbf719` ("drm/amdgpu: update suspend status for aborting from deeper suspend") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:39:23 -04:00
Christian König	57e92d991e	drm/amdgpu: drop volatile from ring buffer Volatile only prevents the compiler from re-ordering reads and writes. Since we always only modify the ring buffer from one CPU thread and have an explicit barrier before signaling the HW this should have no effect at all and just prevents compiler optimisations. While at it drop the local variables as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:32:03 -04:00
Dan Carpenter	dac64cb3e0	drm/amdgpu: Fix amdgpu_ip_block_hw_fini() This NULL check is reversed so the function doesn't work. Fixes: `dad01f93f4` ("drm/amdgpu: validate hw_fini before function call") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/f4fc849e-4e76-4448-8657-caa4c69910b0@stanley.mountain Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:07:10 -04:00
Kent Russell	3c0be69bad	amdgpu: Don't print L2 status if there's nothing to print If a 2nd fault comes in before the 1st is handled, the 1st fault will clear out the FAULT STATUS registers before the 2nd fault is handled. Thus we get a lot of zeroes. If status=0, just skip the L2 fault status information, to avoid confusion of why some VM fault status prints in dmesg are all zeroes. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:06:51 -04:00
Jonathan Kim	e46738a58f	drm/amdkfd: sever xgmi io link if host driver has disable sharing Host drivers can create partial hives per guest by disabling xgmi sharing between certain peers in the main hive. Typically, these partial hives are fully connected per guest session. In the event that the host makes a mistake by adding a non-shared node to a guest session, have the KFD reflect sharing disabled by severing the IO link. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: James Yao <yiqing.yao@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:06:34 -04:00
Lang Yu	46186667f9	drm/amdgpu: refine error handling in amdgpu_ttm_tt_pin_userptr Free sg table when dma_map_sgtable() failed to avoid memory leak. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:06:24 -04:00
Lijo Lazar	d37bc6a4ed	drm/amdgpu: Fix the logic for NPS request failure On a hive, NPS request is placed by the first one for all devices in the hive. If the request fails, mark the mode as UNKNOWN so that subsequent devices on unload don't request it. Also, fix the mutex double lock issue in error condition, should have been mutex_unlock. Fixes: `ee52489d12` ("drm/amdgpu: Place NPS mode request on unload") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:04:59 -04:00
YiPeng Chai	3d0ffc6418	drm/amdgpu: Reduce redundant gpu resets on nbio v7.4 On nbio v7.4, ras controller interrupt and athub interrupt are generated after injecting UE to PCIE, but gpu reset only needs to be triggered once. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:04:34 -04:00
Frank Min	108bc59fe8	drm/amdgpu: fix random data corruption for sdma 7 There is random data corruption caused by const fill, this is caused by write compression mode not correctly configured. So correct compression mode for const fill. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `75400f8d6e`) Cc: stable@vger.kernel.org # 6.11.x	2024-10-22 18:11:43 -04:00
Mario Limonciello	bf58f03931	drm/amd: Guard against bad data for ATIF ACPI method If a BIOS provides bad data in response to an ATIF method call this causes a NULL pointer dereference in the caller. ``` ? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1)) ? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434) ? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2)) ? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1)) ? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642) ? exc_page_fault (arch/x86/mm/fault.c:1542) ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu ``` It has been encountered on at least one system, so guard for it. Fixes: `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c9b7c809b8`) Cc: stable@vger.kernel.org	2024-10-22 18:08:12 -04:00
Prike Liang	32e7ee293f	drm/amdgpu: Dereference the ATCS ACPI buffer Need to dereference the atcs acpi buffer after the method is executed, otherwise it will result in a memory leak. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:51:19 -04:00
Lijo Lazar	591aec150a	drm/amdgpu: Save VCN shared memory with init reset VCN shared memory is in framebuffer and there are some flags initialized during sw_init. Ideally, such programming should be during hw_init. Make sure the flags are saved during reset on initialization since that reset will affect frame buffer region. For clarity, separate it out to another function. Fixes: `1e4acf4d93` ("drm/amdgpu: Add reset on init handler for XGMI") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Hao Zhou <hao.zhou@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:51:19 -04:00
Sunil Khatri	971d8e1c3f	drm/amdgpu: clean unused functions of uvd/vcn/vce Some of the functions pointers of amdgpu_ip_funcs are not used and are left commented out. Hence this cleans those up which arent used. Cc: Leo Liu <leo.liu@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:51:19 -04:00
Victor Lu	8b22f04833	drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts for vega20_ih Port this change to vega20_ih.c: commit `afbf7955ff` ("drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts") Original commit message: "Why: Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit if RB_ENABLE is not set. How to fix: Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set. The RB_ENABLE bit is required to be set, together with WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit would clear the RB_OVERFLOW." Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:40 -04:00
Sunil Khatri	0016e87054	drm/amdgpu: Clean the functions pointer set as NULL We dont need to set the functions to NULL which arent needed as global structure members are by default set to zero or NULL for pointers. Cc: Leo Liu <leo.liu@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	8231e3af96	drm/amdgpu: clean the dummy soft_reset functions Remove the dummy soft_reset functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	f13c7da118	drm/amdgpu: clean the dummy wait_for_idle functions Remove the dummy wait_for_idle functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	aa980de3b5	drm/amdgpu: clean the dummy suspend functions Remove the dummy suspend functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	fbcd0ad5d1	drm/amdgpu: clean the dummy resume functions Remove the dummy resume functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	780002b654	drm/amdgpu: validate wait_for_idle before function call Before making a function call to wait_for_idle, validate the function pointer like we do in sw_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	502d76308d	drm/amdgpu: validate resume before function call Before making a function call to resume, validate the function pointer like we do in sw_init. Use the helper function amdgpu_ip_block_resume where same checks and calls are repeated. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	e095026f00	drm/amdgpu: validate suspend before function call Before making a function call to suspend, validate the function pointer like we do in sw_init. Use the helper function amdgpu_ip_block_suspend where same checks and calls are repeated. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	dad01f93f4	drm/amdgpu: validate hw_fini before function call Before making a function call to hw_fini, validate the function pointer like we do in sw_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Srinivasan Shanmugam	9343b904e7	drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.2 This commit adds the cleaner shader microcode for GFX9.4.2 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_9_4_2_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. Also, this patch updates the `gfx_v9_0_sw_init` function to initialize the cleaner shader if the MEC firmware version is 88 or higher. It sets the `cleaner_shader_ptr` and `cleaner_shader_size` to the appropriate values and attempts to initialize the cleaner shader. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This change ensures that the GPU memory is properly cleared between different processes, preventing data leakage and enhancing security. It also aligns with the serialization mechanism between KGD and KFD, ensuring that the GPU state is consistent across different workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Frank Min	c379dcf797	drm/amdgpu: fix typo for sdma6 constant fill packet Fix typo for sdma6 constant fill packet Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:38 -04:00
Frank Min	75400f8d6e	drm/amdgpu: fix random data corruption for sdma 7 There is random data corruption caused by const fill, this is caused by write compression mode not correctly configured. So correct compression mode for const fill. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:38 -04:00
Sunil Khatri	5ebdb6fd60	drm/amdgpu: clean the dummy sw_fini functions Remove the dummy sw_fini functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Lijo Lazar	785504dd7f	drm/amdgpu: Use SPX as default in partition config In certain cases - ex: when a reset is required on initialization - XCP manager won't have a valid partition mode. In such cases, use SPX as the default selected mode for which partition configuration details are populated. Fixes: `4ae86dc878` ("drm/amdgpu: Add sysfs nodes to get xcp details") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Hao Zhou <hao.zhou@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Sunil Khatri	278b8fbf06	drm/amdgpu: validate sw_fini before function call Before making a function call to sw_fini, validate the function pointer like we do in sw_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Sunil Khatri	7fd12379bd	drm/amdgpu: clean the dummy sw_init functions Remove the dummy sw_init functions for all IP blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Sunil Khatri	df6e463d8f	drm/amdgpu: validate sw_init before function call Before making a function call to sw_init, validate the function pointer like we do in late_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Xiaogang Chen	10112bf828	drm/amdkfd: Not restore userptr buffer if kfd process has been removed When kfd process has been terminated not restore userptr buffer after mmu notifier invalidates a range. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:12 -04:00
Lijo Lazar	8e3a3e847e	drm/amdgpu: Zero-initialize mqd backup memory Zero-initialize mqd backup memory, otherwise the check for 'already-backed-up' could go wrong. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:12 -04:00
Alex Deucher	32f0028969	Revert "drm/amdgpu/gfx9: put queue resets behind a debug option" This reverts commit `7c1a2d8aba`. Extended validation has completed successfully, so enable these features by default. Acked-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jonathan Kim <jonathan.kim@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>	2024-10-22 17:50:11 -04:00
Zhu Lingshan	9ee8ab245c	drm/amdgpu: init saw registers for mmhub v1.0 This commits init registers in the Stand Along Walker for mmhub v1.0, to support ISP use cases. Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Reported-and-tested-by: Du Bin <bin.du@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:49:38 -04:00
Alex Deucher	d2f57b6d89	drm/amdgpu/discovery: add ISP discovery entries for old APUs Raven1/2 and Picasso have ISP 2.0.0, however their ISP blocks are not in the IP discovery table yet. This commit fixes this issue by adding new ISP entries for Raven and Picasso in the IP discovery table. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:49:30 -04:00
Mario Limonciello	c9b7c809b8	drm/amd: Guard against bad data for ATIF ACPI method If a BIOS provides bad data in response to an ATIF method call this causes a NULL pointer dereference in the caller. ``` ? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1)) ? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434) ? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2)) ? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1)) ? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642) ? exc_page_fault (arch/x86/mm/fault.c:1542) ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu ``` It has been encountered on at least one system, so guard for it. Fixes: `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:22:44 -04:00
Thomas Zimmermann	1f828b4dd4	drm/client: Make client support optional Only build client code if DRM_CLIENT has been selected. Automatially do so if one of the default clients has been enabled. If client support has been disabled, the helpers for client-related events are empty and the regular client functions are not present. Amdgpu has an internal DRM client, so it has to select DRM_CLIENT by itself unconditionally. v3: - provide empty drm_client_debugfs_init() if DRM_CLIENT=n (kernel test robot) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241014085740.582287-12-tzimmermann@suse.de	2024-10-18 09:23:03 +02:00
Thomas Zimmermann	4cf50bae05	drm/amdgpu: Suspend and resume internal clients with client helpers Replace calls to drm_fb_helper_set_suspend_unlocked() with calls to the client functions drm_client_dev_suspend() and drm_client_dev_resume(). Any registered in-kernel client will now receive suspend and resume events. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241014085740.582287-9-tzimmermann@suse.de	2024-10-18 09:23:03 +02:00
Srinivasan Shanmugam	e7457532cb	drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ring This patch addresses a double unlock issue in the amdgpu_mes_add_ring function. The mutex was being unlocked twice under certain error conditions, which could lead to undefined behavior. The fix ensures that the mutex is unlocked only once before jumping to the clean_up_memory label. The unlock operation is moved to just before the goto statement within the conditional block that checks the return value of amdgpu_ring_init. This prevents the second unlock attempt after the clean_up_memory label, which is no longer necessary as the mutex is already unlocked by this point in the code flow. This change resolves the potential double unlock and maintains the correct mutex handling throughout the function. Fixes below: Commit `d0c423b647` ("drm/amdgpu/mes: use ring for kernel queue submission"), leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring() warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213) drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 1143 int amdgpu_mes_add_ring(struct amdgpu_device adev, int gang_id, 1144 int queue_type, int idx, 1145 struct amdgpu_mes_ctx_data ctx_data, 1146 struct amdgpu_ring *out) 1147 { 1148 struct amdgpu_ring ring; 1149 struct amdgpu_mes_gang gang; 1150 struct amdgpu_mes_queue_properties qprops = {0}; 1151 int r, queue_id, pasid; 1152 1153 / 1154 * Avoid taking any other locks under MES lock to avoid circular 1155 * lock dependencies. 1156 / 1157 amdgpu_mes_lock(&adev->mes); 1158 gang = idr_find(&adev->mes.gang_id_idr, gang_id); 1159 if (!gang) { 1160 DRM_ERROR("gang id %d doesn't exist\n", gang_id); 1161 amdgpu_mes_unlock(&adev->mes); 1162 return -EINVAL; 1163 } 1164 pasid = gang->process->pasid; 1165 1166 ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL); 1167 if (!ring) { 1168 amdgpu_mes_unlock(&adev->mes); 1169 return -ENOMEM; 1170 } 1171 1172 ring->ring_obj = NULL; 1173 ring->use_doorbell = true; 1174 ring->is_mes_queue = true; 1175 ring->mes_ctx = ctx_data; 1176 ring->idx = idx; 1177 ring->no_scheduler = true; 1178 1179 if (queue_type == AMDGPU_RING_TYPE_COMPUTE) { 1180 int offset = offsetof(struct amdgpu_mes_ctx_meta_data, 1181 compute[ring->idx].mec_hpd); 1182 ring->eop_gpu_addr = 1183 amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset); 1184 } 1185 1186 switch (queue_type) { 1187 case AMDGPU_RING_TYPE_GFX: 1188 ring->funcs = adev->gfx.gfx_ring[0].funcs; 1189 ring->me = adev->gfx.gfx_ring[0].me; 1190 ring->pipe = adev->gfx.gfx_ring[0].pipe; 1191 break; 1192 case AMDGPU_RING_TYPE_COMPUTE: 1193 ring->funcs = adev->gfx.compute_ring[0].funcs; 1194 ring->me = adev->gfx.compute_ring[0].me; 1195 ring->pipe = adev->gfx.compute_ring[0].pipe; 1196 break; 1197 case AMDGPU_RING_TYPE_SDMA: 1198 ring->funcs = adev->sdma.instance[0].ring.funcs; 1199 break; 1200 default: 1201 BUG(); 1202 } 1203 1204 r = amdgpu_ring_init(adev, ring, 1024, NULL, 0, 1205 AMDGPU_RING_PRIO_DEFAULT, NULL); 1206 if (r) 1207 goto clean_up_memory; 1208 1209 amdgpu_mes_ring_to_queue_props(adev, ring, &qprops); 1210 1211 dma_fence_wait(gang->process->vm->last_update, false); 1212 dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false); 1213 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1214 1215 r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id); 1216 if (r) 1217 goto clean_up_ring; ^^^^^^^^^^^^^^^^^^ 1218 1219 ring->hw_queue_id = queue_id; 1220 ring->doorbell_index = qprops.doorbell_off; 1221 1222 if (queue_type == AMDGPU_RING_TYPE_GFX) 1223 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id); 1224 else if (queue_type == AMDGPU_RING_TYPE_COMPUTE) 1225 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id, 1226 queue_id); 1227 else if (queue_type == AMDGPU_RING_TYPE_SDMA) 1228 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id, 1229 queue_id); 1230 else 1231 BUG(); 1232 1233 out = ring; 1234 return 0; 1235 1236 clean_up_ring: 1237 amdgpu_ring_fini(ring); 1238 clean_up_memory: 1239 kfree(ring); --> 1240 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1241 return r; 1242 } Fixes: `d0c423b647` ("drm/amdgpu/mes: use ring for kernel queue submission") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Reported by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `bfaf188360`)	2024-10-15 11:49:08 -04:00
Michael Chen	7760d7f93c	drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipes With Unified MES enabled in gfx12, need separate event log buffer for the 2 MES pipes to avoid data overwrite. Signed-off-by: Michael Chen <michael.chen@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `144df260f3`) Cc: stable@vger.kernel.org # 6.11.x	2024-10-15 11:48:36 -04:00
Mohammed Anees	c0ec082f10	drm/amdgpu: prevent BO_HANDLES error from being overwritten Before this patch, if multiple BO_HANDLES chunks were submitted, the error -EINVAL would be correctly set but could be overwritten by the return value from amdgpu_cs_p1_bo_handles(). This patch ensures that if there are multiple BO_HANDLES, we stop. Fixes: `fec5f8e8c6` ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit") Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `40f2cd9882`) Cc: stable@vger.kernel.org	2024-10-15 11:48:05 -04:00
Alex Deucher	d2c72d96df	drm/amdgpu: enable enforce_isolation sysfs node on VFs It should be enabled on both bare metal and VFs. Fixes: `e189be9b2e` ("drm/amdgpu: Add enforce_isolation sysfs attribute") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> (cherry picked from commit `dc8847b054`)	2024-10-15 11:47:41 -04:00
Dan Carpenter	9f7e94af35	drm/amdgpu: Fix off by one in current_memory_partition_show() The >= ARRAY_SIZE() should be > ARRAY_SIZE() to prevent an out of bounds read. Fixes: `012be6f22c` ("drm/amdgpu: Add sysfs interfaces for NPS mode") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:26:35 -04:00
Lijo Lazar	d25d26b8a8	drm/amdgpu: Wait for reset on init completion When reset on initialization is requested, wait for the reset to finish. In cases where module is loaded after boot, this makes sure all initialization work is done after a successful return of modprobe. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ramesh Errabolu <ramesh.errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:22:26 -04:00
Srinivasan Shanmugam	bfaf188360	drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ring This patch addresses a double unlock issue in the amdgpu_mes_add_ring function. The mutex was being unlocked twice under certain error conditions, which could lead to undefined behavior. The fix ensures that the mutex is unlocked only once before jumping to the clean_up_memory label. The unlock operation is moved to just before the goto statement within the conditional block that checks the return value of amdgpu_ring_init. This prevents the second unlock attempt after the clean_up_memory label, which is no longer necessary as the mutex is already unlocked by this point in the code flow. This change resolves the potential double unlock and maintains the correct mutex handling throughout the function. Fixes below: Commit `d0c423b647` ("drm/amdgpu/mes: use ring for kernel queue submission"), leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring() warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213) drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 1143 int amdgpu_mes_add_ring(struct amdgpu_device adev, int gang_id, 1144 int queue_type, int idx, 1145 struct amdgpu_mes_ctx_data ctx_data, 1146 struct amdgpu_ring *out) 1147 { 1148 struct amdgpu_ring ring; 1149 struct amdgpu_mes_gang gang; 1150 struct amdgpu_mes_queue_properties qprops = {0}; 1151 int r, queue_id, pasid; 1152 1153 / 1154 * Avoid taking any other locks under MES lock to avoid circular 1155 * lock dependencies. 1156 / 1157 amdgpu_mes_lock(&adev->mes); 1158 gang = idr_find(&adev->mes.gang_id_idr, gang_id); 1159 if (!gang) { 1160 DRM_ERROR("gang id %d doesn't exist\n", gang_id); 1161 amdgpu_mes_unlock(&adev->mes); 1162 return -EINVAL; 1163 } 1164 pasid = gang->process->pasid; 1165 1166 ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL); 1167 if (!ring) { 1168 amdgpu_mes_unlock(&adev->mes); 1169 return -ENOMEM; 1170 } 1171 1172 ring->ring_obj = NULL; 1173 ring->use_doorbell = true; 1174 ring->is_mes_queue = true; 1175 ring->mes_ctx = ctx_data; 1176 ring->idx = idx; 1177 ring->no_scheduler = true; 1178 1179 if (queue_type == AMDGPU_RING_TYPE_COMPUTE) { 1180 int offset = offsetof(struct amdgpu_mes_ctx_meta_data, 1181 compute[ring->idx].mec_hpd); 1182 ring->eop_gpu_addr = 1183 amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset); 1184 } 1185 1186 switch (queue_type) { 1187 case AMDGPU_RING_TYPE_GFX: 1188 ring->funcs = adev->gfx.gfx_ring[0].funcs; 1189 ring->me = adev->gfx.gfx_ring[0].me; 1190 ring->pipe = adev->gfx.gfx_ring[0].pipe; 1191 break; 1192 case AMDGPU_RING_TYPE_COMPUTE: 1193 ring->funcs = adev->gfx.compute_ring[0].funcs; 1194 ring->me = adev->gfx.compute_ring[0].me; 1195 ring->pipe = adev->gfx.compute_ring[0].pipe; 1196 break; 1197 case AMDGPU_RING_TYPE_SDMA: 1198 ring->funcs = adev->sdma.instance[0].ring.funcs; 1199 break; 1200 default: 1201 BUG(); 1202 } 1203 1204 r = amdgpu_ring_init(adev, ring, 1024, NULL, 0, 1205 AMDGPU_RING_PRIO_DEFAULT, NULL); 1206 if (r) 1207 goto clean_up_memory; 1208 1209 amdgpu_mes_ring_to_queue_props(adev, ring, &qprops); 1210 1211 dma_fence_wait(gang->process->vm->last_update, false); 1212 dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false); 1213 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1214 1215 r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id); 1216 if (r) 1217 goto clean_up_ring; ^^^^^^^^^^^^^^^^^^ 1218 1219 ring->hw_queue_id = queue_id; 1220 ring->doorbell_index = qprops.doorbell_off; 1221 1222 if (queue_type == AMDGPU_RING_TYPE_GFX) 1223 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id); 1224 else if (queue_type == AMDGPU_RING_TYPE_COMPUTE) 1225 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id, 1226 queue_id); 1227 else if (queue_type == AMDGPU_RING_TYPE_SDMA) 1228 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id, 1229 queue_id); 1230 else 1231 BUG(); 1232 1233 out = ring; 1234 return 0; 1235 1236 clean_up_ring: 1237 amdgpu_ring_fini(ring); 1238 clean_up_memory: 1239 kfree(ring); --> 1240 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1241 return r; 1242 } Fixes: `d0c423b647` ("drm/amdgpu/mes: use ring for kernel queue submission") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Reported by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:21:31 -04:00
Michael Chen	144df260f3	drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipes With Unified MES enabled in gfx12, need separate event log buffer for the 2 MES pipes to avoid data overwrite. Signed-off-by: Michael Chen <michael.chen@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:21:08 -04:00
Lijo Lazar	f8588f051d	drm/amdgpu: Show current compute partition on VF Enable sysfs node for current compute partition mode on VFs also. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Tested-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:20:32 -04:00
Lijo Lazar	b3c6871692	drm/amdgpu: Fetch NPS mode for GCv9.4.3 VFs Use the memory ranges published in discovery table to deduce NPS mode of GC v9.4.3 VFs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Tested-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:19:46 -04:00
Mohammed Anees	40f2cd9882	drm/amdgpu: prevent BO_HANDLES error from being overwritten Before this patch, if multiple BO_HANDLES chunks were submitted, the error -EINVAL would be correctly set but could be overwritten by the return value from amdgpu_cs_p1_bo_handles(). This patch ensures that if there are multiple BO_HANDLES, we stop. Fixes: `fec5f8e8c6` ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit") Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:18:49 -04:00
Alex Deucher	dc8847b054	drm/amdgpu: enable enforce_isolation sysfs node on VFs It should be enabled on both bare metal and VFs. Fixes: `e189be9b2e` ("drm/amdgpu: Add enforce_isolation sysfs attribute") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2024-10-15 11:17:32 -04:00
Lijo Lazar	c29aeadf0b	drm/amdgpu: Add NPS switch support for GC 9.4.3 Add dynamic NPS switch support for GC 9.4.3 variants. Only GC v9.4.3 and GC v9.4.4 currently support this. NPS switch is only supported if an SOC supports multiple NPS modes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:17:25 -04:00
Srinivasan Shanmugam	d594ddc686	drm/amdgpu/gfx12: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v12_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:17:16 -04:00
Sunil Khatri	f83fc3abd5	drm/amdgpu: optimize fn gfx_v12_ring_insert_nop Optimize gfx_v12_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:17:09 -04:00
Sunil Khatri	950dcb0158	drm/amdgpu: optimize fn gfx_v11_ring_insert_nop Optimize gfx_v11_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:17:03 -04:00
Sunil Khatri	6aa902938b	drm/amdgpu: optimize fn gfx_v10_ring_insert_nop Optimize gfx_v10_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:57 -04:00
Sunil Khatri	1537638ae3	drm/amdgpu: optimize fn gfx_v9_ring_insert_nop Optimize gfx_v9_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:52 -04:00
Sunil Khatri	a23575bb3c	drm/amdgpu: optimize fn gfx_v9_4_3_ring_insert_nop Optimize gfx_v9_4_3_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:46 -04:00
Sunil Khatri	ea4e4754c9	drm/amdgpu: optimize insert_nop using multi dwords Optimize the ring_insert_nop fn for n dwords in one step rather then call to amdgpu_ring_write for each nop packet. This avoid function call for each nop packet and also wptr is updated once only. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:40 -04:00
Lijo Lazar	ed3dac4bf9	drm/amdgpu: Check gmc requirement for reset on init Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:32 -04:00
Lijo Lazar	ee52489d12	drm/amdgpu: Place NPS mode request on unload If a user has requested NPS mode switch, place the request through PSP during unload of the driver. For devices which are part of a hive, all requests are placed together. If one of them fails, revert back to the current NPS mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:20 -04:00
Thomas Zimmermann	ea1d2a38fb	drm/amdgpu: Use video aperture helpers DRM's aperture functions have long been implemented as helpers under drivers/video/ for use with fbdev. Avoid the DRM wrappers by calling the video functions directly. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Acked-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930130921.689876-2-tzimmermann@suse.de	2024-10-14 15:28:47 +02:00
Dave Airlie	fc4d262721	amd-drm-fixes-6.12-2024-10-08: amdgpu: - Fix invalid UBSAN warnings - Fix artifacts in MPO transitions - Hibernation fix amdkfd: - Fix an eviction fence leak radeon: - Add late register for connectors - Always set GEM function pointers -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZwVAzQAKCRC93/aFa7yZ 2JP2AQC/n4RMsATvyJ0iWNL7R9XGNLi6B6NryaZStd/iYh8RlgD9FUZ/S3svF8kQ lwRxw61x7+0vCVBOSCM/jyt270oYqwY= =pGmT -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.12-2024-10-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.12-2024-10-08: amdgpu: - Fix invalid UBSAN warnings - Fix artifacts in MPO transitions - Hibernation fix amdkfd: - Fix an eviction fence leak radeon: - Add late register for connectors - Always set GEM function pointers Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008142831.3739244-1-alexander.deucher@amd.com	2024-10-09 16:31:16 +10:00
Dave Airlie	54bc1d3255	Merge tag 'drm-misc-next-2024-09-26' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.13: UAPI Changes: - panthor: Add realtime group priority and priority query. Cross-subsystem Changes: - Add Vivek Kasireddy as udmabuf maintainer. - Assorted udmabuf changes. - Device tree binding updates. - dmabuf documentation fixes. - Move drm_rect to drm core module from kms helper. Core Changes: - Update scheduler documentation and concurrency fixes. - drm/ci updates. - Add memory-agnostic fbdev client and client-agnostic setup helper. - Huge driver conversion for using the above. Driver Changes: - Assorted fixes to imx, panel/nt35510, sti, accel/ivpu, v3d, vkms, host1x. - Add panel quirks for AYA NEO panels. - Make module autoloading work for bridge/it6505 and mcde. - Add huge page support to v3d using a custom shmfs. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a9b95e6f-9f35-464e-83f6-bda75b35ee0b@linux.intel.com	2024-10-09 11:58:39 +10:00
Dave Airlie	7fefa1edc2	Merge tag 'drm-misc-next-2024-09-20' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: - Add panthor/DEV_QUERY_TIMESTAMP_INFO query. Cross-subsystem Changes: - Updated dt bindings. - Add documentation explaining default errnos for fences. - Mark dma-buf heaps creation functions as __init. Core Changes: - Split DSC helpers from DP helpers. - Clang build fixes for drm/mm test. - Remove simple pipeline support for gem-vram, no longer any users left after converting bochs. - Add erno to drm_sched_start to distinguish between GPU and queue reset. - Add drm_framebuffer testcases. - Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=n. - Use read_trylock instead of read_lock in dma_fence_begin_signalling to quiesce lockdep. Driver Changes: - Assorted small fixes and updates for tegra, host1x, imagination, nouveau, panfrost, panthor, panel/ili9341, mali, exynos, panel/samsung-s6e3fa7, ast, bridge/ti-sn65dsi86, panel/himax-hx83112a, bridge/tc358767, bridge/imx8mp-hdmi-tx, panel/khadas-ts050, panel/nt36523, panel/sony-acx565akm, kmb, accel/qaic, omap, v3d. - Add bridge/TI TDP158. - Assorted documentation updates. - Convert bochs from simple drm to gem shmem, and check modes against available memory. - Many VC4 fixes, most related to scaling and YUV support. - Convert some drivers to use SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS. - Rockchip 4k@60 support. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/445713a6-2427-4c53-8ec2-3a894ec62405@linux.intel.com	2024-10-09 09:03:46 +10:00
Sunil Khatri	555cd714bd	drm/amdgpu: no need to log error in multi ring write No need to log error in multi ring write as its taken care during ring commit. This is inline with change done in amdgpu_ring_write. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:46:23 -04:00
Sunil Khatri	ccc0a18748	drm/amdgpu: move error log from ring write to commit Move the error message from ring write as an optimization to avoid printing that message on every write instead print once during commit if it exceeds write the allocated size i.e ring->count_dw. Also we do not want to log the error message in between a ring write and complete the write as its mostly not harmful as it will overwrite stale data only as GPU read from ring is faster than CPU write to ring. This reduces the size of amdgpu.ko module by around 600 Kb as write is very often used function and hence the print. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:46:15 -04:00
Andrew Kreimer	16445e408c	drm/amdgpu: fix typos Fix typos in comments: "wether -> whether". Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:48 -04:00
Tvrtko Ursulin	89cfa73b61	drm/amdgpu: Remove the while loop from amdgpu_job_prepare_job While loop makes it sound like amdgpu_vmid_grab() potentially needs to be called multiple times to produce a fence, while in reality all code paths either return an error, assign a valid job->vmid or assign a vmid which will be valid once the returned fence signals. Therefore we can remove the loop to make it clear the call does not need to be repeated. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:43 -04:00
Tvrtko Ursulin	871f44b4ba	drm/amdgpu: Drop impossible condition from amdgpu_job_prepare_job Fence has been initialised to NULL so no need to test it. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:39 -04:00
Tvrtko Ursulin	04bdba4654	drm/amdgpu: Use drm_print_memory_stats helper from fdinfo Convert fdinfo memory stats to use the common drm_print_memory_stats helper. This achieves alignment with the common keys as documented in drm-usage-stats.rst, adding specifically drm-total- key the driver was missing until now. Additionally I made the code stop skipping total size for objects which currently do not have a backing store, and I added resident, active and purgeable reporting. Legacy keys have been preserved, with the outlook of only potentially removing only the drm-memory- when the time gets right. The example output now looks like this: pos: 0 flags: 02100002 mnt_id: 24 ino: 1239 drm-driver: amdgpu drm-client-id: 4 drm-pdev: 0000:04:00.0 pasid: 32771 drm-total-cpu: 0 drm-shared-cpu: 0 drm-active-cpu: 0 drm-resident-cpu: 0 drm-purgeable-cpu: 0 drm-total-gtt: 2392 KiB drm-shared-gtt: 0 drm-active-gtt: 0 drm-resident-gtt: 2392 KiB drm-purgeable-gtt: 0 drm-total-vram: 44564 KiB drm-shared-vram: 31952 KiB drm-active-vram: 0 drm-resident-vram: 44564 KiB drm-purgeable-vram: 0 drm-memory-vram: 44564 KiB drm-memory-gtt: 2392 KiB drm-memory-cpu: 0 KiB amd-memory-visible-vram: 44564 KiB amd-evicted-vram: 0 KiB amd-evicted-visible-vram: 0 KiB amd-requested-vram: 44564 KiB amd-requested-visible-vram: 11952 KiB amd-requested-gtt: 2392 KiB drm-engine-compute: 46464671 ns v2: * Track purgeable via AMDGPU_GEM_CREATE_DISCARDABLE. Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Rob Clark <robdclark@chromium.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:25 -04:00
Tvrtko Ursulin	fc282e9e86	drm/amdgpu: Drop unused fence argument from amdgpu_vmid_grab_used Fence argument is unused so lets drop it. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:17 -04:00
Lang Yu	d7d7b947a4	drm/amdkfd: Fix an eviction fence leak Only creating a new reference for each process instead of each VM. Fixes: `9a1c1339ab` ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5fa4362894`) Cc: stable@vger.kernel.org	2024-10-07 14:53:23 -04:00
Lijo Lazar	012be6f22c	drm/amdgpu: Add sysfs interfaces for NPS mode Add a sysfs interface to see available NPS modes to switch to - cat /sys/bus/pci/devices/../available_memory_paritition Make the current_memory_partition sysfs node read/write for requesting a new NPS mode. The request is only cached and at a later point a driver unload/reload is required to switch to the new NPS mode. Ex: echo NPS1 > /sys/bus/pci/devices/../current_memory_paritition echo NPS4 > /sys/bus/pci/devices/../current_memory_paritition The above interfaces will be available only if the SOC supports more than one NPS mode. Also modify the current memory partition sysfs logic to be more generic. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:32:23 -04:00
Lijo Lazar	bbc160084e	drm/amdgpu: Add gmc interface to request NPS mode Add a common interface in GMC to request NPS mode through PSP. Also add a variable in hive and gmc control to track the last requested mode. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:32:23 -04:00
Srinivasan Shanmugam	b1cf3ddcc3	drm/amdgpu/gfx10: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v10_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:32:23 -04:00
Rajneesh Bhardwaj	212cc24119	drm/amdgpu: Add PSP interface for NPS switch Implement PSP ring command interface for memory partitioning on the fly on the supported asics. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:32:00 -04:00
Srinivasan Shanmugam	fbca196953	drm/amdgpu/gfx11: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v11_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:09:34 -04:00
Srinivasan Shanmugam	e7cee54595	drm/amdgpu/gfx12: Implement cleaner shader support for GFX12 hardware This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v12_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v12_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:09:28 -04:00
Colin Ian King	1845752b2f	drm/amdgpu: Fix spelling mistake "initializtion" -> "initialization" There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:09:10 -04:00
Srinivasan Shanmugam	8fc279e5e3	drm/amdgpu/gfx11: Implement cleaner shader support for GFX11 hardware The patch modifies the gfx_v11_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v11_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v11_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v11_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:08:56 -04:00
Sunil Khatri	7e6487ab21	drm/amdgpu: change the comment from handle to ip_block htmldoc generation depend upon the input arguments etc to generate the document. After update of handle to ip_block then update needs in comments too to fix the warnings. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410021904.YyGjlpk9-lkp@intel.com Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:08:35 -04:00
Srinivasan Shanmugam	2d5f74a867	drm/amdgpu/gfx10: Implement cleaner shader support for GFX10 hardware The patch modifies the gfx_v10_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v10_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v10_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v10_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:08:29 -04:00
Lang Yu	5fa4362894	drm/amdkfd: Fix an eviction fence leak Only creating a new reference for each process instead of each VM. Fixes: `9a1c1339ab` ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:03:52 -04:00
Sunil Khatri	692d2cd180	drm/amdgpu: update the handle ptr in hw_fini Update the *handle to amdgpu_ip_block ptr for all functions pointers of hw_fini. Also update the ip_block ptr where ever needed as there were cyclic dependency of hw_fini on suspend and some followed clean up. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:03:25 -04:00
Sunil Khatri	58608034ed	drm/amdgpu: update the handle ptr in hw_init Update the *handle to amdgpu_ip_block ptr for all functions pointers of hw_init. Also update the ip_block ptr where ever needed as there were cyclic dependency of hw_init on resume. v2: squash in isp fix Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:03:25 -04:00
Sunil Khatri	7feb4f3ad8	drm/amdgpu: update the handle ptr in resume Update the *handle to amdgpu_ip_block ptr for all functions pointers of resume. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:02:50 -04:00
Sunil Khatri	982d7f9bfe	drm/amdgpu: update the handle ptr in suspend Update the *handle to amdgpu_ip_block ptr for all functions pointers of suspend. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:02:45 -04:00
Sunil Khatri	82ae6619a4	drm/amdgpu: update the handle ptr in wait_for_idle Update the *handle to amdgpu_ip_block ptr for all functions pointers of wait_for_idle. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:02:36 -04:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Sunil Khatri	e15ec812b5	drm/amdgpu: update the handle ptr in post_soft_reset Update the *handle to amdgpu_ip_block ptr for all functions pointers of post_soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:45:51 -04:00
Sunil Khatri	0ef2a1e7af	drm/amdgpu: update the handle ptr in soft_reset Update the *handle to amdgpu_ip_block ptr for all functions pointers of soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:45:44 -04:00
Srinivasan Shanmugam	e47cb9d253	drm/amdgpu/gfx9: Add Cleaner Shader Deinitialization in gfx_v9_0 Module This commit addresses an omission in the previous patch related to the cleaner shader support for GFX9 hardware. Specifically, it adds the necessary deinitialization code for the cleaner shader in the gfx_v9_0_sw_fini function. The added line amdgpu_gfx_cleaner_shader_sw_fini(adev); ensures that any allocated resources for the cleaner shader are freed correctly, avoiding potential memory leaks and ensuring that the GPU state is clean for the next initialization sequence. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: `c2e70d307f` ("drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware") Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:44:47 -04:00
Sunil Khatri	9d5ee7ce88	drm/amdgpu: update the handle ptr in pre_soft_reset Update the *handle to amdgpu_ip_block ptr for all functions pointers of pre_soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:44:41 -04:00
Lijo Lazar	f0b919960d	drm/amdgpu: Fix logic to determine TOS reload Avoid comparing TOS version on APUs. On APUs driver doesn't take care of TOS load. Fixes: `0ff3822613` ("drm/amdgpu: Add interface for TOS reload cases") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Rajneesh Bhardwaj <Rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:43:50 -04:00
Sunil Khatri	6a9456e0e3	drm/amdgpu: update the handle ptr in check_soft_reset Update the *handle to amdgpu_ip_block ptr for all functions pointers of check_soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:43:45 -04:00
Sunil Khatri	94b2e07ad4	drm/amdgpu: update the handle ptr in prepare_suspend Update the *handle to amdgpu_ip_block ptr for all functions pointers of prepare_suspend. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:43:38 -04:00
Sunil Khatri	47d827f9c7	drm/amdgpu: update the handle ptr in late_fini Update the *handle to amdgpu_ip_block ptr for all functions pointers of late_fini. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:43:33 -04:00
Sunil Khatri	904c402e97	drm/amdgpu: remove the dummy fn acp_early_init acp_early_init is a dummy function and is not being used and hence removed. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:42:53 -04:00
Mario Limonciello	b472b8d829	drm/amd: Taint the kernel when enabling overdrive Some distributions have been patching amdgpu to enable overdrive by default which may compromise stability. Furthermore when bug reports are brought upstream it's not obvious that the system has been tampered with. When overdrive is enabled taint the kernel and leave a critical message in the logs for users so that it's obvious in a bug report it's been tampered with. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:41:21 -04:00
Srinivasan Shanmugam	a443852f85	drm/amdkfd: Fix kdoc entry for 'get_wave_count()' function parameters Update kdoc entries to reflect the function's parameters. The descriptor for the 'queue_cnt' parameter has been added, and the incorrect mentions of 'wave_cnt' and 'vmid', which are not parameters but local variables, have been removed. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:954: warning: Function parameter or struct member 'queue_cnt' not described in 'get_wave_count' drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:954: warning: Excess function parameter 'wave_cnt' description in 'get_wave_count' drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:954: warning: Excess function parameter 'vmid' description in 'get_wave_count' Cc: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Cc: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:56 -04:00
Sunil Khatri	90410d3996	drm/amdgpu: update the handle ptr in early_fini Update the *handle to amdgpu_ip_block ptr for all functions pointers of early_fini. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:49 -04:00
Sunil Khatri	36aa9ab9c0	drm/amdgpu: update the handle ptr in sw_fini update the *handle to amdgpu_ip_block ptr for all functions pointers of sw_fini. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:43 -04:00
Sunil Khatri	d5347e8d27	drm/amdgpu: update the handle ptr in sw_init update the *handle to amdgpu_ip_block ptr for all functions pointers of sw_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:37 -04:00
Sunil Khatri	3138ab2c5b	drm/amdgpu: update the handle ptr in late_init Update the ptr handle to amdgpu_ip_block ptr in all the functions of late_init function ptr. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:31 -04:00
Sunil Khatri	146b085ead	drm/amdgpu: update the handle ptr in early_init update the handle ptr to amdgpu_ip_block ptr for all functions pointers on early_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:22 -04:00
Asad Kamal	1007264254	drm/amdgpu: Add supported partition mode node Add sysfs node to show supported partition modes across all NPS modes Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:15 -04:00
Lijo Lazar	fcd91a95df	drm/amdgpu: Add option to refresh NPS data In certain use cases, NPS data needs to be refreshed again from discovery table. Add API parameter to refresh NPS data from discovery table. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:07 -04:00
Jiadong Zhu	5682cd86d6	drm/amdgpu/sdma5.2: implement ring reset callback for sdma5.2 Implement sdma queue reset callback via MMIO. v2: enter/exit safemode for mmio queue reset. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:39:57 -04:00
YuanShang	1fd7c37e3f	drm/amdgpu: Flush tlb by VM_INVALIDATION packet in sdma_v5_2 In order for SDMA not to be switched between VM_INVALIDATION request and ack, use an single VM_INVALIDATION packet in function sdma_v5_2_ring_emit_vm_flush. Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-By: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:39:38 -04:00
Jiadong Zhu	64acf8f69e	drm/amdgpu/sdma5.2: split out per instance resume function Extract the resume sequence from sdma_v5_2_gfx_resume for starting/restarting an individual instance. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:38:44 -04:00
Jiadong Zhu	5fbba6bb98	drm/amdgpu/sdma5: implement ring reset callback for sdma5 Implement sdma queue reset callback via MMIO. v2: enter/exit safemode when sdma queue reset. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:38:38 -04:00
Sunil Khatri	d60e78bdef	drm/amdgpu: update the handle ptr in print_ip_state Update the ptr handle to amdgpu_ip_block ptr in all the functions affected. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:36:17 -04:00
Lijo Lazar	4ae86dc878	drm/amdgpu: Add sysfs nodes to get xcp details Add partition config nodes in sysfs to get resource instance details for a particular partition mode. A resource could be anything like an xcc, vcn decoder, system dma units etc. Details of various resource instances are available under /sys/bus/pci/devices/.../compute_partition_config/ Select a partition configuration: /sys/bus/pci/devices/.../compute_partition_config/xcp_config Number of instances of a resource: /sys/bus/pci/devices/.../compute_partition_config/<rsrc_name>/num_inst Total partitions sharing the resource: /sys/bus/pci/devices/.../compute_partition_config/<rsrc_name>/num_shared v2: Update node name as per spec Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:28:56 -04:00
Sunil Khatri	fa73462dc0	drm/amdgpu: update the handle ptr in dump_ip_state Update the ptr handle to amdgpu_ip_block ptr in all the functions. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:28:51 -04:00
Jiadong Zhu	94daae9744	drm/amdgpu/sdma5: split out per instance resume function Extract the resume sequence from sdma_v5_0_gfx_resume for starting/restarting an individual instance. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:28:43 -04:00
Linus Torvalds	994aeacbb3	drm fixes for 6.12-rc1 i915: - Fix BMG support to UHBR13.5 - Two PSR fixes - Fix colorimetry detection for DP xe - Fix macro for checking minimum GuC version - Fix CCS offset calculation for some BMG SKUs - Fix locking on memory usage reporting via fdinfo and BO destroy - Fix GPU page fault handler on a closed VM - Fix overflow in oa batch buffer amdgpu: - MES 12 fix - KFD fence sync fix - SR-IOV fixes - VCN 4.0.6 fix - SDMA 7.x fix - Bump driver version to note cleared VRAM support - SWSMU fix amdgpu: - CU occupancy logic fix - SDMA queue fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmb3QQoACgkQDHTzWXnE hr7+yQ//dBbBOdxwWcQA6N4p5KOQdhMecU1LaGkv0nbV6ft4mxhHq2XSczr7DMJM C56cgTplw4Lajfo0Q/gwgoLcgK4EzRc3kReb8gC9diyZ+zgolZbS5uSnJsS3IsST xriP2sb+lsPH6UAqtUZABMA6aOE7VYmJnRZlo0tOulRRYSnX++gTPQIi2PprVzBh jFlFmLABCqvZ5md0ux8NITzRpE2sODuawKTpdXoTMTVsrXF+YBtRaJD170eC4mj1 3JDmsY90TpvHWri4BHQ98VqJBpzLiIU8COQHaZab2cfV+yH+KfUo2puH1RS4swW5 gbrOAbK/OXzoX+6aT1rYuDihcrX5+88MZovhRW7Ik0dEm5Ysl8PRlRkftuMa2mg9 tUJjjUfmGDf9eiYCUt7/BuDguN2lc+r/TOM4F+2kmB0dxDkYn3u1W95DxseoiLHt Sq/M2sWm9p/TjDC9XW+vy9dfuoucEyQfdiPqKP27BheckCGF1SskLFW+oZCq3iF9 0RJsvpwQBSxsLR0/oJok9cxmSAhpZoUiV0zKuqCcP+OTIFI4urKujom/XrJIjayU fg0vaXzPd9crzSZX1rqF8/UDx8uV4uf4IHD9MNrCYIXpiVJHWzx0afU1AE5576F5 sT335W/nG6BHsrV/PIRR62v3QU0yLkjQv6VbWqJwMZumuQ2x/iI= =r1M/ -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Regular fixes for the week to end the merge window, i915 and xe have a few each, amdgpu makes up most of it with a bunch of SR-IOV related fixes amongst others. i915: - Fix BMG support to UHBR13.5 - Two PSR fixes - Fix colorimetry detection for DP xe: - Fix macro for checking minimum GuC version - Fix CCS offset calculation for some BMG SKUs - Fix locking on memory usage reporting via fdinfo and BO destroy - Fix GPU page fault handler on a closed VM - Fix overflow in oa batch buffer amdgpu: - MES 12 fix - KFD fence sync fix - SR-IOV fixes - VCN 4.0.6 fix - SDMA 7.x fix - Bump driver version to note cleared VRAM support - SWSMU fix - CU occupancy logic fix - SDMA queue fix" * tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel: (79 commits) drm/amd/pm: update workload mask after the setting drm/amdgpu: bump driver version for cleared VRAM drm/amdgpu: fix vbios fetching for SR-IOV drm/amdgpu: fix PTE copy corruption for sdma 7 drm/amdkfd: Add SDMA queue quantum support for GFX12 drm/amdgpu/vcn: enable AV1 on both instances drm/amdkfd: Fix CU occupancy for GFX 9.4.3 drm/amdkfd: Update logic for CU occupancy calculations drm/amdgpu: skip coredump after job timeout in SRIOV drm/amdgpu: sync to KFD fences before clearing PTEs drm/amdgpu/mes12: set enable_level_process_quantum_check drm/i915/dp: Fix colorimetry detection drm/amdgpu/mes12: reduce timeout drm/amdgpu/mes11: reduce timeout drm/amdgpu: use GEM references instead of TTMs v2 drm/amd/display: Allow backlight to go below `AMDGPU_DM_DEFAULT_MIN_BACKLIGHT` drm/amd/display: Fix kdoc entry for 'tps' in 'dc_process_dmub_dpia_set_tps_notification' drm/amdgpu: update golden regs for gfx12 drm/amdgpu: clean up vbios fetching code drm/amd/display: handle nulled pipe context in DCE110's set_drr() ...	2024-09-28 08:47:46 -07:00
Lijo Lazar	c75c5285e5	drm/amdgpu: Add PSP reload case to reset-on-init A reset on initialization will be needed if a new PSP TOS needs to be loaded than the one currently active on the system. This is possible only on SOCs which support a full device reset which results in unload of active PSP TOS. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:07:16 -04:00
Lijo Lazar	0ff3822613	drm/amdgpu: Add interface for TOS reload cases Add interface to check if a different TOS needs to be loaded than the one which is which is already active on the SOC. Presently the interface is restricted to specific variants of PSPv13.0. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:07:07 -04:00
Lijo Lazar	c4f00312c1	drm/amdgpu: Support reset-on-init on select SOCs Add XGMI reset on init support to aldebaran and SOCs with GC v9.4.3. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:58 -04:00
Lijo Lazar	2accf9d683	drm/amdgpu: Drop delayed reset work handler Drop delayed reset work handler as it is no longer used. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:53 -04:00
Lijo Lazar	631af731ee	drm/amdgpu: Refactor XGMI reset on init handling Use XGMI hive information to rely on resetting XGMI devices on initialization rather than using mgpu structure. mgpu structure may have other devices as well. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <feifxu@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:46 -04:00
Lijo Lazar	b17f87329d	drm/amdgpu: Add helper to initialize badpage info Add a separate function to read badpage data during initialization. Reading bad pages will need hardware access and cannot be done during reset. Hence in cases where device needs a full reset during init itself, attempting to read will cause a deadlock. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:38 -04:00
Dr. David Alan Gilbert	0ee2399116	drm/amdgpu: Remove unused amdgpu_i2c functions amdgpu_i2c_add and amdgpu_i2c_init were added in 2015's commit `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") but never used. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:36 -04:00
Dr. David Alan Gilbert	9d7a8bdb90	drm/amdgpu: Remove unused amdgpu_gfx_bit_to_me_queue amdgpu_gfx_bit_to_me_queue has been unused since it was added in commit `7470bfcf20` ("drm/amdgpu: add helper function for gfx queue/bitmap transition") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:33 -04:00
Dr. David Alan Gilbert	1e10c12263	drm/amdgpu: Remove unused amdgpu_gmc_vram_cpu_pa amdgpu_gmc_vram_cpu_pa has been unused since commit `087451f372` ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:29 -04:00
Dr. David Alan Gilbert	6e261ecbb2	drm/amdgpu: Remove unused amdgpu_atpx functions amdgpu_atpx_dgpu_req_power_for_displays has been unused since commit `bdb1ccb080` ("drm/amdgpu: remove ATPX_DGPU_REQ_POWER_FOR_DISPLAYS check when hotplug-in") amdgpu_atpx_get_dhandle has been unused since commit `f9b7f3703f` ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:24 -04:00
Dr. David Alan Gilbert	632aac6299	drm/amdgpu: Remove unused amdgpu_device_ip_is_idle amdgpu_device_ip_is_idle is unused. It was renamed from 'amdgpu_is_idle' which was originally added in commit `5dbbb60ba6` ("drm/amdgpu: add IP helpers for wait_for_idle and is_idle") but hasn't been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:19 -04:00
Lijo Lazar	1e4acf4d93	drm/amdgpu: Add reset on init handler for XGMI In some cases, device needs to be reset before first use. Add handlers for doing device reset during driver init sequence. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <feifxu@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:19 -04:00
Lijo Lazar	f501057aff	drm/amdgpu: Add callback get xcp resource info Add a callback interface to get the resource information of a partition mode. Presently the information has number of resources and number of entities sharing the resource. Add the implementation for aquavanjaram SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Lijo Lazar	1bc0b33915	drm/amd: Add helper to get partition config modes Add helper to get supported/available partition config modes Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
WangYuli	307b4ab7ba	drm/amdgpu: Fix typo "acccess" and improve the comment style here There are some spelling mistakes of 'acccess' in comments which should be instead of 'access'. And the comment style should be like this: /* * Text * Text */ Suggested-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/all/f75fbe30-528e-404f-97e4-854d27d7a401@amd.com/ Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/all/0c768bf6-bc19-43de-a30b-ff5e3ddfd0b3@suse.de/ Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Alex Deucher	b1281b6d55	drm/amdgpu/gfx9: Explicitly halt CP before init Need to make sure it's halted as we don't know what state the GPU may have been left in previously. Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Alex Deucher	993fcc40ae	drm/amdgpu/gfx9: set additional bits on CP halt Need to set the pipe reset and cache invalidation bits on halt otherwise we can get stale state if the CP firmware changes (e.g., on module unload and reload). Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Sunil Khatri	37b993225d	drm/amdgpu: add amdgpu_device reference in ip block To handle amdgpu_device reference for different GPUs we add it's reference in each ip block which can be used to differentiate between difference gpu devices. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Lijo Lazar	6e37ae8b08	drm/amdgpu: Separate reinitialization after reset Move the reinitialization part after a reset to another function. No functional changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Tim Huang	381ec8161d	drm/amdgpu: check return for setting engine dram timings This resolves the unchecded return value warning reported by Coverity. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Lijo Lazar	5839d27d5b	drm/amdgpu: Use init level for pending_reset flag Drop pending_reset flag in gmc block. Instead use init level to determine which type of init is preferred - in this case MINIMAL. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
YiPeng Chai	9e0feb7946	amd/amdgpu: Reduce unnecessary repetitive GPU resets In multiple GPUs case, after a GPU has started resetting all GPUs on hive, other GPUs do not need to trigger GPU reset again. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Lijo Lazar	14f2fe34f5	drm/amdgpu: Add init levels Add init levels to define the level to which device needs to be initialized. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Jane Jian	8a84d2a472	drm/amdgpu: Remove unneeded write in JPEG v4.0.3 HDP_DEBUG1(offset = 0x3fbc) is no longer functional, remove the redundant write. Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Lijo Lazar	8c50bf9beb	drm/amdgpu: Fix JPEG v4.0.3 register write EXTERNAL_REG_INTERNAL_OFFSET/EXTERNAL_REG_WRITE_ADDR should be used in pairs. If an external register shouldn't be written, both packets shouldn't be sent. Fixes: `a78b481469` ("drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Feifei Xu	3eebfd5e9c	drm/amdkfd:Add kfd function to config sq perfmon Expose the interface for kfd to config sq perfmon. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Sathishkumar S	f0b19b84d3	drm/amdgpu: add amdgpu_jpeg_sched_mask debugfs JPEG_4_0_3 has up to 32 jpeg cores and a single mjpeg video decode will use all available cores on the hardware. This debugfs entry helps to disable or enable job submission to a cluster of cores or one specific core in the ip for debugging. The entry is populated only if there is at least two or more cores in the jpeg ip. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Feifei Xu	400a7591d9	drm/amdgpu: Add psp command CONFIG_SQ_PERFMON Add support for enable/disable perfmon profiling. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Prike Liang	6704dbf719	drm/amdgpu: update suspend status for aborting from deeper suspend There're some other suspend abort cases which can call the noirq suspend except for executing _S3 method. In those cases need to process as incomplete suspendsion. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Asad Kamal	dc443aa4ab	drm/amd/amdgpu: Add helper to get ip block valid Add helper function to check if ip block is enabled Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Jiadong Zhu	92c9b3e8e4	drm/amdgpu/sdma6: implement ring reset callback for sdma6 Implement sdma queue reset callback using mes_reset_queue_mmio. v2: check instance id before reset queue. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Jiadong Zhu	df190e6753	drm/amdgpu/sdma6: split out per instance resume function Extract the resume sequence for individual sdma instance from sdma_v6_0_gfx_resume. The function could be used for start/restart scenario on a certain instance. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Jiadong Zhu	ced65debf4	drm/amdgpu/mes11: update mes_reset_queue function to support sdma queue Reset sdma queue through mmio based on me_id and queue_id. v2: simplify callflows and register calculation. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Alex Deucher	34ad56a467	drm/amdgpu: bump driver version for cleared VRAM Driver now clears VRAM on allocation. Bump the driver version so mesa knows when it will get cleared vram by default. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-26 17:04:47 -04:00
Alex Deucher	a8387ddc0d	drm/amdgpu: fix vbios fetching for SR-IOV SR-IOV fetches the vbios from VRAM in some cases. Re-enable the VRAM path for dGPUs and rename the function to make it clear that it is not IGP specific. Fixes: `042658d17a` ("drm/amdgpu: clean up vbios fetching code") Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Tested-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:04:10 -04:00
Frank Min	3cb576bc6d	drm/amdgpu: fix PTE copy corruption for sdma 7 Without setting dcc bit, there is ramdon PTE copy corruption on sdma 7. so add this bit and update the packet format accordingly. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-26 17:03:39 -04:00
Thomas Zimmermann	32acc286b2	drm/amdgpu: Run DRM default client setup Call drm_client_setup() to run the kernel's default client setup for DRM. Set fbdev_probe in struct drm_driver, so that the client setup can start the common fbdev client. The amdgpu driver specifies a preferred color mode depending on the available video memory, with a default of 32. Adapt this for the new client interface. v5: - select DRM_CLIENT_SELECTION v2: - style changes Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Tested-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924071734.98201-66-tzimmermann@suse.de	2024-09-26 09:31:28 +02:00
Saleemkhan Jamadar	8048e5ade8	drm/amdgpu/vcn: enable AV1 on both instances v1 - remove cs parse code (Christian) On VCN v4_0_6 AV1 is supported on both the instances. Remove cs IB parse code since explict handling of AV1 schedule is not required. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-09-25 12:56:19 -04:00
Mukul Joshi	e45b011d2c	drm/amdkfd: Fix CU occupancy for GFX 9.4.3 Make CU occupancy calculations work on GFX 9.4.3 by updating the logic to handle multiple XCCs correctly. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-25 12:56:07 -04:00
Mukul Joshi	6ae9e1aba9	drm/amdkfd: Update logic for CU occupancy calculations Currently, the code uses the IH_VMID_X_LUT register to map a queue's vmid to the corresponding PASID. This logic is racy since CP can update the VMID-PASID mapping anytime especially when there are more processes than number of vmids. Update the logic to calculate CU occupancy by matching doorbell offset of the queue with valid wave counts against the process's queues. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-25 12:56:00 -04:00
ZhenGuo Yin	e1d27f7a9c	drm/amdgpu: skip coredump after job timeout in SRIOV VF FLR will be triggered by host driver before job timeout, hence the error status of GPU get cleared. Performing a coredump here is unnecessary. Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-25 12:55:52 -04:00
Christian König	126be9b2be	drm/amdgpu: sync to KFD fences before clearing PTEs This patch tries to solve the basic problem we also need to sync to the KFD fences of the BO because otherwise it can be that we clear PTEs while the KFD queues are still running. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-25 12:55:44 -04:00
Jack Xiao	4771d2ecb7	drm/amdgpu/mes12: set enable_level_process_quantum_check enable_level_process_quantum_check is requried to enable process quantum based scheduling. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-25 12:55:14 -04:00
Linus Torvalds	f8ffbc365f	struct fd layout change (and conversion to accessor helpers) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZvDNmgAKCRBZ7Krx/gZQ 63zrAP9vI0rf55v27twiabe9LnI7aSx5ckoqXxFIFxyT3dOYpQD/bPmoApnWDD3d 592+iDgLsema/H/0/CqfqlaNtDNY8Q0= =HUl5 -----END PGP SIGNATURE----- Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it.	2024-09-23 09:35:36 -07:00
Linus Torvalds	de848da12f	drm next for 6.12-rc1 string: - add mem_is_zero() core: - support more device numbers - use XArray for minor ids - add backlight constants - Split dma fence array creation into alloc and arm fbdev: - remove usage of old fbdev hooks kms: - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support dma-buf: - docs cleanup buddy: - Add start address support for trim function printk: - pass description to kmsg_dump scheduler; - Remove full_recover from drm_sched_start ttm: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory panic: - add display QR code (in rust) displayport: - mst: GUID improvements bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR - anx7625: simplify OF array handling - dw-hdmi: simplify clock handling - lontium-lt8912b: fix mode validation - nwl-dsi: fix mode vsync/hsync polarity xe: - Enable LunarLake and Battlemage support - Introducing Xe2 ccs modifiers for integrated and discrete graphics - rename xe perf to xe observation - use wb caching on DGFX for system memory - add fence timeouts - Lunar Lake graphics/media/display workarounds - Battlemage workarounds - Battlemage GSC support - GSC and HuC fw updates for LL/BM - use dma_fence_chain_free - refactor hw engine lookup and mmio access - enable priority mem read for Xe2 - Add first GuC BMG fw - fix dma-resv lock - Fix DGFX display suspend/resume - Use xe_managed for kernel BOs - Use reserved copy engine for user binds on faulting devices - Allow mixing dma-fence jobs and long-running faulting jobs - fix media TLB invalidation - fix rpm in TTM swapout path - track resources and VF state by PF i915: - Type-C programming fix for MTL+ - FBC cleanup - Calc vblank delay more accurately - On DP MST, Enable LT fallback for UHBR<->non-UHBR rates - Fix DP LTTPR detection - limit relocations to INT_MAX - fix long hangs in buddy allocator on DG2/A380 amdgpu: - Per-queue reset support - SDMA devcoredump support - DCN 4.0.1 updates - GFX12/VCN4/JPEG4 updates - Convert vbios embedded EDID to drm_edid - GFX9.3/9.4 devcoredump support - process isolation framework for GFX 9.4.3/4 - take IOMMU mappings into account for P2P DMA amdkfd: - CRIU fixes - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Allow users to target recommended SDMA engines - KFD support for targetting queues on recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking msm: - DPU: - implement DP/PHY mapping on SC8180X - Enable writeback on SM8150, SC8180X, SM6125, SM6350 - DP: - Enable widebus on all relevant chipsets - MSM8998 HDMI support - GPU: - A642L speedbin support - A615/A306/A621 support - A7xx devcoredump support ast: - astdp: Support AST2600 with VGA - Clean up HPD - Fix timeout loop for DP link training - reorganize output code by type (VGA, DP, etc) - convert to struct drm_edid - fix BMC handling for all outputs exynos: - drop stale MAINTAINERS pattern - constify struct loongson: - use GEM refcount over TTM mgag200: - Improve BMC handling - Support VBLANK intterupts - transparently support BMC outputs nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's gm12u320: - convert to struct drm_edid gma500: - update i2c terms lcdif: - pixel clock fix host1x: - fix syncpoint IRQ during resume - use iommu_paging_domain_alloc() imx: - ipuv3: convert to struct drm_edid omapdrm: - improve error handling - use common helper for_each_endpoint_of_node() panel: - add support for BOE TV101WUM-LL2 plus DT bindings - novatek-nt35950: improve error handling - nv3051d: improve error handling - panel-edp: add support for BOE NE140WUM-N6G; revert support for SDC ATNA45AF01 - visionox-vtdr6130: improve error handling; use devm_regulator_bulk_get_const() - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing - panel-edp: fix name for HKC MB116AN01 - jd9365da: fix "exit sleep" commands - jdi-fhd-r63452: simplify error handling with DSI multi-style helpers - mantix-mlaf057we51: simplify error handling with DSI multi-style helpers - simple: support Innolux G070ACE-LH3 plus DT bindings support On Tat Industrial Company KD50G21-40NT-A1 plus DT bindings - st7701: decouple DSI and DRM code add SPI support support Anbernic RG28XX plus DT bindings mediatek: - support alpha blending - remove cl in struct cmdq_pkt - ovl adaptor fix - add power domain binding for mediatek DPI controller renesas: - rz-du: add support for RZ/G2UL plus DT bindings rockchip: - Improve DP sink-capability reporting - dw_hdmi: Support 4k@60Hz - vop: Support RGB display on Rockchip RK3066; Support 4096px width sti: - convert to struct drm_edid stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - gr3d: improve PM domain handling - convert to struct drm_edid - Call drm_atomic_helper_shutdown() vc4: - fix PM during detect - replace DRM_ERROR() with drm_error() - v3d: simplify clock retrieval v3d: - Clean up perfmon virtio: - add DRM capset -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmbq43gACgkQDHTzWXnE hr4+lg/+O/r41E7ioitcM0DWeWem0dTlvQr41pJ8jujHvw+bXNdg0BMGWtsTyTLA eOft2AwofsFjg+O7l8IFXOT37mQLdIdfjb3+w5brI198InL3OWC3QV8ZSwY9VGET n8crO9jFoxNmHZnFniBZbtI6egTyl6H+2ey3E0MTnKiPUKZQvsK/4+x532yVLPob UUOze5wcjyGZc7LJEIZPohPVneCb9ki7sabDQqh4cxIQ0Eg+nqPpWjYM4XVd+lTS 8QmssbR49LrJ7z9m90qVE+8TjYUCn+ChDPMs61KZAAnc8k++nK41btjGZ23mDKPb YEguahCYthWJ4U8K18iXBPnLPxZv5+harQ8OIWAUYqdIOWSXHozvuJ2Z84eHV13a 9mQ5vIymXang8G1nEXwX/vml9uhVhBCeWu3qfdse2jfaTWYUb1YzhqUoFvqI0R0K 8wT03MyNdx965CSqAhpH5Jd559ueZmpd+jsHOfhAS+1gxfD6NgoPXv7lpnMUmGWX SnaeC9RLD4cgy7j2Swo7TEqQHrvK5XhZSwX94kU6RPmFE5RRKqWgFVQmwuikDMId UpNqDnPT5NL2UX4TNG4V4coyTXvKgVcSB9TA7j8NSLfwdGHhiz73pkYosaZXKyxe u6qKMwMONfZiT20nhD7RhH0AFnnKosAcO14dhn0TKFZPY6Ce9O8= =7jR+ -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "This adds a couple of patches outside the drm core, all should be acked appropriately, the string and pstore ones are the main ones that come to mind. Otherwise it's the usual drivers, xe is getting enabled by default on some new hardware, we've changed the device number handling to allow more devices, and we added some optional rust code to create QR codes in the panic handler, an idea first suggested I think 10 years ago :-) string: - add mem_is_zero() core: - support more device numbers - use XArray for minor ids - add backlight constants - Split dma fence array creation into alloc and arm fbdev: - remove usage of old fbdev hooks kms: - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support dma-buf: - docs cleanup buddy: - Add start address support for trim function printk: - pass description to kmsg_dump scheduler: - Remove full_recover from drm_sched_start ttm: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory panic: - add display QR code (in rust) displayport: - mst: GUID improvements bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR - anx7625: simplify OF array handling - dw-hdmi: simplify clock handling - lontium-lt8912b: fix mode validation - nwl-dsi: fix mode vsync/hsync polarity xe: - Enable LunarLake and Battlemage support - Introducing Xe2 ccs modifiers for integrated and discrete graphics - rename xe perf to xe observation - use wb caching on DGFX for system memory - add fence timeouts - Lunar Lake graphics/media/display workarounds - Battlemage workarounds - Battlemage GSC support - GSC and HuC fw updates for LL/BM - use dma_fence_chain_free - refactor hw engine lookup and mmio access - enable priority mem read for Xe2 - Add first GuC BMG fw - fix dma-resv lock - Fix DGFX display suspend/resume - Use xe_managed for kernel BOs - Use reserved copy engine for user binds on faulting devices - Allow mixing dma-fence jobs and long-running faulting jobs - fix media TLB invalidation - fix rpm in TTM swapout path - track resources and VF state by PF i915: - Type-C programming fix for MTL+ - FBC cleanup - Calc vblank delay more accurately - On DP MST, Enable LT fallback for UHBR<->non-UHBR rates - Fix DP LTTPR detection - limit relocations to INT_MAX - fix long hangs in buddy allocator on DG2/A380 amdgpu: - Per-queue reset support - SDMA devcoredump support - DCN 4.0.1 updates - GFX12/VCN4/JPEG4 updates - Convert vbios embedded EDID to drm_edid - GFX9.3/9.4 devcoredump support - process isolation framework for GFX 9.4.3/4 - take IOMMU mappings into account for P2P DMA amdkfd: - CRIU fixes - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Allow users to target recommended SDMA engines - KFD support for targetting queues on recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking msm: - DPU: - implement DP/PHY mapping on SC8180X - Enable writeback on SM8150, SC8180X, SM6125, SM6350 - DP: - Enable widebus on all relevant chipsets - MSM8998 HDMI support - GPU: - A642L speedbin support - A615/A306/A621 support - A7xx devcoredump support ast: - astdp: Support AST2600 with VGA - Clean up HPD - Fix timeout loop for DP link training - reorganize output code by type (VGA, DP, etc) - convert to struct drm_edid - fix BMC handling for all outputs exynos: - drop stale MAINTAINERS pattern - constify struct loongson: - use GEM refcount over TTM mgag200: - Improve BMC handling - Support VBLANK intterupts - transparently support BMC outputs nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's gm12u320: - convert to struct drm_edid gma500: - update i2c terms lcdif: - pixel clock fix host1x: - fix syncpoint IRQ during resume - use iommu_paging_domain_alloc() imx: - ipuv3: convert to struct drm_edid omapdrm: - improve error handling - use common helper for_each_endpoint_of_node() panel: - add support for BOE TV101WUM-LL2 plus DT bindings - novatek-nt35950: improve error handling - nv3051d: improve error handling - panel-edp: - add support for BOE NE140WUM-N6G - revert support for SDC ATNA45AF01 - visionox-vtdr6130: - improve error handling - use devm_regulator_bulk_get_const() - boe-th101mb31ig002: - Support for starry-er88577 MIPI-DSI panel plus DT - Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: - Support Melfas lmfbx101117480 MIPI-DSI panel plus DT - Refactor for code sharing - panel-edp: fix name for HKC MB116AN01 - jd9365da: fix "exit sleep" commands - jdi-fhd-r63452: simplify error handling with DSI multi-style helpers - mantix-mlaf057we51: simplify error handling with DSI multi-style helpers - simple: - support Innolux G070ACE-LH3 plus DT bindings - support On Tat Industrial Company KD50G21-40NT-A1 plus DT bindings - st7701: - decouple DSI and DRM code - add SPI support - support Anbernic RG28XX plus DT bindings mediatek: - support alpha blending - remove cl in struct cmdq_pkt - ovl adaptor fix - add power domain binding for mediatek DPI controller renesas: - rz-du: add support for RZ/G2UL plus DT bindings rockchip: - Improve DP sink-capability reporting - dw_hdmi: Support 4k@60Hz - vop: - Support RGB display on Rockchip RK3066 - Support 4096px width sti: - convert to struct drm_edid stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: - Fix transparency after disabling plane - Remove unused interrupt tegra: - gr3d: improve PM domain handling - convert to struct drm_edid - Call drm_atomic_helper_shutdown() vc4: - fix PM during detect - replace DRM_ERROR() with drm_error() - v3d: simplify clock retrieval v3d: - Clean up perfmon virtio: - add DRM capset" * tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel: (1326 commits) drm/xe: Fix missing conversion to xe_display_pm_runtime_resume drm/xe/xe2hpg: Add Wa_15016589081 drm/xe: Don't keep stale pointer to bo->ggtt_node drm/xe: fix missing 'xe_vm_put' drm/xe: fix build warning with CONFIG_PM=n drm/xe: Suppress missing outer rpm protection warning drm/xe: prevent potential UAF in pf_provision_vf_ggtt() drm/amd/display: Add all planes on CRTC to state for overlay cursor drm/i915/bios: fix printk format width drm/i915/display: Fix BMG CCS modifiers drm/amdgpu: get rid of bogus includes of fdtable.h drm/amdkfd: CRIU fixes drm/amdgpu: fix a race in kfd_mem_export_dmabuf() drm: new helper: drm_gem_prime_handle_to_dmabuf() drm/amdgpu/atomfirmware: Silence UBSAN warning drm/amdgpu: Fix kdoc entry in 'amdgpu_vm_cpu_prepare' drm/amd/amdgpu: apply command submission parser for JPEG v1 drm/amd/amdgpu: apply command submission parser for JPEG v2+ drm/amd/pm: fix the pp_dpm_pcie issue on smu v14.0.2/3 drm/amd/pm: update the features set on smu v14.0.2/3 ...	2024-09-19 10:18:15 +02:00
Alex Deucher	84f76408ab	drm/amdgpu/mes12: reduce timeout The firmware timeout is 2s. Reduce the driver timeout to 2.1 seconds to avoid back pressure on queue submissions. Fixes: `94b51a3d01` ("drm/amdgpu/mes12: increase mes submission timeout") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-18 16:15:13 -04:00
Alex Deucher	856265caa9	drm/amdgpu/mes11: reduce timeout The firmware timeout is 2s. Reduce the driver timeout to 2.1 seconds to avoid back pressure on queue submissions. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3627 Fixes: `f7c161a4c2` ("drm/amdgpu: increase mes submission timeout") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-09-18 16:15:13 -04:00
Christian König	6dcba0975d	drm/amdgpu: use GEM references instead of TTMs v2 Instead of a TTM reference grab a GEM reference whenever necessary. v2: fix typo in amdgpu_bo_unref pointed out by Vitaly, initialize the GEM funcs for kernel allocations as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:13 -04:00
Frank Min	7b6df1d732	drm/amdgpu: update golden regs for gfx12 update golden regs for gfx12 Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-18 16:15:09 -04:00
Alex Deucher	042658d17a	drm/amdgpu: clean up vbios fetching code After splitting the logic between APU and dGPU, clean up some of the APU and dGPU specific logic that no longer applied. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Alex Deucher	375b035f68	drm/amdgpu/bios: split vbios fetching between APU and dGPU We need some different logic for dGPUs and the APU path can be simplified because there are some methods which are never used on APUs. This also fixes a regression on some older APUs causing the driver to fetch the unpatched ROM image rather than the patched image. Fixes: `9c081c11c6` ("drm/amdgpu: Reorder to read EFI exported ROM first") Reviewed-by: George Zhang <George.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Christian König	f2be7b39e4	drm/amdgpu: remove amdgpu_pin_restricted() We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Christian König	54b86443fd	drm/amdgpu: explicitely set the AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flag Instead of having that in the amdgpu_bo_pin() function applied for all pinned BOs. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Lijo Lazar	42ac749d5b	drm/amdgpu: Fix XCP instance mask calculation Fix instance mask calculation for VCN IP. There are cases where VCN instance could be shared across partitions. Fix here so that other blocks don't need to check for any shared instances based on partition mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Asad Kamal	ef126c06a9	drm/amdgpu: Fix get each xcp macro Fix get each xcp macro to loop over each partition correctly Fixes: `4bdca20579` ("drm/amdgpu: Add utility functions for xcp") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:08 -04:00
Le Ma	2778701b16	drm/amdgpu: load sos binary properly on the basis of pmfw version To be compatible with legacy IFWI, driver needs to carry legacy tOS and query pmfw version to load them accordingly. Add psp_firmware_header_v2_1 to handle the combined sos binary. Double the sos count limit for the case of aux sos fw packed. v2: pass the correct fw_bin_desc to parse_sos_bin_descriptor Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
Le Ma	2ae6cd583c	drm/amdgpu: add psp funcs callback to check if aux fw is needed Query pmfw version to determine if aux sos fw needs to be loaded in psp v13.0. v2: refine callback to check if aux_fw loading is needed instead of getting pmfw version barely v3: return the comparison directly Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
Christian König	7181faaa47	drm/amdgpu: nuke the VM PD/PT shadow handling This was only used as workaround for recovering the page tables after VRAM was lost and is no longer necessary after the function amdgpu_vm_bo_reset_state_machine() started to do the same. Compute never used shadows either, so the only proplematic case left is SVM and that is most likely not recoverable in any way when VRAM is lost. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
Alex Deucher	c1de938fb7	drm/amdgpu/gfx9.4.3: Explicitly halt MEC before init Need to make sure it's halted as we don't know what state the GPU may have been left in previously. Tested-by: Amber Lin <Amber.Lin@amd.com> Acked-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
Alex Deucher	797fb15333	drm/amdgpu/gfx9.4.3: set additional bits on MEC halt Need to set the pipe reset and cache invalidation bits on halt otherwise we can get stale state if the CP firmware changes (e.g., on module unload and reload). Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
David Belanger	03b5038c0a	drm/amdgpu: Fix selfring initialization sequence on soc24 Move enable_doorbell_selfring_aperture from common_hw_init to common_late_init in soc24, otherwise selfring aperture is initialized with an incorrect doorbell aperture base. Port changes from this commit from soc21 to soc24: commit `1c312e816c` ("drm/amdgpu: Enable doorbell selfring after resize FB BAR") Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-18 16:14:47 -04:00
Jack Xiao	3c75518cf2	drm/amdgpu/mes12: switch SET_SHADER_DEBUGGER pkt to mes schq pipe The SET_SHADER_DEBUGGER packet must work with the added hardware queue, switch the packet submitting to mes schq pipe. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-18 16:14:27 -04:00
Andrew Kreimer	c400ec6990	drm/amdgpu: Fix a typo Fix a typo in comments. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:14:26 -04:00
Yan Zhen	0110ac1195	drm/amdgpu: fix typo in the comment Correctly spelled comments make it easier for the reader to understand the code. Replace 'udpate' with 'update' in the comment & replace 'recieved' with 'received' in the comment & replace 'dsiable' with 'disable' in the comment & replace 'Initiailize' with 'Initialize' in the comment & replace 'disble' with 'disable' in the comment & replace 'Disbale' with 'Disable' in the comment & replace 'enogh' with 'enough' in the comment & replace 'availabe' with 'available' in the comment. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Yan Zhen <yanzhen@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:14:26 -04:00
Alex Deucher	bfc00a7754	drm/amdgpu/gfx9.4.3: drop extra wrapper Drop wrapper used in one place. gfx_v9_4_3_xcc_cp_enable() is used in one place. gfx_v9_4_3_xcc_cp_compute_enable() is used everywhere else. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:14:26 -04:00
Bob Zhou	28b0ef9227	drm/amdgpu: Fix missing check pcie_p2p module param The module param pcie_p2p should be checked for kfd p2p feature, so add it. Fixes: `75f0efbc4b` ("drm/amdgpu: Take IOMMU remapping into account for p2p checks") Signed-off-by: Bob Zhou <bob.zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-17 10:04:57 -04:00
Tao Zhou	c389a0604c	drm/amdgpu: disable GPU RAS bad page feature for specific ASIC The feature is not applicable to specific app platform. v2: update the disablement condition and commit description v3: move the setting to amdgpu_ras_check_supported Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-17 10:04:57 -04:00
Tim Huang	0da531c82a	drm/amdgpu: ensure the connector is not null before using it This resolves the dereference null return value warning reported by Coverity. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-17 10:04:57 -04:00
Linus Torvalds	8f72c31f45	vfs-6.12.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEGwAKCRCRxhvAZXjc ojIuAQC433+hBkvjvmQ7H0r5rgZSjUuCTG3bSmdU7RJmPHUHhwEA85v/NGq53f+W IhandK6t+Cf0JYpFZ3N0bT88hDYVhQQ= =9zGL -----END PGP SIGNATURE----- Merge tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual pile of misc updates: Features: - Add F_CREATED_QUERY fcntl() that allows userspace to query whether a file was actually created. Often userspace wants to know whether an O_CREATE request did actually create a file without using O_EXCL. The current logic is that to first attempts to open the file without O_CREAT \| O_EXCL and if ENOENT is returned userspace tries again with both flags. If that succeeds all is well. If it now reports EEXIST it retries. That works fairly well but some corner cases make this more involved. If this operates on a dangling symlink the first openat() without O_CREAT \| O_EXCL will return ENOENT but the second openat() with O_CREAT \| O_EXCL will fail with EEXIST. The reason is that openat() without O_CREAT \| O_EXCL follows the symlink while O_CREAT \| O_EXCL doesn't for security reasons. So it's not something we can really change unless we add an explicit opt-in via O_FOLLOW which seems really ugly. All available workarounds are really nasty (fanotify, bpf lsm etc) so add a simple fcntl(). - Try an opportunistic lookup for O_CREAT. Today, when opening a file we'll typically do a fast lookup, but if O_CREAT is set, the kernel always takes the exclusive inode lock. This was likely done with the expectation that O_CREAT means that we always expect to do the create, but that's often not the case. Many programs set O_CREAT even in scenarios where the file already exists (see related F_CREATED_QUERY patch motivation above). The series contained in the pr rearranges the pathwalk-for-open code to also attempt a fast_lookup in certain O_CREAT cases. If a positive dentry is found, the inode_lock can be avoided altogether and it can stay in rcuwalk mode for the last step_into. - Expose the 64 bit mount id via name_to_handle_at() Now that we provide a unique 64-bit mount ID interface in statx(2), we can now provide a race-free way for name_to_handle_at(2) to provide a file handle and corresponding mount without needing to worry about racing with /proc/mountinfo parsing or having to open a file just to do statx(2). While this is not necessary if you are using AT_EMPTY_PATH and don't care about an extra statx(2) call, users that pass full paths into name_to_handle_at(2) need to know which mount the file handle comes from (to make sure they don't try to open_by_handle_at a file handle from a different filesystem) and switching to AT_EMPTY_PATH would require allocating a file for every name_to_handle_at(2) call - Add a per dentry expire timeout to autofs There are two fairly well known automounter map formats, the autofs format and the amd format (more or less System V and Berkley). Some time ago Linux autofs added an amd map format parser that implemented a fair amount of the amd functionality. This was done within the autofs infrastructure and some functionality wasn't implemented because it either didn't make sense or required extra kernel changes. The idea was to restrict changes to be within the existing autofs functionality as much as possible and leave changes with a wider scope to be considered later. One of these changes is implementing the amd options: 1) "unmount", expire this mount according to a timeout (same as the current autofs default). 2) "nounmount", don't expire this mount (same as setting the autofs timeout to 0 except only for this specific mount) . 3) "utimeout=<seconds>", expire this mount using the specified timeout (again same as setting the autofs timeout but only for this mount) To implement these options per-dentry expire timeouts need to be implemented for autofs indirect mounts. This is because all map keys (mounts) for autofs indirect mounts use an expire timeout stored in the autofs mount super block info. structure and all indirect mounts use the same expire timeout. Fixes: - Fix missing fput for FSCONFIG_SET_FD in autofs - Use param->file for FSCONFIG_SET_FD in coda - Delete the 'fs/netfs' proc subtreee when netfs module exits - Make sure that struct uid_gid_map fits into a single cacheline - Don't flush in-flight wb switches for superblocks without cgroup writeback - Correcting the idmapping mount example in the idmapping documentation - Fix a race between evice_inodes() and find_inode() and iput() - Refine the show_inode_state() macro definition in writeback code - Prevent dump_mapping() from accessing invalid dentry.d_name.name - Show actual source for debugfs in /proc/mounts - Annotate data-race of busy_poll_usecs in eventpoll - Don't WARN for racy path_noexec check in exec code - Handle OOM on mnt_warn_timestamp_expiry() - Fix some spelling in the iomap design documentation - Fix typo in procfs comment - Fix typo in fs/namespace.c comment Cleanups: - Add the VFS git tree to the MAINTAINERS file - Move FMODE_UNSIGNED_OFFSET to fop_flags freeing up another f_mode bit in struct file bringing us to 5 free f_mode bits - Remove the __I_DIO_WAKEUP bit from i_state flags as we can simplify the wait mechanism - Remove the unused path_put_init() helper - Replace a __u32 with u32 for s_fsnotify_mask as __u32 is uapi specific - Replace the unsigned long i_state member with a u32 i_state member in struct inode freeing up 4 bytes in struct inode. Instead of using the bit based wait apis we're now using the var event apis and using the individual bytes of the i_state member to wait on state changes - Explain how per-syscall AT_* flags should be allocated - Use in_group_or_capable() helper to simplify the posix acl mode update code - Switch to LIST_HEAD() in fsync_buffers_list() to simplify the code - Removed comment about d_rcu_to_refcount() as that function doesn't exist anymore - Add kernel documentation for lookup_fast() - Don't re-zero evenpoll fields - Remove outdated comment after close_fd() - Fix imprecise wording in comment about the pipe filesystem - Drop GFP_NOFAIL mode from alloc_page_buffers - Missing blank line warnings and struct declaration improved in file_table - Annotate struct poll_list with __counted_by() - Remove the unused read parameter in percpu-rwsem - Remove linux/prefetch.h include from direct-io code - Use kmemdup_array instead of kmemdup for multiple allocation in mnt_idmapping code - Remove unused mnt_cursor_del() declaration Performance tweaks: - Dodge smp_mb in break_lease and break_deleg in the common case - Only read fops once in fops_{get,put}() - Use RCU in ilookup() - Elide smp_mb in iversion handling in the common case - Drop one lock trip in evict()" * tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (58 commits) uidgid: make sure we fit into one cacheline proc: Fix typo in the comment fs/pipe: Correct imprecise wording in comment fhandle: expose u64 mount id to name_to_handle_at(2) uapi: explain how per-syscall AT_* flags should be allocated fs: drop GFP_NOFAIL mode from alloc_page_buffers writeback: Refine the show_inode_state() macro definition fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name mnt_idmapping: Use kmemdup_array instead of kmemdup for multiple allocation netfs: Delete subtree of 'fs/netfs' when netfs module exits fs: use LIST_HEAD() to simplify code inode: make i_state a u32 inode: port __I_LRU_ISOLATING to var event vfs: fix race between evice_inodes() and find_inode()&iput() inode: port __I_NEW to var event inode: port __I_SYNC to var event fs: reorder i_state bits fs: add i_state helpers MAINTAINERS: add the VFS git tree fs: s/__u32/u32/ for s_fsnotify_mask ...	2024-09-16 08:35:09 +02:00
Thomas Zimmermann	61b86391fb	Merge drm/drm-next into drm-misc-next Backmerging to get fixes from v6.12-rc7. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2024-09-11 09:48:49 +02:00
David (Ming Qiang) Wu	8409fb50ce	drm/amd/amdgpu: apply command submission parser for JPEG v1 Similar to jpeg_v2_dec_ring_parse_cs() but it has different register ranges and a few other registers access. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3d5adbdf1d`) Cc: stable@vger.kernel.org	2024-09-10 17:26:55 -04:00
David (Ming Qiang) Wu	3a23aa0b9c	drm/amd/amdgpu: apply command submission parser for JPEG v2+ This patch extends the same cs parser from JPEG v4.0.3 to other JPEG versions (v2 and above). Rename to more common name as jpeg_v2_dec_ring_parse_cs() from jpeg_v4_0_3_dec_ring_parse_cs(). Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `88dcad2d07`) Cc: stable@vger.kernel.org	2024-09-10 17:26:49 -04:00
Al Viro	4c3140fea6	drm/amdgpu: get rid of bogus includes of fdtable.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:30 -04:00
Al Viro	6c6ca71bc1	drm/amdgpu: fix a race in kfd_mem_export_dmabuf() Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it from descriptor table is not just too convoluted - it's racy; another thread might have modified the descriptor table while we'd been going through that song and dance. Switch kfd_mem_export_dmabuf() to using drm_gem_prime_handle_to_dmabuf() and leave the descriptor table alone... Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:30 -04:00
Srinivasan Shanmugam	b8faa981a7	drm/amdgpu: Fix kdoc entry in 'amdgpu_vm_cpu_prepare' This commit updates described non-existent parameters 'resv' and 'sync_mode', and failed to describe the existing 'sync' parameter. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c:50: warning: Function parameter or struct member 'sync' not described in 'amdgpu_vm_cpu_prepare' drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c:50: warning: Excess function parameter 'resv' description in 'amdgpu_vm_cpu_prepare' drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c:50: warning: Excess function parameter 'sync_mode' description in 'amdgpu_vm_cpu_prepare' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:29 -04:00
David (Ming Qiang) Wu	3d5adbdf1d	drm/amd/amdgpu: apply command submission parser for JPEG v1 Similar to jpeg_v2_dec_ring_parse_cs() but it has different register ranges and a few other registers access. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:29 -04:00
David (Ming Qiang) Wu	88dcad2d07	drm/amd/amdgpu: apply command submission parser for JPEG v2+ This patch extends the same cs parser from JPEG v4.0.3 to other JPEG versions (v2 and above). Rename to more common name as jpeg_v2_dec_ring_parse_cs() from jpeg_v4_0_3_dec_ring_parse_cs(). Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:29 -04:00
Jani Nikula	0df8ef6e1b	drm/amdgpu: drop redundant W=1 warnings from Makefile Since commit `a61ddb4393` ("drm: enable (most) W=1 warnings by default across the subsystem"), most of the extra warnings in the driver Makefile are redundant. Remove them. Note that -Wmissing-declarations and -Wmissing-prototypes are always enabled by default in scripts/Makefile.extrawarn. Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:17 -04:00
Christian König	7ccde2e6c0	drm/amdgpu: revert "use CPU for page table update if SDMA is unavailable" That is clearly not something we should do upstream. The SDMA is mandatory for the driver to work correctly. We could do this for emulation and bringup, but in those cases the engineer should probably enabled CPU based updates manually. This reverts commit `62eefd10ac`. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:06 -04:00
Dan Carpenter	27f9dcb9cc	drm/amdgpu/mes11: Indent an if statment Indent the "break" statement one more tab. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:05 -04:00
Ramesh Errabolu	01be2b62c0	drm/amdgpu: Surface svm_default_granularity, a RW module parameter Enables users to update SVM's default granularity, used in buffer migration and handling of recoverable page faults. Param value is set in terms of log(numPages(buffer)), e.g. 9 for a 2 MIB buffer Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:05 -04:00
Jesse Zhang	e8397d327e	drm/amdgpu: fix queue reset issue by mmio Initialize the queue type before resetting the queue using mmio. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:54:54 -04:00
Srinivasan Shanmugam	559a285816	drm/amdgpu: Replace 'amdgpu_job_submit_direct' with 'drm_sched_entity' in cleaner shader This commit replaces the use of amdgpu_job_submit_direct which submits the job to the ring directly, with drm_sched_entity in the cleaner shader job submission process. The change allows the GPU scheduler to manage the cleaner shader job. - The job is then submitted to the GPU using the drm_sched_entity_push_job function, which allows the GPU scheduler to manage the job. This change improves the reliability of the cleaner shader job submission process by leveraging the capabilities of the GPU scheduler. Fixes: `d361ad5d2f` ("drm/amdgpu: Add sysfs interface for running cleaner shader") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:42:33 -04:00
Srinivasan Shanmugam	2578487ebe	drm/amdgpu/: Add missing kdoc entry in amdgpu_vm_handle_fault function This commit adds a description for the 'ts' parameter in the amdgpu_vm_handle_fault function's comment block. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2781: warning: Function parameter or struct member 'ts' not described in 'amdgpu_vm_handle_fault' Cc: Xiaogang.Chen <Xiaogang.Chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408251419.vgZHg3GV-lkp@intel.com/ Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:42:05 -04:00
Lijo Lazar	4481df364d	drm/amdgpu: Normalize reg offsets on JPEG v4.0.3 On VFs and SOCs with GC 9.4.4, VCN RRMT is disabled. Only local register offsets should be used on JPEG v4.0.3 as they cannot handle remote access to other AIDs. Since only local offsets are used, the special write to MCM_ADDR register is no longer needed. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:40:47 -04:00
Li Zetao	760e3c8b32	drm/amdgpu: use clamp() in amdgpu_vm_adjust_size() When it needs to get a value within a certain interval, using clamp() makes the code easier to understand than min(max()). Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:57 -04:00
Li Zetao	6fbbb660b1	drm/amd: use clamp() in amdgpu_pll_get_fb_ref_div() When it needs to get a value within a certain interval, using clamp() makes the code easier to understand than min(max()). Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:53 -04:00
Peng Liu	2c7795e245	drm/amdgpu: enable gfxoff quirk on HP 705G4 Enabling gfxoff quirk results in perfectly usable graphical user interface on HP 705G4 DM with R5 2400G. Without the quirk, X server is completely unusable as every few seconds there is gpu reset due to ring gfx timeout. Signed-off-by: Peng Liu <liupeng01@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:45 -04:00
Peng Liu	0126c0ae11	drm/amdgpu: add raven1 gfxoff quirk Fix screen corruption with openkylin. Link: https://bbs.openkylin.top/t/topic/171497 Signed-off-by: Peng Liu <liupeng01@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:33 -04:00
Lang Yu	4453808d9e	drm/amdgpu: fix invalid fence handling in amdgpu_vm_tlb_flush CPU based update doesn't produce a fence, handle such cases properly. Fixes: `d8a3f0a034` ("drm/amdgpu: implement TLB flush fence") Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:36:50 -04:00
Christian König	4da5a95bf1	drm/amdgpu: re-work VM syncing Rework how VM operations synchronize to submissions. Provide an amdgpu_sync container to the backends instead of an reservation object and fill in the amdgpu_sync object in the higher layers of the code. No intended functional change, just prepares for upcomming changes. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:36:24 -04:00
Christian König	b2ef808786	drm/sched: add optional errno to drm_sched_start() The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL parent in the pending list, making it difficult to distinguish between recovery methods, whether a queue reset or a full GPU reset was used. To improve this, we first try a soft recovery for timeout jobs and use the error code -ENODATA. If soft recovery fails, we proceed with a queue reset, where the error code remains -ENODATA for the job. Finally, for a full GPU reset, we use error codes -ECANCELED or -ETIME. This patch adds an error code parameter to drm_sched_start, allowing us to differentiate between queue reset and GPU reset failures. This enables user mode and test applications to validate the expected correctness of the requested operation. After a successful queue reset, the only way to continue normal operation is to call drm_sched_job_done with the specific error code -ENODATA. v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain and amdgpu_device_unlock_reset_domain to allow user mode to track the queue reset status and distinguish between queue reset and GPU reset. v2: Christian suggested using the error codes -ENODATA for queue reset and -ECANCELED or -ETIME for GPU reset, returned to amdgpu_cs_wait_ioctl. v3: To meet the requirements, we introduce a new function drm_sched_start_ex with an additional parameter to set dma_fence_set_error, allowing us to handle the specific error codes appropriately and dispose of bad jobs with the selected error code depending on whether it was a queue reset or GPU reset. v4: Alex suggested using a new name, drm_sched_start_with_recovery_error, which more accurately describes the function's purpose. Additionally, it was recommended to add documentation details about the new method. v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex) v6 (chk): rebase on upstream changes, cleanup the commit message, drop the new function again and update all callers, apply the errno also to scheduler fences with hw fences v7 (chk): rebased Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com	2024-09-06 18:05:52 +02:00
Dmitry Baryshkov	ca097d4d94	drm/display: split DSC helpers from DP helpers Currently the DRM DSC functions are selected by the DRM_DISPLAY_DP_HELPER Kconfig symbol. This is not optimal, since the DSI code (both panel and host drivers) end up selecting the seemingly irrelevant DP helpers. Split the DSC code to be guarded by the separate DRM_DISPLAY_DSC_HELPER Kconfig symbol. Reviewed-by: Jessica Zhang <quic_jesszhan@quicinc.com> Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #i915 Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240704-panel-sw43408-fix-v6-1-3ea1c94bbb9b@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	2024-09-03 00:13:44 +03:00
Alex Deucher	4de34b0478	drm/amdgpu: always allocate cleared VRAM for GEM allocations This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been working on. In addition this fixes the high CPU spikes seen when doing wipe on release. v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com> (cherry picked from commit `6c0a7c3c69`) Cc: stable@vger.kernel.org # 6.10.x	2024-09-02 13:08:51 -04:00
Jack Xiao	34c36a77f4	drm/amdgpu/mes: add mes mapping legacy queue switch For mes11 old firmware has issue to map legacy queue, add a flag to switch mes to map legacy queue. Fixes: `f9d8c5c785` ("drm/amdgpu/gfx: enable mes to map legacy queue support") Reported-by: Andrew Worsley <amworsley@gmail.com> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `52491d97aa`)	2024-09-02 13:05:39 -04:00
Alex Deucher	ead60e9c4e	drm/amdgpu/gfx10: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:40 -04:00
Alex Deucher	3f2d35c325	drm/amdgpu/gfx11: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:38 -04:00
Alex Deucher	21818f39be	drm/amdgpu/gfx12: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:34 -04:00
Alex Deucher	f8eee864ba	drm/amdgpu/gfx12: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:31 -04:00
Alex Deucher	01d05521f7	drm/amdgpu/gfx11: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:28 -04:00
Alex Deucher	bcee4c3f89	drm/amdgpu/gfx10: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:24 -04:00
Alex Deucher	1a1995b1dc	drm/amdgpu/gfx12: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:22 -04:00
Alex Deucher	01163079e1	drm/amdgpu/gfx11: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:20 -04:00
Alex Deucher	4d5ddfa4b1	drm/amdgpu/gfx10: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:16 -04:00
Jiadong Zhu	178ad0e280	drm/amdgpu/mes11: implement mmio queue reset for gfx11 Implement queue reset for graphic and compute queue. v2: use amdgpu_gfx_rlc funcs to enter/exit safe mode. v3: use gfx_v11_0_request_gfx_index_mutex() v4: fix mutex handling Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:13 -04:00
Jiadong Zhu	01b4ae38e5	drm/amdgpu/mes: implement amdgpu_mes_reset_hw_queue_mmio The reset_queue api could be used from kfd or kgd. v2: add use_mmio parameter for mes_reset_legacy_queue. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:07 -04:00
Jiadong Zhu	8b2429a13f	drm/amdgpu/mes: modify mes api for mmio queue reset Add me/pipe/queue parameters for queue reset input. v2: fix build (Alex) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:03 -04:00
Alex Deucher	8fe4fde381	drm/amdgpu/gfx12: fallback to driver reset compute queue directly Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in each function respectively. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:01 -04:00
Alex Deucher	2480599890	drm/amdgpu/gfx12: add ring reset callbacks Add ring reset callbacks for gfx and compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:58 -04:00
Alex Deucher	d1f2144321	drm/amdgpu/gfx10: rework reset sequence To match other GFX IPs. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:55 -04:00
Jiadong Zhu	097af47d3c	drm/amdgpu/gfx10: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder (Jessie) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:47 -04:00
Jiadong Zhu	2f3806f781	drm/amdgpu/gfx10: remap queue after reset successfully Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. v2: fix up error handling (Alex) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:44 -04:00
Alex Deucher	1741281a15	drm/amdgpu/gfx10: add ring reset callbacks Add ring reset callbacks for gfx and compute. v2: fix gfx handling v3: wait for KIQ to complete Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:38 -04:00
Jiadong Zhu	a10c93931b	drm/amdgpu/gfx11: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:34 -04:00
Alex Deucher	7d8e9e65f2	drm/amdgpu/gfx11: rename gfx_v11_0_gfx_init_queue() Rename to gfx_v11_0_kgq_init_queue() to better align with the other naming in the file. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:31 -04:00
Prike Liang	072b441478	drm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2) Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in each function respectively. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:21 -04:00
Alex Deucher	b3e9bfd866	drm/amdgpu/gfx11: add ring reset callbacks Add ring reset callbacks for gfx and compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:35:12 -04:00
Prike Liang	ad17b124c3	drm/amdgpu/gfx9.4.3: Implement compute pipe reset Implement the compute pipe reset, and the driver will fallback to pipe reset when queue reset fails. The pipe reset only deactivates the queue which is scheduled in the pipe, and meanwhile the MEC pipe will be reset to the firmware _start pointer. So, it seems pipe reset will cost more cycles than the queue reset; therefore, the driver tries to recover by doing queue reset first. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:32:32 -04:00
Christian Brauner	641bb4394f	fs: move FMODE_UNSIGNED_OFFSET to fop_flags This is another flag that is statically set and doesn't need to use up an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit. (1) mem_open() used from proc_mem_operations (2) adi_open() used from adi_fops (3) drm_open_helper(): (3.1) accel_open() used from DRM_ACCEL_FOPS (3.2) drm_open() used from (3.2.1) amdgpu_driver_kms_fops (3.2.2) psb_gem_fops (3.2.3) i915_driver_fops (3.2.4) nouveau_driver_fops (3.2.5) panthor_drm_driver_fops (3.2.6) radeon_driver_kms_fops (3.2.7) tegra_drm_fops (3.2.8) vmwgfx_driver_fops (3.2.9) xe_driver_fops (3.2.10) DRM_GEM_FOPS (3.2.11) DEFINE_DRM_GEM_DMA_FOPS (4) struct memdev sets fmode flags based on type of device opened. For devices using struct mem_fops unsigned offset is used. Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts into the open helper to ensure that the flag is always set. Link: https://lore.kernel.org/r/20240809-work-fop_unsigned-v1-1-658e054d893e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-30 08:22:36 +02:00
Alex Deucher	6c0a7c3c69	drm/amdgpu: always allocate cleared VRAM for GEM allocations This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been working on. In addition this fixes the high CPU spikes seen when doing wipe on release. v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com>	2024-08-29 14:56:37 -04:00
Jack Xiao	52491d97aa	drm/amdgpu/mes: add mes mapping legacy queue switch For mes11 old firmware has issue to map legacy queue, add a flag to switch mes to map legacy queue. Fixes: `f9d8c5c785` ("drm/amdgpu/gfx: enable mes to map legacy queue support") Reported-by: Andrew Worsley <amworsley@gmail.com> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:41:02 -04:00
Alex Deucher	1125f95cd2	drm/amdgpu/gfx12: return early in preempt_ib() When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:57 -04:00
Alex Deucher	1e487c9173	drm/amdgpu/gfx11: return early in preempt_ib() When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:48 -04:00
Sunil Khatri	30e8f4c2bd	drm/amdgpu: Move the dumping log out of for loop log message "Dumping IP State Completed" needs to be logged only once when state dumping is complete. Hence moving it out of the for loop. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Trigger Huang <Trigger.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:19 -04:00
Victor Zhao	af76ca8e18	drm/amd/amdgpu: move drain_workqueue before shutdown is set [background] when unloading amdgpu driver right after running a workload, drain_workqueue is causing "Fence fallback timer expired on ring sdma0.0". Under sriov, this issue will cause sriov full access timeout and a reset happening. move drain_workqueue before shutdown is set to allow ih process and before enter full access under sriov to avoid full access time cost. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:07 -04:00
Trigger Huang	c67db6a6a6	drm/amdgpu: Do core dump immediately when job tmo Do the coredump immediately after a job timeout to get a closer representation of GPU's error status. V2: This will skip printing vram_lost as the GPU reset is not happened yet (Alex) V3: Unconditionally call the core dump as we care about all the reset functions(soft-recovery and queue reset and full adapter reset, Alex) V4: Do the dump after adev->job_hang = true (Sunil) Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Acked-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:00 -04:00
Trigger Huang	6122f5c72e	drm/amdgpu: skip printing vram_lost if needed The vm lost status can only be obtained after a GPU reset occurs, but sometimes a dev core dump can be happened before GPU reset. So a new argument is added to tell the dev core dump implementation whether to skip printing the vram_lost status in the dump. And this patch is also trying to decouple the core dump function from the GPU reset function, by replacing the argument amdgpu_reset_context with amdgpu_job to specify the context for core dump. V2: Inform user if VRAM lost check is skipped so users don't assume VRAM wasn't lost (Alex) Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:38:53 -04:00
Alex Deucher	7c1a2d8aba	drm/amdgpu/gfx9: put queue resets behind a debug option Pending extended validation. Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:38:48 -04:00
Alex Deucher	a9b67c036c	drm/amdgpu: add experimental resets debug flag Add this flag to enable experimental resets for testing before they are fully validated. Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:38:36 -04:00
Likun Gao	6d5064c379	drm/amdgpu: support for gc_info table v1.3 Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `875ff9a7ee`)	2024-08-28 10:05:54 -04:00
Alex Deucher	959fc102ff	drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `40318a2406`)	2024-08-28 10:04:53 -04:00
Daniel Vetter	e55ef65510	amd-drm-next-6.12-2024-08-26: amdgpu: - SDMA devcoredump support - DCN 4.0.1 updates - DC SUBVP fixes - Refactor OPP in DC - Refactor MMHUBBUB in DC - DC DML 2.1 updates - DC FAMS2 updates - RAS updates - GFX12 updates - VCN 4.0.3 updates - JPEG 4.0.3 updates - Enable wave kill (soft recovery) for compute queues - Clean up CP error interrupt handling - Enable CP bad opcode interrupts - VCN 4.x fixes - VCN 5.x fixes - GPU reset fixes - Fix vbios embedded EDID size handling - SMU 14.x updates - Misc code cleanups and spelling fixes - VCN devcoredump support - ISP MFD i2c support - DC vblank fixes - GFX 12 fixes - PSR fixes - Convert vbios embedded EDID to drm_edid - DCN 3.5 updates - DMCUB updates - Cursor fixes - Overdrive support for SMU 14.x - GFX CP padding optimizations - DCC fixes - DSC fixes - Preliminary per queue reset infrastructure - Initial per queue reset support for GFX 9 - Initial per queue reset support for GFX 7, 8 - DCN 3.2 fixes - DP MST fixes - SR-IOV fixes - GFX 9.4.3/4 devcoredump support - Add process isolation framework - Enable process isolation support for GFX 9.4.3/4 - Take IOMMU remapping into account for P2P DMA checks amdkfd: - CRIU fixes - Improved input validation for user queues - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Initial per queue reset support for GFX 9 - Allow users to target recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking UAPI: - KFD support for targetting queues on recommended SDMA engines Proposed userspace: `2f588a2406` `eb30a5bbc7` drm/buddy: - Add start address support for trim function -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZszhcQAKCRC93/aFa7yZ 2M4ZAQD+xgIJkQ9HISQeqER5GblnfrorARd32yP/BH0c+JbGUAD9H/BIB41teZ80 vw2WTx+4TyB39awgvtpDH8iEQdkcSAE= =717w -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.12-2024-08-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.12-2024-08-26: amdgpu: - SDMA devcoredump support - DCN 4.0.1 updates - DC SUBVP fixes - Refactor OPP in DC - Refactor MMHUBBUB in DC - DC DML 2.1 updates - DC FAMS2 updates - RAS updates - GFX12 updates - VCN 4.0.3 updates - JPEG 4.0.3 updates - Enable wave kill (soft recovery) for compute queues - Clean up CP error interrupt handling - Enable CP bad opcode interrupts - VCN 4.x fixes - VCN 5.x fixes - GPU reset fixes - Fix vbios embedded EDID size handling - SMU 14.x updates - Misc code cleanups and spelling fixes - VCN devcoredump support - ISP MFD i2c support - DC vblank fixes - GFX 12 fixes - PSR fixes - Convert vbios embedded EDID to drm_edid - DCN 3.5 updates - DMCUB updates - Cursor fixes - Overdrive support for SMU 14.x - GFX CP padding optimizations - DCC fixes - DSC fixes - Preliminary per queue reset infrastructure - Initial per queue reset support for GFX 9 - Initial per queue reset support for GFX 7, 8 - DCN 3.2 fixes - DP MST fixes - SR-IOV fixes - GFX 9.4.3/4 devcoredump support - Add process isolation framework - Enable process isolation support for GFX 9.4.3/4 - Take IOMMU remapping into account for P2P DMA checks amdkfd: - CRIU fixes - Improved input validation for user queues - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Initial per queue reset support for GFX 9 - Allow users to target recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking UAPI: - KFD support for targetting queues on recommended SDMA engines Proposed userspace: `2f588a2406` `eb30a5bbc7` drm/buddy: - Add start address support for trim function From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826201528.55307-1-alexander.deucher@amd.com	2024-08-27 14:33:12 +02:00
Daniel Vetter	4461e9e5c3	Linux 6.11-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbK2B8eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFwkH/10QpUgzIfbFKbF+ 5hwcvaqS5myxWwJ4PjN0eR1qGE6RzVO0Tb24+TVql+7pxu+iWm1kYgC3+/T5xJsP ECAszdmPWSco1xaHrh2y3PyCJjaBiqFbIxdjPp7odjDpG9qarbcty8YpWs44u/gd RDXzHUuScEShBhEt0ZhvE1pIDL8jJ8JL3yqOMZ+XaDxtJbjaHw4GHp8efxlBWc8N jZKIVJi22q5NWG5T0tGtPWwzCm0ewA/JNMTEvE9leoSoAgO85NZ0ivxMC76q/tbj BrYk5KnzfhJs4b/n/KtIwWaLTgLyXKGqHMaMq8sbXtp410aUdgnRJO2cl3fI+1vc vxQfAfk= =RemI -----END PGP SIGNATURE----- Merge v6.11-rc5 into drm-next amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might as well catch up with a backmerge and handle them all. Plus both misc and intel maintainers asked for a backmerge anyway. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2024-08-27 14:09:45 +02:00
Xiaogang Chen	6ef29715ac	drm/amdkfd: Change kfd/svm page fault drain handling When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and not handle any incoming pages fault of this process until a deferred work item got executed by default system wq. The time period of "not handle page fault" can be long and is unpredicable. That is advese to kfd performance on page faults recovery. This patch uses time stamp of incoming page fault to decide to drop or recover page fault. When app unmap vm ranges kfd records each gpu device's ih ring current time stamp. These time stamps are used at kfd page fault recovery routine. Any page fault happened on unmapped ranges after unmap events is application bug that accesses vm range after unmap. It is not driver work to cover that. By using time stamp of page fault do not need drain page faults at deferred work. So, the time period that kfd does not handle page faults is reduced and can be controlled. Signed-off-by: Xiaogang.Chen <Xiaogang.Chen@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:55:13 -04:00
Likun Gao	875ff9a7ee	drm/amdgpu: support for gc_info table v1.3 Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:54:57 -04:00
Yang Wang	4416377ae1	drm/amdgpu: add list empty check to avoid null pointer issue Add list empty check to avoid null pointer issues in some corner cases. - list_for_each_entry_safe() Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:45 -04:00
Alex Deucher	40318a2406	drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:25 -04:00
Hawking Zhang	b05d6476ae	drm/amdgpu: Retire query_utcl2_poison_status callback Driver switches to interrupt source id to identify utcl2 poison event. polling interface is not needed. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:16 -04:00
Rahul Jain	75f0efbc4b	drm/amdgpu: Take IOMMU remapping into account for p2p checks when trying to enable p2p the amdgpu_device_is_peer_accessible() checks the condition where address_mask overlaps the aper_base and hence returns 0, due to which the p2p disables for this platform IOMMU should remap the BAR addresses so the device can access them. Hence check if peer_adev is remapping DMA v5: (Felix, Alex) - fixing comment as per Alex feedback - refactor code as per Felix v4: (Alex) - fix the comment and description v3: - remove iommu_remap variable v2: (Alex) - Fix as per review comments - add new function amdgpu_device_check_iommu_remap to check if iommu remap Signed-off-by: Rahul Jain <Rahul.Jain@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:05 -04:00
Alex Deucher	9cead81eff	drm/amdgpu: fix eGPU hotplug regression The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Ma Jun <Jun.Ma2@amd.com> (cherry picked from commit `c69b07f7bb`) Cc: stable@vger.kernel.org # 6.10.x	2024-08-20 23:07:11 -04:00
Candice Li	c99769bcea	drm/amdgpu: Validate TA binary size Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c0a04e3570`) Cc: stable@vger.kernel.org	2024-08-20 23:04:17 -04:00
Alex Deucher	e3e4bf58ba	drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1 The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: `a03ebf1163` ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2dc3851ef7`) Cc: stable@vger.kernel.org	2024-08-20 22:51:37 -04:00
Yang Wang	0b43312902	drm/amdgpu: fixing rlc firmware loading failure issue Skip rlc firmware validation to ignore firmware header size mismatch issues. This restores the workaround added in commit `849e133c97` ("drm/amdgpu: Fix the null pointer when load rlc firmware") Fixes: `3af2c80ae2` ("drm/amdgpu: refine gfx10 firmware loading") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `89ec85d16e`)	2024-08-20 22:51:31 -04:00
Alex Deucher	88c511dea1	drm/amd/gfx11: move the gfx mutex into the caller Otherwise we can fail to drop the software mutex when we fail to take the hardware mutex. Fixes: `76acba7b7f` ("drm/amdgpu/gfx11: add a mutex for the gfx semaphore") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Victor Zhao	bf2bc61638	drm/amd/amdgpu: allow use kiq to do hdp flush under sriov when use cpu to do page table update under sriov runtime, since mmio access is blocked, kiq has to be used to flush hdp. change WREG32_NO_KIQ to WREG32 to allow kiq. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Alex Deucher	c69b07f7bb	drm/amdgpu: fix eGPU hotplug regression The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Ma Jun <Jun.Ma2@amd.com>	2024-08-20 22:14:14 -04:00
Candice Li	c0a04e3570	drm/amdgpu: Validate TA binary size Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Mukul Joshi	ccf8ef6b75	drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11 Add implementation for MES Suspend and Resume APIs to unmap/map all queues for GFX11. Support for GFX12 will be added when the corresponding firmware support is in place. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:04 -04:00
Srinivasan Shanmugam	f846250b8a	drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_4_3 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. The commit also includes a check for the type of the ring. If the type of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the `enforce_isolation` structure in the `gfx` structure of the `amdgpu_device` is set to the `xcp_id` of the ring. This ensures that the correct `xcp_id` is used when enforcing isolation on compute rings. The `xcp_id` is an identifier for an XCP partition, and different rings can be associated with different XCP partitions. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2024-08-20 22:08:02 -04:00
Srinivasan Shanmugam	b710dbe55d	drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-20 22:07:58 -04:00
Srinivasan Shanmugam	afefd6f245	drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization This commit introduces the Enforce Isolation Handler designed to enforce shader isolation on AMD GPUs, which helps to prevent data leakage between different processes. The handler counts the number of emitted fences for each GFX and compute ring. If there are any fences, it schedules the `enforce_isolation_work` to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences, it signals the Kernel Fusion Driver (KFD) to resume the runqueue. The function is synchronized using the `enforce_isolation_mutex`. This commit also introduces a reference count mechanism (kfd_sch_req_count) to keep track of the number of requests to enable the KFD scheduler. When a request to enable the KFD scheduler is made, the reference count is decremented. When the reference count reaches zero, a delayed work is scheduled to enforce isolation after a delay of GFX_SLICE_PERIOD. When a request to disable the KFD scheduler is made, the function first checks if the reference count is zero. If it is, it cancels the delayed work for enforcing isolation and checks if the KFD scheduler is active. If the KFD scheduler is active, it sends a request to stop the KFD scheduler and sets the KFD scheduler state to inactive. Then, it increments the reference count. The function is synchronized using the kfd_sch_mutex to ensure that the KFD scheduler state and reference count are updated atomically. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:35 -04:00
Amber Lin	234eebe161	drm/amdkfd: APIs to stop/start KFD scheduling Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling. When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from runlist. If users send ioctls to KFD to create queues, they'll be added but those queues won't be mapped to runlist (so not scheduled) until amdgpu_amdkfd_start_sched is called. v2: fix build (Alex) Signed-off-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:28 -04:00
Srinivasan Shanmugam	b1f49ff9cb	drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware This commit extends the cleaner shader feature to support GFX9.4.4 hardware. The cleaner shader feature is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). This operation needs to be performed in isolation, while no other tasks should be running on the GPU at the same time. Previously, the cleaner shader feature was implemented for GFX9.4.3 hardware. This commit adds support for GFX9.4.4 hardware by allowing the cleaner shader to be used with this hardware version. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:23 -04:00
Srinivasan Shanmugam	335288315a	drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3 This commit adds the cleaner shader microcode for GFX9.4.3 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_9_4_3_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps improve GPU performance and resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. v2: fix copyright date (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:16 -04:00
Srinivasan Shanmugam	d4c3815495	drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware The patch modifies the gfx_v9_4_3_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_4_3_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v9_4_3 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v9_4_3_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:11 -04:00
Srinivasan Shanmugam	c2e70d307f	drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware The patch modifies the gfx_v9_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v9_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v9_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:07 -04:00
Srinivasan Shanmugam	22ff907d4f	drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution This commit adds the PACKET3_RUN_CLEANER_SHADER definition. This packet is a command packet used to instruct the GPU to execute the cleaner shader. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. The PACKET3_RUN_CLEANER_SHADER packet is used to trigger the execution of the cleaner shader on the GPU. The packet consists of a header followed by a RESERVED field, which is programmed to zero. When the GPU receives this packet, it fetches and executes the cleaner shader instructions from the location specified in the packet. The cleaner shader feature helps to enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:02 -04:00
Srinivasan Shanmugam	d361ad5d2f	drm/amdgpu: Add sysfs interface for running cleaner shader This patch adds a new sysfs interface for running the cleaner shader on AMD GPUs. The cleaner shader is used to clear GPU memory before it's reused, which can help prevent data leakage between different processes. The new sysfs file is write-only and is named `run_cleaner_shader`. Write the number of the partition to this file to trigger the cleaner shader on that partition. There is only one partition on GPUs which do not support partitioning. Changes made in this patch: - Added `amdgpu_set_run_cleaner_shader` function to handle writes to the `run_cleaner_shader` sysfs file. - Added `run_cleaner_shader` to the list of device attributes in `amdgpu_device_attrs`. - Updated `default_attr_update` to handle `run_cleaner_shader`. - Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device attributes. v2: fix error handling (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2024-08-20 22:06:56 -04:00
Srinivasan Shanmugam	e189be9b2e	drm/amdgpu: Add enforce_isolation sysfs attribute This commit adds a new sysfs attribute 'enforce_isolation' to control the 'enforce_isolation' setting per GPU. The attribute can be read and written, and accepts values 0 (disabled) and 1 (enabled). When 'enforce_isolation' is enabled, reserved VMIDs are allocated for each ring. When it's disabled, the reserved VMIDs are freed. The set function locks a mutex before changing the 'enforce_isolation' flag and the VMIDs, and unlocks it afterwards. This ensures that these operations are atomic and prevents race conditions and other concurrency issues. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:06:52 -04:00
Srinivasan Shanmugam	dba1a6cfc3	drm/amdgpu: Enforce isolation as part of the job This patch adds a new parameter 'enforce_isolation' to the amdgpu_job structure. This parameter is used to determine whether shader isolation should be enforced for a job. The enforce_isolation parameter is then stored in the amdgpu_job structure and used when flushing the VM. The enforce_isolation field of the amdgpu_job structure is set directly after the job is allocated This change allows more fine-grained control over shader isolation, making it possible to enforce isolation on a per-job basis rather than globally. This can be useful in scenarios where only certain jobs require isolation. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-20 22:06:43 -04:00
Victor Skvortsov	19cff16559	drm/amdgpu: abort KIQ waits when there is a pending reset Stop waiting for the KIQ to return back when there is a reset pending. It's quite likely that the KIQ will never response. Signed-off-by: Koenig Christian <Christian.Koenig@amd.com> Suggested-by: Lazar Lijo <Lijo.Lazar@amd.com> Tested-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:27:50 -04:00
Srinivasan Shanmugam	9659520419	drm/amdgpu: Make enforce_isolation setting per GPU This commit makes enforce_isolation setting to be per GPU and per partition by adding the enforce_isolation array to the adev structure. The adev variable is set based on the global enforce_isolation module parameter during device initialization. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU is used to determine whether to enforce isolation between graphics and compute processes on that GPU. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU and partition is used to determine whether to enforce isolation between graphics and compute processes on that GPU and partition. This allows the enforce_isolation setting to be controlled individually for each GPU and each partition, which is useful in a system with multiple GPUs and partitions where different isolation settings might be desired for different GPUs and partitions. v2: fix loop in amdgpu_vmid_mgr_init() (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-16 14:27:45 -04:00
Alex Deucher	ee7a846ea2	drm/amdgpu: Emit cleaner shader at end of IB submission This commit introduces the emission of a cleaner shader at the end of the IB submission process. This is achieved by adding a new function pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If the `emit_cleaner_shader` function is set in the ring functions, it is called during the VM flush process. The cleaner shader is only emitted if the `enable_cleaner_shader` flag is set in the `amdgpu_device` structure. This allows the cleaner shader emission to be controlled on a per-device basis. By emitting a cleaner shader at the end of the IB submission, we can ensure that the VM state is properly cleaned up after each submission. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-16 14:27:42 -04:00
Srinivasan Shanmugam	aec773a1fb	drm/amdgpu: Add infrastructure for Cleaner Shader feature The cleaner shader is used by the CP firmware to clean LDS and GPRs between processes on the CUs. This adds an internal API for GFX IP code to allocate and initialize the cleaner shader. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-16 14:27:34 -04:00
Alex Deucher	f49280ffd2	drm/amdgpu: handle enforce isolation on non-0 gfxhub Some chips have more than one gfxhub so check if we are a gfxhub rather than just gfxhub 0. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:27:28 -04:00
Alex Deucher	2dc3851ef7	drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1 The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: `a03ebf1163` ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:27:04 -04:00
Sunil Khatri	1a2103d685	drm/amdgpu: add vcn ip dump support for vcn_v2_6 Add support for logging the registers in devcoredump buffer for vcn_v2_6. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:57 -04:00
Sunil Khatri	bc62abe1b9	drm/amdgpu: add print support for vcn_v2_5 ip dump Add support for logging the registers in devcoredump buffer for vcn_v2_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:52 -04:00
Sunil Khatri	0eea81ee2e	drm/amdgpu: add vcn_v2_5 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v2_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:47 -04:00
Sunil Khatri	b910cacb4e	drm/amdgpu: add print support for vcn_v2_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v2_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:41 -04:00
Sunil Khatri	2239aaa204	drm/amdgpu: add vcn_v2_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v2_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:36 -04:00
Sunil Khatri	ef9f3b5fd9	drm/amdgpu: add print support for vcn_v1_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v1_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:31 -04:00
Sunil Khatri	837cc7f1bf	drm/amdgpu: add vcn_v1_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v1_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:25 -04:00
Sunil Khatri	439c3b124e	drm/amdgpu: add print support for vcn_v4_0_5 ip dump Add support for logging the registers in devcoredump buffer for vcn_v4_0_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:19 -04:00
Sunil Khatri	3a50a51d04	drm/amdgpu: add print support for vcn_v4_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v4_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:12 -04:00
Sunil Khatri	dc57edda81	drm/amdgpu: add print support for vcn_v4_0_3 ip dump Add support for logging the registers in devcoredump buffer for vcn_v4_0_3. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:06 -04:00
Sunil Khatri	46553db49c	drm/amdgpu: add vcn_v4_0_5 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v4_0_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:59 -04:00
Sunil Khatri	9d87dac3f9	drm/amdgpu: add vcn_v4_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v4_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:53 -04:00
Sunil Khatri	8962915044	drm/amdgpu: add vcn_v4_0_3 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v4_0_3. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:30 -04:00
Sunil Khatri	f3c958ab85	drm/amdgpu: add print support for vcn_v5_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v5_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:12 -04:00
Alex Deucher	32aada4d0a	drm/amdgpu/mes12: add API for user queue reset Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:09 -04:00
Alex Deucher	d4f1fde734	drm/amdgpu/mes11: add API for user queue reset Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:06 -04:00
Alex Deucher	5b7a59de48	drm/amdgpu/mes: add API for user queue reset Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:02 -04:00
Alex Deucher	478efcb90b	drm/amdgpu/gfx11: export gfx_v11_0_request_gfx_index_mutex() It will be used by the queue reset code. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:56 -04:00
Alex Deucher	76acba7b7f	drm/amdgpu/gfx11: add a mutex for the gfx semaphore This will be used in more places in the future so add a mutex. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:33 -04:00
Alex Deucher	b5be054c58	drm/amdgpu/gfx11: enter safe mode before touching CP_INT_CNTL Need to enter safe mode before touching GC MMIO. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:23 -04:00
Alex Deucher	d479158f65	drm/amdgpu/gfx7: add ring reset callback for gfx Add ring reset callback for gfx. v2: fix operator precedence (kernel test robot) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:18 -04:00
Alex Deucher	4af8071b65	drm/amdgpu/gfx8: add ring reset callback for gfx Add ring reset callback for gfx. v2: fix operator precedence (kernel test robot) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:09 -04:00
Sunil Khatri	f685b38455	drm/amdgpu: add vcn_v5_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v5_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:04 -04:00
Sunil Khatri	6d88c0f94a	drm/amdgpu: add print support for vcn_v3_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v3_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:23:59 -04:00
Sunil Khatri	ab10f77487	drm/amdgpu: add vcn_v3_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v3_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:23:53 -04:00
Sunil Khatri	27a74c125d	drm/amdgpu: add vcn ip dump ptr in vcn global struct Add pointer to the vcn ip dump in the vcn global structure to be accessible for all vcn version via global adev. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:23:45 -04:00
Zhang Zekun	20588d5afc	drm/amd: Remove unused declarations amdgpu_gart_table_vram_pin() and amdgpu_gart_table_vram_unpin() has been removed since commit `575e55ee4f` ("drm/amdgpu: recover gart table at resume") remain the declarations untouched in the header files. Besides, amdgpu_dm_display_resume() has also beed removed since commit `a80aa93de1` ("drm/amd/display: Unify dm resume sequence into a single call"). So, let's remove this unused declarations. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:23:16 -04:00
Yang Wang	89ec85d16e	drm/amdgpu: fixing rlc firmware loading failure issue Skip rlc firmware validation to ignore firmware header size mismatch issues. This restores the workaround added in commit `849e133c97` ("drm/amdgpu: Fix the null pointer when load rlc firmware") Fixes: `3af2c80ae2` ("drm/amdgpu: refine gfx10 firmware loading") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:19:16 -04:00
Sunil Khatri	0f2c243dbf	drm/amdgpu: remove ME0 registers from mi300 dump Remove ME0 registers from MI300 gfx_9_4_3 ipdump MI300 does not have gfx ME and hence those register are just empty one and could be dropped. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:54 -04:00
Alex Deucher	3ec2ad7c34	drm/amdgpu/gfx9: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:51 -04:00
Alex Deucher	d082e5cde4	drm/amdgpu/gfx9.4.3: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:46 -04:00
Alex Deucher	a48f31fb78	drm/amdgpu/gfx9.4.3: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:44 -04:00
Alex Deucher	27ef61f961	drm/amdgpu/gfx9: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:41 -04:00
Alex Deucher	c4f503551f	drm/amdgpu/gfx9: add ring reset callback for gfx Add ring reset callback for gfx. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:38 -04:00
Alex Deucher	31ef969301	drm/amdgpu/gfx9: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:35 -04:00
Jiadong Zhu	4dc4422f11	drm/amdgpu/gfx9.4.3: implement reset_hw_queue for gfx9.4.3 Using mmio to do queue reset. Enter safe mode before writing mmio registers. v2: set register instance offset according to xcc id. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:30 -04:00
Jiadong Zhu	2e9bbdd7b7	drm/amdgpu/gfx9: implement reset_hw_queue for gfx9 Using mmio to do queue reset. Enter safe mode when writing registers. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:27 -04:00
Jiadong Zhu	186020c166	drm/amdgpu/gfx: add a new kiq_pm4_funcs callback for reset_hw_queue Add reset_hw_queue in kiq_pm4_funcs callbacks. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:25 -04:00
Jiadong Zhu	4c953e53cc	drm/amdgpu/gfx_9.4.3: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:22 -04:00
Jiadong Zhu	6f38589e17	drm/amdgpu/gfx9.4.3: remap queue after reset successfully Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:19 -04:00
Alex Deucher	5d0112f777	drm/amdgpu/gfx9.4.3: add ring reset callback Add ring reset callback for compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:14 -04:00
Jiadong Zhu	fdbd69486b	drm/amdgpu/gfx9: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:06 -04:00
Jiadong Zhu	b5e1a3874f	drm/amdgpu/gfx9: remap queue after reset successfully Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:03 -04:00
Alex Deucher	5fb4d2a771	drm/amdgpu/gfx9: add ring reset callback Add ring reset callback for compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:00 -04:00
Prike Liang	fb0a5834a3	drm/amdgpu: increase the reset counter for the queue reset Update the reset counter for the amdgpu_cs_query_reset_state() Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:56 -04:00
Alex Deucher	15789fa0f0	drm/amdgpu: add per ring reset support (v5) If a specific job is hung, try and reset just the ring associated with the job. v2: move to amdgpu_job.c v3: fix drm_sched_stop() handling when ring reset fails v4: drop unnecessary amdgpu_fence_driver_clear_job_fences() and drm_sched_increase_karma() v5: rework sched_stop handling Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:52 -04:00
Alex Deucher	57a372f676	drm/amdgpu: add new ring reset callback Use this to reset just a single ring. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:40 -04:00
Soham Dandapat	406792dc2a	drm/amdgpu: Return earlier in amdgpu_sw_ring_ib_end if mcbp is off As we don't trigger preemption is sw ring muxer when mcbp is disabled,so return earlier in amdgpu_sw_ring_ib_end function if mcbp is disabled ,not required to call amdgpu_ring_mux_end_ib Signed-off-by: Soham Dandapat <sdandapa@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:31 -04:00
Sunil Khatri	37ee145623	drm/amdgpu: add cp queue registers print for gfx9_4_3 Add gfx9_4_3 print support of CP queue registers for all queues to be used by devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:26 -04:00
Sunil Khatri	f9e491c863	drm/amdgpu: add cp queue registers for gfx9_4_3 ipdump Add gfx9 support of CP queue registers for all queues to be used by devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:14 -04:00
Thomas Zimmermann	ddda6542c8	drm/amdgpu: Use backlight power constants Replace FB_BLANK_ constants with their counterparts from the backlight subsystem. The values are identical, so there's no change in functionality or semantics. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240731122311.1143153-2-tzimmermann@suse.de	2024-08-16 09:27:15 +02:00
Kenneth Feng	23acd1f344	drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1 add HDP_SD support on gc 12.0.0/1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `61cffacb3a`)	2024-08-13 13:20:43 -04:00
Yinjie Yao	507a2286c0	drm/amdgpu: Update kmd_fw_shared for VCN5 kmd_fw_shared changed in VCN5 Signed-off-by: Yinjie Yao <yinjie.yao@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `aa02486fb1`)	2024-08-13 13:20:36 -04:00
David (Ming Qiang) Wu	470516c292	drm/amd/amdgpu: command submission parser for JPEG Add JPEG IB command parser to ensure registers in the command are within the JPEG IP block. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `a7f670d5d8`) Cc: stable@vger.kernel.org	2024-08-13 13:17:36 -04:00
Jack Xiao	4246b1077f	drm/amdgpu/mes12: fix suspend issue Use mes pipe to unmap kcq and kgq. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f7fb9d677f`)	2024-08-13 13:17:36 -04:00
Jack Xiao	af401543df	drm/amdgpu/mes12: sw/hw fini for unified mes Free memory for two pipes and unmap pipe0 via pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `98cae695a8`)	2024-08-13 13:17:36 -04:00
Jack Xiao	7254027e1e	drm/amdgpu/mes12: configure two pipes hardware resources Configure two pipes with different hardware resources. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ea5d6db17a`)	2024-08-13 13:17:36 -04:00
Jack Xiao	1097727d6d	drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes Adjust mes12 sw/hw initiailization for both pipe0 and pipe1 enablement. The two pipes are almost identical pipe. Pipe0 behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `aa539da8af`)	2024-08-13 13:17:36 -04:00
Jack Xiao	3738a7f0dd	drm/amdgpu/mes12: add mes pipe switch support Add mes pipe switch to let caller choose pipe to submit packet. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b2dee0837a`)	2024-08-13 13:17:36 -04:00
Jack Xiao	a13d91bf3c	drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1 Enable unified mes firmware to load on pipe0 and pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e69c2dd753`)	2024-08-13 13:17:36 -04:00
Jack Xiao	2029b3d7e1	drm/amdgpu/mes: add multiple mes ring instances support Add multiple mes ring instances in mes structure to support multiple mes pipes. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c7d4355648`)	2024-08-13 13:04:48 -04:00
Bas Nieuwenhuizen	0573a1e2ea	drm/amdgpu: Actually check flags for all context ops. Missing validation ... Checked libdrm and it clears all the structs, so we should be safe to just check everything. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c6b86421f1`) Cc: stable@vger.kernel.org	2024-08-13 13:03:57 -04:00
Alex Deucher	e6c6bd6253	drm/amdgpu/jpeg4: properly set atomics vmid field This needs to be set as well if the IB uses atomics. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c6c2e8b6a4`) Cc: stable@vger.kernel.org	2024-08-13 13:03:11 -04:00
Alex Deucher	e414a304f2	drm/amdgpu/jpeg2: properly set atomics vmid field This needs to be set as well if the IB uses atomics. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `35c628774e`) Cc: stable@vger.kernel.org	2024-08-13 13:02:48 -04:00
Jack Xiao	11752c013f	drm/amdgpu/mes: fix mes ring buffer overflow wait memory room until enough before writing mes packets to avoid ring buffer overflow. v2: squash in sched_hw_submission fix Fixes: `de32462541` ("drm/amdgpu: cleanup MES11 command submission") Fixes: `fffe347e14` ("drm/amdgpu: cleanup MES12 command submission") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `34e087e892`) Cc: stable@vger.kernel.org	2024-08-13 12:50:01 -04:00
Sunil Khatri	b232c4a63a	drm/amdgpu: add print support for gfx9_4_3 ipdump Add support of gfx9_4_3 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
Sunil Khatri	1091796fb1	drm/amdgpu: add gfx9_4_3 register support in ipdump Add general registers of gfx9_4_3 in ipdump for devcoredump support. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
David (Ming Qiang) Wu	6a28a072d9	drm/amd/amdgpu: cleanup parse_cs callbacks Because gpu_addr is updated in the calling routine (amdgpu_cs_patch_ibs()),it is removed in the callback. Use .patch_cs_in_place instead of .parse_cs for amdgpu_vce_ring_parse_cs_vm() as there is no need for keeping a temporary IB, therefore ib->sa_bo is NULL and amdgpu_ib_free() is removed. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
David (Ming Qiang) Wu	a7f670d5d8	drm/amd/amdgpu: command submission parser for JPEG Add JPEG IB command parser to ensure registers in the command are within the JPEG IP block. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
Jack Xiao	f7fb9d677f	drm/amdgpu/mes12: fix suspend issue Use mes pipe to unmap kcq and kgq. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
Jack Xiao	98cae695a8	drm/amdgpu/mes12: sw/hw fini for unified mes Free memory for two pipes and unmap pipe0 via pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
Jack Xiao	ea5d6db17a	drm/amdgpu/mes12: configure two pipes hardware resources Configure two pipes with different hardware resources. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Jack Xiao	aa539da8af	drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes Adjust mes12 sw/hw initiailization for both pipe0 and pipe1 enablement. The two pipes are almost identical pipe. Pipe0 behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Jack Xiao	b2dee0837a	drm/amdgpu/mes12: add mes pipe switch support Add mes pipe switch to let caller choose pipe to submit packet. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Victor Skvortsov	9e823f3070	drm/amdgpu: Block MMR_READ IOCTL in reset Register access from userspace should be blocked until reset is complete. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Jonathan Kim	a85c3db6b3	drm/amdkfd: fallback to pipe reset on queue reset fail for gfx9 If queue reset fails, tell the CP to reset the pipe. Since queues multiplex context per pipe and we've issued a device wide preemption prior to the hang, we can assume the hung pipe only has one queue to reset on pipe reset. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Lijo Lazar	9c081c11c6	drm/amdgpu: Reorder to read EFI exported ROM first On EFI BIOSes, PCI ROM may be exported through EFI_PCI_IO_PROTOCOL and expansion ROM BARs may not be enabled. Choose to read from EFI exported ROM data before reading PCI Expansion ROM BAR. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Jack Xiao	e69c2dd753	drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1 Enable unified mes firmware to load on pipe0 and pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Victor Skvortsov	f83cec3b3a	drm/amdgpu: Disable dpm_enabled flag while VF is in reset VFs do not perform HW fini/suspend in FLR, so the dpm_enabled is incorrectly kept enabled. Add interface to disable it in virt_pre_reset call. v2: Made implementation generic for all asics v3: Re-order conditionals so PP_MP1_STATE_FLR is only evaluated on VF Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Victor Skvortsov	35c7152202	Revert "drm/amdgpu: Extend KIQ reg polling wait for VF" KIQ timeouts no longer seen. This reverts commit `3a19a8af64`. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Yinjie Yao	aa02486fb1	drm/amdgpu: Update kmd_fw_shared for VCN5 kmd_fw_shared changed in VCN5 Signed-off-by: Yinjie Yao <yinjie.yao@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Kenneth Feng	61cffacb3a	drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1 add HDP_SD support on gc 12.0.0/1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:51 -04:00
Victor Zhao	ef6c2cb349	drm/amd/sriov: extend NV_MAILBOX_POLL_MSG_TIMEDOUT on MI300/MI308 UBB products, when doing mode1 reset, since 1 gpu need to wait all 8 gpus finish mode1 reset and then do re-init. As observed, sometimes the gpu which triggered the reset need to wait 15s for all gpus to finish. If poll msg timeout, guest driver will send the reset message again, and may mess up the following reinit sequence on other gpus. So extend the time to cover the maximum time needed to recover. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:51 -04:00
Sunil Khatri	9b7e697839	drm/amdgpu: fix ptr check warning in gfx12 ip_dump Change condition, if (ptr == NULL) to if (!ptr) for a better format and fix the warning. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:34:51 -04:00
Sunil Khatri	bd15f805cd	drm/amdgpu: fix ptr check warning in gfx11 ip_dump Change condition, if (ptr == NULL) to if (!ptr) for a better format and fix the warning. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:34:31 -04:00
Sunil Khatri	98df5a7732	drm/amdgpu: fix ptr check warning in gfx10 ip_dump Change condition, if (ptr == NULL) to if (!ptr) for a better format and fix the warning. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:34:23 -04:00
Sunil Khatri	07f4f9c00e	drm/amdgpu: fix ptr check warning in gfx9 ip_dump Change if (ptr == NULL) to if (!ptr) for a better format and fix the warning. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:33:07 -04:00
Sunil Khatri	311f2b5874	Revert "drm/amdgpu: add vcn ip dump ptr in vcn global struct" This reverts commit `f3392e662e`. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:30:49 -04:00
Sunil Khatri	434b3554d6	Revert "drm/amdgpu: add vcn_v3_0 ip dump support" This reverts commit `58d283801d`. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:30:24 -04:00
Sunil Khatri	2f93ec07ab	Revert "drm/amdgpu: add print support for vcn_v3_0 ip dump" This reverts commit `cd162ae9bc`. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:29:32 -04:00
Jack Xiao	c7d4355648	drm/amdgpu/mes: add multiple mes ring instances support Add multiple mes ring instances in mes structure to support multiple mes pipes. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:29:25 -04:00
Sunil Khatri	3df3433414	Revert "drm/amdgpu: add vcn_v5_0 ip dump support" This reverts commit `a46a7bef7d`. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:28:51 -04:00
Sunil Khatri	a46a7bef7d	drm/amdgpu: add vcn_v5_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v5_0. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:28:13 -04:00
Alex Deucher	947c080869	drm/amdgpu/mes12: add API for legacy queue reset Add API for resetting kernel queues. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:28:11 -04:00
Alex Deucher	45a2a45143	drm/amdgpu/mes11: add API for legacy queue reset Add API for resetting kernel queues. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:28:07 -04:00
Alex Deucher	c30fb344a2	drm/amdgpu/mes: add API for legacy queue reset Add API for resetting kernel queues. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:27:59 -04:00
Bas Nieuwenhuizen	c6b86421f1	drm/amdgpu: Actually check flags for all context ops. Missing validation ... Checked libdrm and it clears all the structs, so we should be safe to just check everything. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:27:22 -04:00
Arnd Bergmann	020620424b	drm/amd: Use a constant format string for amdgpu_ucode_request Multiple files in amdgpu call amdgpu_ucode_request() with a fw_name variable that the compiler cannot check for being a valid format string, as seen by enabling the (default-disabled) -Wformat-security option: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c: In function 'amdgpu_mes_init_microcode': drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1517:61: error: format not a string literal and no format arguments [-Werror=format-security] 1517 \| r = amdgpu_ucode_request(adev, &adev->mes.fw[pipe], fw_name); \| ^~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c: In function 'amdgpu_uvd_sw_init': drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:9: error: format not a string literal and no format arguments [-Werror=format-security] 263 \| r = amdgpu_ucode_request(adev, &adev->uvd.fw, fw_name); \| ^ drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c: In function 'amdgpu_vce_sw_init': drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:161:9: error: format not a string literal and no format arguments [-Werror=format-security] 161 \| r = amdgpu_ucode_request(adev, &adev->vce.fw, fw_name); \| ^ drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c: In function 'amdgpu_umsch_mm_init_microcode': drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c:590:9: error: format not a string literal and no format arguments [-Werror=format-security] 590 \| r = amdgpu_ucode_request(adev, &adev->umsch_mm.fw, fw_name); \| ^ drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c: In function 'amdgpu_cgs_get_firmware_info': drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c:417:72: error: format not a string literal and no format arguments [-Werror=format-security] 417 \| err = amdgpu_ucode_request(adev, &adev->pm.fw, fw_name); \| ^~~~~~~ drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: In function 'load_dmcu_fw': drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2221:9: error: format not a string literal and no format arguments [-Werror=format-security] 2221 \| r = amdgpu_ucode_request(adev, &adev->dm.fw_dmcu, fw_name_dmcu); \| ^ drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: In function 'dm_init_microcode': drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5147:9: error: format not a string literal and no format arguments [-Werror=format-security] 5147 \| r = amdgpu_ucode_request(adev, &adev->dm.dmub_fw, fw_name_dmub); \| ^ Change these all to use a "%s" format with the actual name as an argument, to let the compiler prove this to be correct. Fixes: `e5a7d047f4` ("drm/amd: Use `amdgpu_ucode_` helpers for CGS") Fixes: `52215e2a5d` ("drm/amd: Use `amdgpu_ucode_` helpers for VCE") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:27:03 -04:00
Tobias Jakobi	8641b81739	drm/amd: Make amd_ip_funcs static for SDMA v5.2 The struct can be static, as it is only used in this translation unit. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:26:59 -04:00
Tobias Jakobi	9a12b1c7a0	drm/amd: Make amd_ip_funcs static for SDMA v5.0 The struct can be static, as it is only used in this translation unit. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:26:53 -04:00
WangYuli	0cee47cde4	drm/amd/amdgpu: Properly tune the size of struct The struct assertion is failed because sparse cannot parse `#pragma pack(push, 1)` and `#pragma pack(pop)` correctly. GCC's output is still 1-byte-aligned. No harm to memory layout. The error can be filtered out by sparse-diff, but sometimes multiple lines queezed into one, making the sparse-diff thinks its a new error. I'm trying to aviod this by fixing errors. Link: https://lore.kernel.org/all/20230620045919.492128-1-suhui@nfschina.com/ Link: https://lore.kernel.org/all/93d10611-9fbb-4242-87b8-5860b2606042@suswa.mountain/ Fixes: `1721bc1b2a` ("drm/amdgpu: Update VF2PF interface") Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: wenlunpeng <wenlunpeng@uniontech.com> Reported-by: Su Hui <suhui@nfschina.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:25:49 -04:00
Alex Deucher	c6c2e8b6a4	drm/amdgpu/jpeg4: properly set atomics vmid field This needs to be set as well if the IB uses atomics. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:25:45 -04:00
Alex Deucher	35c628774e	drm/amdgpu/jpeg2: properly set atomics vmid field This needs to be set as well if the IB uses atomics. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:25:16 -04:00
Thomas Zimmermann	7a26f18119	drm/amdgpu: Do not set struct drm_driver.lastclose Remove the implementation of struct drm_driver.lastclose. The hook was only necessary before in-kernel DRM clients existed, but is now obsolete. The code in amdgpu_driver_lastclose_kms() is performed by drm_lastclose(). v2: - update commit message Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240812083000.337744-3-tzimmermann@suse.de	2024-08-13 16:21:08 +02:00
Jack Xiao	34e087e892	drm/amdgpu/mes: fix mes ring buffer overflow wait memory room until enough before writing mes packets to avoid ring buffer overflow. v2: squash in sched_hw_submission fix Fixes: `de32462541` ("drm/amdgpu: cleanup MES11 command submission") Fixes: `fffe347e14` ("drm/amdgpu: cleanup MES12 command submission") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 09:57:52 -04:00
Al Viro	1da91ea87a	introduce fd_file(), convert all accessors to it. For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-08-12 22:00:43 -04:00
Daniel Vetter	4e996697a4	drm-misc-next for v6.12: UAPI Changes: - remove Power Saving Policy property Core Changes: - update connector documentation CI: - add tests for mediatek, meson, rockchip Driver Changes: amdgpu: - revert support for Power Saving Policy property bridge: - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR mgag200: - transparently support BMC outputs omapdrm: - use common helper for_each_endpoint_of_node() panel: - panel-edp: fix name for HKC MB116AN01 vkms: - clean up endianess warnings -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAma1wMsACgkQaA3BHVML eiM+/Af9Fuq5tLo9MlRMDS5kF41UJhEiqd805tlz0mofoIbsbynKIxbTD+sCrXxZ mzWX5lVO6tPPudtYbHX9tkD4eV8neiI9QTuTPiBgP4oRpPlJbnJzuG77/qG3gPIe L+ByiEx1e88A7BY0rY/xjBAKirYaQfVkGouD6m3NgZasiop0Bie3hTDtfPoRYh5u odsEHvrOf4tFOTjEZI9/KlEmW2lBziydTtm2yIiFyQIAy0O+FZIfDtMRjpo3ivKR wfPm15SleKGuLvDL8QsBNl+HshFywqn489yvbp0g56aRfIeQgNFF5g9Aj1+DXniE C/Dm7GDQhK98Wso3yECoK0/AuMHBmA== =JEjA -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2024-08-09' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: - remove Power Saving Policy property Core Changes: - update connector documentation CI: - add tests for mediatek, meson, rockchip Driver Changes: amdgpu: - revert support for Power Saving Policy property bridge: - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR mgag200: - transparently support BMC outputs omapdrm: - use common helper for_each_endpoint_of_node() panel: - panel-edp: fix name for HKC MB116AN01 vkms: - clean up endianess warnings Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240809071241.GA222501@localhost.localdomain	2024-08-09 10:41:59 +02:00
Daniel Vetter	91dae758bd	drm-misc-next for v6.12: UAPI Changes: virtio: - Define DRM capset Cross-subsystem Changes: dma-buf: - heaps: Clean up documentation printk: - Pass description to kmsg_dump() Core Changes: CI: - Update IGT tests - Point upstream repo to GitLab instance modesetting: - Introduce Power Saving Policy property for connectors - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support panic: - Avoid build-time interference with framebuffer console docs: - Document Colorspace property scheduler: - Remove full_recover from drm_sched_start TTM: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory Driver Changes: amdgpu: - Support Power Saving Policy connector property ast: - astdp: Support AST2600 with VGA; Clean up HPD bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable gma500: - Update i2c terminology ivpu: - Add MODULE_FIRMWARE() lcdif: - Fix pixel clock loongson: - Use GEM refcount over TTM's mgag200: - Improve BMC handling - Support VBLANK intterupts nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's panel: - Shutdown fixes plus documentation - Refactor several drivers for better code sharing - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing sti: - Fix module owner stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - Call drm_atomic_helper_shutdown() v3d: - Clean up perfmon vkms: - Clean up -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmareygACgkQaA3BHVML eiO2vwf9FirbMiq4lfHzgcbNIU1dTUtjRAZjrlwGmqk5cb9lUshAMCMBMOEQBDdg XMQQj/RMBvRUuxzsPGk78ObSz5FBaBLgKwFprer0V6uslQaJxj4YRsnkp0l2n+0k +ebhfo2rUgZOdgNOkXH326w9UhqiydIa7GaA2aq1vUzXKFDfvGXtSN75BMlEWlKP rTft56AiwjwcKu7zYFHGlFUMSNpKAQy7lnV3+dBXAfFNHu4zVNoI/yWGEOdR7eVo WhiEcpvismsOh+BfUvMNPP3RKwjXHdwMlJYb+v9XGgH27hqc50lSceWydHtoJTto DTXF9WQhJ+/GQR9ZGmBjos9GVbECDA== =L/1W -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2024-08-01' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: virtio: - Define DRM capset Cross-subsystem Changes: dma-buf: - heaps: Clean up documentation printk: - Pass description to kmsg_dump() Core Changes: CI: - Update IGT tests - Point upstream repo to GitLab instance modesetting: - Introduce Power Saving Policy property for connectors - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support panic: - Avoid build-time interference with framebuffer console docs: - Document Colorspace property scheduler: - Remove full_recover from drm_sched_start TTM: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory Driver Changes: amdgpu: - Support Power Saving Policy connector property ast: - astdp: Support AST2600 with VGA; Clean up HPD bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable gma500: - Update i2c terminology ivpu: - Add MODULE_FIRMWARE() lcdif: - Fix pixel clock loongson: - Use GEM refcount over TTM's mgag200: - Improve BMC handling - Support VBLANK intterupts nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's panel: - Shutdown fixes plus documentation - Refactor several drivers for better code sharing - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing sti: - Fix module owner stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - Call drm_atomic_helper_shutdown() v3d: - Clean up perfmon vkms: - Clean up Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box	2024-08-08 18:58:46 +02:00
Arunpravin Paneer Selvam	6ad9dafba1	drm/amdgpu: Add DCC GFX12 flag to enable address alignment We require this flag AMDGPU_GEM_CREATE_GFX12_DCC or any other kernel level GFX12 DCC flag to differentiate the DCC buffers and other pinned display buffers(which has TTM_PL_FLAG_CONTIGUOUS enabled). If we use the TTM_PL_FLAG_CONTIGUOUS flag for DCC buffers, we may over allocate for all the pinned display buffers unnecessarily that leads to memory allocation failure. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `46142cc1b9`)	2024-08-07 18:23:59 -04:00
Frank Min	7fc5f252c0	drm/amdgpu: correct sdma7 max dw correct sdma7 max dw into 8 Signed-off-by: Frank Min <Frank.Min@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `86598c3819`)	2024-08-07 18:23:49 -04:00
Arunpravin Paneer Selvam	4a5ad08f53	drm/amdgpu: Add address alignment support to DCC buffers Add address alignment support to the DCC VRAM buffers. v2: - adjust size based on the max_texture_channel_caches values only for GFX12 DCC buffers. - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only for DCC buffers. - roundup non power of two DCC buffer adjusted size to nearest power of two number as the buddy allocator does not support non power of two alignments. This applies only to the contiguous DCC buffers. v3:(Alex) - rewrite the max texture channel caches comparison code in an algorithmic way to determine the alignment size. v4:(Alex) - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c and add a new gmc func callback for dcc alignment. If the callback is non-NULL, call it to get the alignment, otherwise, use the default. v5:(Alex) - Set the Alignment to a default value if the callback doesn't exist. - Add the callback to amdgpu_gmc_funcs. v6: - Fix checkpatch warning reported by Intel CI. v7:(Christian) - remove the AMDGPU_GEM_CREATE_GFX12_DCC flag and keep a flag that checks the BO pinning and for a specific hw generation. v8:(Christian) - move this check into gmc_v12_0_get_dcc_alignment. v9: - Fix 32bit build errors Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `aa94b623cb`)	2024-08-07 18:23:42 -04:00
Frank Min	5d687a67fd	drm/amdgpu: change non-dcc buffer copy configuration Without setting cpv bit and 7th ib dw, non-dcc buffer copy will have random corruption So set the cpv bit and clear the 7th ib dw for copy non-dcc buffers Signed-off-by: Frank Min <Frank.Min@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5aacf8917f`)	2024-08-07 18:21:05 -04:00
Joshua Ashton	829798c789	drm/amdgpu: Forward soft recovery errors to userspace As we discussed before[1], soft recovery should be forwarded to userspace, or we can get into a really bad state where apps will keep submitting hanging command buffers cascading us to a hard reset. 1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/ Signed-off-by: Joshua Ashton <joshua@froggi.es> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `434967aadb`) Cc: stable@vger.kernel.org	2024-08-07 18:20:07 -04:00
Likun Gao	8ff3bb44cc	drm/amdgpu: add golden setting for gc v12 Adding Manual GDB golden setting for gc v12 revision 0 ASIC. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c9875d0a78`)	2024-08-07 18:19:38 -04:00
Likun Gao	aa5c9701eb	drm/amdgpu: force to use legacy inv in mmhub MMHUB v4.1.0 only support fixed cache mode, so only use legacy invalidation accordingly. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9192c7613c`)	2024-08-07 18:16:38 -04:00
Arunpravin Paneer Selvam	46142cc1b9	drm/amdgpu: Add DCC GFX12 flag to enable address alignment We require this flag AMDGPU_GEM_CREATE_GFX12_DCC or any other kernel level GFX12 DCC flag to differentiate the DCC buffers and other pinned display buffers(which has TTM_PL_FLAG_CONTIGUOUS enabled). If we use the TTM_PL_FLAG_CONTIGUOUS flag for DCC buffers, we may over allocate for all the pinned display buffers unnecessarily that leads to memory allocation failure. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:03 -04:00
Tim Huang	92549780e3	drm/amdgpu: fix unchecked return value warning for amdgpu_atombios This resolves the unchecded return value warning reported by Coverity. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:03 -04:00
Tim Huang	c0277b9d7c	drm/amdgpu: fix unchecked return value warning for amdgpu_gfx This resolves the unchecded return value warning reported by Coverity. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:03 -04:00
Frank Min	86598c3819	drm/amdgpu: correct sdma7 max dw correct sdma7 max dw into 8 Signed-off-by: Frank Min <Frank.Min@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:03 -04:00
Arunpravin Paneer Selvam	aa94b623cb	drm/amdgpu: Add address alignment support to DCC buffers Add address alignment support to the DCC VRAM buffers. v2: - adjust size based on the max_texture_channel_caches values only for GFX12 DCC buffers. - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only for DCC buffers. - roundup non power of two DCC buffer adjusted size to nearest power of two number as the buddy allocator does not support non power of two alignments. This applies only to the contiguous DCC buffers. v3:(Alex) - rewrite the max texture channel caches comparison code in an algorithmic way to determine the alignment size. v4:(Alex) - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c and add a new gmc func callback for dcc alignment. If the callback is non-NULL, call it to get the alignment, otherwise, use the default. v5:(Alex) - Set the Alignment to a default value if the callback doesn't exist. - Add the callback to amdgpu_gmc_funcs. v6: - Fix checkpatch warning reported by Intel CI. v7:(Christian) - remove the AMDGPU_GEM_CREATE_GFX12_DCC flag and keep a flag that checks the BO pinning and for a specific hw generation. v8:(Christian) - move this check into gmc_v12_0_get_dcc_alignment. v9: - Fix 32bit build errors Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:03 -04:00
Frank Min	5aacf8917f	drm/amdgpu: change non-dcc buffer copy configuration Without setting cpv bit and 7th ib dw, non-dcc buffer copy will have random corruption So set the cpv bit and clear the 7th ib dw for copy non-dcc buffers Signed-off-by: Frank Min <Frank.Min@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:02 -04:00
Tao Zhou	b3a3c9a6b2	drm/amdgpu: report bad status in GPU recovery Instead of printing GPU reset failed. v2: add check for reset_context->src. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:02 -04:00
Tao Zhou	dd3e296289	drm/amdgpu: update bad state check in GPU recovery Return RMA status without message print. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:02 -04:00
Joshua Ashton	434967aadb	drm/amdgpu: Forward soft recovery errors to userspace As we discussed before[1], soft recovery should be forwarded to userspace, or we can get into a really bad state where apps will keep submitting hanging command buffers cascading us to a hard reset. 1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/ Signed-off-by: Joshua Ashton <joshua@froggi.es> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:01 -04:00
Yang Wang	671af06690	drm/amdgpu: remove RAS unused paramter 'err_addr' - amdgpu_ras_error_statistic_ue_count() - amdgpu_ras_error_statistic_ce_count() - amdgpu_ras_error_statistic_de_count() The parameter 'err_addr' is no longer used since following patch. Fixes: `a7e8467fbe` ("drm/amdgpu: Remove unused code") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:01 -04:00
Likun Gao	c9875d0a78	drm/amdgpu: add golden setting for gc v12 Adding Manual GDB golden setting for gc v12 revision 0 ASIC. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:01 -04:00
Tao Zhou	792be2e23a	drm/amdgpu: create function to check RAS RMA status In the convenience of calling it globally. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:01 -04:00
Sunil Khatri	62341f7bc2	drm/amdgpu: optimize the padding for gfx_v9_4_3 Adding NOP packets one by one in the ring does not use the CP efficiently. Solution: Use CP optimization while adding NOP packet's so PFP can discard NOP packets based on information of count from the Header instead of fetching all NOP packets one by one. Reviewed-by: Christian König <christian.koenig@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Tvrtko Ursulin <tursulin@igalia.com> Cc: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:00 -04:00
Sunil Khatri	dee44a7cb5	drm/amdgpu: optimize the padding for gfx9 Adding NOP packets one by one in the ring does not use the CP efficiently. Solution: Use CP optimization while adding NOP packet's so PFP can discard NOP packets based on information of count from the Header instead of fetching all NOP packets one by one. Reviewed-by: Christian König <christian.koenig@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Tvrtko Ursulin <tursulin@igalia.com> Cc: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:11:00 -04:00
Tvrtko Ursulin	bb670c31e1	drm/amdpgu: Micro-optimise amdgpu_ring_commit For some value of optimisation we can replace the division with an bitwise and. And it even shrinks the code. Before: 6c9: 53 push %rbx 6ca: 4c 8b 47 08 mov 0x8(%rdi),%r8 6ce: 31 d2 xor %edx,%edx 6d0: 48 89 fb mov %rdi,%rbx 6d3: 8b 87 c8 05 00 00 mov 0x5c8(%rdi),%eax 6d9: 41 8b 48 04 mov 0x4(%r8),%ecx 6dd: f7 d0 not %eax 6df: 21 c8 and %ecx,%eax 6e1: 83 c1 01 add $0x1,%ecx 6e4: 83 c0 01 add $0x1,%eax 6e7: f7 f1 div %ecx 6e9: 89 d6 mov %edx,%esi 6eb: 41 ff 90 88 00 00 00 call 0x88(%r8) After: 6c9: 53 push %rbx 6ca: 48 8b 57 08 mov 0x8(%rdi),%rdx 6ce: 48 89 fb mov %rdi,%rbx 6d1: 8b 87 c8 05 00 00 mov 0x5c8(%rdi),%eax 6d7: 8b 72 04 mov 0x4(%rdx),%esi 6da: f7 d0 not %eax 6dc: 21 f0 and %esi,%eax 6de: 83 c0 01 add $0x1,%eax 6e1: 21 c6 and %eax,%esi 6e3: ff 92 88 00 00 00 call 0x88(%rdx) Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:10:40 -04:00
Hawking Zhang	dfe9d047b1	drm/amdgpu: Add more types for boot time error reporting Data abort exception and unknown errors are supported. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:10:17 -04:00
Likun Gao	9192c7613c	drm/amdgpu: force to use legacy inv in mmhub MMHUB v4.1.0 only support fixed cache mode, so only use legacy invalidation accordingly. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 11:10:08 -04:00
Sunil Khatri	f59902ffcc	drm/amdgpu: optimize the padding for gfx12 Adding NOP packets one by one in the ring does not use the CP efficiently. Solution: Use CP optimization while adding NOP packet's so PFP can discard NOP packets based on information of count from the Header instead of fetching all NOP packets one by one. Reviewed-by: Christian König <christian.koenig@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Tvrtko Ursulin <tursulin@igalia.com> Cc: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 10:44:37 -04:00
Yifan Zhang	62eefd10ac	drm/amdgpu: use CPU for page table update if SDMA is unavailable avoid using SDMA if it is unavailable. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 10:43:57 -04:00
Sunil Khatri	847e387e00	drm/amdgpu: optimize the padding for gfx11 Adding NOP packets one by one in the ring does not use the CP efficiently. Solution: Use CP optimization while adding NOP packet's so PFP can discard NOP packets based on information of count from the Header instead of fetching all NOP packets one by one. Reviewed-by: Christian König <christian.koenig@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Tvrtko Ursulin <tursulin@igalia.com> Cc: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 10:43:49 -04:00
Sunil Khatri	67c4ca9f79	drm/amdgpu: do not call insert_nop fn for zero count Do not make a function call for zero size NOP as it does not add anything in the ring and is unnecessary function call. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 10:43:28 -04:00
Jonathan Kim	ee0a469cf9	drm/amdkfd: support per-queue reset on gfx9 Support per-queue reset for GFX9. The recommendation is for the driver to target reset the HW queue via a SPI MMIO register write. Since this requires pipe and HW queue info and MEC FW is limited to doorbell reports of hung queues after an unmap failure, scan the HW queue slots defined by SET_RESOURCES first to identify the user queue candidates to reset. Only signal reset events to processes that have had a queue reset. If queue reset fails, fall back to GPU reset. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 10:43:18 -04:00
Sunil Khatri	e89d2fec4c	drm/amdgpu: optimize the padding for gfx10 Adding NOP packets one by one in the ring does not use the CP efficiently. Solution: Use CP optimization while adding NOP packet's so PFP can discard NOP packets based on information of count from the Header instead of fetching all NOP packets one by one. Cc: Christian König <christian.koenig@amd.com> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Tvrtko Ursulin <tursulin@igalia.com> Cc: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 10:43:01 -04:00
Sunil Khatri	836af5be1b	drm/amdgpu: Clean up the register dump via debugfs list debugfs register list for dump is cleaned as it have some issues related to proper power state of the IP before register read. Since the above mentioned is removed we no longer want this to be dumped part of the devcoredump and hence removed. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 10:42:22 -04:00
Sunil Khatri	17277da266	drm/amdgpu: Remove debugfs amdgpu_reset_dump_register_list There are some problem with existing amdgpu_reset_dump_register_list debugfs node. It is supposed to read a list of registers but there could be cases when the IP is not in correct power state. Register read in such cases could lead to more problems. We are taking care of all such power states in devcoredump and dumping the registers of need for debugging. So cleaning this code and we dont need this functionality via debugfs anymore. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 10:42:09 -04:00
Hamza Mahfooz	717b432b6d	Revert "drm/amd: Add power_saving_policy drm property to eDP connectors" This reverts commit `9d8c094dda`. It was merged without meeting userspace requirements. Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240802145946.48073-2-hamza.mahfooz@amd.com	2024-08-02 11:29:17 -04:00
Thomas Zimmermann	0e8655b4e8	Merge drm/drm-next into drm-misc-next Backmerging to get a late RC of v6.10 before moving into v6.11. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2024-07-29 09:35:54 +02:00
Michael Chen	9038e25c80	drm/amdgpu: increase mes log buffer size for gfx12 MES firmware requires larger log buffer for gfx12. Allocate proper buffer respectively for gfx11 and gfx12. Signed-off-by: Michael Chen <michael.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `739d0f3e1f`)	2024-07-27 18:10:12 -04:00
Christian König	f3572db3c0	drm/amdgpu: fix contiguous handling for IB parsing v2 Otherwise we won't get correct access to the IB. v2: keep setting AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS to avoid problems in the VRAM backend. Signed-off-by: Christian König <christian.koenig@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3501 Fixes: `e362b7c8f8` ("drm/amdgpu: Modify the contiguous flags behaviour") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Tested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `fbfb5f0342`)	2024-07-27 18:09:38 -04:00
Thomas Weißschuh	aeb81b62c7	drm/amdgpu: convert bios_hardcoded_edid to drm_edid Instead of manually passing around 'struct edid *' and its size, use 'struct drm_edid', which encapsulates a validated combination of both. As the drm_edid_ can handle NULL gracefully, the explicit checks can be dropped. Also save a few characters by transforming '&array[0]' to the equivalent 'array' and using 'max_t(int, ...)' instead of manual casts. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:35:05 -04:00
Sunil Khatri	13d8850a33	drm/amdgpu: trigger ip dump before suspend of IP's Problem: IP dump right now is done post suspend of all IP's which for some IP's could change power state and software state too which we do not want to reflect in the dump as it might not be same at the time of hang. Solution: IP should be dumped as close to the HW state when the GPU was in hung state without trying to reinitialize any resource. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:34:26 -04:00
Michael Chen	739d0f3e1f	drm/amdgpu: increase mes log buffer size for gfx12 MES firmware requires larger log buffer for gfx12. Allocate proper buffer respectively for gfx11 and gfx12. Signed-off-by: Michael Chen <michael.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:32:05 -04:00
Sunil Khatri	076362d931	drm/amdgpu: print VCN instance dump for valid instance VCN dump is dependent on power state of the ip. Dump is valid if VCN was powered up at the time of ip dump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:31:10 -04:00
Venkata Narendra Kumar Gutta	25dd25f86e	drm/amdgpu: Add MFD support for ISP I2C bus ISP I2C bus device can't be enumerated via ACPI mechanism since it shares the memory map with the AMDGPU. So use the MFD mechanism for registering the ISP I2C device and add the required resources. Signed-off-by: Venkata Narendra Kumar Gutta <vengutta@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:28:58 -04:00
Christian König	fbfb5f0342	drm/amdgpu: fix contiguous handling for IB parsing v2 Otherwise we won't get correct access to the IB. v2: keep setting AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS to avoid problems in the VRAM backend. Signed-off-by: Christian König <christian.koenig@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3501 Fixes: `e362b7c8f8` ("drm/amdgpu: Modify the contiguous flags behaviour") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Tested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:28:47 -04:00
Sunil Khatri	cd162ae9bc	drm/amdgpu: add print support for vcn_v3_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v3_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:28:41 -04:00
Sunil Khatri	58d283801d	drm/amdgpu: add vcn_v3_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v3_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:28:35 -04:00
Sunil Khatri	50d10d9271	drm/amdgpu: add macro to calculate offset with instance Add macro definition which calculate offset of the register with index override. This is useful in case when there is an array of registers which is common for all instances. To read registers in that case it is easy to define registers once and the index value is manually passed to calculate proper offset of register for each instance. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:28:28 -04:00
Sunil Khatri	f3392e662e	drm/amdgpu: add vcn ip dump ptr in vcn global struct Add pointer to the vcn ip dump in the vcn global structure to be accessible for all vcn version via global adev. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-27 17:28:17 -04:00
Alex Deucher	8155566a26	drm/amdgpu: properly handle vbios fake edid sizing The comment in the vbios structure says: // = 128 means EDID length is 128 bytes, otherwise the EDID length = ucFakeEDIDLength*128 This fake edid struct has not been used in a long time, so I'm not sure if there were actually any boards out there with a non-128 byte EDID, but align the code with the comment. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Reported-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-June/109964.html Fixes: `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-25 17:41:46 -04:00
Christian König	83b501c179	drm/scheduler: remove full_recover from drm_sched_start This was basically just another one of amdgpus hacks. The parameter allowed to restart the scheduler without turning fence signaling on again. That this is absolutely not a good idea should be obvious by now since the fences will then just sit there and never signal. While at it cleanup the code a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com	2024-07-25 14:05:12 +02:00
ZhenGuo Yin	5659b0c93a	drm/amdgpu: reset vm state machine after gpu reset(vram lost) [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not equal to the new generation token. v2: Check vm->generation instead of calling drm_sched_entity_error in amdgpu_vm_validate. v3: Use new generation token instead of vram_lost_counter for check. Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org (cherry picked from commit `47c0388b05`)	2024-07-24 17:30:49 -04:00
Tim Huang	fab1ead0ae	drm/amdgpu: add missed harvest check for VCN IP v4/v5 To prevent below probe failure, add a check for models with VCN IP v4.0.6 where VCN1 may be harvested. v2: Apply the same check to VCN IP v4.0 and v5.0. [ 54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu] [ 54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43 c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02 00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00 [ 54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286 [ 54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX: 0000000000000000 [ 54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI: ffff99a82f680000 [ 54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09: 0000000000000000 [ 54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12: 0000000000000000 [ 54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15: 0000000000000001 [ 54.075400] FS: 00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000) knlGS:0000000000000000 [ 54.075988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4: 0000000000750ef0 [ 54.076927] PKRU: 55555554 [ 54.077132] Call Trace: [ 54.077319] <TASK> [ 54.077484] ? show_regs+0x69/0x80 [ 54.077747] ? __die+0x28/0x70 [ 54.077979] ? page_fault_oops+0x180/0x4b0 [ 54.078286] ? do_user_addr_fault+0x2d2/0x680 [ 54.078610] ? exc_page_fault+0x84/0x190 [ 54.078910] ? asm_exc_page_fault+0x2b/0x30 [ 54.079224] ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu] [ 54.079941] ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu] [ 54.080617] vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu] [ 54.081316] amdgpu_device_ip_set_powergating_state+0x64/0xc0 [amdgpu] [ 54.082057] amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu] [ 54.082727] amdgpu_ring_alloc+0x44/0x70 [amdgpu] [ 54.083351] amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu] [ 54.084054] amdgpu_ring_test_helper+0x22/0x90 [amdgpu] [ 54.084698] vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu] [ 54.085307] amdgpu_device_init+0x1f96/0x2780 [amdgpu] [ 54.085951] amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu] [ 54.086591] amdgpu_pci_probe+0x19f/0x550 [amdgpu] [ 54.087215] local_pci_probe+0x48/0xa0 [ 54.087509] pci_device_probe+0xc9/0x250 [ 54.087812] really_probe+0x1a4/0x3f0 [ 54.088101] __driver_probe_device+0x7d/0x170 [ 54.088443] driver_probe_device+0x24/0xa0 [ 54.088765] __driver_attach+0xdd/0x1d0 [ 54.089068] ? __pfx___driver_attach+0x10/0x10 [ 54.089417] bus_for_each_dev+0x8e/0xe0 [ 54.089718] driver_attach+0x22/0x30 [ 54.090000] bus_add_driver+0x120/0x220 [ 54.090303] driver_register+0x62/0x120 [ 54.090606] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu] [ 54.091255] __pci_register_driver+0x62/0x70 [ 54.091593] amdgpu_init+0x67/0xff0 [amdgpu] [ 54.092190] do_one_initcall+0x5f/0x330 [ 54.092495] do_init_module+0x68/0x240 [ 54.092794] load_module+0x201c/0x2110 [ 54.093093] init_module_from_file+0x97/0xd0 [ 54.093428] ? init_module_from_file+0x97/0xd0 [ 54.093777] idempotent_init_module+0x11c/0x2a0 [ 54.094134] __x64_sys_finit_module+0x64/0xc0 [ 54.094476] do_syscall_64+0x58/0x120 [ 54.094767] entry_SYSCALL_64_after_hwframe+0x6e/0x76 Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org (cherry picked from commit `0b071245dd`)	2024-07-24 17:30:23 -04:00
Stanley.Yang	1a8825259a	drm/amdgpu: Fix eeprom max record count The eeprom table is empty before initializing, set eeprom table version first before initializing. Changed from V1: Reuse amdgpu_ras_set_eeprom_table_version function Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `015b8a2fdf`)	2024-07-24 17:30:23 -04:00
YiPeng Chai	afac8c6554	drm/amdgpu: fix ras UE error injection failure issue The ras command shared memory is allocated from VRAM and the response status of the command buffer will not be zero due to gpu being in fatal error state after ras UE error injection. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `8284951a6e`)	2024-07-24 17:30:23 -04:00
Jane Jian	6728f55590	drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF For VCN/JPEG 4.0.3, use only the local addressing scheme. - Mask bit higher than AID0 range v2 remain the case for mmhub use master XCC Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `caaf576292`)	2024-07-24 17:30:23 -04:00
Lijo Lazar	485432d090	drm/amdgpu: Add empty HDP flush function to VCN v4.0.3 VCN 4.0.3 does not HDP flush with RRMT enabled. Instead, mmsch will do the HDP flush. This change is necessary for VCN v4.0.3, no need for backward compatibility Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `49cfaebe48`)	2024-07-24 17:30:23 -04:00
Lijo Lazar	23df34997d	drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3 JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead, mmsch fw will do the flush. This change is necessary for JPEG v4.0.3, no need for backward compatibility Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `585e3fdb36`)	2024-07-24 17:30:23 -04:00
Ma Ke	df65aabef3	drm/amd/amdgpu: Fix uninitialized variable warnings Return 0 to avoid returning an uninitialized variable r. Cc: stable@vger.kernel.org Fixes: `230dd6bb61` ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6472de66c0`)	2024-07-24 17:30:23 -04:00
David Belanger	73048bda46	drm/amdgpu: Fix atomics on GFX12 If PCIe supports atomics, configure register to prevent DF from breaking atomics in separate load/store operations. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `666f14cab2`)	2024-07-24 17:30:23 -04:00
Alex Deucher	a03ebf1163	drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell We seem to have a case where SDMA will sometimes miss a doorbell if GFX is entering the powergating state when the doorbell comes in. To workaround this, we can update the wptr via MMIO, however, this is only safe because we disallow gfxoff in begin_ring() for SDMA 5.2 and then allow it again in end_ring(). Enable this workaround while we are root causing the issue with the HW team. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440 Tested-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org (cherry picked from commit `f2ac526349`)	2024-07-24 17:29:57 -04:00
ZhenGuo Yin	47c0388b05	drm/amdgpu: reset vm state machine after gpu reset(vram lost) [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not equal to the new generation token. v2: Check vm->generation instead of calling drm_sched_entity_error in amdgpu_vm_validate. v3: Use new generation token instead of vram_lost_counter for check. Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-24 14:45:24 -04:00
Tim Huang	0b071245dd	drm/amdgpu: add missed harvest check for VCN IP v4/v5 To prevent below probe failure, add a check for models with VCN IP v4.0.6 where VCN1 may be harvested. v2: Apply the same check to VCN IP v4.0 and v5.0. [ 54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu] [ 54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43 c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02 00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00 [ 54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286 [ 54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX: 0000000000000000 [ 54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI: ffff99a82f680000 [ 54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09: 0000000000000000 [ 54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12: 0000000000000000 [ 54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15: 0000000000000001 [ 54.075400] FS: 00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000) knlGS:0000000000000000 [ 54.075988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4: 0000000000750ef0 [ 54.076927] PKRU: 55555554 [ 54.077132] Call Trace: [ 54.077319] <TASK> [ 54.077484] ? show_regs+0x69/0x80 [ 54.077747] ? __die+0x28/0x70 [ 54.077979] ? page_fault_oops+0x180/0x4b0 [ 54.078286] ? do_user_addr_fault+0x2d2/0x680 [ 54.078610] ? exc_page_fault+0x84/0x190 [ 54.078910] ? asm_exc_page_fault+0x2b/0x30 [ 54.079224] ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu] [ 54.079941] ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu] [ 54.080617] vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu] [ 54.081316] amdgpu_device_ip_set_powergating_state+0x64/0xc0 [amdgpu] [ 54.082057] amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu] [ 54.082727] amdgpu_ring_alloc+0x44/0x70 [amdgpu] [ 54.083351] amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu] [ 54.084054] amdgpu_ring_test_helper+0x22/0x90 [amdgpu] [ 54.084698] vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu] [ 54.085307] amdgpu_device_init+0x1f96/0x2780 [amdgpu] [ 54.085951] amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu] [ 54.086591] amdgpu_pci_probe+0x19f/0x550 [amdgpu] [ 54.087215] local_pci_probe+0x48/0xa0 [ 54.087509] pci_device_probe+0xc9/0x250 [ 54.087812] really_probe+0x1a4/0x3f0 [ 54.088101] __driver_probe_device+0x7d/0x170 [ 54.088443] driver_probe_device+0x24/0xa0 [ 54.088765] __driver_attach+0xdd/0x1d0 [ 54.089068] ? __pfx___driver_attach+0x10/0x10 [ 54.089417] bus_for_each_dev+0x8e/0xe0 [ 54.089718] driver_attach+0x22/0x30 [ 54.090000] bus_add_driver+0x120/0x220 [ 54.090303] driver_register+0x62/0x120 [ 54.090606] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu] [ 54.091255] __pci_register_driver+0x62/0x70 [ 54.091593] amdgpu_init+0x67/0xff0 [amdgpu] [ 54.092190] do_one_initcall+0x5f/0x330 [ 54.092495] do_init_module+0x68/0x240 [ 54.092794] load_module+0x201c/0x2110 [ 54.093093] init_module_from_file+0x97/0xd0 [ 54.093428] ? init_module_from_file+0x97/0xd0 [ 54.093777] idempotent_init_module+0x11c/0x2a0 [ 54.094134] __x64_sys_finit_module+0x64/0xc0 [ 54.094476] do_syscall_64+0x58/0x120 [ 54.094767] entry_SYSCALL_64_after_hwframe+0x6e/0x76 Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-24 14:44:05 -04:00
Yifan Zhang	3b37e2725a	drm/amdgpu: skip kfd init if GFX is not ready. avoid kfd init crash in that case. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-24 14:43:58 -04:00
Alex Deucher	bd4bea5ab2	drm/amdgpu/gfx9.4.3: Enable bad opcode interrupt For the bad opcode case, it will cause CP/ME hang. The firmware will prevent the ME side from hanging by raising a bad opcode interrupt. And the driver needs to perform a vmid reset when receiving the interrupt. Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:46 -04:00
Alex Deucher	238352b494	drm/amdgpu/gfx9: Enable bad opcode interrupt For the bad opcode case, it will cause CP/ME hang. The firmware will prevent the ME side from hanging by raising a bad opcode interrupt. And the driver needs to perform a vmid reset when receiving the interrupt. Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:46 -04:00
Jesse Zhang	5ebca62eb8	drm/amdgpu/gfx12: Enable bad opcode interrupt For the bad opcode case, it will cause CP/ME hang. The firmware will prevent the ME side from hanging by raising a bad opcode interrupt. And the driver needs to perform a vmid reset when receiving the interrupt. v2: update irq naming (drop priv) (Alex) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:46 -04:00
Jesse Zhang	bc6c2a6f64	drm/amdgpu/gfx10: Enable bad opcode interrupt For the bad opcode case, it will cause CP/ME hang. The firmware will prevent the ME side from hanging by raising a bad opcode interrupt. And the driver needs to perform a vmid reset when receiving the interrupt. v2: update irq naming (drop priv) (Alex) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Jesse Zhang	a790902237	drm/amdgpu/gfx11: Enable bad opcode interrupt For the bad opcode case, it will cause CP/ME hang. The firmware will prevent the ME side from hanging by raising a bad opcode interrupt. And the driver needs to perform a vmid reset when receiving the interrupt. v2: update irq naming (drop priv) (Alex) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Alex Deucher	acddd5cf70	drm/amdgpu/gfx: add bad opcode interrupt Add the irq source for bad opcodes. Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Alex Deucher	48695573d2	drm/amdgpu/gfx9: properly handle error ints on all pipes Need to handle the interrupt enables for all pipes. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Alex Deucher	3987932176	drm/amdgpu/gfx12: properly handle error ints on all pipes Need to handle the interrupt enables for all pipes. v2: fix indexing (Jessie) Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Alex Deucher	2662b7d9d8	drm/amdgpu/gfx11: properly handle error ints on all pipes Need to handle the interrupt enables for all pipes. v2: fix indexing (Jessie) Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Alex Deucher	4b95cec689	drm/amdgpu/gfx10: properly handle error ints on all pipes Need to handle the interrupt enables for all pipes. v2: fix indexing (Jessie) Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Alex Deucher	af4808ac40	drm/amdgpu/gfx12: enable wave kill for compute queues It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Alex Deucher	f53f526f70	drm/amdgpu/gfx11: enable wave kill for compute queues It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Alex Deucher	a2737c404c	drm/amdgpu/gfx10: enable wave kill for compute queues It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
Stanley.Yang	015b8a2fdf	drm/amdgpu: Fix eeprom max record count The eeprom table is empty before initializing, set eeprom table version first before initializing. Changed from V1: Reuse amdgpu_ras_set_eeprom_table_version function Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:45:45 -04:00
YiPeng Chai	8284951a6e	drm/amdgpu: fix ras UE error injection failure issue The ras command shared memory is allocated from VRAM and the response status of the command buffer will not be zero due to gpu being in fatal error state after ras UE error injection. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:43:06 -04:00
Philip Yang	834368eab3	drm/amdkfd: Ensure user queue buffers residency Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap BO from the GPU if the bo_va queue_refcount is not zero. Create queue to increase the bo_va queue_refcount, destroy queue to decrease the bo_va queue_refcount, to ensure the queue buffers mapped on the GPU when queue is active. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:54 -04:00
Alex Deucher	22a9d5cbf8	drm/amdgpu/gfx9.4.3: implement wave kill for compute queues Based on gfx9.0 implementation. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:45 -04:00
Sunil Khatri	eac3b274aa	drm/amdgpu: add print support for sdma_v_4_4_2 ip_dump Add print support for ip dump for sdma_v_4_4_2 in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:37 -04:00
Alex Deucher	9c7e69d2e1	drm/amdgpu/gfx9: enable wave kill for compute queues It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:11 -04:00
Alex Deucher	7e60ecc2b7	drm/amdgpu/gfx8: enable wave kill for compute queues It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:08 -04:00
Alex Deucher	12fb3e9c88	drm/amdgpu/gfx7: enable wave kill for compute queues It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:03 -04:00
Philip Yang	fb91065851	drm/amdkfd: Refactor queue wptr_bo GART mapping Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size. Add wptr_bo to structure queue_properties because structure queue is allocated after queue buffers are validated, then we can remove wptr_bo parameter from pqm_create_queue. Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue, the same location with queue ctx_bo GART mapping. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:35:04 -04:00
Sunil Khatri	db54a725d5	drm/amdgpu: Add sdma_v4_4_2 ip dump for devcoredump Add ip dump for sdma_v4_4_2 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:58 -04:00
Sunil Khatri	a11b36ba9c	drm/amdgpu: add print support for sdma_v_4_0 ip_dump Add print support for ip dump for sdma_v_4_0 in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:49 -04:00
Philip Yang	c86ad39140	drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the local variable, the original pointer not set to NULL, this could cause use-after-free bug. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:44 -04:00
Philip Yang	f9e292cbba	drm/amdkfd: kfd_bo_mapped_dev support partition Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint restore now. No functional change. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:36 -04:00
Jane Jian	caaf576292	drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF For VCN/JPEG 4.0.3, use only the local addressing scheme. - Mask bit higher than AID0 range v2 remain the case for mmhub use master XCC Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:28 -04:00
Lijo Lazar	49cfaebe48	drm/amdgpu: Add empty HDP flush function to VCN v4.0.3 VCN 4.0.3 does not HDP flush with RRMT enabled. Instead, mmsch will do the HDP flush. This change is necessary for VCN v4.0.3, no need for backward compatibility Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:18 -04:00
Lijo Lazar	585e3fdb36	drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3 JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead, mmsch fw will do the flush. This change is necessary for JPEG v4.0.3, no need for backward compatibility Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:12 -04:00
Pierre-Eric Pelloux-Prayer	fec5f8e8c6	drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit Before this commit, only submits with both a BO_HANDLES chunk and a 'bo_list_handle' would be rejected (by amdgpu_cs_parser_bos). But if UMD sent multiple BO_HANDLES, what would happen is: * only the last one would be really used * all the others would leak memory as amdgpu_cs_p1_bo_handles would overwrite the previous p->bo_list value This commit rejects submissions with multiple BO_HANDLES chunks to match the implementation of the parser. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:57 -04:00
Sunil Khatri	80237bfc03	drm/amdgpu: Add sdma_v4_0 ip dump for devcoredump Add ip dump for sdma_v4_0 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:52 -04:00
Sunil Khatri	abf839f5eb	drm/amdgpu: add print support for sdma_v_7_0 ip_dump Add print support for ip dump for sdma_v_7_0 in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:45 -04:00
Ma Ke	6472de66c0	drm/amd/amdgpu: Fix uninitialized variable warnings Return 0 to avoid returning an uninitialized variable r. Cc: stable@vger.kernel.org Fixes: `230dd6bb61` ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:38 -04:00
Ma Ke	93381e6b61	drm/amdgpu: fix a possible null pointer dereference In amdgpu_connector_add_common_modes(), the return value of drm_cvt_mode() is assigned to mode, which will lead to a NULL pointer dereference on failure of drm_cvt_mode(). Add a check to avoid npd. Cc: stable@vger.kernel.org Fixes: `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:25 -04:00
David Belanger	666f14cab2	drm/amdgpu: Fix atomics on GFX12 If PCIe supports atomics, configure register to prevent DF from breaking atomics in separate load/store operations. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:17 -04:00
Sunil Khatri	4df9e2200f	drm/amdgpu: Add sdma_v7_0 ip dump for devcoredump Add ip dump for sdma_v7_0 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:10 -04:00
Alex Deucher	f2ac526349	drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell We seem to have a case where SDMA will sometimes miss a doorbell if GFX is entering the powergating state when the doorbell comes in. To workaround this, we can update the wptr via MMIO, however, this is only safe because we disallow gfxoff in begin_ring() for SDMA 5.2 and then allow it again in end_ring(). Enable this workaround while we are root causing the issue with the HW team. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440 Tested-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:32:58 -04:00
YiPeng Chai	a7e8467fbe	drm/amdgpu: Remove unused code Remove unused code. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:32:20 -04:00
YiPeng Chai	56631dee29	drm/amdgpu: optimize logging deferred error info 1. Use pa_pfn as the radix-tree key index to log deferred error info. 2. Use local array to store a row of bad pages. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:32:14 -04:00
YiPeng Chai	27cdf8c3ca	drm/amdgpu: optimize umc v12 address conversion function Split into 3 parts: 1. Convert soc physical address via ras ta. 2. Expand bad pages from soc physical address. 3. Dump bad address info. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:31:59 -04:00
Sunil Khatri	e84f798a93	drm/amdgpu: add print support for sdma_v_5_0 ip_dump Add support for ip dump for sdma_v_5_0 in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	0f1a93704a	drm/amdgpu: Add sdma_v5_0 ip dump for devcoredump Add ip dump for sdma_v5_0 for devcoredump for all instances of sdma. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	ccb54d7d91	drm/amdgpu: add print support for sdma_v_6_0 ip_dump Add print support for ip dump for sdma_v_6_0 in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	1eba165aa4	drm/amdgpu: Add sdma_v6_0 ip dump for devcoredump Add ip dump for sdma_v6_0 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	00bb3223bf	drm/amdgpu: fix the print message in devcoredump Fix the memory type logged for gtt memory size which is wrongly logged as visible vram size. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	43796955a8	drm/amdgpu: fix the extra space between two functions fix extra line space between two functions. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	08bed7e4ff	drm/amdgpu: add print support for sdma_v_5_2 ip_dump Add support for ip dump for sdma_v_5_2 in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:08 -04:00
Sunil Khatri	f763c3b543	drm/amdgpu: Add sdma_v5_2 ip dump for devcoredump Add ip dump for sdma_v5_2 for devcoredump for all instances of sdma. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:08 -04:00
YiPeng Chai	b3fb79cda5	drm/amdgpu: add mutex to protect ras shared memory Add mutex to protect ras shared memory. v2: Add TA_RAS_COMMAND__TRIGGER_ERROR command call status check. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-16 11:44:30 -04:00
Boyuan Zhang	7d75ef3736	drm/amdgpu/vcn: not pause dpg for unified queue For unified queue, DPG pause for encoding is done inside VCN firmware, so there is no need to pause dpg based on ring type in kernel. For VCN3 and below, pausing DPG for encoding in kernel is still needed. v2: add more comments v3: update commit message Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-16 11:44:04 -04:00
Boyuan Zhang	ecfa23c8df	drm/amdgpu/vcn: identify unified queue in sw init Determine whether VCN using unified queue in sw_init, instead of calling functions later on. v2: fix coding style Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-16 11:43:48 -04:00
Aurabindo Pillai	28814be882	drm/amd: Bump KMS_DRIVER_MINOR version Increase the KMS minor version to indicate GFX12 DCC support since this contains a major change in how DCC is managed across IPs like GFX, DCN etc. This will be used mainly by userspace like Mesa to figure out DCC support on GFX12 hardware. v2: fix version number (Alex) Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-15 17:41:23 -04:00
Alex Deucher	1cff1010be	drm/amdgpu/mes12: add missing opcode string Fixes the indexing of the string array. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-12 11:46:46 -04:00
Alex Deucher	478cb8badf	drm/amdgpu/mes11: update opcode strings Add new packet. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-12 11:46:46 -04:00
Mario Limonciello	9d8c094dda	drm/amd: Add power_saving_policy drm property to eDP connectors When the `power_saving_policy` property is set to bit mask "Require color accuracy" ABM should be disabled immediately and any requests by sysfs to update will return an -EBUSY error. When the `power_saving_policy` property is set to bit mask "Require low latency" PSR should be disabled. When the property is restored to an empty bit mask ABM and PSR can be enabled again. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703051722.328-3-mario.limonciello@amd.com	2024-07-10 17:00:07 -04:00
Alex Deucher	8030f6533e	drm/amdgpu: remove exp hw support check for gfx12 Enable it by default. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 13:45:57 -04:00
YiPeng Chai	e23300dfff	drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed The problem case is as follows: 1. GPU A triggers a gpu ras reset, and GPU A drives GPU B to also perform a gpu ras reset. 2. After gpu B ras reset started, gpu B queried a DE data. Since the DE data was queried in the ras reset thread instead of the page retirement thread, bad page retirement work would not be triggered. Then even if all gpu resets are completed, the bad pages will be cached in RAM until GPU B's bad page retirement work is triggered again and then saved to eeprom. This patch can save the bad pages to eeprom in time after gpu ras reset is completed. v2: 1. Add the above description to code comments. 2. Reuse existing function. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:13:41 -04:00
YiPeng Chai	c04706914d	drm/amdgpu: flush all cached ras bad pages to eeprom Before uninstalling gpu driver, flush all cached ras bad pages to eeprom. v2: Put the same code into a function and reuse the function. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:13:35 -04:00
Sunil Khatri	c39385710c	drm/amdgpu: select compute ME engines dynamically GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX12. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:13:22 -04:00
Sunil Khatri	a85cc86cce	drm/amdgpu: select compute ME engines dynamically GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX11. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:13:04 -04:00
Alex Deucher	7d570f56f1	drm/amdgpu/job: Replace DRM_INFO/ERROR logging Use the dev_info/err variants so we get per device logging in multi-GPU cases. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:12:55 -04:00
Sunil Khatri	948f2828a6	drm/amdgpu: select compute ME engines dynamically GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX10. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:12:48 -04:00
Lijo Lazar	d02ddefc7e	drm/amdgpu: Initialize VF partition mode For SOCs with GFX v9.4.3, a VF may have multiple compute partitions. Fetch the partition information during init and initialize partition nodes. There is no support to switch partition mode in VF mode, hence disable the same. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:12:28 -04:00
Gavin Wan	5d64af40e3	drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping. sdma has 2 instances in SRIOV cpx mode. Odd numbered VFs have sdma0/sdma1 instances. Even numbered vfs have sdma2/sdma3. For Even numbered vfs, the sdma2 & sdma3 (irq srouce id CLIENTID_SDMA2 and CLIENTID_SDMA3) should map to irq seq 0 & 1. Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:12:19 -04:00
Thomas Hellström	4c44f89c5d	drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves To address the problem with hitches moving when bulk move sublists are lru-bumped, register the list cursors with the ttm_lru_bulk_move structure when traversing its list, and when lru-bumping the list, move the cursor hitch to the tail. This also means it's mandatory for drivers to call ttm_lru_bulk_move_init() and ttm_lru_bulk_move_fini() when initializing and finalizing the bulk move structure, so add those calls to the amdgpu- and xe driver. Compared to v1 this is slightly more code but less fragile and hopefully easier to understand. Changes in previous series: - Completely rework the functionality - Avoid a NULL pointer dereference assigning manager->mem_type - Remove some leftover code causing build problems v2: - For hitch bulk tail moves, store the mem_type in the cursor instead of with the manager. v3: - Remove leftover mem_type member from change in v2. v6: - Add some lockdep asserts (Matthew Brost) - Avoid NULL pointer dereference (Matthew Brost) - No need to check bo->resource before dereferencing bo->bulk_move (Matthew Brost) Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240705153206.68526-5-thomas.hellstrom@linux.intel.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-07-09 12:39:33 +02:00
Zhigang Luo	ee98fb71ba	drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to avoid reading wrong WPTR from doorbell in sriov vf, set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:56:27 -04:00
Yang Wang	59f488be76	drm/amdgpu: add ras event state device attribute support add amdgpu ras 'event_state' sysfs device attribute support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:56:13 -04:00
Yang Wang	12b435a40c	drm/amdgpu: add ras POSION_CONSUMPTION event id support add amdgpu ras POSION_CONSUMPTION event id support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:55:37 -04:00
Yang Wang	5b9de2596f	drm/amdgpu: add ras POSION_CREATION event id support add amdgpu ras POSION_CREATION event id support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:55:18 -04:00
Yang Wang	75ac6a2506	drm/amdgpu: refine amdgpu ras event id core code v1: - use unified event id to manage ras events - add a new function amdgpu_ras_query_error_status_with_event() to accept event type as parameter. v2: add a warn log to show the location of function failure when calling amdgpu_ras_mark_event(). (Tao Zhou) v3: change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL. v4: rename amdgpu_ras_get_recovery_event() to amdgpu_ras_get_fatal_error_event(). Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:55:11 -04:00
Christian König	320debca1b	drm/amdgpu: reject gang submit on reserved VMIDs A gang submit won't work if the VMID is reserved and we can't flush out VM changes from multiple engines at the same time. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:50:53 -04:00
Saleemkhan Jamadar	b6ad109166	drm/amdgpu: enable dpg for vcn and jpeg on GC 11_5_2 DPG mode is enabled for vcn and jpeg on VCN v4_0_5 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:50:40 -04:00
Yang Wang	332210c13a	drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOG remove redundant semicolons in RAS_EVENT_LOG to avoid code format check warning. Fixes: `b712d7c201` ("drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:43 -04:00
Frank Min	54837bd2be	drm/amdgpu: restore dcc bo tilling configs while moving While moving buffer which has dcc tiling config, it is needed to restore its original dcc tiling. 1. extend copy flag to cover tiling bits 2. add logic to restore original dcc tiling config Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:27 -04:00
Sunil Khatri	c8714ac982	drm/amdgpu: add gfx queue support for gfx12 ipdump Add support of all the CP GFX queues for gfx12 ipdump to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:22 -04:00
Sunil Khatri	495e6173a4	drm/amdgpu: add cp queue registers for gfx12 ipdump Add gfx12 support of CP queue registers for all queues to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:15 -04:00
Sunil Khatri	f0c6b79bfc	drm/amdgpu: enable redirection of irq's for IH v7.0 Enable redirection of irq for pagefaults for specific clients to avoid overflow without dropping interrupts. So here we redirect the interrupts to another IH ring i.e ring1 where only these interrupts are processed. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:09 -04:00
Sunil Khatri	906219ec94	drm:amdgpu: enable IH ring1 for IH v7.0 We need IH ring1 for handling the pagefault interrupts which over flow in default ring for specific usecases. Enable ring1 allows software to redirect high interrupts to ring1 from default IH ring. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:02 -04:00
Yifan Zha	33f23fc315	drm/amdgpu: Set no_hw_access when VF request full GPU fails [Why] If VF request full GPU access and the request failed, the VF driver can get stuck accessing registers for an extended period during the unload of KMS. [How] Set no_hw_access flag when VF request for full GPU access fails This prevents further hardware access attempts, avoiding the prolonged stuck state. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:56 -04:00
Sunil Khatri	2262acad0a	drm/amdgpu: add print support for gfx12 ipdump Add support of gfx12 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:49 -04:00
Sunil Khatri	fbbbb62112	drm/amdgpu: add gfx12 register support in ipdump Add general registers of gfx12 in ipdump for devcoredump support. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:43 -04:00
Frank Min	ffcc5745ed	drm/amdgpu: update gfxhub client id for gfx12 update gfxhub client id for gfx12 Signed-off-by: Frank Min <Frank.Min@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:36 -04:00
Tim Huang	064d92436b	drm/amd/pm: avoid to load smu firmware for APUs Certain call paths still load the SMU firmware for APUs, which needs to be skipped. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:30 -04:00
YiPeng Chai	78347b651a	drm/amdgpu: sysfs node disable query error count during gpu reset Sysfs node disable query error count during gpu reset. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:14 -04:00
Daniel Vetter	6be146cf57	amd-drm-next-6.11-2024-07-03: amdgpu: - Use vmalloc for dc_state - Replay fixes - Freesync fixes - DCN 4.0.1 fixes - DML fixes - DCC updates - Misc code cleanups and bug fixes - 8K display fixes - DCN 3.5 fixes - Restructure DIO code - DML1 fixes - DML2 fixes - GFX11 fix - GFX12 updates - GFX12 modifiers fixes - RAS fixes - IP dump fixes - Add some updated IP version checks _ Silence UBSAN warning radeon: - GPUVM fix -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZoW9RgAKCRC93/aFa7yZ 2AfjAQDJH95ijhdClk4S7t0ofjam7UPhVkJiUl8u73BNbuul/QEAk7RadnWw8SwC wvSXMCyHY3XszMJ1AdWG+4uwq5axPQE= =aWkn -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.11-2024-07-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-07-03: amdgpu: - Use vmalloc for dc_state - Replay fixes - Freesync fixes - DCN 4.0.1 fixes - DML fixes - DCC updates - Misc code cleanups and bug fixes - 8K display fixes - DCN 3.5 fixes - Restructure DIO code - DML1 fixes - DML2 fixes - GFX11 fix - GFX12 updates - GFX12 modifiers fixes - RAS fixes - IP dump fixes - Add some updated IP version checks _ Silence UBSAN warning radeon: - GPUVM fix Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703211314.2041893-1-alexander.deucher@amd.com	2024-07-05 12:02:12 +02:00
Daniel Vetter	71e9f407fd	amd-drm-next-6.11-2024-06-28: amdgpu: - JPEG 5.x fixes - More FW loading cleanups - Misc code cleanups - GC 12.x fixes - ASPM fix - DCN 4.0.1 updates - SR-IOV fixes - HDCP fix - USB4 fixes - Silence UBSAN warnings - MES submission fixes - Update documentation for new products - DCC updates - Initial ISP 4.x plumbing - RAS fixes - Misc small fixes amdkfd: - Fix missing unlock in error path for adding queues -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZn8qfAAKCRC93/aFa7yZ 2H9QAQCZZsu8vps+HVUY5vihNXNL9TdUbrrroQBEyk+DsJzE9QD8CJW8/oiP8/fa /EsAWXNZUH+mPxRaVWTxp0j2P941pww= =uZbG -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.11-2024-06-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-06-28: amdgpu: - JPEG 5.x fixes - More FW loading cleanups - Misc code cleanups - GC 12.x fixes - ASPM fix - DCN 4.0.1 updates - SR-IOV fixes - HDCP fix - USB4 fixes - Silence UBSAN warnings - MES submission fixes - Update documentation for new products - DCC updates - Initial ISP 4.x plumbing - RAS fixes - Misc small fixes amdkfd: - Fix missing unlock in error path for adding queues Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240628213135.427214-1-alexander.deucher@amd.com	2024-07-05 11:39:23 +02:00
Daniel Vetter	86634fa4e6	Linux 6.10-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmaB0NweHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkvwH/36UJRk/o6wvXnyH E6QjCSWo2226APyWks22NjtC3I/8Iqdvkneuh6wG0qL2sXAB078EMjUq5R81bF8H wWFBJwetjYTp8GEyLioMEb2wCH/J3R29dLFC4UYTplafXRGP6//xcpJaKmTxcgdR 31IzvTPXbApZ7L3k1U6rA2bK9PNKcFCOvZlrNMUCuwMrabymHsDfOUt1DqXyg2xp zjqiWYBwlklozmgawSWt/mdEgkWuTcAbg+KyqDVQF59s9aj/OOwZ0j+HACq5V8CM quTPIAYL6CC9p7uxa69lGr/sgC0Is/BZLPX7RTZAwCgarGvnX+1HUsjDcaFCtrVg O6fPUV8= =pgUx -----END PGP SIGNATURE----- Merge v6.10-rc6 into drm-next The exynos-next pull is based on a newer -rc than drm-next. hence backmerge first to make sure the unrelated conflicts we accumulated don't end up randomly in the exynos merge pull, but are separated out. Conflicts are all benign: Adjacent changes in amdgpu and fbdev-dma code, and cherry-pick conflict in xe. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2024-07-05 10:47:28 +02:00
Sunil Khatri	ea67deb03c	drm/amdgpu: fix out of bounds access in gfx11 during ip dump During ip dump in gfx11 the index variable is reused but is not reinitialized to 0 and this causes the index calculation to be wrong and access out of bound access. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:07:16 -04:00
Tim Huang	94845ea057	drm/amdgpu: add firmware for PSP IP v14.0.4 This patch is to add firmware for PSP 14.0.4. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:07:10 -04:00
Tim Huang	b709f949f0	drm/amdgpu: enable mode2 reset for SMU IP v14.0.4 Set the default reset method to mode2 for SMU 14.0.4. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:07:02 -04:00
Tim Huang	38a16bfe6f	drm/amdgpu: add SMU IP v14.0.4 discovery support This patch is to add SMU 14.0.4 support Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:57 -04:00
Tim Huang	9cd2ad14d8	drm/amdgpu: add PSP IP v14.0.4 discovery support This patch is to add PSP 14.0.4 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:36 -04:00
Tim Huang	c7c3f786b9	drm/amdgpu: add PSP IP v14.0.4 support This patch is to add PSP 14.0.4 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:30 -04:00
Tim Huang	614a9f5ed5	drm/amdgpu: add firmware for VPE IP v6.1.3 This patch is to add firmware for VPE 6.1.3. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:24 -04:00
Tim Huang	ca15cd559f	drm/amdgpu: add VPE IP v6.1.3 discovery support This patch is to add VPE 6.1.3 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:19 -04:00
Tim Huang	f3e2a425c6	drm/amdgpu: add VPE IP v6.1.3 support This patch is to add VPE 6.1.3 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:13 -04:00
Tim Huang	410bb279a8	drm/amdgpu: Add NBIO IP v7.11.3 support Enable setting soc21 common clockgating for NBIO 7.11.3. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:06 -04:00
Tim Huang	5aea871694	drm/amdgpu: add NBIO IP v7.11.3 discovery support This patch is to add NBIO 7.11.3 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:00 -04:00
Tim Huang	6857669a22	drm/amdgpu: add firmware for SDMA IP v6.1.2 This patch is to add firmware for SDMA 6.1.2. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:53 -04:00
Tim Huang	dfeccf4d54	drm/amdgpu: add SDMA IP v6.1.2 discovery support This patch is to add SDMA 6.1.2 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:41 -04:00
Tim Huang	4448b1ff4d	drm/amdgpu: add firmware for GC IP v11.5.2 This patch is to add firmware for GC 11.5.2. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:35 -04:00
Tim Huang	23c1ea0241	drm/amdgpu: add GC IP v11.5.2 to GC 11.5.0 family This patch is to add GC 11.5.2 to GC 11.5.0 family. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:23 -04:00
Tim Huang	43e4cc2299	drm/amdgpu: add GC IP v11.5.2 soc21 support Add CG and PG flags for GFX IP v11.5.2 and PG flags for VCN IP v4.0.5. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Li Ma <li.ma@amd.com> Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:15 -04:00
Tim Huang	98392782df	drm/amdgpu: add tmz support for GC IP v11.5.2 Add tmz support for GC 11.5.2. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:09 -04:00
Tim Huang	02cf3ed627	drm/amdgpu: add GFXHUB IP v11.5.2 support This patch is to add GFXHUB 11.5.2 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:03 -04:00
Tim Huang	b6a343df46	drm/amdgpu: initialize GC IP v11.5.2 Initialize GC 11.5.2 and set gfx hw configuration. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:04:57 -04:00
Sunil Khatri	425c4a6f8b	drm/amdgpu: fix out of bounds access in gfx10 during ip dump During ip dump in gfx10 the index variable is reused but is not reinitialized to 0 and this causes the index calculation to be wrong and access out of bound access. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:04:49 -04:00
Marek Olšák	f340f2bad1	drm/amdgpu: rewrite convert_tiling_flags_to_modifier_gfx12 There were multiple bugs, like checking SWIZZLE_MODE before checking GFX12_SWIZZLE_MODE, which has undefined behavior. The function had no effect before (it always returned -EINVAL). Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Hawking Zhang	d3dbccacfd	drm/amdgpu: Fix hbm stack id in boot error report To align with firmware, hbm id field 0x1 refers to hbm stack 0, 0x2 refers to hbm statck 1. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Marek Olšák	0d3157d04d	drm/amdgpu: add amdgpu_framebuffer::gfx12_dcc amdgpu_framebuffer doesn't have tiling_flags, so we need this. amdgpu_display_get_fb_info never gets NULL parameters, so checking for NULL was useless. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Marek Olšák	8dd1426e2c	drm/amdgpu: handle gfx12 in amdgpu_display_verify_sizes It verified GFX9-11 swizzle modes on GFX12, which has undefined behavior. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Hawking Zhang	c2fad73174	drm/amdgpu: Correct register used to clear fault status Driver should write to fault_cntl registers to do one-shot address/status clear. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Marek Olšák	fd536d2e12	drm/amdgpu: don't use amdgpu_lookup_format_info on gfx12 It only uses fields for GFX9-11 related to the separate DCC buffer, which doesn't exist in GFX12. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:46 -04:00
Marek Olšák	30fb9cad6f	drm/amdgpu/gfx12: remove GDS leftovers GDS doesn't exist in gfx12. The incomplete packet allows userspace to hang the hw from the kernel. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:46 -04:00
Marek Olšák	e5f6bfe402	drm/amdgpu/gfx12: remove superfluous cache flags If any INV flags are needed, they should be executed via ACQUIRE_MEM before INDIRECT_BUFFER. GLM flags are also removed because the hw ignores them. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:46 -04:00
Marek Olšák	b16ec6300f	drm/amdgpu/gfx11: remove superfluous cache flags If any INV flags are needed, they should be executed via ACQUIRE_MEM before INDIRECT_BUFFER. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:46 -04:00
Marek Olšák	11317d2963	drm/amdgpu: check for LINEAR_ALIGNED correctly in check_tiling_flags_gfx6 Fix incorrect check. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:36 -04:00
Mario Limonciello	15eb8573ad	drm/amd: Don't initialize ISP hardware without FW Although designs may contain an ISP IP block, the camera might be a USB camera. Because of this the ISP firmware is considered optional from amdgpu. However if the firmware doesn't get loaded the hardware should not be initialized. Adjust the return code for early init to ensure the IP block doesn't go through the other init and fini sequences. Also decrease the message about firmware load failure to debug so it's not as alarming to users. Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Yang Wang	71fe449484	drm/amdgpu: refine isp firmware loading refine isp firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	75be61aa77	drm/amd/amdgpu: Enable MMHUB prefetch for ISP v4.1.0 and 4.1.1 Remove temporary WA to disable ISP prefetch as MMHUB SAW is initialized to support ISP HW access GART memory using the TLSi path with prefetch enabled. Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	7c2d3112b2	drm/amd/amdgpu: Fix 'snprintf' output truncation warning snprintf can truncate the output fw filename if the isp ucode_prefix exceeds 29 characters. Knowing ISP ucode_prefix is in the format isp_x_x_x, limiting the size of ucode_prefix[] to 10 characters to fix the warning. Fixes the below warning: drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c: In function 'isp_early_init': drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:192:58: warning: 'snprintf' output may be truncated before the last format character [-Wformat-truncation=] 192 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^ In function 'isp_load_fw_by_psp', inlined from 'isp_early_init' at drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:218:8: drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:192:9: note: 'snprintf' output between 12 and 41 bytes into a destination of size 40 192 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	062666ffbc	drm/amd/amdgpu: Disable MMHUB prefetch for ISP v4.1.1 Disable MMHUB prefetch for ISP v4.1.1 as a temporary WA until the GART supports MMHUB TLSi and SAW for ISP HW to access GART memory using the TLSi path. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	05bafe95e5	drm/amd/amdgpu: Add ISP4.1.0 and ISP4.1.1 modules Add independent IP centric modules for ISP4.1.0 and ISP4.1.1 hw blocks. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	0253d718a0	drm/amd/amdgpu: Map ISP interrupts as generic IRQs Map ISP IH interrupts to Linux generic IRQ for ISP driver to handle the interrupts using MFD IORESOURCE_IRQ resource. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Alex Deucher	8930b90be6	drm/amdgpu: fix Kconfig for ISP v2 Add new config option and set proper dependencies for ISP. v2: add missed guards, drop separate Kconfig Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Pratap Nirujogi <pratap.nirujogi@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	d232584ae3	drm/amd/amdgpu: Enable ISP in amdgpu_discovery Enable ISP for ISP V4.1.0 and V4.1.1 in amdgpu_discovery. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:39 -04:00
Pratap Nirujogi	8fcbfd53ea	drm/amd/amdgpu: Add ISP driver support Add the isp driver in amdgpu to support ISP device on the APUs that supports ISP IP block. ISP hw block is used for camera front-end, pre and post processing operations. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:39 -04:00
Pratap Nirujogi	772e4d56da	drm/amd/amdgpu: Add ISP support to amdgpu_discovery ISP hw block is supported in some of the AMD GPU versions, add support to discover ISP IP in amdgpu_discovery. v2: squash in documentation update (Alex) Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:39 -04:00
Sonny Jiang	d4b8386c86	drm/amdgpu/jpeg5: Add support for DPG mode Add DPG support for JPEG 5.0 Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:33 -04:00
Frank Min	291af3f598	drm/amdgpu: tolerate allocating GTT bo with dcc flag Do not return failure for allocating GTT bo with dcc flag on gfx12. This will improve compatibility for UMD. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:17 -04:00
Jane Jian	b17eecc08f	drm/amdgpu: normalize registers as local xcc to read/write in gfx_v9_4_3 [WHY] sriov has the higher bit violation when flushing tlb [HOW] normalize the registers to keep lower 16-bit(dword aligned) to aviod higher bit violation RLCG will mask xcd out and always assume it's accessing its own xcd v2 add check in wait mem that only do the normalization on regspace Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Yiqing Yao <YiQing.Yao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:33:27 -04:00
YiPeng Chai	f852c9795c	drm/amdgpu: add gpu reset check and exception handling Add gpu reset check and exception handling for page retirement. v2: Clear poison consumption messages cached in fifo after non mode-1 reset. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:32:12 -04:00
YiPeng Chai	e278849cb2	drm/amdgpu: refine poison consumption interrupt handler 1. The poison fifo is only used for poison consumption requests. 2. Merge reset requests when poison fifo caches multiple poison consumption messages Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:32:06 -04:00
YiPeng Chai	5f08275cfd	drm/amdgpu: refine poison creation interrupt handler In order to apply to the case where a large number of ras poison interrupts: 1. Change to use variable to record poison creation requests to avoid fifo full. 2. Prioritize handling poison creation requests instead of following the order of requests received by the driver. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:31:43 -04:00
Vignesh Chander	cbda2758d8	drm/amdgpu: process RAS fatal error MB notification For RAS error scenario, VF guest driver will check mailbox and set fed flag to avoid unnecessary HW accesses. additionally, poll for reset completion message first to avoid accidentally spamming multiple reset requests to host. v2: add another mailbox check for handling case where kfd detects timeout first v3: set host_flr bit and use wait_for_reset Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:31:37 -04:00
YiPeng Chai	78146c1dcd	drm/amdgpu: add variable to record the deferred error number read by driver Add variable to record the deferred error number read by driver. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:31:20 -04:00
Vignesh Chander	29b6985de5	drm/amdgpu: Use dev_ prints for virtualization as it supports multi adapter So we can get clearer per device logging. Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:30:39 -04:00
Danijel Slivka	afbf7955ff	drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts Why: Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit if RB_ENABLE is not set. How to fix: Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set. The RB_ENABLE bit is required to be set, together with WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit would clear the RB_OVERFLOW. Signed-off-by: Danijel Slivka <danijel.slivka@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:30:27 -04:00
Kenneth Feng	bf826ba9b4	Revert "drm/amd/amdgpu: add module parameter for jpeg" This reverts commit `d3620eeae8`. Revert this due to a final solution: commit `ed3165d660` ("drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playback") Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Sonny Jiang <sonjiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:40 -04:00
Lijo Lazar	fe86c4d1a2	drm/amdgpu: Don't show false warning for reg list If reg list is already loaded on PSP 13.0.2 SOCs, psp will give TEE_ERR_CANCEL response on second time load. Avoid printing warn message for it. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:39 -04:00
Julia Zhang	79ea35c7d8	drm/amdgpu: avoid using null object of framebuffer Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Reported-by: Fusheng Huang <fusheng.huang@ecarxgroup.com> Signed-off-by: Julia Zhang <Julia.Zhang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:39 -04:00
Hawking Zhang	bdbdc7cecd	drm/amdgpu: Fix smatch static checker warning adev->gfx.imu.funcs could be NULL Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:39 -04:00
Bob Zhou	9ff2e14cf0	drm/amdgpu: add missing error handling in function amdgpu_gmc_flush_gpu_tlb_pasid Fix the unchecked return value warning reported by Coverity, so add error handling. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:39 -04:00
Hawking Zhang	9da0f77367	drm/amdgpu: Fix register access violation fault_status is read only register. fault_cntl is not accessible from guest environment. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:35 -04:00
Frank Min	748bd8ebae	drm/amdgpu: access ltr through pci cfg space Access ltr through pci cfg space instead of mmio while programing aspm on gfx12 Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:35 -04:00
Yang Wang	6bab222b8b	drm/amdgpu: refine gfx12 firmware loading refine gfx12 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:35 -04:00
Frank Min	541fe90ee6	drm/amdgpu: update MTYPE mapping for gfx12 gfx12 only support MTYPE UC and NC, so update it accordingly. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:47 -04:00
Pierre-Eric Pelloux-Prayer	c71c9aafd5	amdgpu: don't dereference a NULL resource in sysfs code dma_resv_trylock being successful doesn't guarantee that bo->tbo.base.resv is not NULL, so check its validity before using it. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:46 -04:00
Lijo Lazar	e779af8e8b	drm/amdgpu: Fix pci state save during mode-1 reset Cache the PCI state before bus master is disabled. The saved state is later used for other cases like restoring config space after mode-2 reset. Fixes: `5c03e5843e` ("drm/amdgpu:add smu mode1/2 support for aldebaran") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:46 -04:00
Yang Wang	dab70d9f65	drm/amdgpu: refine gfx11 firmware loading refine gfx11 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:46 -04:00
Sonny Jiang	ed3165d660	drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playback Doorbell needs to be configured after power up during each playback Signed-off-by: Sonny Jiang <sonjiang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:41 -04:00
Alex Deucher	83edf00d89	drm/amdgpu/atomfirmware: fix parsing of vram_info v3.x changed the how vram width was encoded. The previous implementation actually worked correctly for most boards. Fix the implementation to work correctly everywhere. This fixes the vram width reported in the kernel log on some boards. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:41 -04:00
Dave Airlie	365aa9f573	amd-drm-next-6.11-2024-06-22: amdgpu: - HPD fixes - PSR fixes - DCC updates - DCN 4.0.1 fixes - FAMS fixes - Misc code cleanups - SR-IOV fixes - GPUVM TLB flush cleanups - Make VCN less verbose - ACPI backlight fixes - MES fixes - Firmware loading cleanups - Replay fixes - LTTPR fixes - Trap handler fixes - Cursor and overlay fixes - Primary plane zpos fixes - DML 2.1 fixes - RAS updates - USB4 fixes - MALL fixes - Reserved VMID fix - Silence UBSAN warnings amdkfd: - Misc code cleanups -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZnbsMgAKCRC93/aFa7yZ 2GTUAQDGl46/pxpFRmZ8n3Hy6OCnmvoFqI9Re1uAc2RbQNNOAgEAi72PwS2iquU/ 69ectl+oi8P/yNMxP4rO1KgOP3AMsg8= =bvKZ -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.11-2024-06-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-06-22: amdgpu: - HPD fixes - PSR fixes - DCC updates - DCN 4.0.1 fixes - FAMS fixes - Misc code cleanups - SR-IOV fixes - GPUVM TLB flush cleanups - Make VCN less verbose - ACPI backlight fixes - MES fixes - Firmware loading cleanups - Replay fixes - LTTPR fixes - Trap handler fixes - Cursor and overlay fixes - Primary plane zpos fixes - DML 2.1 fixes - RAS updates - USB4 fixes - MALL fixes - Reserved VMID fix - Silence UBSAN warnings amdkfd: - Misc code cleanups From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240622152523.2267072-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-06-27 17:18:49 +10:00
Lijo Lazar	48880f9686	drm/amdgpu: Don't show false warning for reg list If reg list is already loaded on PSP 13.0.2 SOCs, psp will give TEE_ERR_CANCEL response on second time load. Avoid printing warn message for it. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-25 14:22:56 -04:00
Julia Zhang	bcfa48ff78	drm/amdgpu: avoid using null object of framebuffer Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Reported-by: Fusheng Huang <fusheng.huang@ecarxgroup.com> Signed-off-by: Julia Zhang <Julia.Zhang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-06-25 14:22:08 -04:00
Lijo Lazar	74fa02c4a5	drm/amdgpu: Fix pci state save during mode-1 reset Cache the PCI state before bus master is disabled. The saved state is later used for other cases like restoring config space after mode-2 reset. Fixes: `5c03e5843e` ("drm/amdgpu:add smu mode1/2 support for aldebaran") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-25 14:13:12 -04:00
Alex Deucher	f6f49dda49	drm/amdgpu/atomfirmware: fix parsing of vram_info v3.x changed the how vram width was encoded. The previous implementation actually worked correctly for most boards. Fix the implementation to work correctly everywhere. This fixes the vram width reported in the kernel log on some boards. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-06-25 14:11:46 -04:00
Dave Airlie	f680df51ca	drm-misc-next for 6.11: UAPI Changes: - Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION Core Changes: - connector: Create a set of helpers to help with HDMI support - fbdev: Create memory manager optimized fbdev emulation - panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer Driver Changes: - Remove driver owner assignments - Allow more drivers to compile with COMPILE_TEST - Conversions to drm_edid - ivpu: hardware scheduler support, profiling support, improvements to the platform support layer - mgag200: general reworks and improvements - nouveau: Add NVreg_RegistryDwords command line option - rockchip: Conversion to the hdmi helpers - sun4i: Conversion to the hdmi helpers - vc4: Conversion to the hdmi helpers - v3d: Perf counters improvements - zynqmp: IRQ and debugfs improvements - bridge: - Remove redundant checks on bridge->encoder - panels: - Switch panels from register table initialization to proper code - Now that the panel code tracks the panel state, remove every ad-hoc implementation in the panel drivers - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology 13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE nv110wum-l60, IVO t109nw41 -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZlhUKAAKCRAnX84Zoj2+ dgHoAYDTpShgXFXnlnMtqZr+ZuShcjcwiqzwM4qNWdtyji9MONtJJU3ZQnGlnXbI ZU+oZP0Bf0PyT0/8bf+rmZBJ1UdAxt2IQaLkP1tTHOad4E+KlcL5n1opzMi160mB EZSm9f7aNw== =bZPt -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2024-05-30' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.11: UAPI Changes: - Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION Core Changes: - connector: Create a set of helpers to help with HDMI support - fbdev: Create memory manager optimized fbdev emulation - panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer Driver Changes: - Remove driver owner assignments - Allow more drivers to compile with COMPILE_TEST - Conversions to drm_edid - ivpu: hardware scheduler support, profiling support, improvements to the platform support layer - mgag200: general reworks and improvements - nouveau: Add NVreg_RegistryDwords command line option - rockchip: Conversion to the hdmi helpers - sun4i: Conversion to the hdmi helpers - vc4: Conversion to the hdmi helpers - v3d: Perf counters improvements - zynqmp: IRQ and debugfs improvements - bridge: - Remove redundant checks on bridge->encoder - panels: - Switch panels from register table initialization to proper code - Now that the panel code tracks the panel state, remove every ad-hoc implementation in the panel drivers - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology 13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE nv110wum-l60, IVO t109nw41 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240530-hilarious-flat-magpie-5fa186@houat	2024-06-21 10:30:31 +10:00
Likun Gao	ed5a4484f0	drm/amdgpu: init TA fw for psp v14 Add support to init TA firmware for psp v14. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 18:25:58 -04:00
Christian König	e356d321d0	drm/amdgpu: cleanup MES11 command submission The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: `eef016ba89` ("drm/amdgpu/mes11: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 18:25:58 -04:00
Christian König	8bd82363e2	drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2 This reverts commit `b8c415e3bf` ("drm/amdgpu: take runtime pm reference when we attach a buffer") and commit `425285d39a` ("drm/amdgpu: add amdgpu runpm usage trace for separate funcs"). Taking a runtime pm reference for DMA-buf is actually completely unnecessary and even dangerous. The problem is that calling pm_runtime_get_sync() from the DMA-buf callbacks is illegal because we have the reservation locked here which is also taken during resume. So this would deadlock. When the buffer is in GTT it is still accessible even when the GPU is powered down and when it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-06-19 14:17:25 -04:00
Harish Kasiviswanathan	49c9ffabde	drm/amdgpu: Indicate CU havest info to CP To achieve full occupancy CP hardware needs to know if CUs in SE are symmetrically or asymmetrically harvested v2: Reset is_symmetric_cus for each loop Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 14:17:25 -04:00
Yunxiang Li	84801d4f1e	drm/amdgpu: fix locking scope when flushing tlb Which method is used to flush tlb does not depend on whether a reset is in progress or not. We should skip flush altogether if the GPU will get reset. So put both path under reset_domain read lock. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-06-19 14:17:25 -04:00
Likun Gao	1ecef55893	drm/amdgpu: init TA fw for psp v14 Add support to init TA firmware for psp v14. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:52:43 -04:00
Yang Wang	017d0b67bf	drm/amdgpu: refine gfx6 firmware loading refine gfx6 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:52:36 -04:00
Yang Wang	a4fcb5f733	Revert "drm/amdgpu: change aca bank error lock type to spinlock" This reverts commit `f6bce954f4`. Revert this patch to modify lock type back to 'mutex' to avoid kernel calltrace issue. [ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu] [ 602.668939] Call Trace: [ 602.668940] <TASK> [ 602.668941] dump_stack_lvl+0x4c/0x70 [ 602.668945] dump_stack+0x14/0x20 [ 602.668946] __schedule_bug+0x5a/0x70 [ 602.668950] __schedule+0x940/0xb30 [ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668955] ? hrtimer_reprogram+0x77/0xb0 [ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668959] ? hrtimer_start_range_ns+0x126/0x370 [ 602.668961] schedule+0x39/0xe0 [ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140 [ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 602.668966] schedule_hrtimeout_range+0x17/0x20 [ 602.668967] usleep_range_state+0x69/0x90 [ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu] [ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu] [ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu] [ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu] [ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669339] ? stack_depot_save+0x12/0x20 [ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669343] ? set_track_prepare+0x52/0x70 [ 602.669346] ? kmemleak_alloc+0x4f/0x90 [ 602.669348] ? __kmalloc_node+0x34b/0x450 [ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu] [ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu] [ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu] [ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu] [ 602.669743] ? kmemleak_free+0x36/0x60 [ 602.669745] ? kvfree+0x32/0x40 [ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669749] ? kfree+0x15d/0x2a0 [ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu] [ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu] [ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0 [ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu] [ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu] [ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu] [ 602.670187] ? srso_alias_return_thunk+0x5/0 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: YiPeng Chai <yipeng.chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:51:23 -04:00
Yang Wang	8c9ee18019	Revert "drm/amdgpu: change bank cache lock type to spinlock" This reverts commit `258ed689bc` revert this patch to modify lock type back to 'mutex' to avoid kernel calltrace issue. [ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu] [ 602.668939] Call Trace: [ 602.668940] <TASK> [ 602.668941] dump_stack_lvl+0x4c/0x70 [ 602.668945] dump_stack+0x14/0x20 [ 602.668946] __schedule_bug+0x5a/0x70 [ 602.668950] __schedule+0x940/0xb30 [ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668955] ? hrtimer_reprogram+0x77/0xb0 [ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668959] ? hrtimer_start_range_ns+0x126/0x370 [ 602.668961] schedule+0x39/0xe0 [ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140 [ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 602.668966] schedule_hrtimeout_range+0x17/0x20 [ 602.668967] usleep_range_state+0x69/0x90 [ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu] [ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu] [ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu] [ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu] [ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669339] ? stack_depot_save+0x12/0x20 [ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669343] ? set_track_prepare+0x52/0x70 [ 602.669346] ? kmemleak_alloc+0x4f/0x90 [ 602.669348] ? __kmalloc_node+0x34b/0x450 [ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu] [ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu] [ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu] [ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu] [ 602.669743] ? kmemleak_free+0x36/0x60 [ 602.669745] ? kvfree+0x32/0x40 [ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669749] ? kfree+0x15d/0x2a0 [ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu] [ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu] [ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0 [ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu] [ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu] [ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu] [ 602.670187] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670189] ? __schedule+0x37d/0xb30 [ 602.670191] process_one_work+0x176/0x350 [ 602.670194] worker_thread+0x2f7/0x420 [ 602.670197] ? Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:50:31 -04:00
Alex Deucher	19797687e6	drm/amdgpu: remove amdgpu_mes_fence_wait_polling() No longer used so remove it. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:50:24 -04:00
Alex Deucher	fffe347e14	drm/amdgpu: cleanup MES12 command submission The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: `ade887c633` ("drm/amdgpu/mes12: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:49:59 -04:00
Yang Wang	3af2c80ae2	drm/amdgpu: refine gfx10 firmware loading refine gfx10 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:49:53 -04:00
Yang Wang	23fc94795b	drm/amdgpu: refine gfx9 firmware loading refine gfx9 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:49:28 -04:00
Christian König	de32462541	drm/amdgpu: cleanup MES11 command submission The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: `eef016ba89` ("drm/amdgpu/mes11: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:48:25 -04:00
Christian König	a6328c9c3d	drm/amdgpu: fix using the reserved VMID with gang submit We need to ensure that even when using a reserved VMID that the gang members can still run in parallel. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:48:00 -04:00
Victor Lu	b32563859d	drm/amdgpu: Do not wait for MP0_C2PMSG_33 IFWI init in SRIOV SRIOV does not need to wait for IFWI init, and MP0_C2PMSG_33 is blocked for VF access. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:47:52 -04:00
Mukul Joshi	4d14a74054	Revert "drm/amdgpu: Add missing locking for MES API calls" This reverts commit `3612702852`. This is causing a BUG message during suspend. [ 61.603542] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283 [ 61.603550] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2028, name: kworker/u64:14 [ 61.603553] preempt_count: 1, expected: 0 [ 61.603555] RCU nest depth: 0, expected: 0 [ 61.603557] Preemption disabled at: [ 61.603559] [<ffffffffc08a3261>] amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.603789] CPU: 9 PID: 2028 Comm: kworker/u64:14 Tainted: G W 6.8.0+ #7 [ 61.603795] Workqueue: events_unbound async_run_entry_fn [ 61.603801] Call Trace: [ 61.603803] <TASK> [ 61.603806] dump_stack_lvl+0x37/0x50 [ 61.603811] ? amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.604007] dump_stack+0x10/0x20 [ 61.604010] __might_resched+0x16f/0x1d0 [ 61.604016] __might_sleep+0x43/0x70 [ 61.604020] mutex_lock+0x1f/0x60 [ 61.604024] amdgpu_mes_unmap_legacy_queue+0x6d/0x100 [amdgpu] [ 61.604226] gfx11_kiq_unmap_queues+0x3dc/0x430 [amdgpu] [ 61.604422] ? srso_alias_return_thunk+0x5/0xfbef5 [ 61.604429] amdgpu_gfx_disable_kgq+0x122/0x160 [amdgpu] [ 61.604621] gfx_v11_0_hw_fini+0xda/0x100 [amdgpu] [ 61.604814] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 61.605008] amdgpu_device_ip_suspend_phase2+0x135/0x1d0 [amdgpu] [ 61.605175] amdgpu_device_suspend+0xec/0x180 [amdgpu] Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:46:39 -04:00
Yang Wang	3a3be8bb97	drm/amdgpu: refine gfx8 firmware loading refine gfx8 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:18:27 -04:00
Tao Zhou	09a3d8202d	drm/amdgpu: set RAS fed status for more cases Indicate fatal error for each RAS block and NBIO. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:18:26 -04:00
Tao Zhou	7e4371676e	drm/amdgpu: create amdgpu_ras_in_recovery to simplify code Reduce redundant code and user doesn't need to pay attention to RAS details. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:18:26 -04:00
Tao Zhou	5f7697bbc1	drm/amdgpu: trigger mode1 reset for RAS RMA status Check RMA status in bad page retirement flow. v2: fix coding bugs in v1. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:18:26 -04:00
Yang Wang	9d26e0cfc2	drm/amdgpu: refine gfx7 firmware loading refine gfx7 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:15 -04:00
Christian König	030631e97b	drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2 This reverts commit `b8c415e3bf` ("drm/amdgpu: take runtime pm reference when we attach a buffer") and commit `425285d39a` ("drm/amdgpu: add amdgpu runpm usage trace for separate funcs"). Taking a runtime pm reference for DMA-buf is actually completely unnecessary and even dangerous. The problem is that calling pm_runtime_get_sync() from the DMA-buf callbacks is illegal because we have the reservation locked here which is also taken during resume. So this would deadlock. When the buffer is in GTT it is still accessible even when the GPU is powered down and when it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-06-14 16:17:13 -04:00
Yang Wang	c37b8f7868	drm/amdgpu: refine imu firmware loading refine imu firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	cd093c24ee	drm/amdgpu: refine gmc firmware loading refine gmc firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	8d7ff60f36	drm/amdgpu: refine vpe firmware loading refine vpe firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	b441e9ac9d	drm/amdgpu: refine vcn firmware loading refine vcn firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	9817f06173	drm/amdgpu: move aca/mca init functions into ras_init() stage adjust the function position to better match aca/mca fini code in ras_fini(). Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Bob Zhou	be6a69b21a	drm/amdgpu: fix overflowed constant warning in mmhub_set_clockgating() To fix potential overflowed constant warning, modify the variables to u32 for getting the return value of RREG32_SOC15(). Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Harish Kasiviswanathan	199d69d5f9	drm/amdgpu: Indicate CU havest info to CP To achieve full occupancy CP hardware needs to know if CUs in SE are symmetrically or asymmetrically harvested v2: Reset is_symmetric_cus for each loop Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Lijo Lazar	3a86fdc422	drm/amdgpu: Skip coredump during resets for debug Skip scheduling coredump when gpu reset is intentionally triggered through debugfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	3618fa26c8	drm/amdgpu: refine sdma firmware loading refine sdma firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	8cae4b578e	drm/amdgpu: refine psp firmware loading refine psp firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:11 -04:00
Yang Wang	bf349b036d	drm/amdgpu: refine mes firmware loading v1: refine mes firmware loading v2: use dev_info instead of DRM_INFO Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:11 -04:00
Mukul Joshi	3612702852	drm/amdgpu: Add missing locking for MES API calls Add missing locking at a few places when calling MES APIs to ensure exclusive access to MES queue. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:11 -04:00
Mario Limonciello	2fe87f54ab	drm/amd/display: Set default brightness according to ACPI Currently, amdgpu will always set up the brightness at 100% when it loads. However this is jarring when the BIOS has it previously programmed to a much lower value. The ACPI ATIF method includes two members for "ac_level" and "dc_level". These represent the default values that should be used if the system is brought up in AC and DC respectively. Use these values to set up the default brightness when the backlight device is registered. v2: squash in ACPI fix Reviewed-by: Leo Li <sunpeng.li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:11 -04:00
David (Ming Qiang) Wu	ee3942d9ab	drm/amdgpu: drop some kernel messages in VCN code Similar to commit `813e7d4cd0` where some kernel log messages are dropped. With this commit, more log messages in older version of VCN/JPEG code are dropped. Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:16:51 -04:00
Yang Wang	a777c9d70a	drm/amdgpu: refine gpu_info firmware loading refine gpu_info firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yang Wang	1bfe5e7746	drm/amdgpu: enhance amdgpu_ucode_request() function flexibility v1: Adding formatting string feature to improve function flexibility. v2: modify macro name to ADMGPU_UCODE_MAX_NAME. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Bob Zhou	37f432481d	drm/amdgpu: fix the overflowed constant warning for RREG32_SOC15() To fix potential overflowed constant warning reported by Coverity, modify the variables to uint32_t. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yunxiang Li	18f2525d31	drm/amdgpu: add lock in amdgpu_gart_invalidate_tlb We need to take the reset domain lock before flush hdp. We can't put the lock inside amdgpu_device_flush_hdp itself because it is used during reset where we already take the write side lock. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yunxiang Li	9c33e5fd4f	drm/amdgpu: fix locking scope when flushing tlb Which method is used to flush tlb does not depend on whether a reset is in progress or not. We should skip flush altogether if the GPU will get reset. So put both path under reset_domain read lock. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-06-14 16:15:59 -04:00
Yunxiang Li	ba531117a8	drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable Here since we are in reset and takes the reset_domain write side lock already. We can't use the flush tlb helper which tries to take the read side. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yunxiang Li	c1f9d82b92	drm/amdgpu: use helper in amdgpu_gart_unbind When amdgpu_gart_invalidate_tlb helper is introduced this part was left out of the conversion. Avoid the code duplication here. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yunxiang Li	4b0e76e4c1	drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover At this point the gart is not set up, there's no point to invalidate tlb here and it could even be harmful. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Yunxiang Li	5c0a1cdd17	drm/amdgpu: fix sriov host flr handler We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen. In the current state since we take tens of seconds to stop everything, it is very likely that host would give up waiting and reset the GPU before we send ready, so it would be the same as before. But this gets rid of the hack with reset_domain locking and also let us tell how slow ready to reset actually is from the host. The ready to reset speed can be improved later. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Yunxiang Li	b3948ad1ac	drm/amdgpu: add skip_hw_access checks for sriov Accessing registers via host is missing the check for skip_hw_access and the lockdep check that comes with it. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Eric Huang	bac640ddb5	drm/amdgpu: add reset source in various cases To fullfill the reset event description. Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Eric Huang	7bed1df814	drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc amdgpu_job_ring may return NULL, which causes kernel NULL pointer error, using another way to print ring name instead of ring->name. Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Jesse Zhang	27b500b77b	drm/amdgpu: remove dead code in atom_get_src_int Since the range of align is 0~7, the expression is: align = (attr >> 3) & 7. In the case of ATOM_ARG_IMM, the code cannot reach the default case. So there is no need for "break". Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 15:34:10 -04:00
Frank Min	faa64f633c	drm/amdgpu: add sdma 7.0 support for copy dcc buffer 1. Add dcc buffer flag for copy buffer 2. Add sdma 7.0 support copy dcc buffer Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 15:22:14 -04:00
Likun Gao	7c85e97083	drm/amdgpu: support for DCC feature Deal with AMDGPU_GEM_CREATE_GFX12_DCC to set DCC bit when needed. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 15:21:51 -04:00
Alex Deucher	6b83b94a94	drm/amdgpu: add additional VM bits Add additional VM PTE bits. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 15:20:56 -04:00
Maxime Ripard	14731a640e	Merge drm/drm-fixes into drm-misc-fixes Roll -rc3 and current drm/fixes in. This will also unstuck our for-next branch. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-06-14 09:55:46 +02:00
Dave Airlie	1ddaaa2440	amd-drm-next-6.11-2024-06-07: amdgpu: - DCN 4.0.x support - DCN 3.5 updates - GC 12.0 support - DP MST fixes - Cursor fixes - MES11 updates - MMHUB 4.1 support - DML2 Updates - DCN 3.1.5 fixes - IPS fixes - Various code cleanups - GMC 12.0 support - SDMA 7.0 support - SMU 13 updates - SR-IOV fixes - VCN 5.x fixes - MES12 support - SMU 14.x updates - Devcoredump improvements - Fixes for HDP flush on platforms with >4k pages - GC 9.4.3 fixes - RAS ACA updates - Silence UBSAN flex array warnings - MMHUB 3.3 updates amdkfd: - Contiguous VRAM allocations - GC 12.0 support - SDMA 7.0 support - SR-IOV fixes radeon: - Backlight workaround for iMac - Silence UBSAN flex array warnings UAPI: - GFX12 modifier and DCC support Proposed Mesa changes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510 - KFD GFX ALU exceptions Proposed ROCdebugger changes: `08c760622b` `944fe1c141` - KFD Contiguous VRAM allocation flag Proposed ROCr/HIP changes: `f7b4a26991` `26e8530d05` `1d48f2a1ab` -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZmNlVQAKCRC93/aFa7yZ 2MLSAP9lflyG//+wC9WWX5OjvqFWO6qkhVm1w55xy6TwN9NkqQEA76TqmcNZ6rk1 4o9RaYpMJQU275FvK1NvwUbl4PPQYAs= =cFxt -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.11-2024-06-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-06-07: amdgpu: - DCN 4.0.x support - DCN 3.5 updates - GC 12.0 support - DP MST fixes - Cursor fixes - MES11 updates - MMHUB 4.1 support - DML2 Updates - DCN 3.1.5 fixes - IPS fixes - Various code cleanups - GMC 12.0 support - SDMA 7.0 support - SMU 13 updates - SR-IOV fixes - VCN 5.x fixes - MES12 support - SMU 14.x updates - Devcoredump improvements - Fixes for HDP flush on platforms with >4k pages - GC 9.4.3 fixes - RAS ACA updates - Silence UBSAN flex array warnings - MMHUB 3.3 updates amdkfd: - Contiguous VRAM allocations - GC 12.0 support - SDMA 7.0 support - SR-IOV fixes radeon: - Backlight workaround for iMac - Silence UBSAN flex array warnings UAPI: - GFX12 modifier and DCC support Proposed Mesa changes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510 - KFD GFX ALU exceptions Proposed ROCdebugger changes: `08c760622b` `944fe1c141` - KFD Contiguous VRAM allocation flag Proposed ROCr/HIP changes: `f7b4a26991` `26e8530d05` `1d48f2a1ab` Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240607195900.902537-1-alexander.deucher@amd.com	2024-06-11 14:01:55 +10:00
Arunpravin Paneer Selvam	31849bf07e	drm/amdgpu: Fix the BO release clear memory warning This happens when the amdgpu_bo_release_notify running before amdgpu_ttm_set_buffer_funcs_status set the buffer funcs to enabled. check the buffer funcs enablement before calling the fill buffer memory. v2:(Christian) - Apply it only for GEM buffers and since GEM buffers are only allocated/freed while the driver is loaded we never run into the issue to clear with buffer funcs disabled. v3:(Mario) - drop the stable tag as this will presumably go into a -fixes PR for 6.10 Log snip: ERROR Trying to clear memory with ring turned off. RIP: 0010:amdgpu_bo_release_notify+0x201/0x220 [amdgpu] Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Richard Gong <richard.gong@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240610180401.9540-1-Arunpravin.PaneerSelvam@amd.com	2024-06-10 23:46:33 +05:30
Tao Zhou	b95fa494d6	drm/amdgpu: add RAS is_rma flag Set the flag to true if bad page number reaches threshold. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:14 -04:00
Lin.Cao	c8ad1bbbc2	drm/amdgpu: fix failure mapping legacy queue when FLR Flag "mes.ring.shced.ready" will be set as true after mes hw init and set as false when mes hw fini to avoid duplicate initialization. But hw fini will not be called when function level reset, which will cause mes hw init be skipped during FLR, which will leads to mapping legacy queue fail. Set this flag as false when post reset will fix this issue. Signed-off-by: Lin.Cao <lincao12@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:14 -04:00
Eric Huang	dbe2c4c8ab	drm/amdkfd: add reset cause in gpu pre-reset smi event reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources suggested by Lijo Lazar Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:14 -04:00
Frank Min	978f5428c9	drm/amdgpu: Set PTE_IS_PTE bit for gfx12 Set PTE_IS_PTE bit while PRT is enabled on gfx12. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:14 -04:00
Eric Huang	2656e1ce78	drm/amdgpu: add reset sources in gpu reset context reset source or reset cause is very useful info for reset context, it will be used by events API. Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:13 -04:00
Shane Xiao	45bd39fb3b	drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_VG10 This patch changes the implementation of AMDGPU_PTE_MTYPE_VG10, clear the bits before setting the new one. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: longlyao <Longlong.Yao@amd.com> Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:13 -04:00
Alex Deucher	a9ebd10482	Revert "drm/amdgpu/gfx11: enable gfx pipe1 hardware support" This reverts commit `6670142d25`. Pierre-Eric reported problems with this on his navi33. Revert for now until we understand what is going wrong. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Pierre-eric.Pelloux-prayer@amd.com	2024-06-05 11:03:45 -04:00
Shane Xiao	3928290102	drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_NV10 This patch changes the implementation of AMDGPU_PTE_MTYPE_NV10, clear the bits before setting the new one. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: longlyao <Longlong.Yao@amd.com> Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:43 -04:00
Yifan Zhang	301dfbfc84	drm/amdgpu: disable lane0 L1TLB and enable lane1 L1TLB This patch to disable lane0 L1TLB and enable lane1 L1TLB. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:37 -04:00
Yifan Zhang	b7e2170b87	drm/amdgpu: init SAW registers for mmhub v3.3 This patch to configure mmhub3.3 SAW registers Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:30 -04:00
Sunil Khatri	a1a049bd59	drm/amdgpu: fix comments and error message for ipdump Fix comments and error messages to rightly represent the information. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:09 -04:00
Sunil Khatri	33837d62a4	drm/amdgpu: rename ip_dump_cp_queues to compute queues Rename the variable ip_dump_cp_queues to ip_dump_compute_queue as it represent compute queues. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:03 -04:00
Sunil Khatri	34b8d94b6c	drm/amdgpu: add cp queue registers for gfx9 ipdump Add gfx9 support of CP queue registers for all queues to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:01:57 -04:00
Sunil Khatri	173ef9182a	drm/amdgpu: add print support for gfx9 ipdump Add support of gfx9 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:01:50 -04:00
Sunil Khatri	514dc965b2	drm/amdgpu: add gfx9 register support in ipdump Add general registers of gfx9 in ipdump for devcoredump support. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:01:28 -04:00
Srinivasan Shanmugam	745f7170db	drm/amdgpu: Fix type mismatch in amdgpu_gfx_kiq_init_ring This commit fixes a type mismatch in the amdgpu_gfx_kiq_init_ring function triggered by the snprintf function expecting unsigned char arguments due to the '%hhu' format specifier, but receiving int and u32 arguments. The issue occurred because the arguments xcc_id, ring->me, ring->pipe, and ring->queue were of type int and u32, not unsigned char. This led to a type mismatch when these arguments were passed to snprintf. To resolve this, the snprintf line was modified to cast these arguments to unsigned char. This ensures that the arguments are of the correct type for the '%hhu' format specifier and resolves the warning. Fixes the below: >> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:4: warning: format >> specifies type 'unsigned char' but the argument has type 'int' >> [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~ >> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:12: warning: format >> specifies type 'unsigned char' but the argument has type 'u32' (aka >> 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:22: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:34: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~~~~ 4 warnings generated. Fixes: `0ea5544555` ("drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405250446.XeaWe66u-lkp@intel.com/ Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:58:43 -04:00
Jesse Zhang	c09d2eff81	drm/amdgu: fix Unintentional integer overflow for mall size Potentially overflowing expression mall_size_per_umc * adev->gmc.num_umc with type unsigned int (32 bits, unsigned) is evaluated using 32-bit arithmetic,and then used in a context that expects an expression of type u64 (64 bits, unsigned). Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:58:36 -04:00
Hawking Zhang	a474161e84	drm/amdgpu: Update programming for boot error reporting AMDGPU_RAS_GPU_ERR_BOOT_STATUS field is no longer valid. The polling sequence is also simplifed according to the latest firmware change. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:58:26 -04:00
Alex Deucher	7d3b9668e6	drm/amdgpu/soc24: use common nbio callback to set remap offset This fixes HDP flushes on systems with non-4K pages. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:58:10 -04:00
Hawking Zhang	473af28d3e	drm/amdgpu: Estimate RAS reservation when report capacity v2 Add estimate of how much vram we need to reserve for RAS when caculating the total available vram. v2: apply the change to MP0 v13_0_2 and v13_0_14 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:57:53 -04:00
Tao Zhou	76bec2a031	drm/amdgpu: use u32 for buf size in __amdgpu_eeprom_xfer And also make sure the value of msg[1].len should be in the range of u16. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:57:47 -04:00
David (Ming Qiang) Wu	813e7d4cd0	drm/amdgpu: drop some kernel messages in VCN code We have messages when the VCN fails to initialize and there is no need to report on success. Also PSP loading FWs is the default for production. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sonny Jiang <sonjiang@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:57:34 -04:00
Shane Xiao	eba791dc17	drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_GFX12 This patch changes the implementation of AMDGPU_PTE_MTYPE_GFX12, clear the bits before setting the new one. This fixed the potential issue that GFX12 setting memory to NC. v2: Clear mtype field before setting the new one (Alex) v3: Fix typo (Felix) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: longlyao <Longlong.Yao@amd.com> Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:57:24 -04:00
Dave Airlie	bb61cf46b6	amd-drm-fixes-6.10-2024-05-30: amdgpu: - RAS fix - Fix colorspace property for MST connectors - Fix for PCIe DPM - Silence UBSAN warning - GPUVM robustness fix - Partition fix - Drop deprecated I2C_CLASS_SPD amdkfd: - Revert unused changes for certain 11.0.3 devices - Simplify APU VRAM handling -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZljezAAKCRC93/aFa7yZ 2PZpAP43w/Q7aNgEpzdViO6QckU2GtOaRskavnzew4yxd5Y4xAD/U4vEQ3V7Rc0Q XRv4c+rJAVXIlfcS4guLvHSTyqzAPQY= =WQzB -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.10-2024-05-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.10-2024-05-30: amdgpu: - RAS fix - Fix colorspace property for MST connectors - Fix for PCIe DPM - Silence UBSAN warning - GPUVM robustness fix - Partition fix - Drop deprecated I2C_CLASS_SPD amdkfd: - Revert unused changes for certain 11.0.3 devices - Simplify APU VRAM handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240530202316.2246826-1-alexander.deucher@amd.com	2024-05-31 08:38:15 +10:00
Rajneesh Bhardwaj	a9bc5a19e4	drm/amdgpu: Make CPX mode auto default in NPS4 On GFXIP9.4.3, make CPX mode as the default compute mode if the node is setup in NPS4 memory partition mode. This change is only applicable for dGPU, for APU, continue to use TPX mode. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 17:06:48 -04:00
Alex Deucher	1f327dfc84	drm/amdkfd: simplify APU VRAM handling With commit `89773b8559` ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 17:06:15 -04:00
Jesse Zhang	a0cf36546c	drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 17:03:20 -04:00
Alex Deucher	ba46b3bda2	drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth() Use current speed/width on devices which don't support dynamic PCIe switching. Fixes: `466a7d1153` ("drm/amd: Use the first non-dGPU PCI device for BW limits") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289 Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 17:01:49 -04:00
Alex Deucher	6670142d25	drm/amdgpu/gfx11: enable gfx pipe1 hardware support Enable gfx pipe1 hardware support. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Rajneesh Bhardwaj	ef5c0f897e	drm/amdgpu: Make CPX mode auto default in NPS4 On GFXIP9.4.3, make CPX mode as the default compute mode if the node is setup in NPS4 memory partition mode. This change is only applicable for dGPU, for APU, continue to use TPX mode. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Alex Deucher	2e216b1e6b	drm/amdgpu/gfx11: handle priority setup for gfx pipe1 Set up pipe1 as a high priority queue. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Alex Deucher	eab57bf22f	drm/amdgpu/gfx11: select HDP ref/mask according to gfx ring pipe Use correct ref/mask for differnent gfx ring pipe. Ported from ZhenGuo's patch for gfx10. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Victor Skvortsov	e864180ee4	drm/amdgpu: Add lock around VF RLCG interface flush_gpu_tlb may be called from another thread while device_gpu_recover is running. Both of these threads access registers through the VF RLCG interface during VF Full Access. Add a lock around this interface to prevent race conditions between these threads. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Alex Deucher	7978c4d414	drm/amdkfd: simplify APU VRAM handling With commit `89773b8559` ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Yang Wang	3c603b1fa8	drm/amdgpu: fix typo in amdgpu_ras_aca_sysfs_read() function fix typo "info.ue_count" in amdgpu_ras_aca_sysfs_read() function. Fixes: `865d339763` ("drm/amdgpu: add aca deferred error type support") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:09:48 -04:00
Jesse Zhang	511a623fb4	drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:09:41 -04:00
David (Ming Qiang) Wu	ab47fa8358	drm/amd/amdgpu: add AMD_PG_SUPPORT_VCN_DPG flag AMD_PG_SUPPORT_VCN_DPG is needed for secure parts and should/can be enabled by now. Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Sonny Jiang <sonjiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:09:35 -04:00
Alex Deucher	f2a1fbdd1f	drm/amdgpu: drop MES 10.1 support v3 It was an enablement vehicle for MES 11 and was never productized. Remove it. v2: drop additional checks in the GFX10 code. v3: drop mes_api_def.h Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:09:01 -04:00
Alex Deucher	87dfeb47a5	drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth() Use current speed/width on devices which don't support dynamic PCIe switching. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289 Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:08:44 -04:00
Maxime Ripard	375c4d1583	Merge drm/drm-next into drm-misc-next Let's start the new release cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-27 11:08:31 +02:00
Linus Torvalds	56fb6f9285	drm fixes for 6.10-rc1 nouveau: - fix bo metadata uAPI for vm bind panthor: - Fixes for panthor's heap logical block. - Reset on unrecoverable fault - Fix VM references. - Reset fix. xlnx: - xlnx compile and doc fixes. amdgpu: - Handle vbios table integrated info v2.3 amdkfd: - Handle duplicate BOs in reserve_bo_and_cond_vms - Handle memory limitations on small APUs dp/mst: - MST null deref fix. bridge: - Don't let next bridge create connector in adv7511 to make probe work. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZQ9rAACgkQDHTzWXnE hr7kew//e+OwUJcock4rqpMsXrVwgse4bJ1H7gnHLmN/AQKsXWcaHKixJu+WkkQI 62gyhOEiSH+J4Je6zi5JbIt9AKgIKyZeJqg/QeSMqUFJzYkPKhrhj0aX4AefuNJB dxCI1+UnkNdQ/LM+Dwb4sq757FX0/yPqhzBiytWEJJVHP4Th5SMRFmSs8YsT1Z4F bn220QFvPv+KrVV2eKEf6plJqYhs92nT8cVkNw8x9LhnivLRhyh7we3Ew6R2zouV Izm1nYx1CtTqaiHo3+UOeqRLzhqpmukzmJyML9zk/qt+s+YMj9yI/ie02EnTPqg6 NQCqqmoDy1BpUa37N486Q7MFTiS076HMaQpYiDmVz3PIWWPvKZDqcC5vonoh99ra b88X4J4w3e747xZP7VDsnAOFsaLgcRah4JNtv8H3rcurbz0FrcV/0g/W/kt+Nenc k2m2s2hv56yWXd6B7bd3tfUe5dOwxc+CgiNddM5X3WPenEvnH+DKBOoFojSRl2o7 LpRL7WhY6fceRUqVrHzfIPLYgdfz/Ho9LYST2jGIdKDnwV0Hw0sNfU9C0KOzm0J6 rjgGEeMHgVdZPzSeXSmaO82mBRvR7Ke6da7FIazbocledOapsB5qxjhLvkRyULAp mlGZPRPJdNVM8slNLxEqrT/ugorIo5owtQdEnJmDJXFdozcLTMY= =pexB -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Some fixes for the end of the merge window, mostly amdgpu and panthor, with one nouveau uAPI change that fixes a bad decision we made a few months back. nouveau: - fix bo metadata uAPI for vm bind panthor: - Fixes for panthor's heap logical block. - Reset on unrecoverable fault - Fix VM references. - Reset fix. xlnx: - xlnx compile and doc fixes. amdgpu: - Handle vbios table integrated info v2.3 amdkfd: - Handle duplicate BOs in reserve_bo_and_cond_vms - Handle memory limitations on small APUs dp/mst: - MST null deref fix. bridge: - Don't let next bridge create connector in adv7511 to make probe work" * tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel: drm/amdgpu/atomfirmware: add intergrated info v2.3 table drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2 drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms drm/bridge: adv7511: Attach next bridge without creating connector drm/buddy: Fix the warn on's during force merge drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations drm/panthor: Call panthor_sched_post_reset() even if the reset failed drm/panthor: Reset the FW VM to NULL on unplug drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level drm/panthor: Force an immediate reset on unrecoverable faults drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints drm/panthor: Fix an off-by-one in the heap context retrieval logic drm/panthor: Relax the constraints on the tiler chunk size drm/panthor: Make sure the tiler initial/max chunks are consistent drm/panthor: Fix tiler OOM handling to allow incremental rendering drm: xlnx: zynqmp_dpsub: Fix compilation error drm: xlnx: zynqmp_dpsub: Fix few function comments	2024-05-24 17:28:02 -07:00
Hawking Zhang	ec58991054	drm/amdgpu: correct hbm field in boot status hbm filed takes bit 13 and bit 14 in boot status. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 18:13:23 -04:00
Sunil Khatri	498906d376	drm/amdgpu: add gfx queue support for gfx11 ipdump Add support of all the CP GFX queues for gfx11 ipdump to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:14:08 -04:00
Sunil Khatri	368c33ac8a	drm/amdgpu: add cp queue registers for gfx11 ipdump Add gfx11 support of CP queue registers for all queues to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:14:00 -04:00
Sunil Khatri	015a04a59e	drm/amdgpu: add print support for gfx11 ipdump Add support of gfx11 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:48 -04:00
Sunil Khatri	b5812822d9	drm/amdgpu: add gfx11 registers support in ipdump Add general registers of gfx11 in ipdump for devcoredump support. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:41 -04:00
Sunil Khatri	836bc350a5	drm/amdgpu: add more device info to the devcoredump Adding more device information: a. PCI info b. VRAM and GTT info c. GDC config Also correct the print layout and section information for in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:28 -04:00
Sunil Khatri	29b1fc665c	drm/amdgpu: add prints in IP State dump add prints before and after ip state is dumped. It avoids user to think of system being stuck/hung as dump could take some time after a gpu hang. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:20 -04:00
Sunil Khatri	8444453dce	drm/amdgpu: add gfx queue support of gfx10 in ipdump Add gfx queue register for all instances in devcoredump for gfx10. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:13 -04:00
Sunil Khatri	0f83227bc8	drm/amdgpu: Add cp queues support fro gfx10 in ipdump Add support to dump registers of all instances of cp queue registers of gfx10 to devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:06 -04:00
Sunil Khatri	74feef5667	drm/amdgpu: rename the ip_dump to ip_dump_core Rename the memory pointer from ip_dump to ip_dump_core to make it specific to core registers and rest other registers to be dumped in their respective memories. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:12:58 -04:00
Lijo Lazar	621a4e9efb	drm/amdgpu: Add CRC16 selection in config KFD uses crc16 for gpu_id generation. Fixes: `3ed181b8ff` ("drm/amdkfd: Ensure gpu_id is unique") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405211405.TidTWIBX-lkp@intel.com/ Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:52 -04:00
Victor Zhao	e21e0b7824	drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rw the inst passed to amdgpu_virt_rlcg_reg_rw should be physical instance. Fix the miss matched code. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:38 -04:00
Jane Jian	f889f9c68b	drm/amdgpu - optimize rlc spm cntl v1 - driver MMIO read the register to check whether write is required - if write is required, sriov full time to use rlcg, otherwise use KIQ v2 - include gfx v11 sriov runtime case Signed-off-by: Jane Jian <Jane.Jian@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:31 -04:00
Hawking Zhang	cf85764e2b	drm/amdgpu: correct hbm field in boot status hbm filed takes bit 13 and bit 14 in boot status. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:12 -04:00
Frank Min	8d7b149675	drm/amdgpu: program device_cntl2 through pci cfg space device_cntl2 is accessible from pci config space, so program it through pci cfg space instead of mmio. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:05 -04:00
Li Ma	19f0edd897	drm/amdgpu/atomfirmware: add intergrated info v2.3 table [Why] The vram width value is 0. Because the integratedsysteminfo table in VBIOS has updated to 2.3. [How] Driver needs a new intergrated info v2.3 table too. Then the vram width value will be correct. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:10:48 -04:00
Srinivasan Shanmugam	0ea5544555	drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring This commit fixes a format truncation issue arosed by the snprintf function potentially writing more characters into the ring->name buffer than it can hold, in the amdgpu_gfx_kiq_init_ring function The issue occurred because the '%d' format specifier could write between 1 and 10 bytes into a region of size between 0 and 8, depending on the values of xcc_id, ring->me, ring->pipe, and ring->queue. The snprintf function could output between 12 and 41 bytes into a destination of size 16, leading to potential truncation. To resolve this, the snprintf line was modified to use the '%hhu' format specifier for xcc_id, ring->me, ring->pipe, and ring->queue. The '%hhu' specifier is used for unsigned char variables and ensures that these values are printed as unsigned decimal integers. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_kiq_init_ring’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:61: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 8 [-Wformat-truncation=] 332 \| snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", \| ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:50: note: directive argument in the range [0, 2147483647] 332 \| snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", \| ^~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:9: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 16 332 \| snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 333 \| xcc_id, ring->me, ring->pipe, ring->queue); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `345a36c4f1` ("drm/amdgpu: prefer snprintf over sprintf") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:10:03 -04:00
Jesse Zhang	806e8c5579	drm/amdgpu: fix invadate operation for pg_flags Since the type of pg_flags is u32, adev->pg_flags >> 16 >> 16 is 0 regardless of the values of its operands. So removing the operations upper_32_bits and lower_32_bits. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:09:50 -04:00
Jack Xiao	fa10408116	drm/amdgpu/mes12: mes hw_fini fix for mode1 reset Port mes11 hw_fini to mes12, fix for mode1 reset. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:09:42 -04:00
Jesse Zhang	64da71ea76	drm/amdgpu: fix invadate operation for umsch Since the type of data_size is uint32_t, adev->umsch_mm.data_size - 1 >> 16 >> 16 is 0 regardless of the values of its operands So removing the operations upper_32_bits and lower_32_bits. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:09:35 -04:00
Jesse Zhang	030ffd4d43	drm/admgpu: fix dereferencing null pointer context When user space sets an invalid ta type, the pointer context will be empty. So it need to check the pointer context before using it Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:09:15 -04:00
Yang Wang	9262f411dc	drm/amdgpu: skip to create ras xxx_err_count node when ACA is enabled skip to create 'xxx_err_count' node when ACA is enabled. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:08:25 -04:00
Jani Nikula	42505ab120	drm/amdgpu: remove amdgpu_connector_edid() and stop using edid_blob_ptr amdgpu_connector_edid() copies the EDID from edid_blob_ptr as a side effect if amdgpu_connector->edid isn't initialized. However, everywhere that the returned EDID is used, the EDID should have been set beforehands. Only the drm EDID code should look at the EDID property, anyway, so stop using it. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pan Cc: amd-gfx@lists.freedesktop.org Reviewed-by: Robert Foss <rfoss@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1463862965d76e9458551598fd4d287a08d3d264.1715353572.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-05-23 14:37:24 +03:00
Steven Rostedt (Google)	2c92ca849f	tracing/treewide: Remove second parameter of __assign_str() With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str \| while read a ; do sed -e 's/$__assign_str([^,][^ ,]$ ,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>	2024-05-22 20:14:47 -04:00
Li Ma	e64e8f7c17	drm/amdgpu/atomfirmware: add intergrated info v2.3 table [Why] The vram width value is 0. Because the integratedsysteminfo table in VBIOS has updated to 2.3. [How] Driver needs a new intergrated info v2.3 table too. Then the vram width value will be correct. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-05-22 12:01:39 -04:00
Linus Torvalds	f0bae243b2	pci-v6.10-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmZLzNIUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vwr/Q//STe2XGKI8bAKqP2wbbkzm+ISnK4A Lqf3FEAIXunxDRspszfXKKV2p4vaIkmOFiwIdtp/kWvd0DQn5+ATXJ/iQtp8aFX/ R+6BQ7EZc2G7fN5fbQuK54+CvmWEpkKEMbXYbd6ivQ14Cijdb3Nbu+w+DYFjS+6C k2a9lS1bTW7Xcy0fyiO1w6GQiWqtmOH8U3OlQtIrI0EVkDG9OG1LsLuc92/FgkOo REN+sU+hX1K5fHrvm2CtjYDn/9/B6bJ/It22H1dPgUL9nKvKC67fYzosMtUCOX1M 6XSPjZIuXOmQGeZXHhpSlVwaidxoUjYO98I7nMquxKdCy6yct3geK7ULG/xeQCgD ML7MGQB4+sTiSWalXUQaziKqF1FIDEvU3HMGXFWnoBL5l56eRp8KS1EI9Eqk9pU3 pk9fJaCkcFnkzPtMFzqPOm5q9zUZ6bGbfYb0hs72TUKplmVDhFo2T1YsW2AOyHZ7 mjuDzUYZX0H7uM1tntA56IgZX+oNOrLvhBt5L5M/BQeCsZFBBUfIcAEaYoL9LwXO AYgIG3jdqzHHyAUzutJF+XHKinJLMHm0XVYbFmO6saPhFzrUJSNHqT7NzW1DGGTl OnO8e1WNMX1EcnKvnc6fXyGmM3SgVwy45FsbG/zRnhn4uBKqKtjrh6uX/myA22LK CSeqSUK9XmXxFNA= =xjoS -----END PGP SIGNATURE----- Merge tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Skip E820 checks for MCFG ECAM regions for new (2016+) machines, since there's no requirement to describe them in E820 and some platforms require ECAM to work (Bjorn Helgaas) - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien Le Moal) - Remove last user and pci_enable_device_io() (Heiner Kallweit) - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen) - Skip waiting for devices that have been disconnected while suspended (Ilpo Järvinen) - Clear Secondary Status errors after enumeration since Master Aborts and Unsupported Request errors are an expected part of enumeration (Vidya Sagar) MSI: - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas) Error handling: - Mask Genesys GL975x SD host controller Replay Timer Timeout correctable errors caused by a hardware defect; the errors cause interrupts that prevent system suspend (Kai-Heng Feng) - Fix EDR-related _DSM support, which previously evaluated revision 5 but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan) ASPM: - Simplify link state definitions and mask calculation (Ilpo Järvinen) Power management: - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS apparently doesn't know how to put them back in D0 (Mario Limonciello) CXL: - Support resetting CXL devices; special handling required because CXL Ports mask Secondary Bus Reset by default (Dave Jiang) DOE: - Support DOE Discovery Version 2 (Alexey Kardashevskiy) Endpoint framework: - Set endpoint BAR to be 64-bit if the driver says that's all the device supports, in addition to doing so if the size is >2GB (Niklas Cassel) - Simplify endpoint BAR allocation and setting interfaces (Niklas Cassel) Cadence PCIe controller driver: - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof Kozlowski) Cadence PCIe endpoint driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) Freescale Layerscape PCIe controller driver: - Convert DT binding to YAML (Frank Li) MediaTek MT7621 PCIe controller driver: - Add DT binding missing 'reg' property for child Root Ports (Krzysztof Kozlowski) - Fix theoretical string truncation in PHY name (Sergio Paracuellos) NVIDIA Tegra194 PCIe controller driver: - Return success for endpoint probe instead of falling through to the failure path (Vidya Sagar) Renesas R-Car PCIe controller driver: - Add DT binding missing IOMMU properties (Geert Uytterhoeven) - Add DT binding R-Car V4H compatible for host and endpoint mode (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski) - Set the Subsystem Vendor ID, which was previously zero because it was masked incorrectly (Rick Wertenbroek) Synopsys DesignWare PCIe controller driver: - Restructure DBI register access to accommodate devices where this requires Refclk to be active (Manivannan Sadhasivam) - Remove the deinit() callback, which was only need by the pcie-rcar-gen4, and do it directly in that driver (Manivannan Sadhasivam) - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean up things like eDMA (Manivannan Sadhasivam) - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel to dw_pcie_ep_init() (Manivannan Sadhasivam) - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to reflect the actual functionality (Manivannan Sadhasivam) - Call dw_pcie_ep_init_registers() directly from all the glue drivers, not just those that require active Refclk from the host (Manivannan Sadhasivam) - Remove the "core_init_notifier" flag, which was an obscure way for glue drivers to indicate that they depend on Refclk from the host (Manivannan Sadhasivam) TI J721E PCIe driver: - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli) - Add DT binding J722S SoC support (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Add DT binding missing num-viewport, phys and phy-name properties (Jan Kiszka) Miscellaneous: - Constify and annotate with __ro_after_init (Heiner Kallweit) - Convert DT bindings to YAML (Krzysztof Kozlowski) - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming Zhou)" * tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits) PCI: Do not wait for disconnected devices when resuming x86/pci: Skip early E820 check for ECAM region PCI: Remove unused pci_enable_device_io() ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io() PCI: Update pci_find_capability() stub return types PCI: Remove PCI_IRQ_LEGACY scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios Revert "genirq/msi: Provide constants for PCI/IMS support" Revert "x86/apic/msi: Enable PCI/IMS" Revert "iommu/vt-d: Enable PCI/IMS" Revert "iommu/amd: Enable PCI/IMS" Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support" ...	2024-05-21 10:09:28 -07:00
Lang Yu	eb853413d0	drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 17:17:53 -04:00
Lang Yu	2a705f3e49	drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30 Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 17:17:53 -04:00
Tvrtko Ursulin	9c1a429217	drm/amdgpu: Fix amdgpu_vm_is_bo_always_valid kerneldoc Align kerneldoc with the function argument name. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `26e20235ce` ("drm/amdgpu: Add amdgpu_bo_is_vm_bo helper") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:26 -04:00
Dr. David Alan Gilbert	191ef65b4e	drm/amdgpu: remove unused struct 'hqd_registers' 'hqd_registers' used to be used in a member of the 'bonaire_mqd' struct. 'bonaire_mqd' was removed by commit `486d807cd9` ("drm/amdgpu: remove duplicate definition of cik_mqd") It's now unused. Remove 'hqd_registers' as well. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:26 -04:00
Tao Zhou	2aadb520bf	drm/amdgpu: update type of buf size to u32 for eeprom functions Avoid overflow issue. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:26 -04:00
Victor Skvortsov	5434bc03f5	drm/amdgpu: Queue KFD reset workitem in VF FED The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset workitems are queued at the same time. In addition, FLR guest recovery sequence is out of order when PF/VF communication breaks due to a GPU fatal error As a temporary work around, perform a KFD style reset (Initiate reset request from the guest) inside the pf2vf thread on FED. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:25 -04:00
Victor Skvortsov	3a19a8af64	drm/amdgpu: Extend KIQ reg polling wait for VF Runtime KIQ interface to read/write registers in VF may take longer than expected for BM environment. Extend the timeout. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:25 -04:00
Linus Torvalds	ff9a79307f	Kbuild updates for v6.10 - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX 2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2 aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd ZCSnZ31h7woGfNho =D5B/ -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig * tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits) kconfig: use sym_get_choice_menu() in sym_check_prop() rapidio: remove choice for enumeration kconfig: lxdialog: remove initialization with A_NORMAL kconfig: m/nconf: merge two item_add_str() calls kconfig: m/nconf: remove dead code to display value of bool choice kconfig: m/nconf: remove dead code to display children of choice members kconfig: gconf: show checkbox for choice correctly kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal Makefile: remove redundant tool coverage variables kbuild: provide reasonable defaults for tool coverage modules: Drop the .export_symbol section from the final modules kconfig: use menu_list_for_each_sym() in sym_check_choice_deps() kconfig: use sym_get_choice_menu() in conf_write_defconfig() kconfig: add sym_get_choice_menu() helper kconfig: turn defaults and additional prompt for choice members into error kconfig: turn missing prompt for choice members into error kconfig: turn conf_choice() into void function kconfig: use linked list in sym_set_changed() kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED kconfig: gconf: remove debug code ...	2024-05-18 12:39:20 -07:00
Yang Wang	062a7ce676	drm/amdgpu: fix ACA no query result after gpu reset fix ACA no query result after gpu reset. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:39 -04:00
Lijo Lazar	546e6309d1	drm/amd/pm: Remove legacy interface for xgmi plpd Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client. Remove that as well. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:39 -04:00
Yang Wang	258ed689bc	drm/amdgpu: change bank cache lock type to spinlock modify the lock type to 'spinlock' to avoid schedule issue in interrupt context. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:39 -04:00
Yang Wang	f6bce954f4	drm/amdgpu: change aca bank error lock type to spinlock modify the lock type to 'spinlock' to avoid schedule issue in interrupt context. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Tvrtko Ursulin	50bff04d02	drm/amdgpu: Describe all object placements in debugfs Accurately show all placements when describing objects in debugfs, instead of bunching them up under the 'CPU' placement. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Tvrtko Ursulin	8fb0efb101	drm/amdgpu: Reduce mem_type to domain double indirection All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT, depending on AMDGPU_GEM_CREATE_PREEMPTIBLE. Simplify a few places in the code which convert the TTM placement into a domain by checking against the current placement directly. In the conversion AMDGPU_PL_PREEMPT either does not have to be handled because amdgpu_mem_type_to_domain() cannot return that value anyway. v2: * Remove AMDGPU_PL_PREEMPT handling. v3: * Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> # v1 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> # v2 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Tvrtko Ursulin	26e20235ce	drm/amdgpu: Add amdgpu_bo_is_vm_bo helper Help code readability by replacing a bunch of: bo->tbo.base.resv == vm->root.bo->tbo.base.resv With: amdgpu_vm_is_bo_always_valid(vm, bo) No functional changes. v2: * Rename helper and move to amdgpu_vm. (Christian) v3: * Use Christian's kerneldoc. v4: * Fixed logic inversion in amdgpu_vm_bo_get_memory. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Srinivasan Shanmugam	e060c7ba7e	drm/amdgpu: Remove duplicate check for is_queue_unmap in sdma_v7_0_ring_set_wptr This commit removes a duplicate check for is_queue_unmap in the sdma_v7_0_ring_set_wptr function. The check at line 171 was considered dead code because at this point in the code, we already know that is_queue_unmap is false due to the check at line 161. By removing this unnecessary check, improves the readability of the code Fixes the below: drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c:171 sdma_v7_0_ring_set_wptr() warn: duplicate check 'is_queue_unmap' (previous on line 161) drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c 140 static void sdma_v7_0_ring_set_wptr(struct amdgpu_ring ring) 141 { 142 struct amdgpu_device adev = ring->adev; 143 uint32_t wptr_saved; 144 uint32_t is_queue_unmap; 145 uint64_t aggregated_db_index; 146 uint32_t mqd_size = adev->mqds[AMDGPU_HW_IP_DMA].mqd_size; 147 148 DRM_DEBUG("Setting write pointer\n"); 149 150 if (ring->is_mes_queue) { 151 wptr_saved = (uint32_t )(ring->mqd_ptr + mqd_size); 152 is_queue_unmap = (uint32_t )(ring->mqd_ptr + mqd_size + ^^^^^^^^^^^^^^^^ Set here 153 sizeof(uint32_t)); 154 aggregated_db_index = 155 amdgpu_mes_get_aggregated_doorbell_index(adev, 156 ring->hw_prio); 157 158 atomic64_set((atomic64_t )ring->wptr_cpu_addr, 159 ring->wptr << 2); 160 wptr_saved = ring->wptr << 2; 161 if (is_queue_unmap) { ^^^^^^^^^^^^^^^ Checked here 162 WDOORBELL64(aggregated_db_index, ring->wptr << 2); 163 DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n", 164 ring->doorbell_index, ring->wptr << 2); 165 WDOORBELL64(ring->doorbell_index, ring->wptr << 2); 166 } else { 167 DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n", 168 ring->doorbell_index, ring->wptr << 2); 169 WDOORBELL64(ring->doorbell_index, ring->wptr << 2); 170 --> 171 if (is_queue_unmap) ^^^^^^^^^^^^^^^ This is dead code. We know it's false. 172 WDOORBELL64(aggregated_db_index, 173 ring->wptr << 2); 174 } 175 } else { 176 if (ring->use_doorbell) { 177 DRM_DEBUG("Using doorbell -- " 178 "wptr_offs == 0x%08x " Fixes: `b412351e91` ("drm/amdgpu: Add sdma v7_0 ip block support (v7)") Cc: Likun Gao <Likun.Gao@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Tim Van Patten	1446226d32	drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1 The following commit updated gmc->noretry from 0 to 1 for GC HW IP 9.3.0: commit `5f3854f1f4` ("drm/amdgpu: add more cases to noretry=1") This causes the device to hang when a page fault occurs, until the device is rebooted. Instead, revert back to gmc->noretry=0 so the device is still responsive. Fixes: `5f3854f1f4` ("drm/amdgpu: add more cases to noretry=1") Signed-off-by: Tim Van Patten <timvp@google.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Ruijing Dong	a1a9143c96	drm/amdgpu/vcn: update vcn5 enc/dec capabilities Update the capabilities for supporting 8k encoding. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Jiapeng Chong	72e6ea95c4	drm/amdgpu: Remove duplicate amdgpu_umsch_mm.h header ./drivers/gpu/drm/amd/amdgpu/amdgpu.h: amdgpu_umsch_mm.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9063 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Kenneth Feng	d3620eeae8	drm/amd/amdgpu: add module parameter for jpeg add module parameter for jpeg. this is a temporary workaround for jpeg unit test fail on vcn 5.0 now. will be removed later. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Alex Deucher	736f911204	drm/amdgpu: fix documentation errors in gmc v12.0 Fix up parameter descriptions. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Sreekant Somasekharan	0cce5f285d	drm/amdkfd: Check correct memory types for is_system variable To catch GPU mapping of system memory, TTM_PL_TT and AMDGPU_PL_PREEMPT must be checked. Fixes: `628e1ace23` ("drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC") Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Alex Deucher	b72fa761fc	drm/amdgpu: fix documentation errors in sdma v7.0 Fix up parameter descriptions. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Yang Wang	b712d7c201	drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG() create a new helper function to avoid compiler 'side-effect' check about RAS_EVENT_LOG() macro. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Frank Min	9a55c77978	drm/amdgpu: fix getting vram info for gfx12 gfx12 query video mem channel/type/width from umc_info of atom list, so fix it accordingly. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Frank Min	b80160a53a	drm/amdgpu/mes: use mc address for wptr in add queue packet use mc address for wptr in add queue packet Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Likun Gao	c14d5b5095	drm/amdgpu: enable gfxoff for gc v12.0.0 Enable GFXOFF for GC v12.0.0. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Likun Gao	d8cd2d617a	drm/amd/amdgpu: enable mmhub and athub cg on gc 12.0.0 Enable mmhub and athub cg on gc 12.0.0 Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Likun Gao	7be73af53b	drm/amdgpu: switch default mes to uni mes Switch the default mes to uni mes for gfx v12. V2: remove uni_mes set for gfx v11. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Yang Wang	b2aa3d4b30	drm/amdgpu: add debug flag to enable RAS ACA Use debug_mask=0x10 (BIT.4) param to help enable RAS ACA. (RAS ACA is disabled by default.) Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Likun Gao	2ac72cbc7e	drm/amdgpu: enable some cg feature for gc 12.0.0 Enable 3D cgcg, 3D cgls, sram fgcg, perfcounter mgcg, repeater fgcg for gc v12.0.0. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Ma Jun	4c11d30c95	drm/amdgpu: Fix the null pointer dereference to ras_manager Check ras_manager before using it Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Xiaogang Chen	f326d7cc74	drm/kfd: Correct pinned buffer handling at kfd restore and validate process This reverts commit `8a774fe912` ("drm/amdgpu: avoid restore process run into dead loop") since buffer got pinned is not related whether it needs mapping And skip buffer validation at kfd driver if the buffer has been pinned. Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Lijo Lazar	b194d21b9b	drm/amdgpu: Use NPS ranges from discovery table Add GMC API to fetch NPS range information from discovery table. Use NPS range information in GMC 9.4.3 SOCs when available, otherwise fallback to software method. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Friedrich Vock	0cdb3f9740	drm/amdgpu: Check if NBIO funcs are NULL in amdgpu_device_baco_exit The special case for VM passthrough doesn't check adev->nbio.funcs before dereferencing it. If GPUs that don't have an NBIO block are passed through, this leads to a NULL pointer dereference on startup. Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Fixes: `1bece222ea` ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Lijo Lazar	ce798376ef	drm/amdgpu: Fix memory range calculation Consider the 16M reserved region also before range calculation for GMC 9.4.3 SOCs. Fixes: `a433f1f594` ("drm/amdgpu: Initialize memory ranges for GC 9.4.3") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Likun Gao	73fbc3e000	drm/amdgpu: enable gfx cgcg&cgls for gfx v12_0_0 Enable GFX CGCG and CGLS for gfx version 12.0.0. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:11:02 -04:00
Sonny Jiang	75125e6b4c	drm/amdgpu/jpeg5: enable power gating Enable PG on JPEG5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:10:56 -04:00
Likun Gao	ef57158462	drm/amdgpu: support imu for gc 12_0_0 Support IMU for ASIC with GC 12.0.0 Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:10:31 -04:00
Ma Jun	01b3297336	drm/amdgpu: Remove dead code in amdgpu_ras_add_mca_err_addr Remove dead code in amdgpu_ras_add_mca_err_addr Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:09:55 -04:00
Ma Jun	d1a6bfff94	drm/amdgpu: Fix null pointer dereference to bo Check bo before using it Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:09:47 -04:00
shaoyunl	4488cd671c	drm/amdgpu: enable unmapped doorbell handling basic mode on mes 12 This reverts commit `fcc5df722d`. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:05:38 -04:00
Linus Torvalds	db5d28c0bf	drm for 6.10-rc1 new drivers: - panthor: ARM Mali/Immortalis CSF-based GPU driver core: - add a CONFIG_DRM_WERROR option - make more headers self-contained - grab resv lock in pin/unpin - fix vmap resv locking - EDID/eDP panel matching - Kconfig cleanups - DT sound bindings - Add SIZE_HINTS property for cursor planes - Add struct drm_edid_product_id and helpers. - Use drm device based logging in more drm functions. - drop seq_file.h from a bunch of places - use drm_edid driver conversions dp: - DP Tunnel documentation - MST read sideband cap - Adaptive sync SDP prep work ttm: - improve placement for TTM BOs in idle/busy handling panic: - Fixes for drm-panic, and option to test it. - Add drm panic to simpledrm, mgag200, imx, ast bridge: - improve init ordering - adv7511: allow GPIO pin sharing - tc358775: add tc358675 support panel: - AUO B120XAN01.0 - Samsung s6e3fa7 - BOE NT116WHM-N44 - CMN N116BCA-EA1, - CrystalClear CMT430B19N00 - Startek KD050HDFIA020-C020A - powertip PH128800T006-ZHC01 - Innolux G121X1-L03 - LG sw43408 - Khadas TS050 V2 - EDO RM69380 OLED - CSOT MNB601LS1-1 amdgpu: - HDCP/ODM/RAS fixes - Devcoredump improvements - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - SMU 14.0.2 support - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - DCN 3.5.x Updates - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - Remove invalid TTM resource start check - UAF fix in VA IOCTL - GPUVM page fault redirection to secondary IH rings for IH 6.x - Initial support for mapping kernel queues via MES - Fix VRAM memory accounting amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix - Enable SQ watchpoint for gfx10 i915: - Adding new DG2 PCI ID - add context hints for GT frequency - enable only one CCS for compute workloads - new workarounds - Fix UAF on destroy against retire race and remove two earlier partial fixes - Limit the reserved VM space to only the platforms that need it - Fix gt reset with GuC submission is disable - Add and use gt_to_guc() wrapper i915/xe display: - Lunar Lake display enabling, including cdclk and other refactors - BIOS/VBT/opregion related refactor - Digital port related refactor/clean-up - Fix 2s boot time regression on DP panel replay init - Remove duplication on audio enable/disable on SDVO and g4x+ DP - Disable AuxCCS framebuffers if built for Xe - Make crtc disable more atomic - Increase DP idle pattern wait timeout to 2ms - Start using container_of_const() for some extra const safety - Fix Jasper Lake boot freeze - Enable MST mode for 128b/132b single-stream sideband - Enable Adaptive Sync SDP Support for DP - Fix MTL supported DP rates - removal of UHBR13.5 - PLL refactoring - Limit eDP MSO pipe only for display version 20 - More display refactor towards independence from i915 dev_priv - Convert i915/xe fbdev to DRM client - More initial work to make display code more independent from i915 xe: - improved error capture - clean up some uAPI leftovers - devcoredump update - Add BMG mocs table - Handle GSCCS ER interrupt - Implement xe2- and GuC workarounds - struct xe_device cleanup - Hwmon updates - Add LRC parsing for more GPU instruction - Increase VM_BIND number of per-ioctl Ops - drm/xe: Add XE_BO_GGTT_INVALIDATE flag - Initial development for SR-IOV support - Add new PCI IDs to DG2 platform - Move userptr over to start using hmm_range_fault msm: - Switched to generating register header files during build process instead of shipping pre-generated headers - Merged DPU and MDP4 format databases. - DP: - Stop using compat string to distinguish DP and eDP cases - Added support for X Elite platform (X1E80100) - Reworked DP aux/audio support - Added SM6350 DP to the bindings - GPU: - a7xx perfcntr reg fixes - MAINTAINERS updates - a750 devcoredump support radeon: - Silence UBSAN warnings related to flexible arrays nouveau: - move some uAPI objects to uapi headers omapdrm: - console fix ast: - add i2c polling qaic: - add debugfs entries exynos: - fix platform_driver .owner - drop cleanup code mediatek: - Use devm_platform_get_and_ioremap_resource() in mtk_hdmi_ddc_probe() - Add GAMMA 12-bit LUT support for MT8188 - Rename mtk_drm_* to mtk_* - Drop driver owner initialization - Correct calculation formula of PHY Timing -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZEUU0ACgkQDHTzWXnE hr5qMBAAjUFF0w3YOQMsn0LEAm628kMRHpoVeSXmIfO9z9lTyad30EtiS4ggFgj7 Q/oQ6hHCd5jdsvGSJDgtTTAsTQX+aCkXrgf/18ENbqR5mM3MdefUAPR/zawZ7HR4 8+b2h6p7gHBw8wDjuIvQ5e9InHcnIkKWJc82qnJG5Urgxa05SDh3mu3cosPTJiBw a851vlWaYcxC0yAUwJlWaXDdN8yzdFaSQNboZBS/CMLXF/WE6Ht257uxJmaouc0Y Z0kBybok5x0TPQEXF9IV+kuSW3EYpYcwRi0BFFM9sJjkEBdH3rYRZwuYP1LR+7VZ HKsmIkie8YzCm2VwTquYzUvHgF+swZX4RRch9XJlGz7UvBLc0eBO/2n4X6fNd8Kl QGNNqEfsnUQrAHKvGsOUgoGjSCmEo8voGcMZ3JPIAdJ/GcnJwpMvNxtF6XB08hEu rDxuU6o7WkM4dJbtiaFEHNh0Fmjj6aXdBL23UD9pcqPT1fc9cT3xnUd5RJIRuRwV /tpb2WfkFAoxCkKFiunaC4rE8oG6ME6wr/trYjvoYuhCI5hCVaXRBGzJEtC30IP6 lG2YZ8r0jHjktbgjZ0Cz/hY424H4sxSN9SJAnXXFDzcfjBJ/nOgo5nMD1jKajAD5 SYfqWaD5Y+YygtyLJPMfZQI2XMOpCzteXD8uaNXXFJfpV7Apeyg= =ocVM -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "This is the main pull request for the drm subsystems for 6.10. In drivers the main thing is a new driver for ARM Mali firmware based GPUs, otherwise there are a lot of changes to amdgpu/xe/i915/msm and scattered changes to everything else. In the core a bunch of headers and Kconfig was refactored, along with the addition of a new panic handler which is meant to provide a user friendly message when a panic happens and graphical display is enabled. New drivers: - panthor: ARM Mali/Immortalis CSF-based GPU driver Core: - add a CONFIG_DRM_WERROR option - make more headers self-contained - grab resv lock in pin/unpin - fix vmap resv locking - EDID/eDP panel matching - Kconfig cleanups - DT sound bindings - Add SIZE_HINTS property for cursor planes - Add struct drm_edid_product_id and helpers. - Use drm device based logging in more drm functions. - drop seq_file.h from a bunch of places - use drm_edid driver conversions dp: - DP Tunnel documentation - MST read sideband cap - Adaptive sync SDP prep work ttm: - improve placement for TTM BOs in idle/busy handling panic: - Fixes for drm-panic, and option to test it. - Add drm panic to simpledrm, mgag200, imx, ast bridge: - improve init ordering - adv7511: allow GPIO pin sharing - tc358775: add tc358675 support panel: - AUO B120XAN01.0 - Samsung s6e3fa7 - BOE NT116WHM-N44 - CMN N116BCA-EA1, - CrystalClear CMT430B19N00 - Startek KD050HDFIA020-C020A - powertip PH128800T006-ZHC01 - Innolux G121X1-L03 - LG sw43408 - Khadas TS050 V2 - EDO RM69380 OLED - CSOT MNB601LS1-1 amdgpu: - HDCP/ODM/RAS fixes - Devcoredump improvements - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - SMU 14.0.2 support - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - DCN 3.5.x Updates - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - Remove invalid TTM resource start check - UAF fix in VA IOCTL - GPUVM page fault redirection to secondary IH rings for IH 6.x - Initial support for mapping kernel queues via MES - Fix VRAM memory accounting amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix - Enable SQ watchpoint for gfx10 i915: - Adding new DG2 PCI ID - add context hints for GT frequency - enable only one CCS for compute workloads - new workarounds - Fix UAF on destroy against retire race and remove two earlier partial fixes - Limit the reserved VM space to only the platforms that need it - Fix gt reset with GuC submission is disable - Add and use gt_to_guc() wrapper i915/xe display: - Lunar Lake display enabling, including cdclk and other refactors - BIOS/VBT/opregion related refactor - Digital port related refactor/clean-up - Fix 2s boot time regression on DP panel replay init - Remove duplication on audio enable/disable on SDVO and g4x+ DP - Disable AuxCCS framebuffers if built for Xe - Make crtc disable more atomic - Increase DP idle pattern wait timeout to 2ms - Start using container_of_const() for some extra const safety - Fix Jasper Lake boot freeze - Enable MST mode for 128b/132b single-stream sideband - Enable Adaptive Sync SDP Support for DP - Fix MTL supported DP rates - removal of UHBR13.5 - PLL refactoring - Limit eDP MSO pipe only for display version 20 - More display refactor towards independence from i915 dev_priv - Convert i915/xe fbdev to DRM client - More initial work to make display code more independent from i915 xe: - improved error capture - clean up some uAPI leftovers - devcoredump update - Add BMG mocs table - Handle GSCCS ER interrupt - Implement xe2- and GuC workarounds - struct xe_device cleanup - Hwmon updates - Add LRC parsing for more GPU instruction - Increase VM_BIND number of per-ioctl Ops - drm/xe: Add XE_BO_GGTT_INVALIDATE flag - Initial development for SR-IOV support - Add new PCI IDs to DG2 platform - Move userptr over to start using hmm_range_fault msm: - Switched to generating register header files during build process instead of shipping pre-generated headers - Merged DPU and MDP4 format databases. - DP: - Stop using compat string to distinguish DP and eDP cases - Added support for X Elite platform (X1E80100) - Reworked DP aux/audio support - Added SM6350 DP to the bindings - GPU: - a7xx perfcntr reg fixes - MAINTAINERS updates - a750 devcoredump support radeon: - Silence UBSAN warnings related to flexible arrays nouveau: - move some uAPI objects to uapi headers omapdrm: - console fix ast: - add i2c polling qaic: - add debugfs entries exynos: - fix platform_driver .owner - drop cleanup code mediatek: - Use devm_platform_get_and_ioremap_resource() in mtk_hdmi_ddc_probe() - Add GAMMA 12-bit LUT support for MT8188 - Rename mtk_drm_* to mtk_* - Drop driver owner initialization - Correct calculation formula of PHY Timing" * tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel: (1477 commits) drm/xe/ads: Use flexible-array drm/xe: Use ordered WQ for G2H handler drm/msm/gen_header: allow skipping the validation drm/msm/a6xx: Cleanup indexed regs const'ness drm/msm: Add devcoredump support for a750 drm/msm: Adjust a7xx GBIF debugbus dumping drm/msm: Update a6xx registers XML drm/msm: Fix imported a750 snapshot header for upstream drm/msm: Import a750 snapshot registers from kgsl MAINTAINERS: Add Konrad Dybcio as a reviewer for the Adreno driver MAINTAINERS: Add a separate entry for Qualcomm Adreno GPU drivers drm/msm/a6xx: Avoid a nullptr dereference when speedbin setting fails drm/msm/adreno: fix CP cycles stat retrieval on a7xx drm/msm/a7xx: allow writing to CP_BV counter selection registers drm: zynqmp_dpsub: Always register bridge Revert "drm/bridge: ti-sn65dsi83: Fix enable error path" drm/fb_dma: Add checks in drm_fb_dma_get_scanout_buffer() drm/fbdev-generic: Do not set physical framebuffer address drm/panthor: Fix the FW reset logic drm/panthor: Make sure we handle 'unknown group state' case properly ...	2024-05-15 09:43:42 -07:00
Frank Min	b61467778e	drm/amdgpu: fix mqd corruption for gfx12 1. restore mqd from backup while resuming 2. use copy_toio and copy_fromio while mqd in vram Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Frank Min	ef168e6de9	drm/amdgpu: add initial value for gfx12 AGP aperture add initial value for gfx12 AGP aperture Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Jesse Zhang	e6ae021adb	drm/amdgpu: fix the warning bad bit shift operation for aca_error_type type Filter invalid aca error types before performing a shift operation. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Jesse Zhang	d190b459b2	drm/amdgpu: the warning dereferencing obj for nbio_v7_4 if ras_manager obj null, don't print NBIO err data Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Jesse Zhang	68de5d31b1	drm/amdgpu: remove structurally dead code This code cannot be reached: return "UNKNOWN";. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Jesse Zhang	269435aef4	drm/amdgu: remove unused code The same code is executed when the condition err is true or false, because the code in the if-then branch and after the if statement is identical Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:01 -04:00
Jesse Zhang	e2bff63ba6	drm/amdgpu: remove structurally dead code for amd_gmc This code cannot be reached: return sysfs_emit(buf, "UNK....) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:53 -04:00
Jesse Zhang	f0574a56fb	drm/amd: fix the warning unchecking return vaule for sdma_v7 check ring allocate success before emit preempt ib Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:53 -04:00
Jesse Zhang	b55bf19eb9	drm/amdgpu: clear the warning unsigned compared against 0 for xcp_id This greater-than-or-equal-to-zero comparison of an unsigned value is always true. fpriv->xcp_id >= 0U Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:53 -04:00
Jesse Zhang	1940708ccf	drm/amdgpu: fix the waring dereferencing hive Check the amdgpu_hive_info *hive that maybe is NULL. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:53 -04:00
Jesse Zhang	b1f7810b05	drm/amdgpu: fix dereference after null check check the pointer hive before use. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:52 -04:00
Jesse Zhang	1a00f2ac82	drm/amdgpu: Fix the warning division or modulo by zero Checks the partition mode and returns an error for an invalid mode. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:47 -04:00
Saleemkhan Jamadar	98a2e3a0d1	drm/amdgpu/umsch: add support to capture fw debug log Added support to capture unsch fw debug logs in debugfs. To enable set amdgpu_umschfw_log =1 in boot args. v1 - rename variable to umsch_mm_fwlog (Veera) Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:45:41 -04:00
David (Ming Qiang) Wu	4b0497d25d	drm/amd/amdgpu: update jpeg 5 capability Based on the documentation the maximum resolustion should be 16384x16384. Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:45:34 -04:00
David (Ming Qiang) Wu	a166ec28db	drm/amdgpu/vcn: set VCN5 power gating state to GATE on suspend On suspend, we need to set power gating state to GATE when VCN5 is busy, otherwise we will get following error on resume: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring vcn_unified_0 test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] ERROR resume of IP block <vcn_v5_0_0> failed -110 amdgpu: amdgpu_device_ip_resume failed (-110). PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -110 PM: failed to resume async: error -110 Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:45:21 -04:00
David (Ming Qiang) Wu	10fe1a79cd	drm/amdgpu/vcn: remove irq disabling in vcn 5 suspend We do not directly enable/disable VCN IRQ in vcn 5.0.0. And we do not handle the IRQ state as well. So the calls to disable IRQ and set state are removed. This effectively gets rid of the warining of "WARN_ON(!amdgpu_irq_enabled(adev, src, type))" in amdgpu_irq_put(). Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:45:12 -04:00
Jack Xiao	745e0a90be	drm/amdgpu/mes: fix mes12 to map legacy queue Adjust mes12 initialization sequence to fix mapping legacy queue. v2: use dev_err. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:44:20 -04:00
Philip Yang	9095e55440	drm/amdkfd: Remove arbitrary timeout for hmm_range_fault On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary timeout value is not accurate, cause memory allocation failure. Remove the arbitrary timeout value, return EAGAIN to application if hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call ioctl again. Change EAGAIN to debug message as this is not error. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:44:02 -04:00
Michel Dänzer	8d2c930735	drm/amdgpu: Fix comparison in amdgpu_res_cpu_visible It incorrectly claimed a resource isn't CPU visible if it's located at the very end of CPU visible VRAM. Fixes: `a6ff969fe9` ("drm/amdgpu: fix visible VRAM handling during faults") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3343 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-and-Tested-by: Jeremy Day <jsday@noreason.ca> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-05-10 13:05:13 -04:00
Masahiro Yamada	b1992c3772	kbuild: use $(src) instead of $(srctree)/$(src) for source directory Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for checked-in source files. It is merely a convention without any functional difference. In fact, $(obj) and $(src) are exactly the same, as defined in scripts/Makefile.build: src := $(obj) When the kernel is built in a separate output directory, $(src) does not accurately reflect the source directory location. While Kbuild resolves this discrepancy by specifying VPATH=$(srctree) to search for source files, it does not cover all cases. For example, when adding a header search path for local headers, -I$(srctree)/$(src) is typically passed to the compiler. This introduces inconsistency between upstream and downstream Makefiles because $(src) is used instead of $(srctree)/$(src) for the latter. To address this inconsistency, this commit changes the semantics of $(src) so that it always points to the directory in the source tree. Going forward, the variables used in Makefiles will have the following meanings: $(obj) - directory in the object tree $(src) - directory in the source tree (changed by this commit) $(objtree) - the top of the kernel object tree $(srctree) - the top of the kernel source tree Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced with $(src). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>	2024-05-10 04:34:52 +09:00
Michel Dänzer	c4dcb47d46	drm/amdgpu: Fix comparison in amdgpu_res_cpu_visible It incorrectly claimed a resource isn't CPU visible if it's located at the very end of CPU visible VRAM. Fixes: `a6ff969fe9` ("drm/amdgpu: fix visible VRAM handling during faults") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3343 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-and-Tested-by: Jeremy Day <jsday@noreason.ca> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-05-08 18:45:29 -04:00
Srinivasan Shanmugam	e35ba81bb3	drm/amdgpu: Fix buffer size to prevent truncation in gfx_v12_0_init_microcode This commit fixes multiple potential truncations when writing the strings _pfp.bin, _me.bin, _rlc.bin, and _mec.bin into the fw_name buffer in the gfx_v12_0_init_microcode function in the gfx_v12_0.c file The ucode_prefix size was reduced from 30 to 15 to ensure the snprintf function does not exceed the size of the fw_name buffer. Thus fixing the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c: In function ‘gfx_v12_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:421:54: warning: ‘_pfp.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 421 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:421:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 421 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:428:54: warning: ‘_me.bin’ directive output may be truncated writing 7 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 428 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix); \| ^~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:428:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 40 428 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:436:62: warning: ‘_rlc.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 436 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:436:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 436 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:448:54: warning: ‘_mec.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 448 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:448:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 448 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Srinivasan Shanmugam	ffd574459d	drm/amdgpu: Fix truncation by resizing ucode_prefix in imu_v12_0_init_microcode This commit fixes potential truncation when writing the string _imu.bin into the fw_name buffer in the imu_v12_0_init_microcode function in the imu_v12_0.c file The ucode_prefix size was reduced from 30 to 15 to ensure the snprintf function does not exceed the size of the fw_name buffer. Thus fixing the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/imu_v12_0.c: In function ‘imu_v12_0_init_microcode’: drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:51:54: warning: ‘_imu.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 51 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_imu.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:51:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 51 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_imu.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Tim Huang	51dfc0a4d6	drm/amdgpu: fix mc_data out-of-bounds read warning Clear warning that read mc_data[i-1] may out-of-bounds. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Tim Huang	8944acd0f9	drm/amdgpu: fix ucode out-of-bounds read warning Clear warning that read ucode[] may out-of-bounds. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Ma Jun	0991e49d2b	drm/amdgpu: Fix uninitialized variable warning in amdgpu_info_ioctl Check the return value of amdgpu_xcp_get_inst_details, otherwise we may use an uninitialized variable inst_mask Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Ma Jun	d768394fa9	drm/amdgpu: Fix out-of-bounds read of df_v1_7_channel_number Check the fb_channel_number range to avoid the array out-of-bounds read error Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Alex Deucher	cdca89bce4	drm/amdgpu/soc21: use common nbio callback to set remap offset This fixes HDP flushes on systems with non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Alex Deucher	1dd8b24acc	drm/amdgpu/nv: use common nbio callback to set remap offset This fixes HDP flushes on systems with non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Alex Deucher	c866201cdc	drm/amdgpu/soc15: use common nbio callback to set remap offset This fixes HDP flushes on systems with non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	30f45a8ea4	drm/amdgpu: add set_reg_remap callback for NBIF 6.3.1 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	3345f7ec0d	drm/amdgpu: add set_reg_remap callback for NBIO 7.7 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	ffd3d6e780	drm/amdgpu: add set_reg_remap callback for NBIO 4.3 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	454847c9f4	drm/amdgpu: add set_reg_remap callback for NBIO 2.3 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	cacbbfbd24	drm/amdgpu: add set_reg_remap callback for NBIO 7.2 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	42ad8ac6bd	drm/amdgpu: add set_reg_remap callback for NBIO 7.11 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	f9a2274b33	drm/amdgpu: add set_reg_remap callback for NBIO 7.9 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	9d0e2915c4	drm/amdgpu: add set_reg_remap callback for NBIO 7.4 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	b2648640b9	drm/amdgpu: add set_reg_remap callback for NBIO 7.0 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	cab62e4839	drm/amdgpu: add set_reg_remap callback for NBIO 6.1 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	0617cdde84	drm/amdgpu: add nbio set_reg_remap helper Will be used to consolidate reg remap settings and fix HDP flushes on systems with non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
YiPeng Chai	2b3b9d2150	drm/amdgpu: change log level Change log level. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Likun Gao	a735b4a4ad	drm/amdgpu: fix spl component for psp v14 Fix the coding error when load spl component for psp v14. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Tim Huang	9e5da94259	drm/amdgpu: fix uninitialized variable warning for jpeg_v4 Clear warning that using uninitialized variable r. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Yang Wang	329cec8f18	drm/amdgpu: fix RAS unload driver issue in SRIOV Fix null pointer issue when unload driver in SRIOV mode. Adjust the function position to ensure that the amdgpu_mca/aca_xxx_init() related functions can be initialized properly. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Yang Wang	85a24a3ea0	drm/amdgpu: ignoring unsupported ras blocks when MCA bank dispatches This patch is used to solve the problem of incorrect parsing of error counts. When the UE trigger gpu is reset, the driver will attempt to parse all possible ras blocks. For ras blocks that are not supported by the current ASIC, the driver should ignore this error. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Candice Li <candice.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Tim Huang	8f184f8e7a	drm/amdgpu: fix uninitialized variable warning for amdgpu_xgmi Clear warning that using uninitialized variable current_node. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Tim Huang	3aa6b72045	drm/amdgpu: fix uninitialized variable warning for sdma_v7 Clear warning that using uninitialized variable index. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Ma Jun	be1684930f	drm/amdgpu: Fix out-of-bounds write warning Check the ring type value to fix the out-of-bounds write warning Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Ma Jun	7e39d7ec35	drm/amdgpu: Fix the uninitialized variable warning Check the user input and phy_id value range to fix "Using uninitialized value phy_id" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Jack Xiao	56fd1f8868	drm/amdgpu/mes11: fix kiq ring ready flag kiq ring test has overwitten ready flag, need disable after gfx hw init. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Lang Yu	89773b8559	drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Sunil Khatri	3b3c9e865e	drm/amdgpu: add se registers to ip dump for gfx10 add the registers of SE block of gfx for ip dump for gfx10 IP. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Sunil Khatri	b4e394e843	drm/amdgpu: add CP headers registers to gfx10 dump add registers in the ip dump for CP headers in gfx10 Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Ville Syrjälä	d26238c680	drm/amdgpu: Use drm_crtc_vblank_crtc() Replace the open coded drm_crtc_vblank_crtc() with the real thing. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240408190611.24914-2-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 21:55:30 +03:00
Sunil Khatri	b0923d5d80	drm/amdgpu: remove ip dump reg_count variable reg_count is not used and the register count is directly derived from the array size and hence removed. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-03 09:06:47 -04:00
Asad Kamal	6cd2b87264	drm/amd/amdgpu: Check tbo resource pointer Validate tbo resource pointer, skip if NULL Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:18 -04:00
Kenneth Feng	3474e02ed5	drm/amd/pm: support mode1 reset on smu_v14_0_3 support mode1 reset on smu_v14_0_3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Alex Deucher	ade887c633	drm/amdgpu/mes12: Use a separate fence per transaction We can't use a shared fence location because each transaction should be considered independently. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Alex Deucher	94b51a3d01	drm/amdgpu/mes12: increase mes submission timeout MES internally has a timeout allowance of 2 seconds. Increase driver timeout to 3 seconds to be safe. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Alex Deucher	b1d852920b	drm/amdgpu/mes12: print MES opcodes rather than numbers Makes it easier to review the logs when there are MES errors. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	174fdc07c0	drm/amd/amdgpu: enable mmhub and athub cg on gc 12.0.1 enable mmhub and athub cg on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	dd8707295d	drm/amd/amdgpu: enable gfxoff on gc 12.0.1 Enable gfxoff on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Jack Gui <Jack.Gui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Likun Gao	b9f5d0f978	drm/amdgpu: support cg state get for gfx v12 Support to get clockgating state for gfx v12. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	598a3b753a	drm/amd/amdgpu: enable sram fgcg on gc 12.0.1 enable sram fgcg on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	6f6bb3909c	drm/amd/amdgpu: enable perfcounter mgcg and repeater fgcg enable perfcounter mgcg and repeater fgcg on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	0b6662eb2a	drm/amd/amdgpu: enable 3D cgcg and 3D cgls enable 3D cgcg and 3D cgls on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	af472f68c7	drm/amd/amdgpu: enable mgcg on gfx 12.0.1 enable mgcg on gfx 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	81b09cedb3	drm/amd/amdgpu: enable cgcg and cgls enable cgcg and cgls on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
David (Ming Qiang) Wu	856d1ed4b2	drm/amdgpu/vcn5: Add VCN5 capabilities Add VCN5 encode and decode capabilities support Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Sonny Jiang	117f851393	drm/amdgpu/vcn5: enable DPG mode support Enable DPG mode Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Sonny Jiang	f19cfce87d	drm/amdgpu/jpeg5: enable power gating Enable PG on JPEG5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
David (Ming Qiang) Wu	f8f8e95c5f	amdgpu/vcn: enable AMD_PG_SUPPORT_VCN turn on AMD_PG_SUPPORT_VCN flag for power saving Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
David Belanger	da43e93d1b	drm/amdgpu: Fix physical address mask Mask should be 44-bit. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Likun Gao	0a75dc9831	drm/amdgpu/discovery: add mes v12_0 ip block Add mes v12_0 ip block. v2: squash in update (Alex) v3: rebase on unified mes changes (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Likun Gao	5e676d7180	drm/amdgpu/discovery: add gfx v12_0 ip block Add gfx v12_0 ip block. v2: Squash in update (Alex) v3: add exp flag (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	03f4b8c3ca	drm/amdgpu/mes12: disable logging output Random page fault was oberserved, temporarily disable mes log buffer output. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	3dc434ad26	drm/amdgpu: add module parameter 'amdgpu_uni_mes' Add module parameter 'amdgpu_uni_mes' to enable/disable unified mes fw support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	ad5c0a79df	drm/amdgpu/mes12: add legacy setting hw resource interface For unified mes fw, add the legacy interface to set hardware resources. v2: remove warning (Alex) Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
shaoyunl	fcc5df722d	drm/amdgpu: Disable unmapped doorbell handling basic mode on mes 12 The new mechanism for unmapped doorbell handling requires both driver side and MES fw side change. The FW side changes are still not released. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	663bbfaf68	drm/amdgpu/gfx: enable mes to map legacy queue support Enable mes to map legacy queue support. v2: drop unused gfx_v12_0_kiq_enable_kgq() (Alex) Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	4c2439f908	drm/amdgpu/mes12: add mes mapping legacy queue support Add mes12 map legacy queue packet submission. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jack Xiao	6ce03bd3a4	drm/amdgpu/mes12: enable uni_mes fw on mes pipe0 Enable the unified mes firmware on mes pipe0. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jack Xiao	d2e2c9be78	drm/amdgpu/mes12: add uni_mes fw loading support Add the unified mes firmware loading support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jack Xiao	15ddc4e693	drm/amdgpu/mes: add uni_mes fw loading support Add the unified mes firmware loading support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Sreekant Somasekharan	628e1ace23	drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC Due to a HW bug, the system memory mappings and peer GPU mappings on GFX12 need to be marked as MTYPE_NC. Cc: Joe Greathouse <joseph.greathouse@amd.com> Cc: David Belanger <david.belanger@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jonathan Kim	984b265ff6	drm/amdkfd: fix support for trap on wave start and end for gfx12 Similar to GFX11, GFX12 supports trapping on wave start and end. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jonathan Kim	fda3f378c4	drm/amdkfd: always enable ttmp setup for gfx12 Similar to GFX11, always enable the setup of trap temporaries on GFX12. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
David Belanger	90e4fc8369	drm/amdkfd: Added gfx_v12_kfd2kgd interface for GFX12. Initial implementation, based on GFX11. v2: Removed functions not needed by cp scheduler. v3: Fixed typos. v4: squash in warning fix (Alex) Signed-off-by: David Belanger <david.belanger@amd.com> Acked-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
shaoyunl	2f983d3ca5	drm/amdgpu: Enable event log on MES 12 Enable event log through the HW specific FW API Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
shaoyunl	19e69a5d28	drm/amdgpu: Enable unmapped doorbell handling basic mode on mes 12 Enable basic mode handling for doorbell ring on unmapped CP queue. In this mode, MES can start schedule the queue mapping based on HW interrupt instead of timer. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
Hawking Zhang	a2211e475c	drm/amdgpu: Switch to smuio func to get gpu clk counter Switch to smuio callback to query gpu clock counter Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
Likun Gao	e781af6663	drm/amdgpu: init gfxhub setting to align with mmhub Align gfxhub settings with mmhub when program rlc ram. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
Likun Gao	b32edc2340	drm/amdgpu: skip dpm check to init imu fw Skip dpm check to init imu firmware for imu v12. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	044feb8e2a	drm/amdgpu: fix active rb and cu number for gfx12 Correct the algorithm of active CU and RB to bypass the disabled SA for gfx12. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	56159fffaa	drm/amdgpu: use new method to program rlc ram Program rlc ram with golden setting data instead. The old method (program_imu_rlc_ram_old) should be retired in the future. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Kenneth Feng	043869be5a	drm/amd/amdgpu: add cgcg&cgls interface for gfx 12.0 add cgcg&cgls interface for gfx 12.0 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Tom St Denis	60917ce8f8	drm/amd/amdgpu: update GFX12 wave data registers Update the registers for gfx12. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	53efeba35d	drm/amdgpu: set different fw data addr for mec pipe For MEC fw data, different pipe should programed into different address. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Kenneth Feng	1e740df77f	drm/amd/amdgpu: workaround for the imu fw loading workaournd for the imu fw loading on gfx 12.0 without psp Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	f5b4c3236f	drm/amd: Move fw init from sw_init to early_init for imu v12 Move microcode loading from sw_init to early_init to align with the perious version of imu init sequence. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	2502af906b	drm/amdgpu: support S&R fw load for gfx v12 Support Save & Restore related fw load with backdoor RLC autoload type on gfx v12. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Jack Xiao	36b2ce4775	drm/amdgpu/gfx12: recalculate available compute rings to use Recalculate the number of compute rings to use based on the gfx hardware configuration. As needed reserve half of compute rings for mes, kgd can't use up all compute rings. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	29d36a9cfd	drm/amdgpu: skip imu related function if dpm=0 Only execute IMU related functions if dpm>0. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Kenneth Feng	32d1637689	drm/amd/amdgpu: imu fw loading support support imu related function for gfx v12. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	af204b76a7	drm/amdgpu: set cp fw address set for gfx v12 Split PFF/ME/MEC firmware address setting function from related load microcode funtion, as it's also needed for rlc autolad. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	52cb80c12e	drm/amdgpu: Add gfx v12_0 ip block support (v6) Initial support for GFX 12. v1: Add gfx v12_0 ip block support. (Likun) v2: Switch to gfx.kiq array. Move the vmhub from ring callback to ring. (Hawking) v3: Update various callback function impl. (Hawking) v4: Warning fixes (Alex) v5: squash in imu fix, csb, rlc autoload implementations (Alex) v6: Rebase (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Jack Xiao	4632bec9fa	drm/amdgpu/mes12: update data cache boundary Enlarge the data cache boundary. v2: use the fix data cache boundary. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Jonathan Kim	46c4766610	drm/amdgpu: fix trap enablement for gfx12 Fix request to MES to set SQ_SHADER_TBA_HI.trap_en for GFX12. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
shaoyunl	d817c470cb	drm/amdgpu: Enable MES to handle doorbell ring on unmapped queue On MES12, HW can monitor up to 2048 doorbells that not be mapped currently and trigger the interrupt to MES when these unmapped doorbell been ringed. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Jack Xiao	745f46b6a9	drm/amdgpu: enable mes v12 self test 1. fix available compute queue to use 2. enable mes v12 self test Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	6628f7762b	drm/amdgpu: set mes fw address for mes v12 Split the function of mes fimrware address setting from mes firmware load for mes v12, as it's also needed for rlc autoload. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Jack Xiao	785f0f9fe7	drm/amdgpu: Add mes v12_0 ip block support (v4) v1: Add mes v12_0 ip block support. (Jack) v2: Switch to gfx.kiq array. (Hawking) v3: Switch to AMDGPU_GFXHUB(0). (Hawking) v4: Rebase (Alex) Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	69d4c44e51	drm/amdgpu: init mes ucode name for gfx v12 Keep gfx v12 mes fw name to gc_12_x_x_mes.bin and gc_12_x_x_mes1.bin. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	00c9035633	drm/amdgpu: add rlc TOC header file for soc24 Add RLC autoload TOC header file for soc24 ASIC. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	e3a911bb38	drm/amdgpu: add new TOC structure Add new RLC_TABLE_OF_CONTENT structure definition. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	d8fd91f905	drm/amdgpu: add gfx12 clearstate header Add gfx12 clearstate register arrays. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	5638b1cfa7	drm/amdgpu/discovery: Set GC family for GC 12.0 IP Set GC family for GC 12.0 IPs. v2: squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:09 -04:00
Sonny Jiang	e56b042118	drm/amdgpu: IB test encode test package change for VCN5 VCN5 session info package interface changed Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:17:55 -04:00
Hawking Zhang	5f571c61b9	drm/amdgpu: Add gfx v9_4_4 ip block Add gfx v9_4_4 ip block support Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:49:16 -04:00
Hawking Zhang	a6bcffa596	drm/amdgpu: Add smu v13_0_14 ip block Add smu v13_0_14 ip block support Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:49:11 -04:00
Hawking Zhang	1dbd59f3f4	drm/amdgpu: Add psp v13_0_14 ip block Add psp v13_0_14 ip block support. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:49:05 -04:00
Hawking Zhang	3d1bb1a2e0	drm/amdgpu: Add sdma v4_4_5 ip block Add sdma v4_4_5 ip block support Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:48:57 -04:00
Aurabindo Pillai	c45211adfa	drm/amd: Override DCN410 IP version Override DCN IP version to 4.0.1 from 4.1.0 temporarily until change is made in DC codebase to use 4.1.0 Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:48:46 -04:00
Lang Yu	2d6f49ee84	drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30 Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:44:10 -04:00
Yunxiang Li	4752cac300	drm/amdgpu: Move ras resume into SRIOV function This is part of the reset, move it into the reset function. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:44:02 -04:00
Peyton Lee	3f19cffde9	drm/amdgpu/vpe: fix vpe dpm clk ratio setup failed Some version of BIOS does not enable all clock levels, resulting in high level clock frequency of 0. The number of valid CLKs must be confirmed in advance. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:43:48 -04:00
Yunxiang Li	6e4aa08fa9	drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic The retry loop for SRIOV reset have refcount and memory leak issue. Depending on which function call fails it can potentially call amdgpu_amdkfd_pre/post_reset different number of times and causes kfd_locked count to be wrong. This will block all future attempts at opening /dev/kfd. The retry loop also leakes resources by calling amdgpu_virt_init_data_exchange multiple times without calling the corresponding fini function. Align with the bare-metal reset path which doesn't have these issues. This means taking the amdgpu_amdkfd_pre/post_reset functions out of the reset loop and calling amdgpu_device_pre_asic_reset each retry which properly free the resources from previous try by calling amdgpu_virt_fini_data_exchange. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:41:05 -04:00
Aurabindo Pillai	a5b843269a	drm/amd: Enable DCN410 init Enable initializing Display Manager for DCN410 IP Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:57 -04:00
Yunxiang Li	25c01191c2	drm/amdgpu: Add reset_context flag for host FLR There are other reset sources that pass NULL as the job pointer, such as amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if the FLR comes from the host does not work. Add a flag in reset_context to explicitly mark host triggered reset, and set this flag when we receive host reset notification. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:50 -04:00
Yunxiang Li	f4322b9f8a	drm/amdgpu: Fix two reset triggered in a row Some times a hang GPU causes multiple reset sources to schedule resets. The second source will be able to trigger an unnecessary reset if they schedule after we call amdgpu_device_stop_pending_resets. Move amdgpu_device_stop_pending_resets to after the reset is done. Since at this point the GPU is supposedly in a good state, any reset scheduled after this point would be a legitimate reset. Remove unnecessary and incorrect checks for amdgpu_in_reset that was kinda serving this purpose. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:44 -04:00
Zhigang Luo	f5007c67fc	drm/amdgpu: update vf to pf message retry from 2 to 5 increase retry times to wait host has enough time to complete reset. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:38 -04:00
Zhigang Luo	3bcc0ee147	drm/amdgpu: avoid reading vf2pf info size from FB VF can't access FB when host is doing mode1 reset. Using sizeof to get vf2pf info size, instead of reading it from vf2pf header stored in FB. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:32 -04:00
Geert Uytterhoeven	05b8b6dd22	Revert "drm: Switch DRM_DISPLAY_HELPER to depends on" This reverts commit `e075e496f5`, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/1ba76cc4d96a8afefff5d1bc42fb1e1329c5da68.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-02 17:58:23 +02:00
Geert Uytterhoeven	7fe302ae19	Revert "drm: Switch DRM_DISPLAY_DP_HELPER to depends on" This reverts commit `0323287de8`, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/89ac456805746b6d0c888f10c5120b11aacd3319.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-02 17:58:21 +02:00
Geert Uytterhoeven	9573446953	Revert "drm: Switch DRM_DISPLAY_HDCP_HELPER to depends on" This reverts commit `3166e7e6d9`, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/a40e70a0abd3d841c23c107d452a43fdd70ef37a.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-02 17:58:21 +02:00
Geert Uytterhoeven	d7c128cb77	Revert "drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on" This reverts commit `f6d2dc03fa`, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/bd288a5943dab8609f2d1f2bf413595a61df727a.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-02 17:58:20 +02:00
Thomas Zimmermann	aae4682e5d	drm/fbdev-generic: Convert to fbdev-ttm Only TTM-based drivers use fbdev-generic. Rename it to fbdev-ttm and change the symbol infix from _generic_ to _ttm_. Link the source file into TTM helpers, so that it is only build if TTM-based drivers have been selected. Select DRM_TTM_HELPER for loongson. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419083331.7761-43-tzimmermann@suse.de	2024-05-02 11:33:32 +02:00
Shashank Sharma	705d0480e6	drm/amdgpu: fix doorbell regression This patch adds a missed handling of PL domain doorbell while handling VRAM faults. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: `a6ff969fe9` ("drm/amdgpu: fix visible VRAM handling during faults") Reviewed-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 21:59:16 -04:00
Christian König	d3a9331a65	drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2 This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes were suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. v2: add missing NULL check Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `94aeb41173` ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-04-30 21:39:57 -04:00
Mukul Joshi	f06446ef23	drm/amdgpu: Fix VRAM memory accounting Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 21:22:50 -04:00
Tim Huang	0fa4c25db8	drm/amdgpu: fix uninitialized scalar variable warning Clear warning that field bp is uninitialized when calling amdgpu_virt_ras_add_bps. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:04:15 -04:00
Likun Gao	f45ed399d7	drm/amdgpu/discovery: add sdma v7_0 ip block Add sdma v7_0 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:52 -04:00
Likun Gao	4badb9999b	drm/amdgpu: provide more ucode name shown via id Provide some lost ucode name shown via firmware ID. v2: fix whitespace (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:47 -04:00
Likun Gao	807d90b5ef	drm/amdgpu: support SDMA v3 struct fw front door load Add support for new SDMA firmware struct (V3) with PSP front door load type. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:42 -04:00
Jack Xiao	5251b56e38	drm/amdgpu/sdma7: set sdma hang watchdog Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers to improve SDMA reliability. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:36 -04:00
Likun Gao	b412351e91	drm/amdgpu: Add sdma v7_0 ip block support (v7) v1: Add sdma v7_0 ip block support. (Likun) v2: Move vmhub from ring_funcs to ring. (Hawking) v3: Switch to AMDGPU_GFXHUB(0). (Hawking) v4: Move microcode init into early_init. (Likun) v5: Fix warnings (Alex) v6: Squash in various fixes (Alex) v7: Rebase (Alex) v8: Rebase (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:32 -04:00
Likun Gao	9989a924aa	drm/amdgpu: Add sdma fw v3 structure Add sdma firmware struct version 3 to support sdma v7_0 firmware. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:12 -04:00
Dan Carpenter	6769a23697	drm/amdgpu: Fix signedness bug in sdma_v4_0_process_trap_irq() The "instance" variable needs to be signed for the error handling to work. Fixes: `8b2faf1a4f` ("drm/amdgpu: add error handle to avoid out-of-bounds") Reviewed-by: Bob Zhou <bob.zhou@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:53 -04:00
Likun Gao	e8a31b4e81	drm/amdgpu: Add new members for sdma v7_0 fw Add new members in sdma instance structure for sdma v7_0 firmware. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:49 -04:00
Likun Gao	1b838189ed	drm/amdgpu/discovery: Add gmc v12_0 ip block Add gmc v12_0 ip block. v2: Squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:39 -04:00
Shashank Sharma	04790139c5	drm/amdgpu: fix doorbell regression This patch adds a missed handling of PL domain doorbell while handling VRAM faults. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: `a6ff969fe9` ("drm/amdgpu: fix visible VRAM handling during faults") Reviewed-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:04 -04:00
Hawking Zhang	980a0a9452	drm/amdgpu: support gfx v12 specific pte/pde fields Add gfx v12 pte/pde support to gmc common helper. v2: squash in fixes (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:01 -04:00
Hawking Zhang	f3c3dd1207	drm/amdgpu: Set pte_is_pte flag in gmc v12 gart pte_is_pte is new flag introduced in gmc v12 that needs to be set by default for pte. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:51 -04:00
Hawking Zhang	075b44aa21	drm/amdgpu: Add gmc v12_0 ip block support (v7) Add initial support for GMC v12. v1: Add gmc v12_0 ip block support. v2: Switch to gfx.kiq array. v3: Switch to vmhubs_mask. v4: Switch to AMDGPU_MMHUB0(0) and AMDGPU_GFXHUB(0) v5: Rebase (Alex) v6: Squash in fixes for AGP handling, gfxhub init order, vmhub index (Alex) v7: Rebase (Alex) v8: squash in ecc fix (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:39 -04:00
Hawking Zhang	2d1d875656	drm/amdgpu: Add gfx v12 pte/pde format change Add gfx v12 pte/pde format change. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:35 -04:00
Likun Gao	47677629f6	drm/amdgpu: Add gfxhub v12_0 ip block support (v3) Add initial gfxhub v12 support. v1: Add gfxhub v12_0 ip block support (Likun) v2: Switch to AMDGPU_GFXHUB(0) (Hawking) v3: Squash in keep default error response mode (Hawking) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:30 -04:00
Jack Xiao	27694eace5	drm/amdgpu/mes11: increase waiting time for engine ready mes schq engine require more waiting time for engine ready before packet submission. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:23 -04:00
Jack Xiao	f9d8c5c785	drm/amdgpu/gfx: enable mes to map legacy queue support Enable mes to map legacy queue support. v2: kiq_set_resources is required. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:01 -04:00
Philip Yang	d53ce02352	drm/amdkfd: Evict BO itself for contiguous allocation If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. v6: user context should use interruptible call (Felix) Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:59:51 -04:00
YiPeng Chai	3ca73073f4	drm/amdgpu: Remove redundant function call Remove redundant function call. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:59:14 -04:00
YiPeng Chai	2c0410fbee	rm/amdgpu: Remove unused code Remove unused code. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:59:08 -04:00
Tim Huang	ebbc2ada5c	drm/amdgpu: fix overflowed array index read warning Clear overflowed array index read warning by cast operation. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:59:03 -04:00
Tim Huang	22a5daaec0	drm/amdgpu: fix potential resource leak warning Clear resource leak warning that when the prepare fails, the allocated amdgpu job object will never be released. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:53 -04:00
Yang Wang	5eccab32c1	drm/amdgpu: avoid dump mca bank log muti times during ras ISR because the ue valid mca count will only be cleared after gpu reset, so only dump mca log on the first time to get mca bank after receive RAS interrupt. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:47 -04:00
Yang Wang	76ad30f51a	drm/amdgpu: add MCA smu cache support v1: because SMU CE valid mca bank will be cleared after reading, this patch adds mca cache at the driver level to ensure that the mca bank is not lost. v2: refine amdgpu_mca_init/fini/reset() function name. v3: add mca_cache.lock support only add CE bank to mca bank cache. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:41 -04:00
Yang Wang	8fb20d9551	drm/amdgpu: add amdgpu MCA bank dispatch function support - Refine mca driver code. - Centralize mca bank dispatch code logic. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:34 -04:00
Hawking Zhang	8e9f1575d1	drm/amdgpu: Add mmhub v4_1_0 ip block support (v4) Add initial support for MMHUB 4.1.0. v1: Add mmhub v4_1_0 ip block support. v2: Switch to AMDGPU_MMHUB0(0). v3: squash in fix for ip version check (Alex) v4: squash in vm_contexts_disable fix (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:25 -04:00
Philip Yang	7005b169da	drm/amdgpu: Evict BOs from same process for contiguous allocation When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system memory then retry the allocation, this skips the KFD BOs from the same process because KFD require all BOs are resident for user queues. If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow TTM evict KFD BOs from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:15 -04:00
Philip Yang	b2dba064c9	drm/amdgpu: Handle sg size limit for contiguous allocation Define macro AMDGPU_MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than AMDGPU_MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:08 -04:00
Yang Wang	7e0357bef4	drm/amdgpu: remove unused MCA driver codes - remove unused callback functions. - make part of mca functions static and refine the function order. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:51:44 -04:00
Likun Gao	d22c075676	drm/amdgpu/discovery: Add common soc24 ip block Add common soc24 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:51:06 -04:00
Jack Xiao	ff518e13eb	drm/amdgpu/mes11: adjust mes initialization sequence Adjust mes queue initialization before kgq/kcq initialization to enable mes mapping legacy queue. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:50:47 -04:00
Jack Xiao	486eb6b5a8	drm/amdgpu/mes11: add mes mapping legacy queue support Add mes11 map legacy queue packet submission. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:50:40 -04:00
Christian König	ffda708148	drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2 This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes were suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. v2: add missing NULL check Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `94aeb41173` ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-04-30 09:49:51 -04:00
Hawking Zhang	98b912c50e	drm/amdgpu: Add soc24 common ip block (v2) Add initial soc24 support. v1: Add soc24 common ip block. v2: Switch to new select_se_sh/enter_safe_mode interface. v3: squash in correct ext rev id, etc. (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:46:51 -04:00
Philip Yang	155ce502e9	drm/amdgpu: Support contiguous VRAM allocation RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:44:46 -04:00
Ma Jun	c0d6bd3cd2	drm/amdgpu: Fix uninitialized variable warning in amdgpu_afmt_acr Assign value to clock to fix the warning below: "Using uninitialized value res. Field res.clock is uninitialized" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:44:34 -04:00
Dave Airlie	4a56c0ed5a	amd-drm-next-6.10-2024-04-26: amdgpu: - Misc code cleanups and refactors - Support setting reset method at runtime - Report OD status - SMU 14.0.1 fixes - SDMA 4.4.2 fixes - VPE fixes - MES fixes - Update BO eviction priorities - UMSCH fixes - Reset fixes - Freesync fixes - GFXIP 9.4.3 fixes - SDMA 5.2 fixes - MES UAF fix - RAS updates - Devcoredump updates for dumping IP state - DSC fixes - JPEG fix - Fix VRAM memory accounting - VCN 5.0 fixes - MES fixes - UMC 12.0 updates - Modify contiguous flags handling - Initial support for mapping kernel queues via MES amdkfd: - Fix rescheduling of restore worker - VRAM accounting for SVM migrations - mGPU fix - Enable SQ watchpoint for gfx10 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZiwmAQAKCRC93/aFa7yZ 2Hp0AP9SOxHrKDPErqCQbenkZXNoPFtLJWAhhKZ9zEb7YgjufwD/Zpk31mO0FpVu kUYSAQ4OFXinVVLtmVH3fE/vK+1ccwM= =oFW7 -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.10-2024-04-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-26: amdgpu: - Misc code cleanups and refactors - Support setting reset method at runtime - Report OD status - SMU 14.0.1 fixes - SDMA 4.4.2 fixes - VPE fixes - MES fixes - Update BO eviction priorities - UMSCH fixes - Reset fixes - Freesync fixes - GFXIP 9.4.3 fixes - SDMA 5.2 fixes - MES UAF fix - RAS updates - Devcoredump updates for dumping IP state - DSC fixes - JPEG fix - Fix VRAM memory accounting - VCN 5.0 fixes - MES fixes - UMC 12.0 updates - Modify contiguous flags handling - Initial support for mapping kernel queues via MES amdkfd: - Fix rescheduling of restore worker - VRAM accounting for SVM migrations - mGPU fix - Enable SQ watchpoint for gfx10 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240426221245.1613332-1-alexander.deucher@amd.com	2024-04-30 14:43:00 +10:00
Daniel Vetter	b84bc94852	Linux 6.9-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmYutdweHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGG5oH/3Ggwz5N+gdEK3np qxUfpgJTWo+fJ6xhRLGy84TnEZ9s9UnK0Su6UuVyOb0F/2Y8hesJ6iwB16yQFKNe Nore/VvuBZ+utshz5N20yNyPugNOP74GGbyOm+d+iJwIKnmSE8jSjWyMwNFHJCZM BfoBxZrpwU/YD/0PD1KkI44jhPX1H/EcEmtNiklLnuYvJydTWiRFeku+CSgcOiRz 6fIFrcZMREZrAytMQSwteBAvI3vWblC0S39ZgJmtZt+oi+s1ksIUNG8Mm5uMAiyF LcGep2tWV06x9uB9XVvrk0qco/kOaUgYuQrHQwNu9LzsaUdYGKqBBMk41SoNXFkB hBXN26U= =N2E2 -----END PGP SIGNATURE----- Merge v6.9-rc6 into drm-next Thomas needs the defio fixes, Maíra needs the vkms fixes and Joonas has some fun with i915-gem conflicts. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2024-04-29 20:22:39 +02:00
Aurabindo Pillai	96557f785a	drm/amd: GFX12 changes for converting tiling flags to modifiers GFX12 swizzle mode and GCC formats changed and is much simpler. Use a seperate function for the same. Changes: * Swizzle mode is now 3 bits only * DCC enablement doesn't come from tiling_flags, it is always set in modifiers * DCC max compressed block size of 128B Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:58 -04:00
Jesse Zhang	ea686fef54	drm/amdgpu: fix the warning about the expression (int)size - len Converting size from size_t to int may overflow. v2: keep reverse xmas tree order (Christian) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Jack Xiao	029c2b0389	drm/amdgpu/mes: add mes mapping legacy queue support Add mes mapping legacy queue framework support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Tim Huang	9a5f15d2a2	drm/amdgpu: fix uninitialized scalar variable warning Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Sunil Khatri	2a8f7464d3	drm/amdgpu: skip ip dump if devcoredump flag is set Do not dump the ip registers during driver reload in passthrough environment. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:44 -04:00
Arunpravin Paneer Selvam	e362b7c8f8	drm/amdgpu: Modify the contiguous flags behaviour Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's placement function. This patch will change the default behaviour of the two flags. When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS - This means contiguous is not mandatory. - we will try to allocate the contiguous buffer. Say if the allocation fails, we fallback to allocate the individual pages. When we setTTM_PL_FLAG_CONTIGUOUS - This means contiguous allocation is mandatory. - we are setting this in amdgpu_bo_pin_restricted() before bo validation and check this flag in the vram manager file. - if this is set, we should allocate the buffer pages contiguously. the allocation fails, we return -ENOSPC. v2: - keep the mem_flags and bo->flags check as is(Christian) - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the amdgpu_bo_pin_restricted function placement range iteration loop(Christian) - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block (Christian) - Keep the kernel BO allocation as is(Christain) - If BO pin vram allocation failed, we need to return -ENOSPC as RDMA cannot work with scattered VRAM pages(Philip) v3(Christian): - keep contiguous flag handling outside of pages_per_block calculation - remove the hacky implementation in contiguous flag error handling code v4(Christian): - use any variable and return value for non-contiguous fallback v5: rebase to amd-staging-drm-next branch Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:44 -04:00
Lancelot SIX	bd31e5026d	drm/amdkfd: Enable SQ watchpoint for gfx10 There are new control registers introduced in gfx10 used to configure hardware watchpoints triggered by SMEM instructions: SQ_WATCH{0,1,2,3}_{CNTL_ADDR_HI,ADDR_L}. Those registers work in a similar way as the TCP_WATCH* registers currently used for gfx9 and above. This patch adds support to program the SQ_WATCH registers for gfx10. The SQ_WATCH?_CNTL.MASK field has one bit more than TCP_WATCH?_CNTL.MASK, so SQ watchpoints can have a finer granularity than TCP_WATCH watchpoints. In this patch, we keep the capabilities advertised to the debugger unchanged (HSA_DBG_WATCH_ADDR_MASK_*_BIT_GFX10) as this reflects what both TCP and SQ watchpoints can do and both watchpoints are configured together. Signed-off-by: Lancelot SIX <lancelot.six@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:44 -04:00
Srinivasan Shanmugam	acce6479e3	drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microcode() The function gfx_v9_4_3_init_microcode in gfx_v9_4_3.c was generating about potential truncation of output when using the snprintf function. The issue was due to the size of the buffer 'ucode_prefix' being too small to accommodate the maximum possible length of the string being written into it. The string being written is "amdgpu/%s_mec.bin" or "amdgpu/%s_rlc.bin", where %s is replaced by the value of 'chip_name'. The length of this string without the %s is 16 characters. The warning message indicated that 'chip_name' could be up to 29 characters long, resulting in a total of 45 characters, which exceeds the buffer size of 30 characters. To resolve this issue, the size of the 'ucode_prefix' buffer has been reduced from 30 to 15. This ensures that the maximum possible length of the string being written into the buffer will not exceed its size, thus preventing potential buffer overflow and truncation issues. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c: In function ‘gfx_v9_4_3_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 379 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~ ...... 439 \| r = gfx_v9_4_3_init_rlc_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 379 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 413 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~ ...... 443 \| r = gfx_v9_4_3_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 413 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `8630112969` ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0") Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
YiPeng Chai	6f3b69139c	drm/amdgpu: Fix ras mode2 reset failure in ras aca mode Fix ras mode2 reset failure in ras aca mode. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Bob Zhou	506c245f3f	drm/amdgpu: fix double free err_addr pointer warnings In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages will be run many times so that double free err_addr in some special case. So set the err_addr to NULL to avoid the warnings. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Jesse Zhang	7bfd16d0ec	drm/amdgpu: initialize the last_jump_jiffies in atom_exec_context The parameter "last_jump_jiffies" should be initialized before being used in the function atom_op_jump. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Jesse Zhang	2d10c3dbde	drm/amdgpu: add check before free wb entry Check if ring is not a mes queue before freeing the wb entry, because we only allocate a wb entry when it's not a mes queue. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Bob Zhou	cd48b97ce7	drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be conducted to put wrong value. So return and check the i2c transfer result. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00

... 18 19 20 21 22 ...

15900 Commits