linux

Commit Graph

Author	SHA1	Message	Date
Philip Yang	541c341d2e	Revert "drm/amdkfd: Use partial migrations in GPU page faults" This reverts commit `dc427a473e`. The change prevents migrating the entire range to VRAM because retry fault restore_pages map the remaining system memory range to GPUs. It will work correctly to submit together with partial mapping to GPU patch later. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:23 -04:00
Philip Yang	afaec204d2	Revert "drm/amdkfd:remove unused code" This reverts commit `f9caf6cdd5`. Needed for the next revert patch. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:23 -04:00
Srinivasan Shanmugam	0300882ed6	drm/amdkfd: Address 'remap_list' not described in 'svm_range_add' Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2073: warning: Function parameter or member 'remap_list' not described in 'svm_range_add' Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:22 -04:00
Jesse Zhang	282c1d7930	drm/amdkfd: Fix shift out-of-bounds issue [ 567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned int' [ 567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G OE 6.2.0-34-generic #34~22.04.1-Ubuntu [ 567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 09/25/2023 [ 567.614504] Workqueue: events send_exception_work_handler [amdgpu] [ 567.614748] Call Trace: [ 567.614750] <TASK> [ 567.614753] dump_stack_lvl+0x48/0x70 [ 567.614761] dump_stack+0x10/0x20 [ 567.614763] __ubsan_handle_shift_out_of_bounds+0x156/0x310 [ 567.614769] ? srso_alias_return_thunk+0x5/0x7f [ 567.614773] ? update_sd_lb_stats.constprop.0+0xf2/0x3c0 [ 567.614780] svm_range_split_by_granularity.cold+0x2b/0x34 [amdgpu] [ 567.615047] ? srso_alias_return_thunk+0x5/0x7f [ 567.615052] svm_migrate_to_ram+0x185/0x4d0 [amdgpu] [ 567.615286] do_swap_page+0x7b6/0xa30 [ 567.615291] ? srso_alias_return_thunk+0x5/0x7f [ 567.615294] ? __free_pages+0x119/0x130 [ 567.615299] handle_pte_fault+0x227/0x280 [ 567.615303] __handle_mm_fault+0x3c0/0x720 [ 567.615311] handle_mm_fault+0x119/0x330 [ 567.615314] ? lock_mm_and_find_vma+0x44/0x250 [ 567.615318] do_user_addr_fault+0x1a9/0x640 [ 567.615323] exc_page_fault+0x81/0x1b0 [ 567.615328] asm_exc_page_fault+0x27/0x30 [ 567.615332] RIP: 0010:__get_user_8+0x1c/0x30 Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:21 -04:00
Alex Sierra	7ef6b2d4b7	drm/amdkfd: remap unaligned svm ranges that have split Split SVM ranges that have been mapped into 2MB page table entries, require to be remap in case the split has happened in a non-aligned VA. [WHY]: This condition causes the 2MB page table entries be split into 4KB PTEs. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:29 -04:00
Jesse Zhang	f9caf6cdd5	drm/amdkfd:remove unused code Function svm_range_split_by_grinity is not used, so it is removed. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:26 -04:00
Jiapeng Chong	8e9a110cb2	drm/amdkfd: clean up some inconsistent indenting No functional modification involved. drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:305 svm_range_free() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6804 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:52 -04:00
Dave Airlie	27442758e9	amd-drm-next-6.7-2023-10-13: amdgpu: - DC replay fixes - Misc code cleanups and spelling fixes - Documentation updates - RAS EEPROM Updates - FRU EEPROM Updates - IP discovery updates - SR-IOV fixes - RAS updates - DC PQ fixes - SMU 13.0.6 updates - GC 11.5 Support - NBIO 7.11 Support - GMC 11 Updates - Reset fixes - SMU 11.5 Updates - SMU 13.0 OD support - Use flexible arrays for bo list handling - W=1 Fixes - SubVP fixes - DPIA fixes - DCN 3.5 Support - Devcoredump fixes - VPE 6.1 support - VCN 4.0 Updates - S/G display fixes - DML fixes - DML2 Support - MST fixes - VRR fixes - Enable seamless boot in more cases - Enable content type property for HDMI - OLED fixes - Rework and clean up GPUVM TLB flushing - DC ODM fixes - DP 2.x fixes - AGP aperture fixes - SDMA firmware loading cleanups - Cyan Skillfish GPU clock counter fix - GC 11 GART fix - Cache GPU fault info for userspace queries - DC cursor check fixes - eDP fixes - DC FP handling fixes - Variable sized array fixes - SMU 13.0.x fixes - IB start and size alignment fixes for VCN - SMU 14 Support - Suspend and resume sequence rework - vkms fix amdkfd: - GC 11 fixes - GC 10 fixes - Doorbell fixes - CWSR fixes - SVM fixes - Clean up GC info enumeration - Rework memory limit handling - Coherent memory handling fixes - Use partial migrations in GPU faults - TLB flush fixes - DMA unmap fixes - GC 9.4.3 fixes - SQ interrupt fix - GTT mapping fix - GC 11.5 Support radeon: - Misc code cleanups - W=1 Fixes - Fix possible buffer overflow - Fix possible NULL pointer dereference UAPI: - Add EXT_COHERENT memory allocation flags. These allow for system scope atomics. Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 - Add support for new VPE engine. This is a memory to memory copy engine with advanced scaling, CSC, and color management features Proposed mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25713 - Add INFO IOCTL interface to query GPU faults Proposed Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238 Proposed libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/298 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZSmDAQAKCRC93/aFa7yZ 2EdeAQC2lkQ9IHLOon5kIZUK+r9IPYlgFsii+qfmMPLBaMcuwgEA8F4eJln/cc9V 02EKhlapkggYXYa+uhOE2KTnWgMFJgI= =SEXq -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.7-2023-10-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.7-2023-10-13: amdgpu: - DC replay fixes - Misc code cleanups and spelling fixes - Documentation updates - RAS EEPROM Updates - FRU EEPROM Updates - IP discovery updates - SR-IOV fixes - RAS updates - DC PQ fixes - SMU 13.0.6 updates - GC 11.5 Support - NBIO 7.11 Support - GMC 11 Updates - Reset fixes - SMU 11.5 Updates - SMU 13.0 OD support - Use flexible arrays for bo list handling - W=1 Fixes - SubVP fixes - DPIA fixes - DCN 3.5 Support - Devcoredump fixes - VPE 6.1 support - VCN 4.0 Updates - S/G display fixes - DML fixes - DML2 Support - MST fixes - VRR fixes - Enable seamless boot in more cases - Enable content type property for HDMI - OLED fixes - Rework and clean up GPUVM TLB flushing - DC ODM fixes - DP 2.x fixes - AGP aperture fixes - SDMA firmware loading cleanups - Cyan Skillfish GPU clock counter fix - GC 11 GART fix - Cache GPU fault info for userspace queries - DC cursor check fixes - eDP fixes - DC FP handling fixes - Variable sized array fixes - SMU 13.0.x fixes - IB start and size alignment fixes for VCN - SMU 14 Support - Suspend and resume sequence rework - vkms fix amdkfd: - GC 11 fixes - GC 10 fixes - Doorbell fixes - CWSR fixes - SVM fixes - Clean up GC info enumeration - Rework memory limit handling - Coherent memory handling fixes - Use partial migrations in GPU faults - TLB flush fixes - DMA unmap fixes - GC 9.4.3 fixes - SQ interrupt fix - GTT mapping fix - GC 11.5 Support radeon: - Misc code cleanups - W=1 Fixes - Fix possible buffer overflow - Fix possible NULL pointer dereference UAPI: - Add EXT_COHERENT memory allocation flags. These allow for system scope atomics. Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 - Add support for new VPE engine. This is a memory to memory copy engine with advanced scaling, CSC, and color management features Proposed mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25713 - Add INFO IOCTL interface to query GPU faults Proposed Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238 Proposed libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/298 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231013175758.1735031-1-alexander.deucher@amd.com	2023-10-18 16:08:07 +10:00
Thomas Gleixner	b9655e702d	x86/cpu: Encapsulate topology information in cpuinfo_x86 The topology related information is randomly scattered across cpuinfo_x86. Create a new structure cpuinfo_topo and move in a first step initial_apicid and apicid into it. Aside of being better readable this is in preparation for replacing the horribly fragile CPU topology evaluation code further down the road. Consolidate APIC ID fields to u32 as that represents the hardware type. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085112.269787744@linutronix.de	2023-10-10 14:38:17 +02:00
Arvind Yadav	367a0af433	drm/amdkfd: get doorbell's absolute offset based on the db_size Here, Adding db_size in byte to find the doorbell's absolute offset for both 32-bit and 64-bit doorbell sizes. So that doorbell offset will be aligned based on the doorbell size. v2: - Addressed the review comment from Felix. v3: - Adding doorbell_size as parameter to get db absolute offset. v4: Squash the two patches into one. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 17:02:34 -04:00
Xiaogang Chen	dc427a473e	drm/amdkfd: Use partial migrations in GPU page faults This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. A svm range may include pages from both system ram and vram of one gpu now. These chagnes are expected to improve migration performance and reduce mmu callback and TLB flush workloads. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:58:59 -04:00
Philip Yang	23de7616f3	drm/amdkfd: Fix EXT_COHERENT memory allocation crash If there is no VRAM domain, bo_node is NULL and this causes crash. Refactor the change, and use the module parameter as higher privilege. Need another patch to support override PTE flag on APU. Fixes: `5f248462c6` ("drm/amdgpu: Add EXT_COHERENT memory allocation flags") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 18:40:30 -04:00
Alex Deucher	0021d70a06	drm/amdkfd: drop struct kfd_cu_info I think this was an abstraction back from when kfd supported both radeon and amdgpu. Since we just support amdgpu now, there is no more need for this and we can use the amdgpu structures directly. This also avoids having the kfd_cu_info structures on the stack when inlining which can blow up the stack. Cc: Arnd Bergmann <arnd@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-03 15:41:13 -04:00
Alex Deucher	4ff91f2185	drm/amdkfd: reduce stack size in kfd_topology_add_device() kfd_topology.c:2082:1: warning: the frame size of 1440 bytes is larger than 1024 bytes Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2866 Cc: Arnd Bergmann <arnd@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-03 15:40:50 -04:00
Xiaogang Chen	709c348261	drm/amdkfd: Fix a race condition of vram buffer unref in svm code prange->svm_bo unref can happen in both mmu callback and a callback after migrate to system ram. Both are async call in different tasks. Sync svm_bo unref operation to avoid random "use-after-free". Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Tested-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:44:29 -04:00
Philip Yang	eb3c357bcb	drm/amdkfd: Handle errors from svm validate and map If new range is splited to multiple pranges with max_svm_range_pages alignment and added to update_list, svm validate and map should keep going after error to make sure prange->mapped_to_gpu flag is up to date for the whole range. svm validate and map update set prange->mapped_to_gpu after mapping to GPUs successfully, otherwise clear prange->mapped_to_gpu flag (for update mapping case) instead of setting error flag, we can remove the redundant error flag to simpliy code. Refactor to remove goto and update prange->mapped_to_gpu flag inside svm_range_lock, to guarant we always evict queues or unmap from GPUs if there are invalid ranges. After svm validate and map return error -EAGIN, the caller retry will update the mapping for the whole range again. Fixes: `c22b044070` ("drm/amdkfd: flag added to handle errors from svm validate and map") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: James Zhu <james.zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 17:00:23 -04:00
Xiaogang Chen	7bfaa160ca	drm/amdkfd: fix some race conditions in vram buffer alloc/free of svm code This patch fixes: 1: ref number of prange's svm_bo got decreased by an async call from hmm. When wait svm_bo of prange got released we shoul also wait prang->svm_bo become NULL, otherwise prange->svm_bo may be set to null after allocate new vram buffer. 2: During waiting svm_bo of prange got released in a while loop should reschedule current task to give other tasks oppotunity to run, specially the the workque task that handles svm_bo ref release, otherwise we may enter to softlock. Signed-off-by: Xiaogang.Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 17:00:02 -04:00
Philip Yang	101b810430	drm/amdkfd: Move dma unmapping after TLB flush Otherwise GPU may access the stale mapping and generate IOMMU IO_PAGE_FAULT. Move this to inside p->mutex to prevent multiple threads mapping and unmapping concurrently race condition. After kfd_mem_dmaunmap_attachment is removed from unmap_bo_from_gpuvm, kfd_mem_dmaunmap_attachment is called if failed to map to GPUs, and before free the mem attachment in case failed to unmap from GPUs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:10 -04:00
YuBiao Wang	cc39f9ccb8	drm/amdkfd: Use gpu_offset for user queue's wptr Directly use tbo's start address will miss the domain start offset. Need to use gpu_offset instead. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-09-20 17:30:42 -04:00
Philip Yang	c99b161280	drm/amdkfd: Remove svm range validated_once flag The validated_once flag is not used after the prefault was removed, The prefault was needed to ensure validate all system memory pages at least once before mapping or migrating the range to GPU. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:25:17 -04:00
Xiaogang Chen	df954b695c	drm/amdkfd: Separate dma unmap and free of dma address array operations We do not need free dma address array of svm_range each time we do dma unmap for pages in svm_range as we can reuse the same array. Only free it when free svm_range. Separate these two operations and use them accordingly. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:22 -04:00
YuBiao Wang	b157df66d8	drm/amdkfd: Use gpu_offset for user queue's wptr Directly use tbo's start address will miss the domain start offset. Need to use gpu_offset instead. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:09 -04:00
David Francis	5f248462c6	drm/amdgpu: Add EXT_COHERENT memory allocation flags These flags (for GEM and SVM allocations) allocate memory that allows for system-scope atomic semantics. On GFX943 these flags cause caches to be avoided on non-local memory. On all other ASICs they are identical in functionality to the equivalent COHERENT flags. Corresponding Thunk patch is at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 Reviewed-by: David Yat Sin <David.YatSin@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:06 -04:00
Jonathan Kim	d92e55565c	drm/amdkfd: fix add queue process context clear without runtime enable There are cases where HSA runtime is not enabled through the AMDKFD_IOC_RUNTIME_ENABLE call when adding queues and the MES ADD_QUEUE API should clear the MES process context instead of SET_SHADER_DEBUGGER. Such examples are legacy HSA runtime builds that do not support the current exception handling and running KFD tests. The only time ADD_QUEUE.skip_process_ctx_clear is required is for debugger use cases where a debugged process is always runtime enabled when adding a queue. Tested-by: Shikai Guo <shikai.guo@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Eric Huang <jinhuieric.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:23:57 -04:00
Lijo Lazar	4e8303cf2c	drm/amdgpu: Use function for IP version check Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:23:28 -04:00
Harish Kasiviswanathan	edcfe22985	drm/amdkfd: Insert missing TLB flush on GFX10 and later Heavy-weight TLB flush is required after unmap on all GPUs for correctness and security. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-09-12 17:45:40 -04:00
Harish Kasiviswanathan	4412f8529c	drm/amdkfd: Insert missing TLB flush on GFX10 and later Heavy-weight TLB flush is required after unmap on all GPUs for correctness and security. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-12 17:30:01 -04:00
David Francis	9296da8c40	drm/amdkfd: Checkpoint and restore queues on GFX11 The code in kfd_mqd_manager_v11.c to support criu dump and restore of queue state was missing. Added it; should be equivalent to kfd_mqd_manager_v10.c. CC: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:22:38 -04:00
Mukul Joshi	fc6efed2c7	drm/amdkfd: Update CU masking for GFX 9.4.3 The CU mask passed from user-space will change based on different spatial partitioning mode. As a result, update CU masking code for GFX9.4.3 to work for all partitioning modes. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:17:27 -04:00
Mukul Joshi	0752e66e91	drm/amdkfd: Update cache info reporting for GFX v9.4.3 Update cache info reporting in sysfs to report the correct number of CUs and associated cache information based on different spatial partitioning modes. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:17:20 -04:00
Mukul Joshi	97e3c6a853	drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3 Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:16:31 -04:00
Mukul Joshi	2f06b27444	drm/amdkfd: Fix unaligned 64-bit doorbell warning This patch fixes the following unaligned 64-bit doorbell warning seen when submitting packets on HIQ on GFX v9.4.3 by making the HIQ doorbell 64-bit aligned. The warning is seen when GPU is loaded in any mode other than SPX mode. [ +0.000301] ------------[ cut here ]------------ [ +0.000003] Unaligned 64-bit doorbell [ +0.000030] WARNING: /amdkfd/kfd_doorbell.c:339 write_kernel_doorbell64+0x72/0x80 [ +0.000003] RIP: 0010:write_kernel_doorbell64+0x72/0x80 [ +0.000004] RSP: 0018:ffffc90004287730 EFLAGS: 00010246 [ +0.000005] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ +0.000003] RDX: 0000000000000001 RSI: ffffffff82837c71 RDI: 00000000ffffffff [ +0.000003] RBP: ffffc90004287748 R08: 0000000000000003 R09: 0000000000000001 [ +0.000002] R10: 000000000000001a R11: ffff88a034008198 R12: ffffc900013bd004 [ +0.000003] R13: 0000000000000008 R14: ffffc900042877b0 R15: 000000000000007f [ +0.000003] FS: 00007fa8c7b62000(0000) GS:ffff889f88400000(0000) knlGS:0000000000000000 [ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000003] CR2: 000056111c45aaf0 CR3: 00000001414f2002 CR4: 0000000000770ee0 [ +0.000003] PKRU: 55555554 [ +0.000002] Call Trace: [ +0.000004] <TASK> [ +0.000006] kq_submit_packet+0x45/0x50 [amdgpu] [ +0.000524] pm_send_set_resources+0x7f/0xc0 [amdgpu] [ +0.000500] set_sched_resources+0xe4/0x160 [amdgpu] [ +0.000503] start_cpsch+0x1c5/0x2a0 [amdgpu] [ +0.000497] kgd2kfd_device_init.cold+0x816/0xb42 [amdgpu] [ +0.000743] amdgpu_amdkfd_device_init+0x15f/0x1f0 [amdgpu] [ +0.000602] amdgpu_device_init.cold+0x1813/0x2176 [amdgpu] [ +0.000684] ? pci_bus_read_config_word+0x4a/0x80 [ +0.000012] ? do_pci_enable_device+0xdc/0x110 [ +0.000008] amdgpu_driver_load_kms+0x1a/0x110 [amdgpu] [ +0.000545] amdgpu_pci_probe+0x197/0x400 [amdgpu] Fixes: `c318666510` ("drm/amdgpu: use doorbell mgr for kfd kernel doorbells") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:15:49 -04:00
Mukul Joshi	81faf9e0c3	drm/amdkfd: Fix reg offset for setting CWSR grace period This patch fixes the case where the code currently passes absolute register address and not the reg offset, which HWS expects, when sending the PM4 packet to set/update CWSR grace period. Additionally, cleanup the signature of build_grace_period_packet_info function as it no longer needs the inst parameter. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:15:43 -04:00
André Almeida	887db1e49a	drm/amdgpu: Merge debug module parameters Merge all developer debug options available as separated module parameters in one, making it obvious that are for developers. Drop the obsolete module options in favor of the new ones. Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:19:54 -04:00
David Francis	e379162adf	drm/amdkfd: Checkpoint and restore queues on GFX11 The code in kfd_mqd_manager_v11.c to support criu dump and restore of queue state was missing. Added it; should be equivalent to kfd_mqd_manager_v10.c. CC: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:16:42 -04:00
Mukul Joshi	68fa72a437	drm/amdgpu: Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES to conform with the naming convention followed in amdgpu_gfx.h. No functional change. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:15:41 -04:00
Mukul Joshi	f4fa8fcd25	drm/amdkfd: Update CU masking for GFX 9.4.3 The CU mask passed from user-space will change based on different spatial partitioning mode. As a result, update CU masking code for GFX9.4.3 to work for all partitioning modes. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:11:12 -04:00
Mukul Joshi	610cc82b1f	drm/amdkfd: Update cache info reporting for GFX v9.4.3 Update cache info reporting in sysfs to report the correct number of CUs and associated cache information based on different spatial partitioning modes. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:11:04 -04:00
Mukul Joshi	f705a6f021	drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3 Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:10:19 -04:00
Mukul Joshi	fe2b830073	drm/amdkfd: Fix unaligned 64-bit doorbell warning This patch fixes the following unaligned 64-bit doorbell warning seen when submitting packets on HIQ on GFX v9.4.3 by making the HIQ doorbell 64-bit aligned. The warning is seen when GPU is loaded in any mode other than SPX mode. [ +0.000301] ------------[ cut here ]------------ [ +0.000003] Unaligned 64-bit doorbell [ +0.000030] WARNING: /amdkfd/kfd_doorbell.c:339 write_kernel_doorbell64+0x72/0x80 [ +0.000003] RIP: 0010:write_kernel_doorbell64+0x72/0x80 [ +0.000004] RSP: 0018:ffffc90004287730 EFLAGS: 00010246 [ +0.000005] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ +0.000003] RDX: 0000000000000001 RSI: ffffffff82837c71 RDI: 00000000ffffffff [ +0.000003] RBP: ffffc90004287748 R08: 0000000000000003 R09: 0000000000000001 [ +0.000002] R10: 000000000000001a R11: ffff88a034008198 R12: ffffc900013bd004 [ +0.000003] R13: 0000000000000008 R14: ffffc900042877b0 R15: 000000000000007f [ +0.000003] FS: 00007fa8c7b62000(0000) GS:ffff889f88400000(0000) knlGS:0000000000000000 [ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000003] CR2: 000056111c45aaf0 CR3: 00000001414f2002 CR4: 0000000000770ee0 [ +0.000003] PKRU: 55555554 [ +0.000002] Call Trace: [ +0.000004] <TASK> [ +0.000006] kq_submit_packet+0x45/0x50 [amdgpu] [ +0.000524] pm_send_set_resources+0x7f/0xc0 [amdgpu] [ +0.000500] set_sched_resources+0xe4/0x160 [amdgpu] [ +0.000503] start_cpsch+0x1c5/0x2a0 [amdgpu] [ +0.000497] kgd2kfd_device_init.cold+0x816/0xb42 [amdgpu] [ +0.000743] amdgpu_amdkfd_device_init+0x15f/0x1f0 [amdgpu] [ +0.000602] amdgpu_device_init.cold+0x1813/0x2176 [amdgpu] [ +0.000684] ? pci_bus_read_config_word+0x4a/0x80 [ +0.000012] ? do_pci_enable_device+0xdc/0x110 [ +0.000008] amdgpu_driver_load_kms+0x1a/0x110 [amdgpu] [ +0.000545] amdgpu_pci_probe+0x197/0x400 [amdgpu] Fixes: `c318666510` ("drm/amdgpu: use doorbell mgr for kfd kernel doorbells") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:07:47 -04:00
Mukul Joshi	56d6daa3c7	drm/amdkfd: Fix reg offset for setting CWSR grace period This patch fixes the case where the code currently passes absolute register address and not the reg offset, which HWS expects, when sending the PM4 packet to set/update CWSR grace period. Additionally, cleanup the signature of build_grace_period_packet_info function as it no longer needs the inst parameter. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:07:33 -04:00
Linus Torvalds	a48fa7efaf	drm fixes for 6.6-rc1 amdgpu: - Display replay fixes - Fixes for headless boards - Fix documentation breakage - RAS fixes - Handle newer IP discovery tables - SMU 13.0.6 fixes - SR-IOV fixes - Display vstartup fixes - NBIO 7.9 fixes - Display scaling mode fixes - Debugfs power reporting fix - GC 9.4.3 fixes - Dirty framebuffer fixes for fbcon - eDP fixes - DCN 3.1.5 fix - Display ODM fixes - GPU core dump fix - Re-enable zops property now that IGT test is fixed - Fix possible UAF in CS code - Cursor degamma fix amdkfd: - HMM fixes - Interrupt masking fix - GFX11 MQD fixes i915: - Mark requests for GuC virtual engines to avoid use-after-free nouveau: - Fix fence state in nouveau_fence_emit() ivpu: - replace strncpy -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmT6ihUACgkQDHTzWXnE hr6Vqg/+OGfbxx0qev5C93bLYpg8d4zbply0zTed2hU48zczARkOyDH+h2uYM4tP rmB6mK/0KRy3H8vkngKduR/IF6QBwzOnLDpS+C/TrHYQYqMDwvs3qEDVYqXh3V5H GPIFuu9sb2Nb/o9Fid70pvNACbgGsAUgqMUhQ/is3NjzOR8S/qjdBQ7wSdoUoOqx okTdlwuMK5SEYrihSyTZvhNcwpR/1L8JuxOUXXXUSQ0tRXBer/ZNF2lcEyYmQ0zs bZHKM4dNbdew/EhygbH6LVB5RjFaT5pGw08Xm8zJry+q5tXQV/NIXPQHL3vWqQoX i2QLbvGX/Uu8LJg9YNdsa1kPwNKADAxF64cW38Llv8ybsPHyva/I255j689/TvSG Se7HkTooURKS6GWFHPOkyMNC0Y+Fb/7WG5zUPSDVk9stqJz9pzx48okTsCWeuovD cBOssp8If1QsTyPvDq5A47l5z1oO3J1rdJ9fL0GnpOuXulPhgJGIoUSkftyI6lbw rhUoRd7w6VcgOsA9WIkAf+/325em0Y0AKZBgQnr2jfF56IE4iLa8yFuJNeReZ9oy W9yf14AB0orVm9+P4+PqATXBh+PdCTD1CPcB0MyK1SZAjh836Tc0HPKvJAUU6jp+ 8aw3BKXcaLjIP/dCyhSddXCnuuTPWreff5isktwgEXUtNFv9jx0= =fPeN -----END PGP SIGNATURE----- Merge tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Regular rounds of rc1 fixes, a large bunch for amdgpu since it's three weeks in one go, one i915, one nouveau and one ivpu. I think there might be a few more fixes in misc that I haven't pulled in yet, but we should get them all for rc2. amdgpu: - Display replay fixes - Fixes for headless boards - Fix documentation breakage - RAS fixes - Handle newer IP discovery tables - SMU 13.0.6 fixes - SR-IOV fixes - Display vstartup fixes - NBIO 7.9 fixes - Display scaling mode fixes - Debugfs power reporting fix - GC 9.4.3 fixes - Dirty framebuffer fixes for fbcon - eDP fixes - DCN 3.1.5 fix - Display ODM fixes - GPU core dump fix - Re-enable zops property now that IGT test is fixed - Fix possible UAF in CS code - Cursor degamma fix amdkfd: - HMM fixes - Interrupt masking fix - GFX11 MQD fixes i915: - Mark requests for GuC virtual engines to avoid use-after-free nouveau: - Fix fence state in nouveau_fence_emit() ivpu: - replace strncpy" * tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm: (51 commits) drm/amdgpu: Restrict bootloader wait to SMUv13.0.6 drm/amd/display: prevent potential division by zero errors drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1 Revert "drm/amd/display: Remove v_startup workaround for dcn3+" drm/amdgpu: fix amdgpu_cs_p1_user_fence Revert "Revert "drm/amd/display: Implement zpos property"" drm/amdkfd: Add missing gfx11 MQD manager callbacks drm/amdgpu: Free ras cmd input buffer properly drm/amdgpu: Hide xcp partition sysfs under SRIOV drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting drm/amdkfd: use mask to get v9 interrupt sq data bits correctly drm/amdgpu: Allocate coredump memory in a nonblocking way drm/amdgpu: Support query ecc cap for aqua_vanjaram drm/amdgpu: Add umc_info v4_0 structure drm/amd/display: always switch off ODM before committing more streams drm/amd/display: Remove wait while locked drm/amd/display: update blank state on ODM changes drm/amd/display: Add smu write msg id fail retry process drm/amdgpu: Add SMU v13.0.6 default reset methods ...	2023-09-07 19:47:04 -07:00
Jay Cornwall	e9dca969b2	drm/amdkfd: Add missing gfx11 MQD manager callbacks mqd_stride function was introduced in commit `2f77b9a242` ("drm/amdkfd: Update MQD management on multi XCC setup") but not assigned for gfx11. Fixes a NULL dereference in debugfs. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.5.x	2023-08-31 18:14:28 -04:00
Alex Sierra	a1fe9e9f73	drm/amdkfd: use mask to get v9 interrupt sq data bits correctly Interrupt sq data bits were not taken properly from contextid0 and contextid1. Use macro KFD_CONTEXT_ID_GET_SQ_INT_DATA instead. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:10:28 -04:00
Alex Sierra	3aca8cca60	drm/amdkfd: retry after EBUSY is returned from hmm_ranges_get_pages if hmm_range_get_pages returns EBUSY error during svm_range_validate_and_map, within the context of a page fault interrupt. This should retry through svm_range_restore_pages callback. Therefore we treat this as EAGAIN error instead, and defer it to restore pages fallback. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:54:41 -04:00
Linus Torvalds	461f35f014	drm for 6.6-rc1 core: - fix gfp flags in drmm_kmalloc gpuva: - add new generic GPU VA manager (for nouveau initially) syncobj: - add new DRM_IOCTL_SYNCOBJ_EVENTFD ioctl dma-buf: - acquire resv lock for mmap() in exporters - support dma-buf self import automatically - docs fixes backlight: - fix fbdev interactions atomic: - improve logging prime: - remove struct gem_prim_mmap plus driver updates gem: - drm_exec: add locking over multiple GEM objects - fix lockdep checking fbdev: - make fbdev userspace interfaces optional - use linux device instead of fbdev device - use deferred i/o helper macros in various drivers - Make FB core selectable without drivers - Remove obsolete flags FBINFO_DEFAULT and FBINFO_FLAG_DEFAULT - Add helper macros and Kconfig tokens for DMA-allocated framebuffer ttm: - support init_on_free - swapout fixes panel: - panel-edp: Support AUO B116XAB01.4 - Support Visionox R66451 plus DT bindings - ld9040: Backlight support, magic improved, Kconfig fix - Convert to of_device_get_match_data() - Fix Kconfig dependencies - simple: Set bpc value to fix warning; Set connector type for AUO T215HVN01; Support Innolux G156HCE-L01 plus DT bindings - ili9881: Support TDO TL050HDV35 LCD panel plus DT bindings - startek: Support KD070FHFID015 MIPI-DSI panel plus DT bindings - sitronix-st7789v: Support Inanbo T28CP45TN89 plus DT bindings; Support EDT ET028013DMA plus DT bindings; Various cleanups - edp: Add timings for N140HCA-EAC - Allow panels and touchscreens to power sequence together - Fix Innolux G156HCE-L01 LVDS clock bridge: - debugfs for chains support - dw-hdmi: Improve support for YUV420 bus format CEC suspend/resume, update EDID on HDMI detect - dw-mipi-dsi: Fix enable/disable of DSI controller - lt9611uxc: Use MODULE_FIRMWARE() - ps8640: Remove broken EDID code - samsung-dsim: Fix command transfer - tc358764: Handle HS/VS polarity; Use BIT() macro; Various cleanups - adv7511: Fix low refresh rate - anx7625: Switch to macros instead of hardcoded values locking fixes - tc358767: fix hardware delays - sitronix-st7789v: Support panel orientation; Support rotation property; Add support for Jasonic JT240MHQS-HWT-EK-E3 plus DT bindings amdgpu: - SDMA 6.1.0 support - HDP 6.1 support - SMUIO 14.0 support - PSP 14.0 support - IH 6.1 support - Lots of checkpatch cleanups - GFX 9.4.3 updates - Add USB PD and IFWI flashing documentation - GPUVM updates - RAS fixes - DRR fixes - FAMS fixes - Virtual display fixes - Soft IH fixes - SMU13 fixes - Rework PSP firmware loading for other IPs - Kernel doc fixes - DCN 3.0.1 fixes - LTTPR fixes - DP MST fixes - DCN 3.1.6 fixes - SMU 13.x fixes - PSP 13.x fixes - SubVP fixes - GC 9.4.3 fixes - Display bandwidth calculation fixes - VCN4 secure submission fixes - Allow building DC on RISC-V - Add visible FB info to bo_print_info - HBR3 fixes - GFX9 MCBP fix - GMC10 vmhub index fix - GMC11 vmhub index fix - Create a new doorbell manager - SR-IOV fixes - initial freesync panel replay support - revert zpos properly until igt regression is fixeed - use TTM to manage doorbell BAR - Expose both current and average power via hwmon if supported amdkfd: - Cleanup CRIU dma-buf handling - Use KIQ to unmap HIQ - GFX 9.4.3 debugger updates - GFX 9.4.2 debugger fixes - Enable cooperative groups fof gfx11 - SVM fixes - Convert older APUs to use dGPU path like newer APUs - Drop IOMMUv2 path as it is no longer used - TBA fix for aldebaran i915: - ICL+ DSI modeset sequence - HDCP improvements - MTL display fixes and cleanups - HSW/BDW PSR1 restored - Init DDI ports in VBT order - General display refactors - Start using plane scale factor for relative data rate - Use shmem for dpt objects - Expose RPS thresholds in sysfs - Apply GuC SLPC min frequency softlimit correctly - Extend Wa_14015795083 to TGL, RKL, DG1 and ADL - Fix a VMA UAF for multi-gt platform - Do not use stolen on MTL due to HW bug - Check HuC and GuC version compatibility on MTL - avoid infinite GPU waits due to premature release of request memory - Fixes and updates for GSC memory allocation - Display SDVO fixes - Take stolen handling out of FBC code - Make i915_coherent_map_type GT-centric - Simplify shmem_create_from_object map_type msm: - SM6125 MDSS support - DPU: SM6125 DPU support - DSI: runtime PM support, burst mode support - DSI PHY: SM6125 support in 14nm DSI PHY driver - GPU: prepare for a7xx - fix a690 firmware - disable relocs on a6xx and newer radeon: - Lots of checkpatch cleanups ast: - improve device-model detection - Represent BMV as virtual connector - Report DP connection status nouveau: - add new exec/bind interface to support Vulkan - document some getparam ioctls - improve VRAM detection - various fixes/cleanups - workraound DPCD issues ivpu: - MMU updates - debugfs support - Support vpu4 virtio: - add sync object support atmel-hlcdc: - Support inverted pixclock polarity etnaviv: - runtime PM cleanups - hang handling fixes exynos: - use fbdev DMA helpers - fix possible NULL ptr dereference komeda: - always attach encoder omapdrm: - use fbdev DMA helpers ingenic: - kconfig regmap fixes loongson: - support display controller mediatek: - Small mtk-dpi cleanups - DisplayPort: support eDP and aux-bus - Fix coverity issues - Fix potential memory leak if vmap() fail mgag200: - minor fixes mxsfb: - support disabling overlay planes panfrost: - fix sync in IRQ handling ssd130x: - Support per-controller default resolution plus DT bindings - Reduce memory-allocation overhead - Improve intermediate buffer size computation - Fix allocation of temporary buffers - Fix pitch computation - Fix shadow plane allocation tegra: - use fbdev DMA helpers - Convert to devm_platform_ioremap_resource() - support bridge/connector - enable PM tidss: - Support TI AM625 plus DT bindings - Implement new connector model plus driver updates vkms: - improve write back support - docs fixes - support gamma LUT zynqmp-dpsub: - misc fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmTukSYACgkQDHTzWXnE hr6vnQ/+J7vBVkBr8JsaEV/twcZwzbNdpivsIagd8U83GQB50nDReVXbNx+Wo0/C WiGlrC6Sw3NVOGbkigd5IQ7fb5C/7RnBmzMi/iS7Qnk2uEqLqgV00VxfGwdm6wgr 0gNB8zuu2xYphHz2K8LzwnmeQRdN+YUQpUa2wNzLO88IEkTvq5vx2rJEn5p9/3hp OxbbPBzpDRRPlkNFfVQCN8todbKdsPc4am81Eqgv7BJf21RFgQodPGW5koCYuv0w 3m+PJh1KkfYAL974EsLr/pkY7yhhiZ6SlFLX8ssg4FyZl/Vthmc9bl14jRq/pqt4 GBp8yrPq1XjrwXR8wv3MiwNEdANQ+KD9IoGlzLxqVgmEFRE+g4VzZZXeC3AIrTVP FPg4iLUrDrmj9RpJmbVqhq9X2jZs+EtRAFkJPrPbq2fItAD2a2dW4X3ISSnnTqDI 6O2dVwuLCU6OfWnvN4bPW9p8CqRgR8Itqv1SI8qXooDy307YZu1eTUf5JAVwG/SW xbDEFVFlMPyFLm+KN5dv1csJKK21vWi9gLg8phK8mTWYWnqMEtJqbxbRzmdBEFmE pXKVu01P6ZqgBbaETpCljlOaEDdJnvO4W+o70MgBtpR2IWFMbMNO+iS0EmLZ6Vgj 9zYZctpL+dMuHV0Of1GMkHFRHTMYEzW4tuctLIQfG13y4WzyczY= =CwV9 -----END PGP SIGNATURE----- Merge tag 'drm-next-2023-08-30' of git://anongit.freedesktop.org/drm/drm Pull drm updates from Dave Airlie: "The drm core grew a new generic gpu virtual address manager, and new execution locking helpers. These are used by nouveau now to provide uAPI support for the userspace Vulkan driver. AMD had a bunch of new IP core support, loads of refactoring around fbdev, but mostly just the usual amount of stuff across the board. core: - fix gfp flags in drmm_kmalloc gpuva: - add new generic GPU VA manager (for nouveau initially) syncobj: - add new DRM_IOCTL_SYNCOBJ_EVENTFD ioctl dma-buf: - acquire resv lock for mmap() in exporters - support dma-buf self import automatically - docs fixes backlight: - fix fbdev interactions atomic: - improve logging prime: - remove struct gem_prim_mmap plus driver updates gem: - drm_exec: add locking over multiple GEM objects - fix lockdep checking fbdev: - make fbdev userspace interfaces optional - use linux device instead of fbdev device - use deferred i/o helper macros in various drivers - Make FB core selectable without drivers - Remove obsolete flags FBINFO_DEFAULT and FBINFO_FLAG_DEFAULT - Add helper macros and Kconfig tokens for DMA-allocated framebuffer ttm: - support init_on_free - swapout fixes panel: - panel-edp: Support AUO B116XAB01.4 - Support Visionox R66451 plus DT bindings - ld9040: - Backlight support - magic improved - Kconfig fix - Convert to of_device_get_match_data() - Fix Kconfig dependencies - simple: - Set bpc value to fix warning - Set connector type for AUO T215HVN01 - Support Innolux G156HCE-L01 plus DT bindings - ili9881: Support TDO TL050HDV35 LCD panel plus DT bindings - startek: Support KD070FHFID015 MIPI-DSI panel plus DT bindings - sitronix-st7789v: - Support Inanbo T28CP45TN89 plus DT bindings - Support EDT ET028013DMA plus DT bindings - Various cleanups - edp: Add timings for N140HCA-EAC - Allow panels and touchscreens to power sequence together - Fix Innolux G156HCE-L01 LVDS clock bridge: - debugfs for chains support - dw-hdmi: - Improve support for YUV420 bus format - CEC suspend/resume - update EDID on HDMI detect - dw-mipi-dsi: Fix enable/disable of DSI controller - lt9611uxc: Use MODULE_FIRMWARE() - ps8640: Remove broken EDID code - samsung-dsim: Fix command transfer - tc358764: - Handle HS/VS polarity - Use BIT() macro - Various cleanups - adv7511: Fix low refresh rate - anx7625: - Switch to macros instead of hardcoded values - locking fixes - tc358767: fix hardware delays - sitronix-st7789v: - Support panel orientation - Support rotation property - Add support for Jasonic JT240MHQS-HWT-EK-E3 plus DT bindings amdgpu: - SDMA 6.1.0 support - HDP 6.1 support - SMUIO 14.0 support - PSP 14.0 support - IH 6.1 support - Lots of checkpatch cleanups - GFX 9.4.3 updates - Add USB PD and IFWI flashing documentation - GPUVM updates - RAS fixes - DRR fixes - FAMS fixes - Virtual display fixes - Soft IH fixes - SMU13 fixes - Rework PSP firmware loading for other IPs - Kernel doc fixes - DCN 3.0.1 fixes - LTTPR fixes - DP MST fixes - DCN 3.1.6 fixes - SMU 13.x fixes - PSP 13.x fixes - SubVP fixes - GC 9.4.3 fixes - Display bandwidth calculation fixes - VCN4 secure submission fixes - Allow building DC on RISC-V - Add visible FB info to bo_print_info - HBR3 fixes - GFX9 MCBP fix - GMC10 vmhub index fix - GMC11 vmhub index fix - Create a new doorbell manager - SR-IOV fixes - initial freesync panel replay support - revert zpos properly until igt regression is fixeed - use TTM to manage doorbell BAR - Expose both current and average power via hwmon if supported amdkfd: - Cleanup CRIU dma-buf handling - Use KIQ to unmap HIQ - GFX 9.4.3 debugger updates - GFX 9.4.2 debugger fixes - Enable cooperative groups fof gfx11 - SVM fixes - Convert older APUs to use dGPU path like newer APUs - Drop IOMMUv2 path as it is no longer used - TBA fix for aldebaran i915: - ICL+ DSI modeset sequence - HDCP improvements - MTL display fixes and cleanups - HSW/BDW PSR1 restored - Init DDI ports in VBT order - General display refactors - Start using plane scale factor for relative data rate - Use shmem for dpt objects - Expose RPS thresholds in sysfs - Apply GuC SLPC min frequency softlimit correctly - Extend Wa_14015795083 to TGL, RKL, DG1 and ADL - Fix a VMA UAF for multi-gt platform - Do not use stolen on MTL due to HW bug - Check HuC and GuC version compatibility on MTL - avoid infinite GPU waits due to premature release of request memory - Fixes and updates for GSC memory allocation - Display SDVO fixes - Take stolen handling out of FBC code - Make i915_coherent_map_type GT-centric - Simplify shmem_create_from_object map_type msm: - SM6125 MDSS support - DPU: SM6125 DPU support - DSI: runtime PM support, burst mode support - DSI PHY: SM6125 support in 14nm DSI PHY driver - GPU: prepare for a7xx - fix a690 firmware - disable relocs on a6xx and newer radeon: - Lots of checkpatch cleanups ast: - improve device-model detection - Represent BMV as virtual connector - Report DP connection status nouveau: - add new exec/bind interface to support Vulkan - document some getparam ioctls - improve VRAM detection - various fixes/cleanups - workraound DPCD issues ivpu: - MMU updates - debugfs support - Support vpu4 virtio: - add sync object support atmel-hlcdc: - Support inverted pixclock polarity etnaviv: - runtime PM cleanups - hang handling fixes exynos: - use fbdev DMA helpers - fix possible NULL ptr dereference komeda: - always attach encoder omapdrm: - use fbdev DMA helpers ingenic: - kconfig regmap fixes loongson: - support display controller mediatek: - Small mtk-dpi cleanups - DisplayPort: support eDP and aux-bus - Fix coverity issues - Fix potential memory leak if vmap() fail mgag200: - minor fixes mxsfb: - support disabling overlay planes panfrost: - fix sync in IRQ handling ssd130x: - Support per-controller default resolution plus DT bindings - Reduce memory-allocation overhead - Improve intermediate buffer size computation - Fix allocation of temporary buffers - Fix pitch computation - Fix shadow plane allocation tegra: - use fbdev DMA helpers - Convert to devm_platform_ioremap_resource() - support bridge/connector - enable PM tidss: - Support TI AM625 plus DT bindings - Implement new connector model plus driver updates vkms: - improve write back support - docs fixes - support gamma LUT zynqmp-dpsub: - misc fixes" * tag 'drm-next-2023-08-30' of git://anongit.freedesktop.org/drm/drm: (1327 commits) drm/gpuva_mgr: remove unused prev pointer in __drm_gpuva_sm_map() drm/tests/drm_kunit_helpers: Place correct function name in the comment header drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly drm/nouveau: uvmm: fix unset region pointer on remap drm/nouveau: sched: avoid job races between entities drm/i915: Fix HPD polling, reenabling the output poll work as needed drm: Add an HPD poll helper to reschedule the poll work drm/i915: Fix TLB-Invalidation seqno store drm/ttm/tests: Fix type conversion in ttm_pool_test drm/msm/a6xx: Bail out early if setting GPU OOB fails drm/msm/a6xx: Move LLC accessors to the common header drm/msm/a6xx: Introduce a6xx_llc_read drm/ttm/tests: Require MMU when testing drm/panel: simple: Fix Innolux G156HCE-L01 LVDS clock Revert "Revert "drm/amdgpu/display: change pipe policy for DCN 2.0"" drm/amdgpu: Add memory vendor information drm/amd: flush any delayed gfxoff on suspend entry drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix drm/amdgpu: Remove gfxoff check in GFX v9.4.3 drm/amd/pm: Update pci link speed for smu v13.0.6 ...	2023-08-30 13:34:34 -07:00
Asad Kamal	80c74918aa	drm/amdkfd: Replace pr_err with dev_err Replace pr_err with dev_err to show the bus-id of failing device with kfd queue errors Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:51:17 -04:00
Jay Cornwall	38498908c5	drm/amdkfd: Add missing gfx11 MQD manager callbacks mqd_stride function was introduced in commit `2f77b9a242` ("drm/amdkfd: Update MQD management on multi XCC setup") but not assigned for gfx11. Fixes a NULL dereference in debugfs. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:51:16 -04:00
Harish Kasiviswanathan	37fb879107	drm/amdkfd: ratelimited SQ interrupt messages No functional change. Use ratelimited version of pr_ to avoid overflowing of dmesg buffer Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:51:16 -04:00
Alex Sierra	da3a815ccd	drm/amdkfd: use mask to get v9 interrupt sq data bits correctly Interrupt sq data bits were not taken properly from contextid0 and contextid1. Use macro KFD_CONTEXT_ID_GET_SQ_INT_DATA instead. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:51:16 -04:00
Lang Yu	afac198cd1	drm/amdkfd: add KFD support for GC 11.5.0 Enable KFD for GC 11.5.0. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:00:51 -04:00
Alex Sierra	ebac9414a5	drm/amdkfd: retry after EBUSY is returned from hmm_ranges_get_pages if hmm_range_get_pages returns EBUSY error during svm_range_validate_and_map, within the context of a page fault interrupt. This should retry through svm_range_restore_pages callback. Therefore we treat this as EAGAIN error instead, and defer it to restore pages fallback. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 14:58:07 -04:00
Linus Torvalds	b96a3e9142	- Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6 Nfio7afS1ncD+hPYT8947UnLxTgn+ww= =Afws -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ...	2023-08-29 14:25:26 -07:00
Kefeng Wang	f7992bfaf3	drm/amdkfd: use vma_is_initial_stack() and vma_is_initial_heap() Use the helpers to simplify code. Link: https://lkml.kernel.org/r/20230728050043.59880-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Christian Göttsche <cgzones@googlemail.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-08-21 13:37:31 -07:00
James Zhu	b25fdc048c	drm/amdgpu: skip xcp drm device allocation when out of drm resource Return 0 when drm device alloc failed with -ENOSPC in order to allow amdgpu drive loading. But the xcp without drm device node assigned won't be visiable in user space. This helps amdgpu driver loading on system which has more than 64 nodes, the current limitation. The proposal to add more drm nodes is discussed in public, which will support up to 2^20 nodes totally. kernel drm: https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/ libdrm: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305 Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-16 15:46:39 -04:00
James Zhu	400a39f1ec	drm/amdgpu: skip xcp drm device allocation when out of drm resource Return 0 when drm device alloc failed with -ENOSPC in order to allow amdgpu drive loading. But the xcp without drm device node assigned won't be visiable in user space. This helps amdgpu driver loading on system which has more than 64 nodes, the current limitation. The proposal to add more drm nodes is discussed in public, which will support up to 2^20 nodes totally. kernel drm: https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/ libdrm: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305 Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-16 11:34:11 -04:00
Ruan Jinjie	275e37221b	drm/amdkfd: Remove unnecessary NULL values The NULL initialization of the pointers assigned by kzalloc() first is not necessary, because if the kzalloc() failed, the pointers will be assigned NULL, otherwise it works as usual. so remove it. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-15 18:08:27 -04:00
Jonathan Kim	8b4c350c4d	drm/amdkfd: fix double assign skip process context clear Remove redundant assignment when skipping process ctx clear. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-15 18:07:42 -04:00
Atul Raut	bd6040b0ea	drm/amdkfd: Use memdup_user() rather than duplicating its implementation To prevent its redundant implementation and streamline code, use memdup_user. This fixes warnings reported by Coccinelle: ./drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:2811:13-20: WARNING opportunity for memdup_user Signed-off-by: Atul Raut <rauji.raut@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-15 18:07:41 -04:00
Arnd Bergmann	475968fe4a	drm/amdkfd: fix build failure without CONFIG_DYNAMIC_DEBUG When CONFIG_DYNAMIC_DEBUG is disabled altogether, calling _dynamic_func_call_no_desc() does not work: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function 'svm_range_set_attr': drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:52:9: error: implicit declaration of function '_dynamic_func_call_no_desc' [-Werror=implicit-function-declaration] 52 \| _dynamic_func_call_no_desc("svm_range_dump", svm_range_debug_dump, svms) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:3564:9: note: in expansion of macro 'dynamic_svm_range_dump' 3564 \| dynamic_svm_range_dump(svms); \| ^~~~~~~~~~~~~~~~~~~~~~ Add a compile-time conditional in addition to the runtime check. Fixes: `8923137dbe` ("drm/amdkfd: avoid svm dump when dynamic debug disabled") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-15 18:07:28 -04:00
Jay Cornwall	a34cab4409	drm/amdkfd: Add missing tba_hi programming on aldebaran Previously asymptomatic because high 32 bits were zero. Fixes: `96c211f1f9` ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole") Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-15 17:40:55 -04:00
Alex Deucher	80e28aaf93	drm/amdkfd: rename device_queue_manager_init_v10_navi10() Drop the navi10 in the name for consistency with other families. All gfx10 parts use the same implementation. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-11 14:47:39 -04:00
Alex Deucher	c99a2e7ae2	drm/amdkfd: drop IOMMUv2 support Now that we use the dGPU path for all APUs, drop the IOMMUv2 support. v2: drop the now unused queue manager functions for gfx7/8 APUs Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-11 14:47:25 -04:00
Alex Deucher	2b4adeb34f	drm/amdkfd: disable IOMMUv2 support for Raven Use the dGPU path instead. There were a lot of platform issues with IOMMU in general on these chips due to windows not enabling IOMMU at the time. The dGPU path has been used for a long time with newer APUs and works fine. This also paves the way to simplify the driver significantly. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-11 14:47:14 -04:00
Alex Deucher	99c1501996	drm/amdkfd: disable IOMMUv2 support for KV/CZ Use the dGPU path instead. There were a lot of platform issues with IOMMU in general on these chips due to windows not enabling IOMMU at the time. The dGPU path has been used for a long time with newer APUs and works fine. This also paves the way to simplify the driver significantly. v2: use the dGPU queue manager functions Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-11 14:47:03 -04:00
Alex Deucher	95979df25b	drm/amdkfd: ignore crat by default We are dropping the IOMMUv2 path, so no need to enable this. It's often buggy on consumer platforms anyway. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-11 14:46:18 -04:00
Alex Deucher	091ae5473f	drm/amdkfd: disable IOMMUv2 support for Raven Use the dGPU path instead. There were a lot of platform issues with IOMMU in general on these chips due to windows not enabling IOMMU at the time. The dGPU path has been used for a long time with newer APUs and works fine. This also paves the way to simplify the driver significantly. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-09 10:58:40 -04:00
Alex Deucher	616f92d188	drm/amdkfd: disable IOMMUv2 support for KV/CZ Use the dGPU path instead. There were a lot of platform issues with IOMMU in general on these chips due to windows not enabling IOMMU at the time. The dGPU path has been used for a long time with newer APUs and works fine. This also paves the way to simplify the driver significantly. v2: use the dGPU queue manager functions Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-09 10:58:30 -04:00
Alex Deucher	a6dea2d64f	drm/amdkfd: ignore crat by default We are dropping the IOMMUv2 path, so no need to enable this. It's often buggy on consumer platforms anyway. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-09 10:58:20 -04:00
Shashank Sharma	8da0d694a3	drm/amdgpu: remove unused functions and variables This patch removes some variables and functions from KFD doorbell handling code, which are no more required since doorbell manager is handling doorbell calculations. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-07 17:14:07 -04:00
Shashank Sharma	2105a15a20	drm/amdgpu: use doorbell mgr for kfd process doorbells This patch: - adds a doorbell object in kfd pdd structure. - allocates doorbells for a process while creating its queue. - frees the doorbells with pdd destroy. - moves doorbell bitmap init function to kfd_doorbell.c PS: This patch ensures that we don't break the existing KFD functionality, but now KFD userspace library should also create doorbell pages as AMDGPU GEM objects using libdrm functions in userspace. The reference code for the same is available with AMDGPU Usermode queue libdrm MR. Once this is done, we will not need to create process doorbells in kernel. V2: - Do not use doorbell wrapper API, use amdgpu_bo_create_kernel instead (Alex). - Do not use custom doorbell structure, instead use separate variables for bo and doorbell_bitmap (Alex) V3: - Do not allocate doorbell page with PDD, delay doorbell process page allocation until really needed (Felix) Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felilx.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-07 17:14:07 -04:00
Shashank Sharma	c318666510	drm/amdgpu: use doorbell mgr for kfd kernel doorbells This patch: - adds a doorbell bo in kfd device structure. - creates doorbell page for kfd kernel usages. - updates the get_kernel_doorbell and free_kernel_doorbell functions accordingly V2: Do not use wrapper API, use direct amdgpu_create_kernel(Alex) V3: - Move single variable declaration below (Christian) - Add a to-do item to reuse the KGD kernel level doorbells for KFD for non-MES cases, instead of reserving one page (Felix) Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-07 17:14:07 -04:00
Jay Cornwall	05c899eacc	drm/amdkfd: Sign-extend TMA address in trap handler SMEM instructions can reach addresses above 47 bits but require bit 47 to be sign-extended through bits [63:48]. This allows the TMA to be relocated in a following patch. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-07 17:14:06 -04:00
Jay Cornwall	96c211f1f9	drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole The TBA and TMA, along with an unused IB allocation, reside at low addresses in the VM address space. A stray VM fault which hits these pages must be serviced by making their page table entries invalid. The scheduler depends upon these pages being resident and fails, preventing a debugger from inspecting the failure state. By relocating these pages above 47 bits in the VM address space they can only be reached when bits [63:48] are set to 1. This makes it much less likely for a misbehaving program to generate accesses to them. The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL access with a small offset. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-07 17:14:06 -04:00
Jay Cornwall	631ddc3553	drm/amdkfd: Sync trap handler binaries with source Some changes have been lost during rebases. Rebuild sources. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-07 17:14:06 -04:00
Alex Sierra	ab3400eb94	drm/amdkfd: avoid unmap dma address when svm_ranges are split DMA address reference within svm_ranges should be unmapped only after the memory has been released from the system. In case of range splitting, the DMA address information should be copied to the corresponding range after this has split. But leaving dma mapping intact. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-07 16:36:44 -04:00
Daniel Vetter	3d00c59d14	amd-drm-next-6.6-2023-07-28: amdgpu: - Lots of checkpatch cleanups - GFX 9.4.3 updates - Add USB PD and IFWI flashing documentation - GPUVM updates - RAS fixes - DRR fixes - FAMS fixes - Virtual display fixes - Soft IH fixes - SMU13 fixes - Rework PSP firmware loading for other IPs - Kernel doc fixes - DCN 3.0.1 fixes - LTTPR fixes - DP MST fixes - DCN 3.1.6 fixes - SubVP fixes - Display bandwidth calculation fixes - VCN4 secure submission fixes - Allow building DC on RISC-V - Add visible FB info to bo_print_info - HBR3 fixes - Add PSP 14.0 support - GFX9 MCBP fix - GMC10 vmhub index fix - GMC11 vmhub index fix - Create a new doorbell manager - SR-IOV fixes amdkfd: - Cleanup CRIU dma-buf handling - Use KIQ to unmap HIQ - GFX 9.4.3 debugger updates - GFX 9.4.2 debugger fixes - Enable cooperative groups fof gfx11 - SVM fixes radeon: - Lots of checkpatch cleanups -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZMQ0vAAKCRC93/aFa7yZ 2EOOAQCrsNf1IEynXVj0gVYOWFDpBCdaDkw+gXR73nOlwBeZzgD8DAoismXYDY95 pkKlx/HL5O8qyZ25Lc9ZlgsJnTpnpw4= =c/Jk -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.6-2023-07-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.6-2023-07-28: amdgpu: - Lots of checkpatch cleanups - GFX 9.4.3 updates - Add USB PD and IFWI flashing documentation - GPUVM updates - RAS fixes - DRR fixes - FAMS fixes - Virtual display fixes - Soft IH fixes - SMU13 fixes - Rework PSP firmware loading for other IPs - Kernel doc fixes - DCN 3.0.1 fixes - LTTPR fixes - DP MST fixes - DCN 3.1.6 fixes - SubVP fixes - Display bandwidth calculation fixes - VCN4 secure submission fixes - Allow building DC on RISC-V - Add visible FB info to bo_print_info - HBR3 fixes - Add PSP 14.0 support - GFX9 MCBP fix - GMC10 vmhub index fix - GMC11 vmhub index fix - Create a new doorbell manager - SR-IOV fixes amdkfd: - Cleanup CRIU dma-buf handling - Use KIQ to unmap HIQ - GFX 9.4.3 debugger updates - GFX 9.4.2 debugger fixes - Enable cooperative groups fof gfx11 - SVM fixes radeon: - Lots of checkpatch cleanups Merge conflicts: - drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c The switch to drm eu helpers in `8a206685d3` ("drm/amdgpu: use drm_exec for GEM and CSA handling v2") clashed with the cosmetic cleanups from `30953c4d00` ("drm/amdgpu: Fix style issues in amdgpu_gem.c"). I kept the former since the cleanup up code is gone. - drivers/gpu/drm/amd/amdgpu/atom.c. `adf64e2142` ("drm/amd: Avoid reading the VBIOS part number twice") removed code that `992b8fe106` ("drm/radeon: Replace all non-returning strlcpy with strscpy") polished. From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230728214228.8102-1-alexander.deucher@amd.com [sima: some merge conflict wrangling as noted] Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2023-08-04 11:10:18 +02:00
Jonathan Kim	fc7f1d9697	drm/amdkfd: fix and enable ttmp setup for gfx11 The MES cached process context must be cleared on adding any queue for the first time. For proper debug support, the MES will clear it's cached process context on the first call to SET_SHADER_DEBUGGER. This allows TTMPs to be pesistently enabled in a safe manner. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Eric Huang <jinhuieric@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-27 14:59:29 -04:00
Ramesh Errabolu	6d67b681f9	drm/amdgpu: Checkpoint and Restore VRAM BOs without VA Extend checkpoint logic to allow inclusion of VRAM BOs that do not have a VA attached Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-27 14:48:07 -04:00
Jonathan Kim	602816c3ee	drm/amdkfd: fix trap handling work around for debugging Update the list of devices that require the cwsr trap handling workaround for debugging use cases. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Ruili Ji <ruili.ji@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-25 16:13:58 -04:00
Alex Sierra	8923137dbe	drm/amdkfd: avoid svm dump when dynamic debug disabled Set dynamic_svm_range_dump macro to avoid iterating over SVM lists from svm_range_debug_dump when dynamic debug is disabled. Otherwise, it could drop performance, specially with big number of SVM ranges. Make sure both svm_range_set_attr and svm_range_debug_dump functions are dynamically enabled to print svm_range_debug_dump debug traces. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Tested-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-25 13:44:05 -04:00
Jonathan Kim	7a1c5c6753	drm/amdkfd: enable cooperative groups for gfx11 MES can concurrently schedule queues on the device that require exclusive device access if marked exclusively_scheduled without the requirement of GWS. Similar to the F32 HWS, MES will manage quality of service for these queues. Use this for cooperative groups since cooperative groups are device occupancy limited. Since some GFX11 devices can only be debugged with partial CUs, do not allow the debugging of cooperative groups on these devices as the CU occupancy limit will change on attach. In addition, zero initialize the MES add queue submission vector for MES initialization tests as we do not want these to be cooperative dispatches. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-25 13:35:43 -04:00
Jonathan Kim	cef600e1fd	drm/amdkfd: fix trap handling work around for debugging Update the list of devices that require the cwsr trap handling workaround for debugging use cases. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Ruili Ji <ruili.ji@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-21 16:52:25 -04:00
Daniel Vetter	6c7f27441d	drm-misc-next for v6.6: UAPI Changes: * fbdev: * Make fbdev userspace interfaces optional; only leaves the framebuffer console active * prime: * Support dma-buf self-import for all drivers automatically: improves support for many userspace compositors Cross-subsystem Changes: * backlight: * Fix interaction with fbdev in several drivers * base: Convert struct platform.remove to return void; part of a larger, tree-wide effort * dma-buf: Acquire reservation lock for mmap() in exporters; part of an on-going effort to simplify locking around dma-bufs * fbdev: * Use Linux device instead of fbdev device in many places * Use deferred-I/O helper macros in various drivers * i2c: Convert struct i2c from .probe_new to .probe; part of a larger, tree-wide effort * video: * Avoid including <linux/screen_info.h> Core Changes: * atomic: * Improve logging * prime: * Remove struct drm_driver.gem_prime_mmap plus driver updates: all drivers now implement this callback with drm_gem_prime_mmap() * gem: * Support execution contexts: provides locking over multiple GEM objects * ttm: * Support init_on_free * Swapout fixes Driver Changes: * accel: * ivpu: MMU updates; Support debugfs * ast: * Improve device-model detection * Cleanups * bridge: * dw-hdmi: Improve support for YUV420 bus format * dw-mipi-dsi: Fix enable/disable of DSI controller * lt9611uxc: Use MODULE_FIRMWARE() * ps8640: Remove broken EDID code * samsung-dsim: Fix command transfer * tc358764: Handle HS/VS polarity; Use BIT() macro; Various cleanups * Cleanups * ingenic: * Kconfig REGMAP fixes * loongson: * Support display controller * mgag200: * Minor fixes * mxsfb: * Support disabling overlay planes * nouveau: * Improve VRAM detection * Various fixes and cleanups * panel: * panel-edp: Support AUO B116XAB01.4 * Support Visionox R66451 plus DT bindings * Cleanups * ssd130x: * Support per-controller default resolution plus DT bindings * Reduce memory-allocation overhead * Cleanups * tidss: * Support TI AM625 plus DT bindings * Implement new connector model plus driver updates * vkms * Improve write-back support * Documentation fixes -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmSvvRAACgkQaA3BHVML eiNpGQgAs8jq1XjN9t8jZsdgXnoCbkZyVUI2NO0HwoVwpRCLgbXp5AX5qq2oRciE TBhe4Fceh/ZsYqHTZQahnguxgRKM5JgXwbI4Z0iiOVcqasNbycaKAqipxJJ7kdo1 qPhGCbgQFVX7oIq2xjfXehh6O0SYX+R9r88X8dMJxMYv/pcLwOHG74kS040WOcQq uATgcnobOf/D8ZmlqvfKGAeTUoFo/RSR2Uhlauka58qgeUbicrTELZT2barY9d+k as6U5vv4wx2zMklTkjrlkMpAT1ZpbB9d3jGHwL27VEnjlfd3wV2bdH7Dzn9qZRf/ gn0ALg/b3u5yBWk/k7YBvijXyNcH6Q== =bBuG -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2023-07-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.6: UAPI Changes: * fbdev: * Make fbdev userspace interfaces optional; only leaves the framebuffer console active * prime: * Support dma-buf self-import for all drivers automatically: improves support for many userspace compositors Cross-subsystem Changes: * backlight: * Fix interaction with fbdev in several drivers * base: Convert struct platform.remove to return void; part of a larger, tree-wide effort * dma-buf: Acquire reservation lock for mmap() in exporters; part of an on-going effort to simplify locking around dma-bufs * fbdev: * Use Linux device instead of fbdev device in many places * Use deferred-I/O helper macros in various drivers * i2c: Convert struct i2c from .probe_new to .probe; part of a larger, tree-wide effort * video: * Avoid including <linux/screen_info.h> Core Changes: * atomic: * Improve logging * prime: * Remove struct drm_driver.gem_prime_mmap plus driver updates: all drivers now implement this callback with drm_gem_prime_mmap() * gem: * Support execution contexts: provides locking over multiple GEM objects * ttm: * Support init_on_free * Swapout fixes Driver Changes: * accel: * ivpu: MMU updates; Support debugfs * ast: * Improve device-model detection * Cleanups * bridge: * dw-hdmi: Improve support for YUV420 bus format * dw-mipi-dsi: Fix enable/disable of DSI controller * lt9611uxc: Use MODULE_FIRMWARE() * ps8640: Remove broken EDID code * samsung-dsim: Fix command transfer * tc358764: Handle HS/VS polarity; Use BIT() macro; Various cleanups * Cleanups * ingenic: * Kconfig REGMAP fixes * loongson: * Support display controller * mgag200: * Minor fixes * mxsfb: * Support disabling overlay planes * nouveau: * Improve VRAM detection * Various fixes and cleanups * panel: * panel-edp: Support AUO B116XAB01.4 * Support Visionox R66451 plus DT bindings * Cleanups * ssd130x: * Support per-controller default resolution plus DT bindings * Reduce memory-allocation overhead * Cleanups * tidss: * Support TI AM625 plus DT bindings * Implement new connector model plus driver updates * vkms * Improve write-back support * Documentation fixes Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230713090830.GA23281@linux-uq9g	2023-07-17 15:37:57 +02:00
Jonathan Kim	8e43632695	drm/amdkfd: report dispatch id always saved in ttmps after gc9.4.2 The feature to save the dispatch ID in trap temporaries 6 & 7 on context save is unconditionally enabled during MQD initialization. Now that TTMPs are always setup regardless of debug mode for GC 9.4.3, we should report that the dispatch ID is always available for debug/trap handling. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-12 12:22:52 -04:00
Mukul Joshi	1879e009a4	drm/amdkfd: Update CWSR grace period for GFX9.4.3 For GFX9.4.3, setup a reduced default CWSR grace period equal to 1000 cycles instead of 64000 cycles. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-12 11:12:09 -04:00
Jonathan Kim	41b8a08109	drm/amdkfd: add multi-process debugging support for GC v9.4.3 Similar to GC v9.4.2, GC v9.4.3 should use the 5-Dword extended MAP_PROCESS packet to support multi-process debugging. Update the mutli-process debug support list so that the KFD updates the runlist on debug mode setting and that it allocates enough GTT memory during KFD device initialization. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-12 11:12:09 -04:00
Jonathan Kim	7a93cc579c	drm/amdkfd: enable watch points globally for gfx943 Set watch points for all xcc instances on GFX943. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-12 11:12:08 -04:00
Jonathan Kim	567db9e070	drm/amdkfd: restore debugger additional info for gfx v9_4_3 The additional information that the KFD reports to the debugger was destroyed when the following commit was merged: "drm/amdkfd: convert switches to IP version checking" Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Amber Lin <amber.lin@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-12 11:10:19 -04:00
Eric Huang	036e348fdc	drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3 Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec, i.e. xcc instance. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-12 10:58:01 -04:00
Christian König	8abc1eb298	drm/amdkfd: switch over to using drm_exec v3 Avoids quite a bit of logic and kmalloc overhead. v2: fix multiple problems pointed out by Felix v3: two more nit picks from Felix fixed Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230711133122.3710-4-christian.koenig@amd.com	2023-07-12 14:14:26 +02:00
Philip Yang	8c45b31909	drm/amdkfd: Skip handle mapping SVM range with no GPU access If the SVM range has no GPU access nor access-in-place attribute, validate and map to GPU should skip the range. Add NULL pointer check if find_first_bit(ctx->bitmap, MAX_GPU_INSTANCE) returns MAX_GPU_INSTANCE as gpuidx if ctx->bitmap is empty. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-07 13:51:48 -04:00
Mukul Joshi	9041b53a59	drm/amdkfd: Use KIQ to unmap HIQ Currently, we unmap HIQ by directly writing to HQD registers. This doesn't work for GFX9.4.3. Instead, use KIQ to unmap HIQ, similar to how we use KIQ to map HIQ. Using KIQ to unmap HIQ works for all GFX series post GFXv9. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-07 13:51:48 -04:00
Ramesh Errabolu	95de7f26b5	drm/amdkfd: Access gpuvm_export_dmabuf() API to get Dmabuf Directly invoking the function amdgpu_gem_prime_export() from within KFD is not correct. By utilizing the KFD API to obtain Dmabuf, the implementation can prevent the creation of multiple instances of struct dma_buf. Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-07 13:51:47 -04:00
Mukul Joshi	d4300362a6	drm/amdkfd: Update interrupt handling for GFX 9.4.3 For GFX 9.4.3, interrupt handling needs to be updated for: - Interrupt cookie will have a NodeId field. Each KFD node needs to check the NodeId before processing the interrupt. - For CPX mode, there are additional checks of client ID needed to process the interrupt. - Add NodeId to the process drain interrupt. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-30 13:11:35 -04:00
Mukul Joshi	fc133acc43	drm/amdkfd: Enable GWS on GFX9.4.3 Enable GWS capable queue creation for forward progress gaurantee on GFX 9.4.3. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-30 13:06:49 -04:00
Alex Sierra	03d400e760	drm/amdkfd: set coherent host access capability flag This flag determines whether the host possesses coherent access to the memory of the device. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-23 15:35:34 -04:00
James Zhu	973fddea6f	drm/amdkfd: update user space last_event_age Update user space last_event_age when event age is enabled. It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-15 11:37:55 -04:00
James Zhu	96cdb5384d	drm/amdkfd: set activated flag true when event age unmatchs Set waiter's activated flag true when event age unmatchs with last_event_age. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-15 11:37:55 -04:00
James Zhu	4057e6ce33	drm/amdkfd: add event_age tracking when receiving interrupt Add event_age tracking when receiving interrupt. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-15 11:37:55 -04:00
Jonathan Kim	80a780ab27	drm/amdkfd: decrement queue count on mes queue destroy Queue count should decrement on queue destruction regardless of HWS support type. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-15 11:06:58 -04:00
Mukul Joshi	765663b7fa	drm/amdkfd: Remove DUMMY_VRAM_SIZE Remove DUMMY_VRAM_SIZE as it is not needed and can result in reporting incorrect memory size. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-15 11:06:58 -04:00
Ruili Ji	fb120e84b0	drm/amdkfd: To enable traps for GC_11_0_4 and up Flag trap_en should be enabled for trap handler. Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-15 11:06:57 -04:00
Jonathan Kim	8f7bd7010d	drm/amdkfd: fix null queue check on debug setting exceptions Null check should be done on queue struct itself and not on the process queue list node. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-15 10:46:05 -04:00
Mukul Joshi	41ce6d6d03	drm/amdgpu: Rename DRM schedulers in amdgpu TTM Rename mman.entity to mman.high_pr to make the distinction clearer that this is a high priority scheduler. Similarly, rename the recently added mman.delayed to mman.low_pr to make it clear it is a low priority scheduler. No functional change in this patch. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-15 10:42:33 -04:00
Dave Airlie	901bdf5ea1	Merge tag 'amd-drm-next-6.5-2023-06-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.5-2023-06-02: amdgpu: - SR-IOV fixes - Warning fixes - Misc code cleanups and spelling fixes - DCN 3.2 updates - Improved DC FAMS support for better power management - Improved DC SubVP support for better power management - DCN 3.1.x fixes - Max IB size query - DC GPU reset fixes - RAS updates - DCN 3.0.x fixes - S/G display fixes - CP shadow buffer support - Implement connector force callback - Z8 power improvements - PSP 13.0.10 vbflash support - Mode2 reset fixes - Store MQDs in VRAM to improve queue switch latency - VCN 3.x fixes - JPEG 3.x fixes - Enable DC_FP on LoongArch - GFXOFF fixes - GC 9.4.3 partition support - SDMA 4.4.2 partition support - VCN/JPEG 4.0.3 partition support - VCN 4.0.3 updates - NBIO 7.9 updates - GC 9.4.3 updates - Take NUMA into account when allocating memory - Handle NUMA for partitions - SMU 13.0.6 updates - GC 9.4.3 RAS updates - Stop including unused swiotlb.h - SMU 13.0.7 fixes - Fix clock output ordering on some APUs - Clean up DC FPGA code - GFX9 preemption fixes - Misc irq fixes - S0ix fixes - Add new DRM_AMDGPU_WERROR config parameter to help with CI - PCIe fix for RDNA2 - kdoc fixes - Documentation updates amdkfd: - Query TTM mem limit rather than hardcoding it - GC 9.4.3 partition support - Handle NUMA for partitions radeon: - Fix possible double free - Stop including unused swiotlb.h - Fix possible division by zero ttm: - Add query for TTM mem limit - Add NUMA awareness to pools - Export ttm_pool_fini() UAPI: - Add new ctx query flag to better handle GPU resets Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290 - Add new interface to query and set shadow buffer for RDNA3 Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21986 - Add new INFO query for max IB size Proposed userspace: https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/-/commits/ib-rejection-v3 amd-drm-next-6.5-2023-06-09: amdgpu: - S0ix fixes - Initial SMU13 Overdrive support - kdoc fixes - Misc clode cleanups - Flexible array fixes - Display OTG fixes - SMU 13.0.6 updates - Revert some broken clock counter updates - Misc display fixes - GFX9 preemption fixes - Add support for newer EEPROM bad page table format - Add missing radeon secondary id - Add support for new colorspace KMS API - CSA fix - Stable pstate fixes for APUs - make vbl interface admin only - Handle PCI accelerator class amdkfd: - Add debugger support for gdb radeon: - Fix possible UAF drm: - Add Colorspace functionality UAPI: - Add debugger interface for enabling gdb Proposed userspace: https://github.com/ROCm-Developer-Tools/ROCdbgapi/tree/wip-dbgapi - Add KMS colorspace API Discussion: https://lists.freedesktop.org/archives/dri-devel/2023-June/408128.html From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609174817.7764-1-alexander.deucher@amd.com	2023-06-15 14:11:22 +10:00
Jonathan Kim	09d49e14ea	drm/amdkfd: fix and enable debugging for gfx11 There are a couple of fixes required to enable gfx11 debugging. First, ADD_QUEUE.trap_en is an inappropriate place to toggle a per-process register so move it to SET_SHADER_DEBUGGER.trap_en. When ADD_QUEUE.skip_process_ctx_clear is set, MES will prioritize the SET_SHADER_DEBUGGER.trap_en setting. Second, to preserve correct save/restore priviledged wave states in coordination with the trap enablement setting, resume suspended waves early in the disable call. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:48:19 -04:00
Mukul Joshi	597364adc0	drm/amdkfd: Fix reserved SDMA queues handling This patch fixes a regression caused by a bad merge where the handling of reserved SDMA queues was accidentally removed. With the fix, the reserved SDMA queues are again correctly marked as unavailable for allocation. Fixes: `a805889a15` ("drm/amdkfd: Update SDMA queue management for GFX9.4.3") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:44:56 -04:00
Mario Limonciello	bbcc3514ab	drm/amd: Check that a system is a NUMA system before looking for SRAT It's pointless on laptops to look for the SRAT table as these are not NUMA. Check the number of possible nodes is > 1 to decide whether to look for SRAT. Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:44:46 -04:00
Jonathan Kim	7386f88ab1	drm/amdkfd: fix vmfault signalling with additional data. Exception handling for vmfaults should be raised with additional data. Reported-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:44:44 -04:00
Dan Carpenter	8be2950467	drm/amdkfd: potential error pointer dereference in ioctl The "target" either comes from kfd_create_process() which returns error pointers on error or kfd_lookup_process_by_pid() which returns NULL on error. So we need to check for both types of errors. Fixes: `0ab2d7532b` ("drm/amdkfd: prepare per-process debug enable and disable") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:43:00 -04:00
Jonathan Kim	a159afdad2	drm/amdkfd: bump kfd ioctl minor version for debug api availability Bump the minor version to declare debugging capability is now available. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:37:03 -04:00
Jonathan Kim	12976e6a5a	drm/amdkfd: add debug device snapshot operation Similar to queue snapshot, return an array of device information using an entry_size check and return. Unlike queue snapshots, the debugger needs to pass to correct number of devices that exist. If it fails to do so, the KFD will return the number of actual devices so that the debugger can make a subsequent successful call. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:37:00 -04:00
Jonathan Kim	b17bd5dbf6	drm/amdkfd: add debug queue snapshot operation Allow the debugger to get a snapshot of a specified number of queues containing various queue property information that is copied to the debugger. Since the debugger doesn't know how many queues exist at any given time, allow the debugger to pass the requested number of snapshots as 0 to get the actual number of potential snapshots to use for a subsequent snapshot request for actual information. To prevent future ABI breakage, pass in the requested entry_size. The KFD will return it's own entry_size in case the debugger still wants log the information in a core dump on sizing failure. Also allow the debugger to clear exceptions when doing a snapshot. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:57 -04:00
Jonathan Kim	2b36de971d	drm/amdkfd: add debug query exception info operation Allow the debugger to query additional info based on an exception code. For device exceptions, it's currently only memory violation information. For process exceptions, it's currently only runtime information. Queue exception only report the queue exception status. The debugger has the option of clearing the target exception on query. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:53 -04:00
Jonathan Kim	5bc20c224b	drm/amdkfd: add debug query event operation Allow the debugger to query a single queue, device and process exception. The KFD should also return the GPU or Queue id of the exception. The debugger also has the option of clearing exceptions after being queried. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:51 -04:00
Jonathan Kim	103d5f08ff	drm/amdkfd: add debug set flags operation Allow the debugger to set single memory and single ALU operations. Some exceptions are imprecise (memory violations, address watch) in the sense that a trap occurs only when the exception interrupt occurs and not at the non-halting faulty instruction. Trap temporaries 0 & 1 save the program counter address, which means that these values will not point to the faulty instruction address but to whenever the interrupt was raised. Setting the Single Memory Operations flag will inject an automatic wait on every memory operation instruction forcing imprecise memory exceptions to become precise at the cost of performance. This setting is not permitted on debug devices that support only a global setting of this option. Return the previous set flags to the debugger as well. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:48 -04:00
Jonathan Kim	e0f85f4690	drm/amdkfd: add debug set and clear address watch points operation Shader read, write and atomic memory operations can be alerted to the debugger as an address watch exception. Allow the debugger to pass in a watch point to a particular memory address per device. Note that there exists only 4 watch points per devices to date, so have the KFD keep track of what watch points are allocated or not. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:46 -04:00
Jonathan Kim	a70a93fa56	drm/amdkfd: add debug suspend and resume process queues operation In order to inspect waves from the saved context at any point during a debug session, the debugger must be able to preempt queues to trigger context save by suspending them. On queue suspend, the KFD will copy the context save header information so that the debugger can correctly crawl the appropriate size of the saved context. The debugger must then also be allowed to resume suspended queues. A queue that is newly created cannot be suspended because queue ids are recycled after destruction so the debugger needs to know that this has occurred. Query functions will be later added that will clear a given queue of its new queue status. A queue cannot be destroyed while it is suspended to preserve its saved context during debugger inspection. Have queue destruction block while a queue is suspended and unblocked when it is resumed. Likewise, if a queue is about to be destroyed, it cannot be suspended. Return the number of queues successfully suspended or resumed along with a per queue status array where the upper bits per queue status show that the request was invalid (new/destroyed queue suspend request, missing queue) or an error occurred (HWS in a fatal state so it can't suspend or resume queues). Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:43 -04:00
Jonathan Kim	aea1b4738b	drm/amdkfd: add debug wave launch mode operation Allow the debugger to set wave behaviour on to either normally operate, halt at launch, trap on every instruction, terminate immediately or stall on allocation. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:39 -04:00
Jonathan Kim	101827e130	drm/amdkfd: add debug wave launch override operation This operation allows the debugger to override the enabled HW exceptions on the device. On debug devices that only support the debugging of a single process, the HW exceptions are global and set through the SPI_GDBG_TRAP_MASK register. Because they are global, only address watch exceptions are allowed to be enabled. In other words, the debugger must preserve all non-address watch exception states in normal mode operation by barring a full replacement override or a non-address watch override request. For multi-process debugging, all HW exception overrides are per-VMID so all exceptions can be overridden or fully replaced. In order for the debugger to know what is permissible, returned the supported override mask back to the debugger along with the previously enable overrides. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:37 -04:00
Jonathan Kim	e90bf919f7	drm/amdkfd: add debug set exceptions enabled operation The debugger subscibes to nofication for requested exceptions on attach. Allow the debugger to change its subsciption later on. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:21 -04:00
Jonathan Kim	12fb1ad70d	drm/amdkfd: update process interrupt handling for debug events The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts. If a debugger session exits, any exceptions it subscribed to may still have interrupts in the interrupt ring buffer or KGD/KFD pipeline. To prevent a new session from inheriting stale interrupts, when a new queue is created, open an interrupt drain and allow the IH ring to drain from a timestamped checkpoint. Then inject a custom IV so that once the custom IV is picked up by the KFD, it's safe to close the drain and proceed with queue creation. The drain must also be on debug disable as SW interrupts may still be processed. Drain at this time and clear all the exception status. The debugger may also not be attached nor subscibed to certain exceptions so forward them directly to the runtime. GFX10 also requires its own IV processing, hence the creation of kfd_int_process_v10.c. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:17 -04:00
Jay Cornwall	50cff45e27	drm/amdkfd: add debug trap enabled flag to tma Trap handler behavior will differ when a debugger is attached. Make the debug trap flag available in the trap handler TMA. Update it when the debug trap ioctl is invoked. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:10 -04:00
Jonathan Kim	455227c464	drm/amdkfd: add runtime enable operation The debugger can attach to a process prior to HSA enablement (i.e. inferior is spawned by the debugger and attached to immediately before target process has been enabled for HSA dispatches) or it can attach to a running target that is already HSA enabled. Either way, the debugger needs to know the enablement status to know when it can inspect queues. For the scenario where the debugger spawns the target process, it will have to wait for ROCr's runtime enable request from the target. The runtime enable request will be able to see that its process has been debug attached. ROCr raises an EC_PROCESS_RUNTIME signal to the debugger then blocks the target process while waiting the debugger's response. Once the debugger has received the runtime signal, it will unblock the target process. For the scenario where the debugger attaches to a running target process, ROCr will set the target process' runtime status as enabled so that on an attach request, the debugger will be able to see this status and will continue with debug enablement as normal. A secondary requirement is to conditionally enable the trap tempories only if the user requests it (env var HSA_ENABLE_DEBUG=1) or if the debugger attaches with HSA runtime enabled. This is because setting up the trap temporaries incurs a performance overhead that is unacceptable for microbench performance in normal mode for certain customers. In the scenario where the debugger spawns the target process, when ROCr detects that the debugger has attached during the runtime enable request, it will enable the trap temporaries before it blocks the target process while waiting for the debugger to respond. In the scenario where the debugger attaches to a running target process, it will enable to trap temporaries itself. Finally, there is an additional restriction that is required to be enforced with runtime enable and HW debug mode setting. The debugger must first ensure that HW debug mode has been enabled before permitting HW debug mode operations. With single process debug devices, allowing the debugger to set debug HW modes prior to trap activation means that debug HW mode setting can occur before the KFD has reserved the debug VMID (0xf) from the hardware scheduler's VMID allocation resource pool. This can result in the hardware scheduler assigning VMID 0xf to a non-debugged process and having that process inherit debug HW mode settings intended for the debugged target process instead, which is both incorrect and potentially fatal for normal mode operation. With multi process debug devices, allowing the debugger to set debug HW modes prior to trap activation means that non-debugged processes migrating to a new VMID could inherit unintended debug settings. All debug operations that touch HW settings must require trap activation where trap activation is triggered by both debug attach and runtime enablement (target has KFD opened and is ready to dispatch work). Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:05 -04:00
Jonathan Kim	c2d2588c70	drm/amdkfd: add send exception operation Add a debug operation that allows the debugger to send an exception directly to runtime through a payload address. For memory violations, normal vmfault signals will be applied to notify runtime instead after passing in the saved exception data when a memory violation was raised to the debugger. For runtime exceptions, this will unblock the runtime enable function which will be explained and implemented in a follow up patch. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:36:01 -04:00
Jonathan Kim	44b87bb083	drm/amdkfd: add raise exception event function Exception events can be generated from interrupts or queue activitity. The raise event function will save exception status of a queue, device or process then notify the debugger of the status change by writing to a debugger polled file descriptor that the debugger provides during debug attach. For memory violation exceptions, extra exception data will be saved. The debugger will be able to query the saved exception states by query operation that will be provided by follow up patches. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:35:58 -04:00
Jonathan Kim	69a8c3ae2d	drm/amdkfd: apply trap workaround for gfx11 Due to a HW bug, waves in only half the shader arrays can enter trap. When starting a debug session, relocate all waves to the first shader array of each shader engine and mask off the 2nd shader array as unavailable. When ending a debug session, re-enable the 2nd shader array per shader engine. User CU masking per queue cannot be guaranteed to remain functional if requested during debugging (e.g. user cu mask requests only 2nd shader array as an available resource leading to zero HW resources available) nor can runtime be alerted of any of these changes during execution. Make user CU masking and debugging mutual exclusive with respect to availability. If the debugger tries to attach to a process with a user cu masked queue, return the runtime status as enabled but busy. If the debugger tries to attach and fails to reallocate queue waves to the first shader array of each shader engine, return the runtime status as enabled but with an error. In addition, like any other mutli-process debug supported devices, disable trap temporary setup per-process to avoid performance impact from setup overhead. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:35:52 -04:00
Jonathan Kim	218895820e	drm/amdkfd: add per process hw trap enable and disable functions To enable HW debug mode per process, all devices must be debug enabled successfully. If a failure occures, rewind the enablement of debug mode on the enabled devices. A power management scenario that needs to be considered is HW debug mode setting during GFXOFF. During GFXOFF, these registers will be unreachable so we have to transiently disable GFXOFF when setting. Also, some devices don't support the RLC save restore function for these debug registers so we have to disable GFXOFF completely during a debug session. Cooperative launch also has debugging restriction based on HW/FW bugs. If such bugs exists, the debugger cannot attach to a process that uses GWS resources nor can GWS resources be requested if a process is being debugged. Multi-process debug devices can only enable trap temporaries based on certain runtime scenerios, which will be explained when the runtime enable functions are implemented in a follow up patch. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:35:48 -04:00
Jonathan Kim	0de4ec9a03	drm/amdgpu: prepare map process for multi-process debug devices Unlike single process debug devices, multi-process debug devices allow debug mode setting per-VMID (non-device-global). Because the HWS manages PASID-VMID mapping, the new MAP_PROCESS API allows the KFD to forward the required SPI debug register write requests. To request a new debug mode setting change, the KFD must be able to preempt all queues then remap all queues with these new setting requests for MAP_PROCESS to take effect. Note that by default, trap enablement in non-debug mode must be disabled for performance reasons for multi-process debug devices due to setup overhead in FW. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:35:39 -04:00
Jonathan Kim	97ae3c8cce	drm/amdkfd: prepare map process for single process debug devices Older HW only supports debugging on a single process because the SPI debug mode setting registers are device global. The HWS has supplied a single pinned VMID (0xf) for MAP_PROCESS for debug purposes. To pin the VMID, the KFD will remove the VMID from the HWS dynamic VMID allocation via SET_RESOUCES so that a debugged process will never migrate away from its pinned VMID. The KFD is responsible for reserving and releasing this pinned VMID accordingly whenever the debugger attaches and detaches respectively. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:35:36 -04:00
Jonathan Kim	7cee6a6824	drm/amdgpu: add configurable grace period for unmap queues The HWS schedule allows a grace period for wave completion prior to preemption for better performance by avoiding CWSR on waves that can potentially complete quickly. The debugger, on the other hand, will want to inspect wave status immediately after it actively triggers preemption (a suspend function to be provided). To minimize latency between preemption and debugger wave inspection, allow immediate preemption by setting the grace period to 0. Note that setting the preepmtion grace period to 0 will result in an infinite grace period being set due to a CP FW bug so set it to 1 for now. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:35:31 -04:00
Jonathan Kim	bb13d763f2	drm/amdkfd: fix kfd_suspend_all_processes Flush delayed restore work in kfd_suspend_all_queues instead of cancelling. Cancelling the work before it runs results in the queues becoming permanently disabled. Flushing the work ensures that the queue suspend/resume state stays balanced. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:35:18 -04:00
Yang Li	d3116d9f27	drm/amdkfd: clean up one inconsistent indenting drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:1036 kgd2kfd_interrupt() warn: inconsistent indenting Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:35:08 -04:00
Jonathan Kim	4504f14338	drm/amdgpu: setup hw debug registers on driver initialization Add missing debug trap registers references and initialize all debug registers on boot by clearing the hardware exception overrides and the wave allocation ID index. The debugger requires that TTMPs 6 & 7 save the dispatch ID to map waves onto dispatch during compute context inspection. In order to correctly set this up, set the special reserved CP bit by default whenever the MQD is initailized. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:34:56 -04:00
Jonathan Kim	0ab2d7532b	drm/amdkfd: prepare per-process debug enable and disable The ROCm debugger will attach to a process to debug by PTRACE and will expect the KFD to prepare a process for the target PID, whether the target PID has opened the KFD device or not. This patch is to explicity handle this requirement. Further HW mode setting and runtime coordination requirements will be handled in following patches. In the case where the target process has not opened the KFD device, a new KFD process must be created for the target PID. The debugger as well as the target process for this case will have not acquired any VMs so handle process restoration to correctly account for this. To coordinate with HSA runtime, the debugger must be aware of the target process' runtime enablement status and will copy the runtime status information into the debugged KFD process for later query. On enablement, the debugger will subscribe to a set of exceptions where each exception events will notify the debugger through a pollable FIFO file descriptor that the debugger provides to the KFD to manage. Finally on process termination of either the debugger or the target, debugging must be disabled if it has not been done so. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:34:48 -04:00
Jonathan Kim	d230f1bfe7	drm/amdkfd: display debug capabilities Expose debug capabilities in the KFD topology node's HSA capabilities and debug properties flags. Ensure correct capabilities are exposed based on firmware support. Flag definitions can be referenced in uapi/linux/kfd_sysfs.h. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:34:45 -04:00
Jonathan Kim	4f98cf2baf	drm/amdkfd: add debug and runtime enable interface Introduce the GPU debug operations interface. For ROCm-GDB to extend the GNU Debugger's ability to inspect the AMD GPU instruction set, provide the necessary interface to allow the debugger to HW debug-mode set and query exceptions per HSA queue, process or device. The runtime_enable interface coordinates exception handling with the HSA runtime. Usage is available in the kern docs at uapi/linux/kfd_ioctl.h. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:34:41 -04:00
Alex Deucher	ba3c87fffb	amd/amdkfd: drop unused KFD_IOCTL_SVM_FLAG_UNCACHED flag Was leftover from GC 9.4.3 bring up and is currently unused. Drop it for now. Cc: Philip.Yang@amd.com Cc: rajneesh.bhardwaj@amd.com Cc: Felix.Kuehling@amd.com Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:34:36 -04:00
Srinivasan Shanmugam	16cc3a2215	drm/amdgpu: Add function parameter 'event' to kdoc in svm_range_evict() Fixes the following gcc with W=1: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:1841: warning: Function parameter or member 'event' not described in 'svm_range_evict' Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:33:58 -04:00
Tom Rix	0df1106bfd	drm/amdkfd: remove unused sq_int_priv variable clang with W=1 reports drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v11.c:282:38: error: variable 'sq_int_priv' set but not used [-Werror,-Wunused-but-set-variable] uint8_t sq_int_enc, sq_int_errtype, sq_int_priv; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:33:38 -04:00
Srinivasan Shanmugam	b3a02e8b61	drm/amdgpu: Fix up missing parameters kdoc in svm_migrate_vma_to_ram Fix these warnings by adding & deleting the deviant arguments. gcc with W=1 drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:671: warning: Function parameter or member 'node' not described in 'svm_migrate_vma_to_ram' drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:671: warning: Function parameter or member 'trigger' not described in 'svm_migrate_vma_to_ram' drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:671: warning: Function parameter or member 'fault_page' not described in 'svm_migrate_vma_to_ram' drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:671: warning: Excess function parameter 'adev' description in 'svm_migrate_vma_to_ram' drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:771: warning: Function parameter or member 'fault_page' not described in 'svm_migrate_vram_to_ram' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:32:48 -04:00
Alex Sierra	c22b044070	drm/amdkfd: flag added to handle errors from svm validate and map If a return error is raised during validation and mapping of a prange, this flag is set. It is a rare occurrence, but it could happen when `amdgpu_hmm_range_get_pages_done` returns true. In such cases, the caller should retry. However, it is important to ensure that the prange is updated correctly during the retry. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:32:25 -04:00
Lijo Lazar	b695c97b58	drm/amdkfd: Fix MEC pipe interrupt enablement for_each_inst modifies xcc_mask and therefore the loop doesn't initialize properly interrupts on all pipes. Keep looping through xcc as the outer loop to fix this issue. Fixes: `c4050ff1a4` ("drm/amdkfd: Use xcc mask for identifying xcc") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:31:35 -04:00
Graham Sider	07a1475279	drm/amdkfd: Add new gfx_target_versions for GC 9.4.3 For GC 9.4.3, set gfx_target_version to 90402 for rev 1 and later (APU or dGPU), 90401 for rev 0 dGPU, and 90400 for rev 0 APU. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:31:32 -04:00
Alex Deucher	28ebbb4981	drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices Certain boards with GC IP 11.0.3 need slightly different handling in the shader compiler due to board specific bounding box optimizations. Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 12:31:15 -04:00
Mukul Joshi	55a6dc60b4	drm/amdkfd: Set event interrupt class for GFX 9.4.3 Fix the warning during driver load because the event interrupt class is not set for GFX9.4.3. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 10:56:42 -04:00
Srinivasan Shanmugam	2ad00e753a	drm/amdgpu: Fix uninitalized variable in kgd2kfd_device_init drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:613:4: error: variable 'num_xcd' is uninitialized when used here [-Werror,-Wuninitialized] num_xcd, kfd->adev->gfx.num_xcc_per_xcp); ^~~~~~~ include/linux/dev_printk.h:144:65: note: expanded from macro 'dev_err' dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__) ^~~~~~~~~~~ include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap' _p_func(dev, fmt, ##__VA_ARGS__); \ ^~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:597:13: note: initialize the variable 'num_xcd' to silence this warning int num_xcd, partition_mode; ^ = 0 1 error generated. Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 10:42:04 -04:00
Alex Deucher	f2cd6b2692	drm/amdkfd: fix stack size in svm_range_validate_and_map Allocate large local variable on heap to avoid exceeding the stack size: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function ‘svm_range_validate_and_map’: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:1690:1: warning: the frame size of 2360 bytes is larger than 2048 bytes [-Wframe-larger-than=] Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 10:38:06 -04:00
Alex Deucher	45b3a914d4	drm/amdgpu/gmc9: fix 64 bit division in partition code Rework logic or use do_div() to avoid problems on 32 bit. v2: add a missing case for XCP macro v3: fix out of bounds array access v4: fix xcp handling harder Acked-by: Guchun Chen <guchun.chen@amd.com> (v1) Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> (v3) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 10:37:52 -04:00

1 2 3 4 5 ...

1604 Commits