linux

Commit Graph

Author	SHA1	Message	Date
Sunil Khatri	eac3b274aa	drm/amdgpu: add print support for sdma_v_4_4_2 ip_dump Add print support for ip dump for sdma_v_4_4_2 in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:37 -04:00
Alex Deucher	9c7e69d2e1	drm/amdgpu/gfx9: enable wave kill for compute queues It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:11 -04:00
Alex Deucher	7e60ecc2b7	drm/amdgpu/gfx8: enable wave kill for compute queues It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:08 -04:00
Alex Deucher	12fb3e9c88	drm/amdgpu/gfx7: enable wave kill for compute queues It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:42:03 -04:00
Philip Yang	fb91065851	drm/amdkfd: Refactor queue wptr_bo GART mapping Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size. Add wptr_bo to structure queue_properties because structure queue is allocated after queue buffers are validated, then we can remove wptr_bo parameter from pqm_create_queue. Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue, the same location with queue ctx_bo GART mapping. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:35:04 -04:00
Sunil Khatri	db54a725d5	drm/amdgpu: Add sdma_v4_4_2 ip dump for devcoredump Add ip dump for sdma_v4_4_2 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:58 -04:00
Sunil Khatri	a11b36ba9c	drm/amdgpu: add print support for sdma_v_4_0 ip_dump Add print support for ip dump for sdma_v_4_0 in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:49 -04:00
Philip Yang	c86ad39140	drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the local variable, the original pointer not set to NULL, this could cause use-after-free bug. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:44 -04:00
Philip Yang	f9e292cbba	drm/amdkfd: kfd_bo_mapped_dev support partition Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint restore now. No functional change. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:36 -04:00
Jane Jian	caaf576292	drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF For VCN/JPEG 4.0.3, use only the local addressing scheme. - Mask bit higher than AID0 range v2 remain the case for mmhub use master XCC Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:28 -04:00
Lijo Lazar	49cfaebe48	drm/amdgpu: Add empty HDP flush function to VCN v4.0.3 VCN 4.0.3 does not HDP flush with RRMT enabled. Instead, mmsch will do the HDP flush. This change is necessary for VCN v4.0.3, no need for backward compatibility Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:18 -04:00
Lijo Lazar	585e3fdb36	drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3 JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead, mmsch fw will do the flush. This change is necessary for JPEG v4.0.3, no need for backward compatibility Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:34:12 -04:00
Pierre-Eric Pelloux-Prayer	fec5f8e8c6	drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit Before this commit, only submits with both a BO_HANDLES chunk and a 'bo_list_handle' would be rejected (by amdgpu_cs_parser_bos). But if UMD sent multiple BO_HANDLES, what would happen is: * only the last one would be really used * all the others would leak memory as amdgpu_cs_p1_bo_handles would overwrite the previous p->bo_list value This commit rejects submissions with multiple BO_HANDLES chunks to match the implementation of the parser. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:57 -04:00
Sunil Khatri	80237bfc03	drm/amdgpu: Add sdma_v4_0 ip dump for devcoredump Add ip dump for sdma_v4_0 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:52 -04:00
Sunil Khatri	abf839f5eb	drm/amdgpu: add print support for sdma_v_7_0 ip_dump Add print support for ip dump for sdma_v_7_0 in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:45 -04:00
Ma Ke	6472de66c0	drm/amd/amdgpu: Fix uninitialized variable warnings Return 0 to avoid returning an uninitialized variable r. Cc: stable@vger.kernel.org Fixes: `230dd6bb61` ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:38 -04:00
Ma Ke	93381e6b61	drm/amdgpu: fix a possible null pointer dereference In amdgpu_connector_add_common_modes(), the return value of drm_cvt_mode() is assigned to mode, which will lead to a NULL pointer dereference on failure of drm_cvt_mode(). Add a check to avoid npd. Cc: stable@vger.kernel.org Fixes: `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:25 -04:00
David Belanger	666f14cab2	drm/amdgpu: Fix atomics on GFX12 If PCIe supports atomics, configure register to prevent DF from breaking atomics in separate load/store operations. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:17 -04:00
Sunil Khatri	4df9e2200f	drm/amdgpu: Add sdma_v7_0 ip dump for devcoredump Add ip dump for sdma_v7_0 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:33:10 -04:00
Alex Deucher	f2ac526349	drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell We seem to have a case where SDMA will sometimes miss a doorbell if GFX is entering the powergating state when the doorbell comes in. To workaround this, we can update the wptr via MMIO, however, this is only safe because we disallow gfxoff in begin_ring() for SDMA 5.2 and then allow it again in end_ring(). Enable this workaround while we are root causing the issue with the HW team. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440 Tested-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:32:58 -04:00
YiPeng Chai	a7e8467fbe	drm/amdgpu: Remove unused code Remove unused code. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:32:20 -04:00
YiPeng Chai	56631dee29	drm/amdgpu: optimize logging deferred error info 1. Use pa_pfn as the radix-tree key index to log deferred error info. 2. Use local array to store a row of bad pages. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:32:14 -04:00
YiPeng Chai	27cdf8c3ca	drm/amdgpu: optimize umc v12 address conversion function Split into 3 parts: 1. Convert soc physical address via ras ta. 2. Expand bad pages from soc physical address. 3. Dump bad address info. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:31:59 -04:00
Sunil Khatri	e84f798a93	drm/amdgpu: add print support for sdma_v_5_0 ip_dump Add support for ip dump for sdma_v_5_0 in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	0f1a93704a	drm/amdgpu: Add sdma_v5_0 ip dump for devcoredump Add ip dump for sdma_v5_0 for devcoredump for all instances of sdma. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	ccb54d7d91	drm/amdgpu: add print support for sdma_v_6_0 ip_dump Add print support for ip dump for sdma_v_6_0 in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	1eba165aa4	drm/amdgpu: Add sdma_v6_0 ip dump for devcoredump Add ip dump for sdma_v6_0 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	00bb3223bf	drm/amdgpu: fix the print message in devcoredump Fix the memory type logged for gtt memory size which is wrongly logged as visible vram size. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	43796955a8	drm/amdgpu: fix the extra space between two functions fix extra line space between two functions. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:09 -04:00
Sunil Khatri	08bed7e4ff	drm/amdgpu: add print support for sdma_v_5_2 ip_dump Add support for ip dump for sdma_v_5_2 in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:08 -04:00
Sunil Khatri	f763c3b543	drm/amdgpu: Add sdma_v5_2 ip dump for devcoredump Add ip dump for sdma_v5_2 for devcoredump for all instances of sdma. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-23 17:07:08 -04:00
YiPeng Chai	b3fb79cda5	drm/amdgpu: add mutex to protect ras shared memory Add mutex to protect ras shared memory. v2: Add TA_RAS_COMMAND__TRIGGER_ERROR command call status check. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-16 11:44:30 -04:00
Boyuan Zhang	7d75ef3736	drm/amdgpu/vcn: not pause dpg for unified queue For unified queue, DPG pause for encoding is done inside VCN firmware, so there is no need to pause dpg based on ring type in kernel. For VCN3 and below, pausing DPG for encoding in kernel is still needed. v2: add more comments v3: update commit message Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-16 11:44:04 -04:00
Boyuan Zhang	ecfa23c8df	drm/amdgpu/vcn: identify unified queue in sw init Determine whether VCN using unified queue in sw_init, instead of calling functions later on. v2: fix coding style Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-16 11:43:48 -04:00
Aurabindo Pillai	28814be882	drm/amd: Bump KMS_DRIVER_MINOR version Increase the KMS minor version to indicate GFX12 DCC support since this contains a major change in how DCC is managed across IPs like GFX, DCN etc. This will be used mainly by userspace like Mesa to figure out DCC support on GFX12 hardware. v2: fix version number (Alex) Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-15 17:41:23 -04:00
Alex Deucher	1cff1010be	drm/amdgpu/mes12: add missing opcode string Fixes the indexing of the string array. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-12 11:46:46 -04:00
Alex Deucher	478cb8badf	drm/amdgpu/mes11: update opcode strings Add new packet. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-12 11:46:46 -04:00
Mario Limonciello	9d8c094dda	drm/amd: Add power_saving_policy drm property to eDP connectors When the `power_saving_policy` property is set to bit mask "Require color accuracy" ABM should be disabled immediately and any requests by sysfs to update will return an -EBUSY error. When the `power_saving_policy` property is set to bit mask "Require low latency" PSR should be disabled. When the property is restored to an empty bit mask ABM and PSR can be enabled again. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703051722.328-3-mario.limonciello@amd.com	2024-07-10 17:00:07 -04:00
Alex Deucher	8030f6533e	drm/amdgpu: remove exp hw support check for gfx12 Enable it by default. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 13:45:57 -04:00
YiPeng Chai	e23300dfff	drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed The problem case is as follows: 1. GPU A triggers a gpu ras reset, and GPU A drives GPU B to also perform a gpu ras reset. 2. After gpu B ras reset started, gpu B queried a DE data. Since the DE data was queried in the ras reset thread instead of the page retirement thread, bad page retirement work would not be triggered. Then even if all gpu resets are completed, the bad pages will be cached in RAM until GPU B's bad page retirement work is triggered again and then saved to eeprom. This patch can save the bad pages to eeprom in time after gpu ras reset is completed. v2: 1. Add the above description to code comments. 2. Reuse existing function. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:13:41 -04:00
YiPeng Chai	c04706914d	drm/amdgpu: flush all cached ras bad pages to eeprom Before uninstalling gpu driver, flush all cached ras bad pages to eeprom. v2: Put the same code into a function and reuse the function. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:13:35 -04:00
Sunil Khatri	c39385710c	drm/amdgpu: select compute ME engines dynamically GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX12. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:13:22 -04:00
Sunil Khatri	a85cc86cce	drm/amdgpu: select compute ME engines dynamically GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX11. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:13:04 -04:00
Alex Deucher	7d570f56f1	drm/amdgpu/job: Replace DRM_INFO/ERROR logging Use the dev_info/err variants so we get per device logging in multi-GPU cases. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:12:55 -04:00
Sunil Khatri	948f2828a6	drm/amdgpu: select compute ME engines dynamically GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX10. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:12:48 -04:00
Lijo Lazar	d02ddefc7e	drm/amdgpu: Initialize VF partition mode For SOCs with GFX v9.4.3, a VF may have multiple compute partitions. Fetch the partition information during init and initialize partition nodes. There is no support to switch partition mode in VF mode, hence disable the same. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:12:28 -04:00
Gavin Wan	5d64af40e3	drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping. sdma has 2 instances in SRIOV cpx mode. Odd numbered VFs have sdma0/sdma1 instances. Even numbered vfs have sdma2/sdma3. For Even numbered vfs, the sdma2 & sdma3 (irq srouce id CLIENTID_SDMA2 and CLIENTID_SDMA3) should map to irq seq 0 & 1. Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-10 10:12:19 -04:00
Thomas Hellström	4c44f89c5d	drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves To address the problem with hitches moving when bulk move sublists are lru-bumped, register the list cursors with the ttm_lru_bulk_move structure when traversing its list, and when lru-bumping the list, move the cursor hitch to the tail. This also means it's mandatory for drivers to call ttm_lru_bulk_move_init() and ttm_lru_bulk_move_fini() when initializing and finalizing the bulk move structure, so add those calls to the amdgpu- and xe driver. Compared to v1 this is slightly more code but less fragile and hopefully easier to understand. Changes in previous series: - Completely rework the functionality - Avoid a NULL pointer dereference assigning manager->mem_type - Remove some leftover code causing build problems v2: - For hitch bulk tail moves, store the mem_type in the cursor instead of with the manager. v3: - Remove leftover mem_type member from change in v2. v6: - Add some lockdep asserts (Matthew Brost) - Avoid NULL pointer dereference (Matthew Brost) - No need to check bo->resource before dereferencing bo->bulk_move (Matthew Brost) Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240705153206.68526-5-thomas.hellstrom@linux.intel.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-07-09 12:39:33 +02:00
Zhigang Luo	ee98fb71ba	drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to avoid reading wrong WPTR from doorbell in sriov vf, set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:56:27 -04:00
Yang Wang	59f488be76	drm/amdgpu: add ras event state device attribute support add amdgpu ras 'event_state' sysfs device attribute support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:56:13 -04:00
Yang Wang	12b435a40c	drm/amdgpu: add ras POSION_CONSUMPTION event id support add amdgpu ras POSION_CONSUMPTION event id support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:55:37 -04:00
Yang Wang	5b9de2596f	drm/amdgpu: add ras POSION_CREATION event id support add amdgpu ras POSION_CREATION event id support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:55:18 -04:00
Yang Wang	75ac6a2506	drm/amdgpu: refine amdgpu ras event id core code v1: - use unified event id to manage ras events - add a new function amdgpu_ras_query_error_status_with_event() to accept event type as parameter. v2: add a warn log to show the location of function failure when calling amdgpu_ras_mark_event(). (Tao Zhou) v3: change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL. v4: rename amdgpu_ras_get_recovery_event() to amdgpu_ras_get_fatal_error_event(). Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:55:11 -04:00
Christian König	320debca1b	drm/amdgpu: reject gang submit on reserved VMIDs A gang submit won't work if the VMID is reserved and we can't flush out VM changes from multiple engines at the same time. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:50:53 -04:00
Saleemkhan Jamadar	b6ad109166	drm/amdgpu: enable dpg for vcn and jpeg on GC 11_5_2 DPG mode is enabled for vcn and jpeg on VCN v4_0_5 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:50:40 -04:00
Yang Wang	332210c13a	drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOG remove redundant semicolons in RAS_EVENT_LOG to avoid code format check warning. Fixes: `b712d7c201` ("drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:43 -04:00
Frank Min	54837bd2be	drm/amdgpu: restore dcc bo tilling configs while moving While moving buffer which has dcc tiling config, it is needed to restore its original dcc tiling. 1. extend copy flag to cover tiling bits 2. add logic to restore original dcc tiling config Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:27 -04:00
Sunil Khatri	c8714ac982	drm/amdgpu: add gfx queue support for gfx12 ipdump Add support of all the CP GFX queues for gfx12 ipdump to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:22 -04:00
Sunil Khatri	495e6173a4	drm/amdgpu: add cp queue registers for gfx12 ipdump Add gfx12 support of CP queue registers for all queues to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:15 -04:00
Sunil Khatri	f0c6b79bfc	drm/amdgpu: enable redirection of irq's for IH v7.0 Enable redirection of irq for pagefaults for specific clients to avoid overflow without dropping interrupts. So here we redirect the interrupts to another IH ring i.e ring1 where only these interrupts are processed. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:09 -04:00
Sunil Khatri	906219ec94	drm:amdgpu: enable IH ring1 for IH v7.0 We need IH ring1 for handling the pagefault interrupts which over flow in default ring for specific usecases. Enable ring1 allows software to redirect high interrupts to ring1 from default IH ring. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:47:02 -04:00
Yifan Zha	33f23fc315	drm/amdgpu: Set no_hw_access when VF request full GPU fails [Why] If VF request full GPU access and the request failed, the VF driver can get stuck accessing registers for an extended period during the unload of KMS. [How] Set no_hw_access flag when VF request for full GPU access fails This prevents further hardware access attempts, avoiding the prolonged stuck state. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:56 -04:00
Sunil Khatri	2262acad0a	drm/amdgpu: add print support for gfx12 ipdump Add support of gfx12 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:49 -04:00
Sunil Khatri	fbbbb62112	drm/amdgpu: add gfx12 register support in ipdump Add general registers of gfx12 in ipdump for devcoredump support. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:43 -04:00
Frank Min	ffcc5745ed	drm/amdgpu: update gfxhub client id for gfx12 update gfxhub client id for gfx12 Signed-off-by: Frank Min <Frank.Min@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:36 -04:00
Tim Huang	064d92436b	drm/amd/pm: avoid to load smu firmware for APUs Certain call paths still load the SMU firmware for APUs, which needs to be skipped. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:30 -04:00
YiPeng Chai	78347b651a	drm/amdgpu: sysfs node disable query error count during gpu reset Sysfs node disable query error count during gpu reset. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-08 16:46:14 -04:00
Daniel Vetter	6be146cf57	amd-drm-next-6.11-2024-07-03: amdgpu: - Use vmalloc for dc_state - Replay fixes - Freesync fixes - DCN 4.0.1 fixes - DML fixes - DCC updates - Misc code cleanups and bug fixes - 8K display fixes - DCN 3.5 fixes - Restructure DIO code - DML1 fixes - DML2 fixes - GFX11 fix - GFX12 updates - GFX12 modifiers fixes - RAS fixes - IP dump fixes - Add some updated IP version checks _ Silence UBSAN warning radeon: - GPUVM fix -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZoW9RgAKCRC93/aFa7yZ 2AfjAQDJH95ijhdClk4S7t0ofjam7UPhVkJiUl8u73BNbuul/QEAk7RadnWw8SwC wvSXMCyHY3XszMJ1AdWG+4uwq5axPQE= =aWkn -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.11-2024-07-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-07-03: amdgpu: - Use vmalloc for dc_state - Replay fixes - Freesync fixes - DCN 4.0.1 fixes - DML fixes - DCC updates - Misc code cleanups and bug fixes - 8K display fixes - DCN 3.5 fixes - Restructure DIO code - DML1 fixes - DML2 fixes - GFX11 fix - GFX12 updates - GFX12 modifiers fixes - RAS fixes - IP dump fixes - Add some updated IP version checks _ Silence UBSAN warning radeon: - GPUVM fix Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703211314.2041893-1-alexander.deucher@amd.com	2024-07-05 12:02:12 +02:00
Daniel Vetter	71e9f407fd	amd-drm-next-6.11-2024-06-28: amdgpu: - JPEG 5.x fixes - More FW loading cleanups - Misc code cleanups - GC 12.x fixes - ASPM fix - DCN 4.0.1 updates - SR-IOV fixes - HDCP fix - USB4 fixes - Silence UBSAN warnings - MES submission fixes - Update documentation for new products - DCC updates - Initial ISP 4.x plumbing - RAS fixes - Misc small fixes amdkfd: - Fix missing unlock in error path for adding queues -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZn8qfAAKCRC93/aFa7yZ 2H9QAQCZZsu8vps+HVUY5vihNXNL9TdUbrrroQBEyk+DsJzE9QD8CJW8/oiP8/fa /EsAWXNZUH+mPxRaVWTxp0j2P941pww= =uZbG -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.11-2024-06-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-06-28: amdgpu: - JPEG 5.x fixes - More FW loading cleanups - Misc code cleanups - GC 12.x fixes - ASPM fix - DCN 4.0.1 updates - SR-IOV fixes - HDCP fix - USB4 fixes - Silence UBSAN warnings - MES submission fixes - Update documentation for new products - DCC updates - Initial ISP 4.x plumbing - RAS fixes - Misc small fixes amdkfd: - Fix missing unlock in error path for adding queues Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240628213135.427214-1-alexander.deucher@amd.com	2024-07-05 11:39:23 +02:00
Daniel Vetter	86634fa4e6	Linux 6.10-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmaB0NweHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkvwH/36UJRk/o6wvXnyH E6QjCSWo2226APyWks22NjtC3I/8Iqdvkneuh6wG0qL2sXAB078EMjUq5R81bF8H wWFBJwetjYTp8GEyLioMEb2wCH/J3R29dLFC4UYTplafXRGP6//xcpJaKmTxcgdR 31IzvTPXbApZ7L3k1U6rA2bK9PNKcFCOvZlrNMUCuwMrabymHsDfOUt1DqXyg2xp zjqiWYBwlklozmgawSWt/mdEgkWuTcAbg+KyqDVQF59s9aj/OOwZ0j+HACq5V8CM quTPIAYL6CC9p7uxa69lGr/sgC0Is/BZLPX7RTZAwCgarGvnX+1HUsjDcaFCtrVg O6fPUV8= =pgUx -----END PGP SIGNATURE----- Merge v6.10-rc6 into drm-next The exynos-next pull is based on a newer -rc than drm-next. hence backmerge first to make sure the unrelated conflicts we accumulated don't end up randomly in the exynos merge pull, but are separated out. Conflicts are all benign: Adjacent changes in amdgpu and fbdev-dma code, and cherry-pick conflict in xe. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2024-07-05 10:47:28 +02:00
Sunil Khatri	ea67deb03c	drm/amdgpu: fix out of bounds access in gfx11 during ip dump During ip dump in gfx11 the index variable is reused but is not reinitialized to 0 and this causes the index calculation to be wrong and access out of bound access. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:07:16 -04:00
Tim Huang	94845ea057	drm/amdgpu: add firmware for PSP IP v14.0.4 This patch is to add firmware for PSP 14.0.4. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:07:10 -04:00
Tim Huang	b709f949f0	drm/amdgpu: enable mode2 reset for SMU IP v14.0.4 Set the default reset method to mode2 for SMU 14.0.4. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:07:02 -04:00
Tim Huang	38a16bfe6f	drm/amdgpu: add SMU IP v14.0.4 discovery support This patch is to add SMU 14.0.4 support Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:57 -04:00
Tim Huang	9cd2ad14d8	drm/amdgpu: add PSP IP v14.0.4 discovery support This patch is to add PSP 14.0.4 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:36 -04:00
Tim Huang	c7c3f786b9	drm/amdgpu: add PSP IP v14.0.4 support This patch is to add PSP 14.0.4 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:30 -04:00
Tim Huang	614a9f5ed5	drm/amdgpu: add firmware for VPE IP v6.1.3 This patch is to add firmware for VPE 6.1.3. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:24 -04:00
Tim Huang	ca15cd559f	drm/amdgpu: add VPE IP v6.1.3 discovery support This patch is to add VPE 6.1.3 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:19 -04:00
Tim Huang	f3e2a425c6	drm/amdgpu: add VPE IP v6.1.3 support This patch is to add VPE 6.1.3 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:13 -04:00
Tim Huang	410bb279a8	drm/amdgpu: Add NBIO IP v7.11.3 support Enable setting soc21 common clockgating for NBIO 7.11.3. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:06 -04:00
Tim Huang	5aea871694	drm/amdgpu: add NBIO IP v7.11.3 discovery support This patch is to add NBIO 7.11.3 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:06:00 -04:00
Tim Huang	6857669a22	drm/amdgpu: add firmware for SDMA IP v6.1.2 This patch is to add firmware for SDMA 6.1.2. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:53 -04:00
Tim Huang	dfeccf4d54	drm/amdgpu: add SDMA IP v6.1.2 discovery support This patch is to add SDMA 6.1.2 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:41 -04:00
Tim Huang	4448b1ff4d	drm/amdgpu: add firmware for GC IP v11.5.2 This patch is to add firmware for GC 11.5.2. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:35 -04:00
Tim Huang	23c1ea0241	drm/amdgpu: add GC IP v11.5.2 to GC 11.5.0 family This patch is to add GC 11.5.2 to GC 11.5.0 family. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:23 -04:00
Tim Huang	43e4cc2299	drm/amdgpu: add GC IP v11.5.2 soc21 support Add CG and PG flags for GFX IP v11.5.2 and PG flags for VCN IP v4.0.5. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Li Ma <li.ma@amd.com> Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:15 -04:00
Tim Huang	98392782df	drm/amdgpu: add tmz support for GC IP v11.5.2 Add tmz support for GC 11.5.2. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:09 -04:00
Tim Huang	02cf3ed627	drm/amdgpu: add GFXHUB IP v11.5.2 support This patch is to add GFXHUB 11.5.2 support. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:05:03 -04:00
Tim Huang	b6a343df46	drm/amdgpu: initialize GC IP v11.5.2 Initialize GC 11.5.2 and set gfx hw configuration. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:04:57 -04:00
Sunil Khatri	425c4a6f8b	drm/amdgpu: fix out of bounds access in gfx10 during ip dump During ip dump in gfx10 the index variable is reused but is not reinitialized to 0 and this causes the index calculation to be wrong and access out of bound access. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-02 18:04:49 -04:00
Marek Olšák	f340f2bad1	drm/amdgpu: rewrite convert_tiling_flags_to_modifier_gfx12 There were multiple bugs, like checking SWIZZLE_MODE before checking GFX12_SWIZZLE_MODE, which has undefined behavior. The function had no effect before (it always returned -EINVAL). Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Hawking Zhang	d3dbccacfd	drm/amdgpu: Fix hbm stack id in boot error report To align with firmware, hbm id field 0x1 refers to hbm stack 0, 0x2 refers to hbm statck 1. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Marek Olšák	0d3157d04d	drm/amdgpu: add amdgpu_framebuffer::gfx12_dcc amdgpu_framebuffer doesn't have tiling_flags, so we need this. amdgpu_display_get_fb_info never gets NULL parameters, so checking for NULL was useless. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Marek Olšák	8dd1426e2c	drm/amdgpu: handle gfx12 in amdgpu_display_verify_sizes It verified GFX9-11 swizzle modes on GFX12, which has undefined behavior. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Hawking Zhang	c2fad73174	drm/amdgpu: Correct register used to clear fault status Driver should write to fault_cntl registers to do one-shot address/status clear. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:47 -04:00
Marek Olšák	fd536d2e12	drm/amdgpu: don't use amdgpu_lookup_format_info on gfx12 It only uses fields for GFX9-11 related to the separate DCC buffer, which doesn't exist in GFX12. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:46 -04:00
Marek Olšák	30fb9cad6f	drm/amdgpu/gfx12: remove GDS leftovers GDS doesn't exist in gfx12. The incomplete packet allows userspace to hang the hw from the kernel. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:46 -04:00
Marek Olšák	e5f6bfe402	drm/amdgpu/gfx12: remove superfluous cache flags If any INV flags are needed, they should be executed via ACQUIRE_MEM before INDIRECT_BUFFER. GLM flags are also removed because the hw ignores them. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:46 -04:00
Marek Olšák	b16ec6300f	drm/amdgpu/gfx11: remove superfluous cache flags If any INV flags are needed, they should be executed via ACQUIRE_MEM before INDIRECT_BUFFER. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:46 -04:00
Marek Olšák	11317d2963	drm/amdgpu: check for LINEAR_ALIGNED correctly in check_tiling_flags_gfx6 Fix incorrect check. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-07-01 16:10:36 -04:00
Mario Limonciello	15eb8573ad	drm/amd: Don't initialize ISP hardware without FW Although designs may contain an ISP IP block, the camera might be a USB camera. Because of this the ISP firmware is considered optional from amdgpu. However if the firmware doesn't get loaded the hardware should not be initialized. Adjust the return code for early init to ensure the IP block doesn't go through the other init and fini sequences. Also decrease the message about firmware load failure to debug so it's not as alarming to users. Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Yang Wang	71fe449484	drm/amdgpu: refine isp firmware loading refine isp firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	75be61aa77	drm/amd/amdgpu: Enable MMHUB prefetch for ISP v4.1.0 and 4.1.1 Remove temporary WA to disable ISP prefetch as MMHUB SAW is initialized to support ISP HW access GART memory using the TLSi path with prefetch enabled. Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	7c2d3112b2	drm/amd/amdgpu: Fix 'snprintf' output truncation warning snprintf can truncate the output fw filename if the isp ucode_prefix exceeds 29 characters. Knowing ISP ucode_prefix is in the format isp_x_x_x, limiting the size of ucode_prefix[] to 10 characters to fix the warning. Fixes the below warning: drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c: In function 'isp_early_init': drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:192:58: warning: 'snprintf' output may be truncated before the last format character [-Wformat-truncation=] 192 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^ In function 'isp_load_fw_by_psp', inlined from 'isp_early_init' at drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:218:8: drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:192:9: note: 'snprintf' output between 12 and 41 bytes into a destination of size 40 192 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	062666ffbc	drm/amd/amdgpu: Disable MMHUB prefetch for ISP v4.1.1 Disable MMHUB prefetch for ISP v4.1.1 as a temporary WA until the GART supports MMHUB TLSi and SAW for ISP HW to access GART memory using the TLSi path. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	05bafe95e5	drm/amd/amdgpu: Add ISP4.1.0 and ISP4.1.1 modules Add independent IP centric modules for ISP4.1.0 and ISP4.1.1 hw blocks. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	0253d718a0	drm/amd/amdgpu: Map ISP interrupts as generic IRQs Map ISP IH interrupts to Linux generic IRQ for ISP driver to handle the interrupts using MFD IORESOURCE_IRQ resource. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:40 -04:00
Alex Deucher	8930b90be6	drm/amdgpu: fix Kconfig for ISP v2 Add new config option and set proper dependencies for ISP. v2: add missed guards, drop separate Kconfig Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Pratap Nirujogi <pratap.nirujogi@amd.com>	2024-06-27 17:34:40 -04:00
Pratap Nirujogi	d232584ae3	drm/amd/amdgpu: Enable ISP in amdgpu_discovery Enable ISP for ISP V4.1.0 and V4.1.1 in amdgpu_discovery. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:39 -04:00
Pratap Nirujogi	8fcbfd53ea	drm/amd/amdgpu: Add ISP driver support Add the isp driver in amdgpu to support ISP device on the APUs that supports ISP IP block. ISP hw block is used for camera front-end, pre and post processing operations. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:39 -04:00
Pratap Nirujogi	772e4d56da	drm/amd/amdgpu: Add ISP support to amdgpu_discovery ISP hw block is supported in some of the AMD GPU versions, add support to discover ISP IP in amdgpu_discovery. v2: squash in documentation update (Alex) Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:39 -04:00
Sonny Jiang	d4b8386c86	drm/amdgpu/jpeg5: Add support for DPG mode Add DPG support for JPEG 5.0 Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:33 -04:00
Frank Min	291af3f598	drm/amdgpu: tolerate allocating GTT bo with dcc flag Do not return failure for allocating GTT bo with dcc flag on gfx12. This will improve compatibility for UMD. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:34:17 -04:00
Jane Jian	b17eecc08f	drm/amdgpu: normalize registers as local xcc to read/write in gfx_v9_4_3 [WHY] sriov has the higher bit violation when flushing tlb [HOW] normalize the registers to keep lower 16-bit(dword aligned) to aviod higher bit violation RLCG will mask xcd out and always assume it's accessing its own xcd v2 add check in wait mem that only do the normalization on regspace Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Yiqing Yao <YiQing.Yao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:33:27 -04:00
YiPeng Chai	f852c9795c	drm/amdgpu: add gpu reset check and exception handling Add gpu reset check and exception handling for page retirement. v2: Clear poison consumption messages cached in fifo after non mode-1 reset. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:32:12 -04:00
YiPeng Chai	e278849cb2	drm/amdgpu: refine poison consumption interrupt handler 1. The poison fifo is only used for poison consumption requests. 2. Merge reset requests when poison fifo caches multiple poison consumption messages Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:32:06 -04:00
YiPeng Chai	5f08275cfd	drm/amdgpu: refine poison creation interrupt handler In order to apply to the case where a large number of ras poison interrupts: 1. Change to use variable to record poison creation requests to avoid fifo full. 2. Prioritize handling poison creation requests instead of following the order of requests received by the driver. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:31:43 -04:00
Vignesh Chander	cbda2758d8	drm/amdgpu: process RAS fatal error MB notification For RAS error scenario, VF guest driver will check mailbox and set fed flag to avoid unnecessary HW accesses. additionally, poll for reset completion message first to avoid accidentally spamming multiple reset requests to host. v2: add another mailbox check for handling case where kfd detects timeout first v3: set host_flr bit and use wait_for_reset Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:31:37 -04:00
YiPeng Chai	78146c1dcd	drm/amdgpu: add variable to record the deferred error number read by driver Add variable to record the deferred error number read by driver. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:31:20 -04:00
Vignesh Chander	29b6985de5	drm/amdgpu: Use dev_ prints for virtualization as it supports multi adapter So we can get clearer per device logging. Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:30:39 -04:00
Danijel Slivka	afbf7955ff	drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts Why: Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit if RB_ENABLE is not set. How to fix: Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set. The RB_ENABLE bit is required to be set, together with WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit would clear the RB_OVERFLOW. Signed-off-by: Danijel Slivka <danijel.slivka@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:30:27 -04:00
Kenneth Feng	bf826ba9b4	Revert "drm/amd/amdgpu: add module parameter for jpeg" This reverts commit `d3620eeae8`. Revert this due to a final solution: commit `ed3165d660` ("drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playback") Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Sonny Jiang <sonjiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:40 -04:00
Lijo Lazar	fe86c4d1a2	drm/amdgpu: Don't show false warning for reg list If reg list is already loaded on PSP 13.0.2 SOCs, psp will give TEE_ERR_CANCEL response on second time load. Avoid printing warn message for it. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:39 -04:00
Julia Zhang	79ea35c7d8	drm/amdgpu: avoid using null object of framebuffer Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Reported-by: Fusheng Huang <fusheng.huang@ecarxgroup.com> Signed-off-by: Julia Zhang <Julia.Zhang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:39 -04:00
Hawking Zhang	bdbdc7cecd	drm/amdgpu: Fix smatch static checker warning adev->gfx.imu.funcs could be NULL Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:39 -04:00
Bob Zhou	9ff2e14cf0	drm/amdgpu: add missing error handling in function amdgpu_gmc_flush_gpu_tlb_pasid Fix the unchecked return value warning reported by Coverity, so add error handling. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:39 -04:00
Hawking Zhang	9da0f77367	drm/amdgpu: Fix register access violation fault_status is read only register. fault_cntl is not accessible from guest environment. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:35 -04:00
Frank Min	748bd8ebae	drm/amdgpu: access ltr through pci cfg space Access ltr through pci cfg space instead of mmio while programing aspm on gfx12 Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:35 -04:00
Yang Wang	6bab222b8b	drm/amdgpu: refine gfx12 firmware loading refine gfx12 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:10:35 -04:00
Frank Min	541fe90ee6	drm/amdgpu: update MTYPE mapping for gfx12 gfx12 only support MTYPE UC and NC, so update it accordingly. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:47 -04:00
Pierre-Eric Pelloux-Prayer	c71c9aafd5	amdgpu: don't dereference a NULL resource in sysfs code dma_resv_trylock being successful doesn't guarantee that bo->tbo.base.resv is not NULL, so check its validity before using it. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:46 -04:00
Lijo Lazar	e779af8e8b	drm/amdgpu: Fix pci state save during mode-1 reset Cache the PCI state before bus master is disabled. The saved state is later used for other cases like restoring config space after mode-2 reset. Fixes: `5c03e5843e` ("drm/amdgpu:add smu mode1/2 support for aldebaran") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:46 -04:00
Yang Wang	dab70d9f65	drm/amdgpu: refine gfx11 firmware loading refine gfx11 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:46 -04:00
Sonny Jiang	ed3165d660	drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playback Doorbell needs to be configured after power up during each playback Signed-off-by: Sonny Jiang <sonjiang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:41 -04:00
Alex Deucher	83edf00d89	drm/amdgpu/atomfirmware: fix parsing of vram_info v3.x changed the how vram width was encoded. The previous implementation actually worked correctly for most boards. Fix the implementation to work correctly everywhere. This fixes the vram width reported in the kernel log on some boards. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-27 17:09:41 -04:00
Dave Airlie	365aa9f573	amd-drm-next-6.11-2024-06-22: amdgpu: - HPD fixes - PSR fixes - DCC updates - DCN 4.0.1 fixes - FAMS fixes - Misc code cleanups - SR-IOV fixes - GPUVM TLB flush cleanups - Make VCN less verbose - ACPI backlight fixes - MES fixes - Firmware loading cleanups - Replay fixes - LTTPR fixes - Trap handler fixes - Cursor and overlay fixes - Primary plane zpos fixes - DML 2.1 fixes - RAS updates - USB4 fixes - MALL fixes - Reserved VMID fix - Silence UBSAN warnings amdkfd: - Misc code cleanups -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZnbsMgAKCRC93/aFa7yZ 2GTUAQDGl46/pxpFRmZ8n3Hy6OCnmvoFqI9Re1uAc2RbQNNOAgEAi72PwS2iquU/ 69ectl+oi8P/yNMxP4rO1KgOP3AMsg8= =bvKZ -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.11-2024-06-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-06-22: amdgpu: - HPD fixes - PSR fixes - DCC updates - DCN 4.0.1 fixes - FAMS fixes - Misc code cleanups - SR-IOV fixes - GPUVM TLB flush cleanups - Make VCN less verbose - ACPI backlight fixes - MES fixes - Firmware loading cleanups - Replay fixes - LTTPR fixes - Trap handler fixes - Cursor and overlay fixes - Primary plane zpos fixes - DML 2.1 fixes - RAS updates - USB4 fixes - MALL fixes - Reserved VMID fix - Silence UBSAN warnings amdkfd: - Misc code cleanups From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240622152523.2267072-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-06-27 17:18:49 +10:00
Lijo Lazar	48880f9686	drm/amdgpu: Don't show false warning for reg list If reg list is already loaded on PSP 13.0.2 SOCs, psp will give TEE_ERR_CANCEL response on second time load. Avoid printing warn message for it. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-25 14:22:56 -04:00
Julia Zhang	bcfa48ff78	drm/amdgpu: avoid using null object of framebuffer Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Reported-by: Fusheng Huang <fusheng.huang@ecarxgroup.com> Signed-off-by: Julia Zhang <Julia.Zhang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-06-25 14:22:08 -04:00
Lijo Lazar	74fa02c4a5	drm/amdgpu: Fix pci state save during mode-1 reset Cache the PCI state before bus master is disabled. The saved state is later used for other cases like restoring config space after mode-2 reset. Fixes: `5c03e5843e` ("drm/amdgpu:add smu mode1/2 support for aldebaran") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-25 14:13:12 -04:00
Alex Deucher	f6f49dda49	drm/amdgpu/atomfirmware: fix parsing of vram_info v3.x changed the how vram width was encoded. The previous implementation actually worked correctly for most boards. Fix the implementation to work correctly everywhere. This fixes the vram width reported in the kernel log on some boards. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-06-25 14:11:46 -04:00
Dave Airlie	f680df51ca	drm-misc-next for 6.11: UAPI Changes: - Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION Core Changes: - connector: Create a set of helpers to help with HDMI support - fbdev: Create memory manager optimized fbdev emulation - panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer Driver Changes: - Remove driver owner assignments - Allow more drivers to compile with COMPILE_TEST - Conversions to drm_edid - ivpu: hardware scheduler support, profiling support, improvements to the platform support layer - mgag200: general reworks and improvements - nouveau: Add NVreg_RegistryDwords command line option - rockchip: Conversion to the hdmi helpers - sun4i: Conversion to the hdmi helpers - vc4: Conversion to the hdmi helpers - v3d: Perf counters improvements - zynqmp: IRQ and debugfs improvements - bridge: - Remove redundant checks on bridge->encoder - panels: - Switch panels from register table initialization to proper code - Now that the panel code tracks the panel state, remove every ad-hoc implementation in the panel drivers - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology 13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE nv110wum-l60, IVO t109nw41 -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZlhUKAAKCRAnX84Zoj2+ dgHoAYDTpShgXFXnlnMtqZr+ZuShcjcwiqzwM4qNWdtyji9MONtJJU3ZQnGlnXbI ZU+oZP0Bf0PyT0/8bf+rmZBJ1UdAxt2IQaLkP1tTHOad4E+KlcL5n1opzMi160mB EZSm9f7aNw== =bZPt -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2024-05-30' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.11: UAPI Changes: - Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION Core Changes: - connector: Create a set of helpers to help with HDMI support - fbdev: Create memory manager optimized fbdev emulation - panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer Driver Changes: - Remove driver owner assignments - Allow more drivers to compile with COMPILE_TEST - Conversions to drm_edid - ivpu: hardware scheduler support, profiling support, improvements to the platform support layer - mgag200: general reworks and improvements - nouveau: Add NVreg_RegistryDwords command line option - rockchip: Conversion to the hdmi helpers - sun4i: Conversion to the hdmi helpers - vc4: Conversion to the hdmi helpers - v3d: Perf counters improvements - zynqmp: IRQ and debugfs improvements - bridge: - Remove redundant checks on bridge->encoder - panels: - Switch panels from register table initialization to proper code - Now that the panel code tracks the panel state, remove every ad-hoc implementation in the panel drivers - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology 13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE nv110wum-l60, IVO t109nw41 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240530-hilarious-flat-magpie-5fa186@houat	2024-06-21 10:30:31 +10:00
Likun Gao	ed5a4484f0	drm/amdgpu: init TA fw for psp v14 Add support to init TA firmware for psp v14. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 18:25:58 -04:00
Christian König	e356d321d0	drm/amdgpu: cleanup MES11 command submission The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: `eef016ba89` ("drm/amdgpu/mes11: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 18:25:58 -04:00
Christian König	8bd82363e2	drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2 This reverts commit `b8c415e3bf` ("drm/amdgpu: take runtime pm reference when we attach a buffer") and commit `425285d39a` ("drm/amdgpu: add amdgpu runpm usage trace for separate funcs"). Taking a runtime pm reference for DMA-buf is actually completely unnecessary and even dangerous. The problem is that calling pm_runtime_get_sync() from the DMA-buf callbacks is illegal because we have the reservation locked here which is also taken during resume. So this would deadlock. When the buffer is in GTT it is still accessible even when the GPU is powered down and when it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-06-19 14:17:25 -04:00
Harish Kasiviswanathan	49c9ffabde	drm/amdgpu: Indicate CU havest info to CP To achieve full occupancy CP hardware needs to know if CUs in SE are symmetrically or asymmetrically harvested v2: Reset is_symmetric_cus for each loop Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 14:17:25 -04:00
Yunxiang Li	84801d4f1e	drm/amdgpu: fix locking scope when flushing tlb Which method is used to flush tlb does not depend on whether a reset is in progress or not. We should skip flush altogether if the GPU will get reset. So put both path under reset_domain read lock. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-06-19 14:17:25 -04:00
Likun Gao	1ecef55893	drm/amdgpu: init TA fw for psp v14 Add support to init TA firmware for psp v14. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:52:43 -04:00
Yang Wang	017d0b67bf	drm/amdgpu: refine gfx6 firmware loading refine gfx6 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:52:36 -04:00
Yang Wang	a4fcb5f733	Revert "drm/amdgpu: change aca bank error lock type to spinlock" This reverts commit `f6bce954f4`. Revert this patch to modify lock type back to 'mutex' to avoid kernel calltrace issue. [ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu] [ 602.668939] Call Trace: [ 602.668940] <TASK> [ 602.668941] dump_stack_lvl+0x4c/0x70 [ 602.668945] dump_stack+0x14/0x20 [ 602.668946] __schedule_bug+0x5a/0x70 [ 602.668950] __schedule+0x940/0xb30 [ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668955] ? hrtimer_reprogram+0x77/0xb0 [ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668959] ? hrtimer_start_range_ns+0x126/0x370 [ 602.668961] schedule+0x39/0xe0 [ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140 [ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 602.668966] schedule_hrtimeout_range+0x17/0x20 [ 602.668967] usleep_range_state+0x69/0x90 [ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu] [ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu] [ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu] [ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu] [ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669339] ? stack_depot_save+0x12/0x20 [ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669343] ? set_track_prepare+0x52/0x70 [ 602.669346] ? kmemleak_alloc+0x4f/0x90 [ 602.669348] ? __kmalloc_node+0x34b/0x450 [ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu] [ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu] [ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu] [ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu] [ 602.669743] ? kmemleak_free+0x36/0x60 [ 602.669745] ? kvfree+0x32/0x40 [ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669749] ? kfree+0x15d/0x2a0 [ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu] [ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu] [ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0 [ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu] [ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu] [ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu] [ 602.670187] ? srso_alias_return_thunk+0x5/0 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: YiPeng Chai <yipeng.chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:51:23 -04:00
Yang Wang	8c9ee18019	Revert "drm/amdgpu: change bank cache lock type to spinlock" This reverts commit `258ed689bc` revert this patch to modify lock type back to 'mutex' to avoid kernel calltrace issue. [ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu] [ 602.668939] Call Trace: [ 602.668940] <TASK> [ 602.668941] dump_stack_lvl+0x4c/0x70 [ 602.668945] dump_stack+0x14/0x20 [ 602.668946] __schedule_bug+0x5a/0x70 [ 602.668950] __schedule+0x940/0xb30 [ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668955] ? hrtimer_reprogram+0x77/0xb0 [ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668959] ? hrtimer_start_range_ns+0x126/0x370 [ 602.668961] schedule+0x39/0xe0 [ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140 [ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 602.668966] schedule_hrtimeout_range+0x17/0x20 [ 602.668967] usleep_range_state+0x69/0x90 [ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu] [ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu] [ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu] [ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu] [ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669339] ? stack_depot_save+0x12/0x20 [ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669343] ? set_track_prepare+0x52/0x70 [ 602.669346] ? kmemleak_alloc+0x4f/0x90 [ 602.669348] ? __kmalloc_node+0x34b/0x450 [ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu] [ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu] [ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu] [ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu] [ 602.669743] ? kmemleak_free+0x36/0x60 [ 602.669745] ? kvfree+0x32/0x40 [ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669749] ? kfree+0x15d/0x2a0 [ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu] [ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu] [ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0 [ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu] [ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu] [ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu] [ 602.670187] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670189] ? __schedule+0x37d/0xb30 [ 602.670191] process_one_work+0x176/0x350 [ 602.670194] worker_thread+0x2f7/0x420 [ 602.670197] ? Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:50:31 -04:00
Alex Deucher	19797687e6	drm/amdgpu: remove amdgpu_mes_fence_wait_polling() No longer used so remove it. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:50:24 -04:00
Alex Deucher	fffe347e14	drm/amdgpu: cleanup MES12 command submission The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: `ade887c633` ("drm/amdgpu/mes12: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:49:59 -04:00
Yang Wang	3af2c80ae2	drm/amdgpu: refine gfx10 firmware loading refine gfx10 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:49:53 -04:00
Yang Wang	23fc94795b	drm/amdgpu: refine gfx9 firmware loading refine gfx9 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:49:28 -04:00
Christian König	de32462541	drm/amdgpu: cleanup MES11 command submission The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: `eef016ba89` ("drm/amdgpu/mes11: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:48:25 -04:00
Christian König	a6328c9c3d	drm/amdgpu: fix using the reserved VMID with gang submit We need to ensure that even when using a reserved VMID that the gang members can still run in parallel. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:48:00 -04:00
Victor Lu	b32563859d	drm/amdgpu: Do not wait for MP0_C2PMSG_33 IFWI init in SRIOV SRIOV does not need to wait for IFWI init, and MP0_C2PMSG_33 is blocked for VF access. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:47:52 -04:00
Mukul Joshi	4d14a74054	Revert "drm/amdgpu: Add missing locking for MES API calls" This reverts commit `3612702852`. This is causing a BUG message during suspend. [ 61.603542] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283 [ 61.603550] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2028, name: kworker/u64:14 [ 61.603553] preempt_count: 1, expected: 0 [ 61.603555] RCU nest depth: 0, expected: 0 [ 61.603557] Preemption disabled at: [ 61.603559] [<ffffffffc08a3261>] amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.603789] CPU: 9 PID: 2028 Comm: kworker/u64:14 Tainted: G W 6.8.0+ #7 [ 61.603795] Workqueue: events_unbound async_run_entry_fn [ 61.603801] Call Trace: [ 61.603803] <TASK> [ 61.603806] dump_stack_lvl+0x37/0x50 [ 61.603811] ? amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.604007] dump_stack+0x10/0x20 [ 61.604010] __might_resched+0x16f/0x1d0 [ 61.604016] __might_sleep+0x43/0x70 [ 61.604020] mutex_lock+0x1f/0x60 [ 61.604024] amdgpu_mes_unmap_legacy_queue+0x6d/0x100 [amdgpu] [ 61.604226] gfx11_kiq_unmap_queues+0x3dc/0x430 [amdgpu] [ 61.604422] ? srso_alias_return_thunk+0x5/0xfbef5 [ 61.604429] amdgpu_gfx_disable_kgq+0x122/0x160 [amdgpu] [ 61.604621] gfx_v11_0_hw_fini+0xda/0x100 [amdgpu] [ 61.604814] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 61.605008] amdgpu_device_ip_suspend_phase2+0x135/0x1d0 [amdgpu] [ 61.605175] amdgpu_device_suspend+0xec/0x180 [amdgpu] Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-19 12:46:39 -04:00
Yang Wang	3a3be8bb97	drm/amdgpu: refine gfx8 firmware loading refine gfx8 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:18:27 -04:00
Tao Zhou	09a3d8202d	drm/amdgpu: set RAS fed status for more cases Indicate fatal error for each RAS block and NBIO. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:18:26 -04:00
Tao Zhou	7e4371676e	drm/amdgpu: create amdgpu_ras_in_recovery to simplify code Reduce redundant code and user doesn't need to pay attention to RAS details. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:18:26 -04:00
Tao Zhou	5f7697bbc1	drm/amdgpu: trigger mode1 reset for RAS RMA status Check RMA status in bad page retirement flow. v2: fix coding bugs in v1. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:18:26 -04:00
Yang Wang	9d26e0cfc2	drm/amdgpu: refine gfx7 firmware loading refine gfx7 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:15 -04:00
Christian König	030631e97b	drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2 This reverts commit `b8c415e3bf` ("drm/amdgpu: take runtime pm reference when we attach a buffer") and commit `425285d39a` ("drm/amdgpu: add amdgpu runpm usage trace for separate funcs"). Taking a runtime pm reference for DMA-buf is actually completely unnecessary and even dangerous. The problem is that calling pm_runtime_get_sync() from the DMA-buf callbacks is illegal because we have the reservation locked here which is also taken during resume. So this would deadlock. When the buffer is in GTT it is still accessible even when the GPU is powered down and when it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-06-14 16:17:13 -04:00
Yang Wang	c37b8f7868	drm/amdgpu: refine imu firmware loading refine imu firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	cd093c24ee	drm/amdgpu: refine gmc firmware loading refine gmc firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	8d7ff60f36	drm/amdgpu: refine vpe firmware loading refine vpe firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	b441e9ac9d	drm/amdgpu: refine vcn firmware loading refine vcn firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	9817f06173	drm/amdgpu: move aca/mca init functions into ras_init() stage adjust the function position to better match aca/mca fini code in ras_fini(). Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Bob Zhou	be6a69b21a	drm/amdgpu: fix overflowed constant warning in mmhub_set_clockgating() To fix potential overflowed constant warning, modify the variables to u32 for getting the return value of RREG32_SOC15(). Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Harish Kasiviswanathan	199d69d5f9	drm/amdgpu: Indicate CU havest info to CP To achieve full occupancy CP hardware needs to know if CUs in SE are symmetrically or asymmetrically harvested v2: Reset is_symmetric_cus for each loop Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Lijo Lazar	3a86fdc422	drm/amdgpu: Skip coredump during resets for debug Skip scheduling coredump when gpu reset is intentionally triggered through debugfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	3618fa26c8	drm/amdgpu: refine sdma firmware loading refine sdma firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:12 -04:00
Yang Wang	8cae4b578e	drm/amdgpu: refine psp firmware loading refine psp firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:11 -04:00
Yang Wang	bf349b036d	drm/amdgpu: refine mes firmware loading v1: refine mes firmware loading v2: use dev_info instead of DRM_INFO Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:11 -04:00
Mukul Joshi	3612702852	drm/amdgpu: Add missing locking for MES API calls Add missing locking at a few places when calling MES APIs to ensure exclusive access to MES queue. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:11 -04:00
Mario Limonciello	2fe87f54ab	drm/amd/display: Set default brightness according to ACPI Currently, amdgpu will always set up the brightness at 100% when it loads. However this is jarring when the BIOS has it previously programmed to a much lower value. The ACPI ATIF method includes two members for "ac_level" and "dc_level". These represent the default values that should be used if the system is brought up in AC and DC respectively. Use these values to set up the default brightness when the backlight device is registered. v2: squash in ACPI fix Reviewed-by: Leo Li <sunpeng.li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:17:11 -04:00
David (Ming Qiang) Wu	ee3942d9ab	drm/amdgpu: drop some kernel messages in VCN code Similar to commit `813e7d4cd0` where some kernel log messages are dropped. With this commit, more log messages in older version of VCN/JPEG code are dropped. Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:16:51 -04:00
Yang Wang	a777c9d70a	drm/amdgpu: refine gpu_info firmware loading refine gpu_info firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yang Wang	1bfe5e7746	drm/amdgpu: enhance amdgpu_ucode_request() function flexibility v1: Adding formatting string feature to improve function flexibility. v2: modify macro name to ADMGPU_UCODE_MAX_NAME. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Bob Zhou	37f432481d	drm/amdgpu: fix the overflowed constant warning for RREG32_SOC15() To fix potential overflowed constant warning reported by Coverity, modify the variables to uint32_t. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yunxiang Li	18f2525d31	drm/amdgpu: add lock in amdgpu_gart_invalidate_tlb We need to take the reset domain lock before flush hdp. We can't put the lock inside amdgpu_device_flush_hdp itself because it is used during reset where we already take the write side lock. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yunxiang Li	9c33e5fd4f	drm/amdgpu: fix locking scope when flushing tlb Which method is used to flush tlb does not depend on whether a reset is in progress or not. We should skip flush altogether if the GPU will get reset. So put both path under reset_domain read lock. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-06-14 16:15:59 -04:00
Yunxiang Li	ba531117a8	drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable Here since we are in reset and takes the reset_domain write side lock already. We can't use the flush tlb helper which tries to take the read side. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yunxiang Li	c1f9d82b92	drm/amdgpu: use helper in amdgpu_gart_unbind When amdgpu_gart_invalidate_tlb helper is introduced this part was left out of the conversion. Avoid the code duplication here. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:59 -04:00
Yunxiang Li	4b0e76e4c1	drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover At this point the gart is not set up, there's no point to invalidate tlb here and it could even be harmful. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Yunxiang Li	5c0a1cdd17	drm/amdgpu: fix sriov host flr handler We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen. In the current state since we take tens of seconds to stop everything, it is very likely that host would give up waiting and reset the GPU before we send ready, so it would be the same as before. But this gets rid of the hack with reset_domain locking and also let us tell how slow ready to reset actually is from the host. The ready to reset speed can be improved later. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Yunxiang Li	b3948ad1ac	drm/amdgpu: add skip_hw_access checks for sriov Accessing registers via host is missing the check for skip_hw_access and the lockdep check that comes with it. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Eric Huang	bac640ddb5	drm/amdgpu: add reset source in various cases To fullfill the reset event description. Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Eric Huang	7bed1df814	drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc amdgpu_job_ring may return NULL, which causes kernel NULL pointer error, using another way to print ring name instead of ring->name. Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 16:15:58 -04:00
Jesse Zhang	27b500b77b	drm/amdgpu: remove dead code in atom_get_src_int Since the range of align is 0~7, the expression is: align = (attr >> 3) & 7. In the case of ATOM_ARG_IMM, the code cannot reach the default case. So there is no need for "break". Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 15:34:10 -04:00
Frank Min	faa64f633c	drm/amdgpu: add sdma 7.0 support for copy dcc buffer 1. Add dcc buffer flag for copy buffer 2. Add sdma 7.0 support copy dcc buffer Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 15:22:14 -04:00
Likun Gao	7c85e97083	drm/amdgpu: support for DCC feature Deal with AMDGPU_GEM_CREATE_GFX12_DCC to set DCC bit when needed. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 15:21:51 -04:00
Alex Deucher	6b83b94a94	drm/amdgpu: add additional VM bits Add additional VM PTE bits. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-14 15:20:56 -04:00
Maxime Ripard	14731a640e	Merge drm/drm-fixes into drm-misc-fixes Roll -rc3 and current drm/fixes in. This will also unstuck our for-next branch. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-06-14 09:55:46 +02:00
Dave Airlie	1ddaaa2440	amd-drm-next-6.11-2024-06-07: amdgpu: - DCN 4.0.x support - DCN 3.5 updates - GC 12.0 support - DP MST fixes - Cursor fixes - MES11 updates - MMHUB 4.1 support - DML2 Updates - DCN 3.1.5 fixes - IPS fixes - Various code cleanups - GMC 12.0 support - SDMA 7.0 support - SMU 13 updates - SR-IOV fixes - VCN 5.x fixes - MES12 support - SMU 14.x updates - Devcoredump improvements - Fixes for HDP flush on platforms with >4k pages - GC 9.4.3 fixes - RAS ACA updates - Silence UBSAN flex array warnings - MMHUB 3.3 updates amdkfd: - Contiguous VRAM allocations - GC 12.0 support - SDMA 7.0 support - SR-IOV fixes radeon: - Backlight workaround for iMac - Silence UBSAN flex array warnings UAPI: - GFX12 modifier and DCC support Proposed Mesa changes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510 - KFD GFX ALU exceptions Proposed ROCdebugger changes: `08c760622b` `944fe1c141` - KFD Contiguous VRAM allocation flag Proposed ROCr/HIP changes: `f7b4a26991` `26e8530d05` `1d48f2a1ab` -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZmNlVQAKCRC93/aFa7yZ 2MLSAP9lflyG//+wC9WWX5OjvqFWO6qkhVm1w55xy6TwN9NkqQEA76TqmcNZ6rk1 4o9RaYpMJQU275FvK1NvwUbl4PPQYAs= =cFxt -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.11-2024-06-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-06-07: amdgpu: - DCN 4.0.x support - DCN 3.5 updates - GC 12.0 support - DP MST fixes - Cursor fixes - MES11 updates - MMHUB 4.1 support - DML2 Updates - DCN 3.1.5 fixes - IPS fixes - Various code cleanups - GMC 12.0 support - SDMA 7.0 support - SMU 13 updates - SR-IOV fixes - VCN 5.x fixes - MES12 support - SMU 14.x updates - Devcoredump improvements - Fixes for HDP flush on platforms with >4k pages - GC 9.4.3 fixes - RAS ACA updates - Silence UBSAN flex array warnings - MMHUB 3.3 updates amdkfd: - Contiguous VRAM allocations - GC 12.0 support - SDMA 7.0 support - SR-IOV fixes radeon: - Backlight workaround for iMac - Silence UBSAN flex array warnings UAPI: - GFX12 modifier and DCC support Proposed Mesa changes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510 - KFD GFX ALU exceptions Proposed ROCdebugger changes: `08c760622b` `944fe1c141` - KFD Contiguous VRAM allocation flag Proposed ROCr/HIP changes: `f7b4a26991` `26e8530d05` `1d48f2a1ab` Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240607195900.902537-1-alexander.deucher@amd.com	2024-06-11 14:01:55 +10:00
Arunpravin Paneer Selvam	31849bf07e	drm/amdgpu: Fix the BO release clear memory warning This happens when the amdgpu_bo_release_notify running before amdgpu_ttm_set_buffer_funcs_status set the buffer funcs to enabled. check the buffer funcs enablement before calling the fill buffer memory. v2:(Christian) - Apply it only for GEM buffers and since GEM buffers are only allocated/freed while the driver is loaded we never run into the issue to clear with buffer funcs disabled. v3:(Mario) - drop the stable tag as this will presumably go into a -fixes PR for 6.10 Log snip: ERROR Trying to clear memory with ring turned off. RIP: 0010:amdgpu_bo_release_notify+0x201/0x220 [amdgpu] Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Richard Gong <richard.gong@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240610180401.9540-1-Arunpravin.PaneerSelvam@amd.com	2024-06-10 23:46:33 +05:30
Tao Zhou	b95fa494d6	drm/amdgpu: add RAS is_rma flag Set the flag to true if bad page number reaches threshold. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:14 -04:00
Lin.Cao	c8ad1bbbc2	drm/amdgpu: fix failure mapping legacy queue when FLR Flag "mes.ring.shced.ready" will be set as true after mes hw init and set as false when mes hw fini to avoid duplicate initialization. But hw fini will not be called when function level reset, which will cause mes hw init be skipped during FLR, which will leads to mapping legacy queue fail. Set this flag as false when post reset will fix this issue. Signed-off-by: Lin.Cao <lincao12@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:14 -04:00
Eric Huang	dbe2c4c8ab	drm/amdkfd: add reset cause in gpu pre-reset smi event reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources suggested by Lijo Lazar Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:14 -04:00
Frank Min	978f5428c9	drm/amdgpu: Set PTE_IS_PTE bit for gfx12 Set PTE_IS_PTE bit while PRT is enabled on gfx12. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:14 -04:00
Eric Huang	2656e1ce78	drm/amdgpu: add reset sources in gpu reset context reset source or reset cause is very useful info for reset context, it will be used by events API. Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:13 -04:00
Shane Xiao	45bd39fb3b	drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_VG10 This patch changes the implementation of AMDGPU_PTE_MTYPE_VG10, clear the bits before setting the new one. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: longlyao <Longlong.Yao@amd.com> Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:25:13 -04:00
Alex Deucher	a9ebd10482	Revert "drm/amdgpu/gfx11: enable gfx pipe1 hardware support" This reverts commit `6670142d25`. Pierre-Eric reported problems with this on his navi33. Revert for now until we understand what is going wrong. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Pierre-eric.Pelloux-prayer@amd.com	2024-06-05 11:03:45 -04:00
Shane Xiao	3928290102	drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_NV10 This patch changes the implementation of AMDGPU_PTE_MTYPE_NV10, clear the bits before setting the new one. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: longlyao <Longlong.Yao@amd.com> Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:43 -04:00
Yifan Zhang	301dfbfc84	drm/amdgpu: disable lane0 L1TLB and enable lane1 L1TLB This patch to disable lane0 L1TLB and enable lane1 L1TLB. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:37 -04:00
Yifan Zhang	b7e2170b87	drm/amdgpu: init SAW registers for mmhub v3.3 This patch to configure mmhub3.3 SAW registers Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:30 -04:00
Sunil Khatri	a1a049bd59	drm/amdgpu: fix comments and error message for ipdump Fix comments and error messages to rightly represent the information. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:09 -04:00
Sunil Khatri	33837d62a4	drm/amdgpu: rename ip_dump_cp_queues to compute queues Rename the variable ip_dump_cp_queues to ip_dump_compute_queue as it represent compute queues. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:02:03 -04:00
Sunil Khatri	34b8d94b6c	drm/amdgpu: add cp queue registers for gfx9 ipdump Add gfx9 support of CP queue registers for all queues to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:01:57 -04:00
Sunil Khatri	173ef9182a	drm/amdgpu: add print support for gfx9 ipdump Add support of gfx9 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:01:50 -04:00
Sunil Khatri	514dc965b2	drm/amdgpu: add gfx9 register support in ipdump Add general registers of gfx9 in ipdump for devcoredump support. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 11:01:28 -04:00
Srinivasan Shanmugam	745f7170db	drm/amdgpu: Fix type mismatch in amdgpu_gfx_kiq_init_ring This commit fixes a type mismatch in the amdgpu_gfx_kiq_init_ring function triggered by the snprintf function expecting unsigned char arguments due to the '%hhu' format specifier, but receiving int and u32 arguments. The issue occurred because the arguments xcc_id, ring->me, ring->pipe, and ring->queue were of type int and u32, not unsigned char. This led to a type mismatch when these arguments were passed to snprintf. To resolve this, the snprintf line was modified to cast these arguments to unsigned char. This ensures that the arguments are of the correct type for the '%hhu' format specifier and resolves the warning. Fixes the below: >> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:4: warning: format >> specifies type 'unsigned char' but the argument has type 'int' >> [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~ >> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:12: warning: format >> specifies type 'unsigned char' but the argument has type 'u32' (aka >> 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:22: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:34: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~~~~ 4 warnings generated. Fixes: `0ea5544555` ("drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405250446.XeaWe66u-lkp@intel.com/ Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:58:43 -04:00
Jesse Zhang	c09d2eff81	drm/amdgu: fix Unintentional integer overflow for mall size Potentially overflowing expression mall_size_per_umc * adev->gmc.num_umc with type unsigned int (32 bits, unsigned) is evaluated using 32-bit arithmetic,and then used in a context that expects an expression of type u64 (64 bits, unsigned). Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:58:36 -04:00
Hawking Zhang	a474161e84	drm/amdgpu: Update programming for boot error reporting AMDGPU_RAS_GPU_ERR_BOOT_STATUS field is no longer valid. The polling sequence is also simplifed according to the latest firmware change. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:58:26 -04:00
Alex Deucher	7d3b9668e6	drm/amdgpu/soc24: use common nbio callback to set remap offset This fixes HDP flushes on systems with non-4K pages. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:58:10 -04:00
Hawking Zhang	473af28d3e	drm/amdgpu: Estimate RAS reservation when report capacity v2 Add estimate of how much vram we need to reserve for RAS when caculating the total available vram. v2: apply the change to MP0 v13_0_2 and v13_0_14 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:57:53 -04:00
Tao Zhou	76bec2a031	drm/amdgpu: use u32 for buf size in __amdgpu_eeprom_xfer And also make sure the value of msg[1].len should be in the range of u16. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:57:47 -04:00
David (Ming Qiang) Wu	813e7d4cd0	drm/amdgpu: drop some kernel messages in VCN code We have messages when the VCN fails to initialize and there is no need to report on success. Also PSP loading FWs is the default for production. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sonny Jiang <sonjiang@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:57:34 -04:00
Shane Xiao	eba791dc17	drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_GFX12 This patch changes the implementation of AMDGPU_PTE_MTYPE_GFX12, clear the bits before setting the new one. This fixed the potential issue that GFX12 setting memory to NC. v2: Clear mtype field before setting the new one (Alex) v3: Fix typo (Felix) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: longlyao <Longlong.Yao@amd.com> Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-06-05 10:57:24 -04:00
Dave Airlie	bb61cf46b6	amd-drm-fixes-6.10-2024-05-30: amdgpu: - RAS fix - Fix colorspace property for MST connectors - Fix for PCIe DPM - Silence UBSAN warning - GPUVM robustness fix - Partition fix - Drop deprecated I2C_CLASS_SPD amdkfd: - Revert unused changes for certain 11.0.3 devices - Simplify APU VRAM handling -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZljezAAKCRC93/aFa7yZ 2PZpAP43w/Q7aNgEpzdViO6QckU2GtOaRskavnzew4yxd5Y4xAD/U4vEQ3V7Rc0Q XRv4c+rJAVXIlfcS4guLvHSTyqzAPQY= =WQzB -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.10-2024-05-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.10-2024-05-30: amdgpu: - RAS fix - Fix colorspace property for MST connectors - Fix for PCIe DPM - Silence UBSAN warning - GPUVM robustness fix - Partition fix - Drop deprecated I2C_CLASS_SPD amdkfd: - Revert unused changes for certain 11.0.3 devices - Simplify APU VRAM handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240530202316.2246826-1-alexander.deucher@amd.com	2024-05-31 08:38:15 +10:00
Rajneesh Bhardwaj	a9bc5a19e4	drm/amdgpu: Make CPX mode auto default in NPS4 On GFXIP9.4.3, make CPX mode as the default compute mode if the node is setup in NPS4 memory partition mode. This change is only applicable for dGPU, for APU, continue to use TPX mode. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 17:06:48 -04:00
Alex Deucher	1f327dfc84	drm/amdkfd: simplify APU VRAM handling With commit `89773b8559` ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 17:06:15 -04:00
Jesse Zhang	a0cf36546c	drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 17:03:20 -04:00
Alex Deucher	ba46b3bda2	drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth() Use current speed/width on devices which don't support dynamic PCIe switching. Fixes: `466a7d1153` ("drm/amd: Use the first non-dGPU PCI device for BW limits") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289 Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 17:01:49 -04:00
Alex Deucher	6670142d25	drm/amdgpu/gfx11: enable gfx pipe1 hardware support Enable gfx pipe1 hardware support. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Rajneesh Bhardwaj	ef5c0f897e	drm/amdgpu: Make CPX mode auto default in NPS4 On GFXIP9.4.3, make CPX mode as the default compute mode if the node is setup in NPS4 memory partition mode. This change is only applicable for dGPU, for APU, continue to use TPX mode. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Alex Deucher	2e216b1e6b	drm/amdgpu/gfx11: handle priority setup for gfx pipe1 Set up pipe1 as a high priority queue. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Alex Deucher	eab57bf22f	drm/amdgpu/gfx11: select HDP ref/mask according to gfx ring pipe Use correct ref/mask for differnent gfx ring pipe. Ported from ZhenGuo's patch for gfx10. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Victor Skvortsov	e864180ee4	drm/amdgpu: Add lock around VF RLCG interface flush_gpu_tlb may be called from another thread while device_gpu_recover is running. Both of these threads access registers through the VF RLCG interface during VF Full Access. Add a lock around this interface to prevent race conditions between these threads. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Alex Deucher	7978c4d414	drm/amdkfd: simplify APU VRAM handling With commit `89773b8559` ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:48:30 -04:00
Yang Wang	3c603b1fa8	drm/amdgpu: fix typo in amdgpu_ras_aca_sysfs_read() function fix typo "info.ue_count" in amdgpu_ras_aca_sysfs_read() function. Fixes: `865d339763` ("drm/amdgpu: add aca deferred error type support") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:09:48 -04:00
Jesse Zhang	511a623fb4	drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:09:41 -04:00
David (Ming Qiang) Wu	ab47fa8358	drm/amd/amdgpu: add AMD_PG_SUPPORT_VCN_DPG flag AMD_PG_SUPPORT_VCN_DPG is needed for secure parts and should/can be enabled by now. Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Sonny Jiang <sonjiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:09:35 -04:00
Alex Deucher	f2a1fbdd1f	drm/amdgpu: drop MES 10.1 support v3 It was an enablement vehicle for MES 11 and was never productized. Remove it. v2: drop additional checks in the GFX10 code. v3: drop mes_api_def.h Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:09:01 -04:00
Alex Deucher	87dfeb47a5	drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth() Use current speed/width on devices which don't support dynamic PCIe switching. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289 Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-29 14:08:44 -04:00
Maxime Ripard	375c4d1583	Merge drm/drm-next into drm-misc-next Let's start the new release cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-27 11:08:31 +02:00
Linus Torvalds	56fb6f9285	drm fixes for 6.10-rc1 nouveau: - fix bo metadata uAPI for vm bind panthor: - Fixes for panthor's heap logical block. - Reset on unrecoverable fault - Fix VM references. - Reset fix. xlnx: - xlnx compile and doc fixes. amdgpu: - Handle vbios table integrated info v2.3 amdkfd: - Handle duplicate BOs in reserve_bo_and_cond_vms - Handle memory limitations on small APUs dp/mst: - MST null deref fix. bridge: - Don't let next bridge create connector in adv7511 to make probe work. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZQ9rAACgkQDHTzWXnE hr7kew//e+OwUJcock4rqpMsXrVwgse4bJ1H7gnHLmN/AQKsXWcaHKixJu+WkkQI 62gyhOEiSH+J4Je6zi5JbIt9AKgIKyZeJqg/QeSMqUFJzYkPKhrhj0aX4AefuNJB dxCI1+UnkNdQ/LM+Dwb4sq757FX0/yPqhzBiytWEJJVHP4Th5SMRFmSs8YsT1Z4F bn220QFvPv+KrVV2eKEf6plJqYhs92nT8cVkNw8x9LhnivLRhyh7we3Ew6R2zouV Izm1nYx1CtTqaiHo3+UOeqRLzhqpmukzmJyML9zk/qt+s+YMj9yI/ie02EnTPqg6 NQCqqmoDy1BpUa37N486Q7MFTiS076HMaQpYiDmVz3PIWWPvKZDqcC5vonoh99ra b88X4J4w3e747xZP7VDsnAOFsaLgcRah4JNtv8H3rcurbz0FrcV/0g/W/kt+Nenc k2m2s2hv56yWXd6B7bd3tfUe5dOwxc+CgiNddM5X3WPenEvnH+DKBOoFojSRl2o7 LpRL7WhY6fceRUqVrHzfIPLYgdfz/Ho9LYST2jGIdKDnwV0Hw0sNfU9C0KOzm0J6 rjgGEeMHgVdZPzSeXSmaO82mBRvR7Ke6da7FIazbocledOapsB5qxjhLvkRyULAp mlGZPRPJdNVM8slNLxEqrT/ugorIo5owtQdEnJmDJXFdozcLTMY= =pexB -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Some fixes for the end of the merge window, mostly amdgpu and panthor, with one nouveau uAPI change that fixes a bad decision we made a few months back. nouveau: - fix bo metadata uAPI for vm bind panthor: - Fixes for panthor's heap logical block. - Reset on unrecoverable fault - Fix VM references. - Reset fix. xlnx: - xlnx compile and doc fixes. amdgpu: - Handle vbios table integrated info v2.3 amdkfd: - Handle duplicate BOs in reserve_bo_and_cond_vms - Handle memory limitations on small APUs dp/mst: - MST null deref fix. bridge: - Don't let next bridge create connector in adv7511 to make probe work" * tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel: drm/amdgpu/atomfirmware: add intergrated info v2.3 table drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2 drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms drm/bridge: adv7511: Attach next bridge without creating connector drm/buddy: Fix the warn on's during force merge drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations drm/panthor: Call panthor_sched_post_reset() even if the reset failed drm/panthor: Reset the FW VM to NULL on unplug drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level drm/panthor: Force an immediate reset on unrecoverable faults drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints drm/panthor: Fix an off-by-one in the heap context retrieval logic drm/panthor: Relax the constraints on the tiler chunk size drm/panthor: Make sure the tiler initial/max chunks are consistent drm/panthor: Fix tiler OOM handling to allow incremental rendering drm: xlnx: zynqmp_dpsub: Fix compilation error drm: xlnx: zynqmp_dpsub: Fix few function comments	2024-05-24 17:28:02 -07:00
Hawking Zhang	ec58991054	drm/amdgpu: correct hbm field in boot status hbm filed takes bit 13 and bit 14 in boot status. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 18:13:23 -04:00
Sunil Khatri	498906d376	drm/amdgpu: add gfx queue support for gfx11 ipdump Add support of all the CP GFX queues for gfx11 ipdump to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:14:08 -04:00
Sunil Khatri	368c33ac8a	drm/amdgpu: add cp queue registers for gfx11 ipdump Add gfx11 support of CP queue registers for all queues to be used by devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:14:00 -04:00
Sunil Khatri	015a04a59e	drm/amdgpu: add print support for gfx11 ipdump Add support of gfx11 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:48 -04:00
Sunil Khatri	b5812822d9	drm/amdgpu: add gfx11 registers support in ipdump Add general registers of gfx11 in ipdump for devcoredump support. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:41 -04:00
Sunil Khatri	836bc350a5	drm/amdgpu: add more device info to the devcoredump Adding more device information: a. PCI info b. VRAM and GTT info c. GDC config Also correct the print layout and section information for in devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:28 -04:00
Sunil Khatri	29b1fc665c	drm/amdgpu: add prints in IP State dump add prints before and after ip state is dumped. It avoids user to think of system being stuck/hung as dump could take some time after a gpu hang. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:20 -04:00
Sunil Khatri	8444453dce	drm/amdgpu: add gfx queue support of gfx10 in ipdump Add gfx queue register for all instances in devcoredump for gfx10. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:13 -04:00
Sunil Khatri	0f83227bc8	drm/amdgpu: Add cp queues support fro gfx10 in ipdump Add support to dump registers of all instances of cp queue registers of gfx10 to devcoredump. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:13:06 -04:00
Sunil Khatri	74feef5667	drm/amdgpu: rename the ip_dump to ip_dump_core Rename the memory pointer from ip_dump to ip_dump_core to make it specific to core registers and rest other registers to be dumped in their respective memories. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:12:58 -04:00
Lijo Lazar	621a4e9efb	drm/amdgpu: Add CRC16 selection in config KFD uses crc16 for gpu_id generation. Fixes: `3ed181b8ff` ("drm/amdkfd: Ensure gpu_id is unique") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405211405.TidTWIBX-lkp@intel.com/ Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:52 -04:00
Victor Zhao	e21e0b7824	drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rw the inst passed to amdgpu_virt_rlcg_reg_rw should be physical instance. Fix the miss matched code. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:38 -04:00
Jane Jian	f889f9c68b	drm/amdgpu - optimize rlc spm cntl v1 - driver MMIO read the register to check whether write is required - if write is required, sriov full time to use rlcg, otherwise use KIQ v2 - include gfx v11 sriov runtime case Signed-off-by: Jane Jian <Jane.Jian@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:31 -04:00
Hawking Zhang	cf85764e2b	drm/amdgpu: correct hbm field in boot status hbm filed takes bit 13 and bit 14 in boot status. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:12 -04:00
Frank Min	8d7b149675	drm/amdgpu: program device_cntl2 through pci cfg space device_cntl2 is accessible from pci config space, so program it through pci cfg space instead of mmio. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:11:05 -04:00
Li Ma	19f0edd897	drm/amdgpu/atomfirmware: add intergrated info v2.3 table [Why] The vram width value is 0. Because the integratedsysteminfo table in VBIOS has updated to 2.3. [How] Driver needs a new intergrated info v2.3 table too. Then the vram width value will be correct. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:10:48 -04:00
Srinivasan Shanmugam	0ea5544555	drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring This commit fixes a format truncation issue arosed by the snprintf function potentially writing more characters into the ring->name buffer than it can hold, in the amdgpu_gfx_kiq_init_ring function The issue occurred because the '%d' format specifier could write between 1 and 10 bytes into a region of size between 0 and 8, depending on the values of xcc_id, ring->me, ring->pipe, and ring->queue. The snprintf function could output between 12 and 41 bytes into a destination of size 16, leading to potential truncation. To resolve this, the snprintf line was modified to use the '%hhu' format specifier for xcc_id, ring->me, ring->pipe, and ring->queue. The '%hhu' specifier is used for unsigned char variables and ensures that these values are printed as unsigned decimal integers. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_kiq_init_ring’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:61: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 8 [-Wformat-truncation=] 332 \| snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", \| ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:50: note: directive argument in the range [0, 2147483647] 332 \| snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", \| ^~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:9: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 16 332 \| snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 333 \| xcc_id, ring->me, ring->pipe, ring->queue); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `345a36c4f1` ("drm/amdgpu: prefer snprintf over sprintf") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:10:03 -04:00
Jesse Zhang	806e8c5579	drm/amdgpu: fix invadate operation for pg_flags Since the type of pg_flags is u32, adev->pg_flags >> 16 >> 16 is 0 regardless of the values of its operands. So removing the operations upper_32_bits and lower_32_bits. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:09:50 -04:00
Jack Xiao	fa10408116	drm/amdgpu/mes12: mes hw_fini fix for mode1 reset Port mes11 hw_fini to mes12, fix for mode1 reset. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:09:42 -04:00
Jesse Zhang	64da71ea76	drm/amdgpu: fix invadate operation for umsch Since the type of data_size is uint32_t, adev->umsch_mm.data_size - 1 >> 16 >> 16 is 0 regardless of the values of its operands So removing the operations upper_32_bits and lower_32_bits. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:09:35 -04:00
Jesse Zhang	030ffd4d43	drm/admgpu: fix dereferencing null pointer context When user space sets an invalid ta type, the pointer context will be empty. So it need to check the pointer context before using it Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:09:15 -04:00
Yang Wang	9262f411dc	drm/amdgpu: skip to create ras xxx_err_count node when ACA is enabled skip to create 'xxx_err_count' node when ACA is enabled. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:08:25 -04:00
Jani Nikula	42505ab120	drm/amdgpu: remove amdgpu_connector_edid() and stop using edid_blob_ptr amdgpu_connector_edid() copies the EDID from edid_blob_ptr as a side effect if amdgpu_connector->edid isn't initialized. However, everywhere that the returned EDID is used, the EDID should have been set beforehands. Only the drm EDID code should look at the EDID property, anyway, so stop using it. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pan Cc: amd-gfx@lists.freedesktop.org Reviewed-by: Robert Foss <rfoss@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1463862965d76e9458551598fd4d287a08d3d264.1715353572.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-05-23 14:37:24 +03:00
Steven Rostedt (Google)	2c92ca849f	tracing/treewide: Remove second parameter of __assign_str() With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str \| while read a ; do sed -e 's/$__assign_str([^,][^ ,]$ ,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>	2024-05-22 20:14:47 -04:00
Li Ma	e64e8f7c17	drm/amdgpu/atomfirmware: add intergrated info v2.3 table [Why] The vram width value is 0. Because the integratedsysteminfo table in VBIOS has updated to 2.3. [How] Driver needs a new intergrated info v2.3 table too. Then the vram width value will be correct. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-05-22 12:01:39 -04:00
Linus Torvalds	f0bae243b2	pci-v6.10-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmZLzNIUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vwr/Q//STe2XGKI8bAKqP2wbbkzm+ISnK4A Lqf3FEAIXunxDRspszfXKKV2p4vaIkmOFiwIdtp/kWvd0DQn5+ATXJ/iQtp8aFX/ R+6BQ7EZc2G7fN5fbQuK54+CvmWEpkKEMbXYbd6ivQ14Cijdb3Nbu+w+DYFjS+6C k2a9lS1bTW7Xcy0fyiO1w6GQiWqtmOH8U3OlQtIrI0EVkDG9OG1LsLuc92/FgkOo REN+sU+hX1K5fHrvm2CtjYDn/9/B6bJ/It22H1dPgUL9nKvKC67fYzosMtUCOX1M 6XSPjZIuXOmQGeZXHhpSlVwaidxoUjYO98I7nMquxKdCy6yct3geK7ULG/xeQCgD ML7MGQB4+sTiSWalXUQaziKqF1FIDEvU3HMGXFWnoBL5l56eRp8KS1EI9Eqk9pU3 pk9fJaCkcFnkzPtMFzqPOm5q9zUZ6bGbfYb0hs72TUKplmVDhFo2T1YsW2AOyHZ7 mjuDzUYZX0H7uM1tntA56IgZX+oNOrLvhBt5L5M/BQeCsZFBBUfIcAEaYoL9LwXO AYgIG3jdqzHHyAUzutJF+XHKinJLMHm0XVYbFmO6saPhFzrUJSNHqT7NzW1DGGTl OnO8e1WNMX1EcnKvnc6fXyGmM3SgVwy45FsbG/zRnhn4uBKqKtjrh6uX/myA22LK CSeqSUK9XmXxFNA= =xjoS -----END PGP SIGNATURE----- Merge tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Skip E820 checks for MCFG ECAM regions for new (2016+) machines, since there's no requirement to describe them in E820 and some platforms require ECAM to work (Bjorn Helgaas) - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien Le Moal) - Remove last user and pci_enable_device_io() (Heiner Kallweit) - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen) - Skip waiting for devices that have been disconnected while suspended (Ilpo Järvinen) - Clear Secondary Status errors after enumeration since Master Aborts and Unsupported Request errors are an expected part of enumeration (Vidya Sagar) MSI: - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas) Error handling: - Mask Genesys GL975x SD host controller Replay Timer Timeout correctable errors caused by a hardware defect; the errors cause interrupts that prevent system suspend (Kai-Heng Feng) - Fix EDR-related _DSM support, which previously evaluated revision 5 but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan) ASPM: - Simplify link state definitions and mask calculation (Ilpo Järvinen) Power management: - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS apparently doesn't know how to put them back in D0 (Mario Limonciello) CXL: - Support resetting CXL devices; special handling required because CXL Ports mask Secondary Bus Reset by default (Dave Jiang) DOE: - Support DOE Discovery Version 2 (Alexey Kardashevskiy) Endpoint framework: - Set endpoint BAR to be 64-bit if the driver says that's all the device supports, in addition to doing so if the size is >2GB (Niklas Cassel) - Simplify endpoint BAR allocation and setting interfaces (Niklas Cassel) Cadence PCIe controller driver: - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof Kozlowski) Cadence PCIe endpoint driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) Freescale Layerscape PCIe controller driver: - Convert DT binding to YAML (Frank Li) MediaTek MT7621 PCIe controller driver: - Add DT binding missing 'reg' property for child Root Ports (Krzysztof Kozlowski) - Fix theoretical string truncation in PHY name (Sergio Paracuellos) NVIDIA Tegra194 PCIe controller driver: - Return success for endpoint probe instead of falling through to the failure path (Vidya Sagar) Renesas R-Car PCIe controller driver: - Add DT binding missing IOMMU properties (Geert Uytterhoeven) - Add DT binding R-Car V4H compatible for host and endpoint mode (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski) - Set the Subsystem Vendor ID, which was previously zero because it was masked incorrectly (Rick Wertenbroek) Synopsys DesignWare PCIe controller driver: - Restructure DBI register access to accommodate devices where this requires Refclk to be active (Manivannan Sadhasivam) - Remove the deinit() callback, which was only need by the pcie-rcar-gen4, and do it directly in that driver (Manivannan Sadhasivam) - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean up things like eDMA (Manivannan Sadhasivam) - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel to dw_pcie_ep_init() (Manivannan Sadhasivam) - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to reflect the actual functionality (Manivannan Sadhasivam) - Call dw_pcie_ep_init_registers() directly from all the glue drivers, not just those that require active Refclk from the host (Manivannan Sadhasivam) - Remove the "core_init_notifier" flag, which was an obscure way for glue drivers to indicate that they depend on Refclk from the host (Manivannan Sadhasivam) TI J721E PCIe driver: - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli) - Add DT binding J722S SoC support (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Add DT binding missing num-viewport, phys and phy-name properties (Jan Kiszka) Miscellaneous: - Constify and annotate with __ro_after_init (Heiner Kallweit) - Convert DT bindings to YAML (Krzysztof Kozlowski) - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming Zhou)" * tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits) PCI: Do not wait for disconnected devices when resuming x86/pci: Skip early E820 check for ECAM region PCI: Remove unused pci_enable_device_io() ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io() PCI: Update pci_find_capability() stub return types PCI: Remove PCI_IRQ_LEGACY scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios Revert "genirq/msi: Provide constants for PCI/IMS support" Revert "x86/apic/msi: Enable PCI/IMS" Revert "iommu/vt-d: Enable PCI/IMS" Revert "iommu/amd: Enable PCI/IMS" Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support" ...	2024-05-21 10:09:28 -07:00
Lang Yu	eb853413d0	drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 17:17:53 -04:00
Lang Yu	2a705f3e49	drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30 Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 17:17:53 -04:00
Tvrtko Ursulin	9c1a429217	drm/amdgpu: Fix amdgpu_vm_is_bo_always_valid kerneldoc Align kerneldoc with the function argument name. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `26e20235ce` ("drm/amdgpu: Add amdgpu_bo_is_vm_bo helper") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:26 -04:00
Dr. David Alan Gilbert	191ef65b4e	drm/amdgpu: remove unused struct 'hqd_registers' 'hqd_registers' used to be used in a member of the 'bonaire_mqd' struct. 'bonaire_mqd' was removed by commit `486d807cd9` ("drm/amdgpu: remove duplicate definition of cik_mqd") It's now unused. Remove 'hqd_registers' as well. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:26 -04:00
Tao Zhou	2aadb520bf	drm/amdgpu: update type of buf size to u32 for eeprom functions Avoid overflow issue. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:26 -04:00
Victor Skvortsov	5434bc03f5	drm/amdgpu: Queue KFD reset workitem in VF FED The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset workitems are queued at the same time. In addition, FLR guest recovery sequence is out of order when PF/VF communication breaks due to a GPU fatal error As a temporary work around, perform a KFD style reset (Initiate reset request from the guest) inside the pf2vf thread on FED. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:25 -04:00
Victor Skvortsov	3a19a8af64	drm/amdgpu: Extend KIQ reg polling wait for VF Runtime KIQ interface to read/write registers in VF may take longer than expected for BM environment. Extend the timeout. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-20 16:20:25 -04:00
Linus Torvalds	ff9a79307f	Kbuild updates for v6.10 - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX 2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2 aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd ZCSnZ31h7woGfNho =D5B/ -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig * tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits) kconfig: use sym_get_choice_menu() in sym_check_prop() rapidio: remove choice for enumeration kconfig: lxdialog: remove initialization with A_NORMAL kconfig: m/nconf: merge two item_add_str() calls kconfig: m/nconf: remove dead code to display value of bool choice kconfig: m/nconf: remove dead code to display children of choice members kconfig: gconf: show checkbox for choice correctly kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal Makefile: remove redundant tool coverage variables kbuild: provide reasonable defaults for tool coverage modules: Drop the .export_symbol section from the final modules kconfig: use menu_list_for_each_sym() in sym_check_choice_deps() kconfig: use sym_get_choice_menu() in conf_write_defconfig() kconfig: add sym_get_choice_menu() helper kconfig: turn defaults and additional prompt for choice members into error kconfig: turn missing prompt for choice members into error kconfig: turn conf_choice() into void function kconfig: use linked list in sym_set_changed() kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED kconfig: gconf: remove debug code ...	2024-05-18 12:39:20 -07:00
Yang Wang	062a7ce676	drm/amdgpu: fix ACA no query result after gpu reset fix ACA no query result after gpu reset. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:39 -04:00
Lijo Lazar	546e6309d1	drm/amd/pm: Remove legacy interface for xgmi plpd Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client. Remove that as well. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:39 -04:00
Yang Wang	258ed689bc	drm/amdgpu: change bank cache lock type to spinlock modify the lock type to 'spinlock' to avoid schedule issue in interrupt context. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:39 -04:00
Yang Wang	f6bce954f4	drm/amdgpu: change aca bank error lock type to spinlock modify the lock type to 'spinlock' to avoid schedule issue in interrupt context. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Tvrtko Ursulin	50bff04d02	drm/amdgpu: Describe all object placements in debugfs Accurately show all placements when describing objects in debugfs, instead of bunching them up under the 'CPU' placement. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Tvrtko Ursulin	8fb0efb101	drm/amdgpu: Reduce mem_type to domain double indirection All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT, depending on AMDGPU_GEM_CREATE_PREEMPTIBLE. Simplify a few places in the code which convert the TTM placement into a domain by checking against the current placement directly. In the conversion AMDGPU_PL_PREEMPT either does not have to be handled because amdgpu_mem_type_to_domain() cannot return that value anyway. v2: * Remove AMDGPU_PL_PREEMPT handling. v3: * Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> # v1 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> # v2 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Tvrtko Ursulin	26e20235ce	drm/amdgpu: Add amdgpu_bo_is_vm_bo helper Help code readability by replacing a bunch of: bo->tbo.base.resv == vm->root.bo->tbo.base.resv With: amdgpu_vm_is_bo_always_valid(vm, bo) No functional changes. v2: * Rename helper and move to amdgpu_vm. (Christian) v3: * Use Christian's kerneldoc. v4: * Fixed logic inversion in amdgpu_vm_bo_get_memory. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Srinivasan Shanmugam	e060c7ba7e	drm/amdgpu: Remove duplicate check for is_queue_unmap in sdma_v7_0_ring_set_wptr This commit removes a duplicate check for is_queue_unmap in the sdma_v7_0_ring_set_wptr function. The check at line 171 was considered dead code because at this point in the code, we already know that is_queue_unmap is false due to the check at line 161. By removing this unnecessary check, improves the readability of the code Fixes the below: drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c:171 sdma_v7_0_ring_set_wptr() warn: duplicate check 'is_queue_unmap' (previous on line 161) drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c 140 static void sdma_v7_0_ring_set_wptr(struct amdgpu_ring ring) 141 { 142 struct amdgpu_device adev = ring->adev; 143 uint32_t wptr_saved; 144 uint32_t is_queue_unmap; 145 uint64_t aggregated_db_index; 146 uint32_t mqd_size = adev->mqds[AMDGPU_HW_IP_DMA].mqd_size; 147 148 DRM_DEBUG("Setting write pointer\n"); 149 150 if (ring->is_mes_queue) { 151 wptr_saved = (uint32_t )(ring->mqd_ptr + mqd_size); 152 is_queue_unmap = (uint32_t )(ring->mqd_ptr + mqd_size + ^^^^^^^^^^^^^^^^ Set here 153 sizeof(uint32_t)); 154 aggregated_db_index = 155 amdgpu_mes_get_aggregated_doorbell_index(adev, 156 ring->hw_prio); 157 158 atomic64_set((atomic64_t )ring->wptr_cpu_addr, 159 ring->wptr << 2); 160 wptr_saved = ring->wptr << 2; 161 if (is_queue_unmap) { ^^^^^^^^^^^^^^^ Checked here 162 WDOORBELL64(aggregated_db_index, ring->wptr << 2); 163 DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n", 164 ring->doorbell_index, ring->wptr << 2); 165 WDOORBELL64(ring->doorbell_index, ring->wptr << 2); 166 } else { 167 DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n", 168 ring->doorbell_index, ring->wptr << 2); 169 WDOORBELL64(ring->doorbell_index, ring->wptr << 2); 170 --> 171 if (is_queue_unmap) ^^^^^^^^^^^^^^^ This is dead code. We know it's false. 172 WDOORBELL64(aggregated_db_index, 173 ring->wptr << 2); 174 } 175 } else { 176 if (ring->use_doorbell) { 177 DRM_DEBUG("Using doorbell -- " 178 "wptr_offs == 0x%08x " Fixes: `b412351e91` ("drm/amdgpu: Add sdma v7_0 ip block support (v7)") Cc: Likun Gao <Likun.Gao@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Tim Van Patten	1446226d32	drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1 The following commit updated gmc->noretry from 0 to 1 for GC HW IP 9.3.0: commit `5f3854f1f4` ("drm/amdgpu: add more cases to noretry=1") This causes the device to hang when a page fault occurs, until the device is rebooted. Instead, revert back to gmc->noretry=0 so the device is still responsive. Fixes: `5f3854f1f4` ("drm/amdgpu: add more cases to noretry=1") Signed-off-by: Tim Van Patten <timvp@google.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Ruijing Dong	a1a9143c96	drm/amdgpu/vcn: update vcn5 enc/dec capabilities Update the capabilities for supporting 8k encoding. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Jiapeng Chong	72e6ea95c4	drm/amdgpu: Remove duplicate amdgpu_umsch_mm.h header ./drivers/gpu/drm/amd/amdgpu/amdgpu.h: amdgpu_umsch_mm.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9063 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:38 -04:00
Kenneth Feng	d3620eeae8	drm/amd/amdgpu: add module parameter for jpeg add module parameter for jpeg. this is a temporary workaround for jpeg unit test fail on vcn 5.0 now. will be removed later. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Alex Deucher	736f911204	drm/amdgpu: fix documentation errors in gmc v12.0 Fix up parameter descriptions. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Sreekant Somasekharan	0cce5f285d	drm/amdkfd: Check correct memory types for is_system variable To catch GPU mapping of system memory, TTM_PL_TT and AMDGPU_PL_PREEMPT must be checked. Fixes: `628e1ace23` ("drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC") Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Alex Deucher	b72fa761fc	drm/amdgpu: fix documentation errors in sdma v7.0 Fix up parameter descriptions. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Yang Wang	b712d7c201	drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG() create a new helper function to avoid compiler 'side-effect' check about RAS_EVENT_LOG() macro. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Frank Min	9a55c77978	drm/amdgpu: fix getting vram info for gfx12 gfx12 query video mem channel/type/width from umc_info of atom list, so fix it accordingly. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Frank Min	b80160a53a	drm/amdgpu/mes: use mc address for wptr in add queue packet use mc address for wptr in add queue packet Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Likun Gao	c14d5b5095	drm/amdgpu: enable gfxoff for gc v12.0.0 Enable GFXOFF for GC v12.0.0. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:37 -04:00
Likun Gao	d8cd2d617a	drm/amd/amdgpu: enable mmhub and athub cg on gc 12.0.0 Enable mmhub and athub cg on gc 12.0.0 Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Likun Gao	7be73af53b	drm/amdgpu: switch default mes to uni mes Switch the default mes to uni mes for gfx v12. V2: remove uni_mes set for gfx v11. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Yang Wang	b2aa3d4b30	drm/amdgpu: add debug flag to enable RAS ACA Use debug_mask=0x10 (BIT.4) param to help enable RAS ACA. (RAS ACA is disabled by default.) Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Likun Gao	2ac72cbc7e	drm/amdgpu: enable some cg feature for gc 12.0.0 Enable 3D cgcg, 3D cgls, sram fgcg, perfcounter mgcg, repeater fgcg for gc v12.0.0. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Ma Jun	4c11d30c95	drm/amdgpu: Fix the null pointer dereference to ras_manager Check ras_manager before using it Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Xiaogang Chen	f326d7cc74	drm/kfd: Correct pinned buffer handling at kfd restore and validate process This reverts commit `8a774fe912` ("drm/amdgpu: avoid restore process run into dead loop") since buffer got pinned is not related whether it needs mapping And skip buffer validation at kfd driver if the buffer has been pinned. Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Lijo Lazar	b194d21b9b	drm/amdgpu: Use NPS ranges from discovery table Add GMC API to fetch NPS range information from discovery table. Use NPS range information in GMC 9.4.3 SOCs when available, otherwise fallback to software method. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Friedrich Vock	0cdb3f9740	drm/amdgpu: Check if NBIO funcs are NULL in amdgpu_device_baco_exit The special case for VM passthrough doesn't check adev->nbio.funcs before dereferencing it. If GPUs that don't have an NBIO block are passed through, this leads to a NULL pointer dereference on startup. Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Fixes: `1bece222ea` ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Lijo Lazar	ce798376ef	drm/amdgpu: Fix memory range calculation Consider the 16M reserved region also before range calculation for GMC 9.4.3 SOCs. Fixes: `a433f1f594` ("drm/amdgpu: Initialize memory ranges for GC 9.4.3") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:40:36 -04:00
Likun Gao	73fbc3e000	drm/amdgpu: enable gfx cgcg&cgls for gfx v12_0_0 Enable GFX CGCG and CGLS for gfx version 12.0.0. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:11:02 -04:00
Sonny Jiang	75125e6b4c	drm/amdgpu/jpeg5: enable power gating Enable PG on JPEG5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:10:56 -04:00
Likun Gao	ef57158462	drm/amdgpu: support imu for gc 12_0_0 Support IMU for ASIC with GC 12.0.0 Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:10:31 -04:00
Ma Jun	01b3297336	drm/amdgpu: Remove dead code in amdgpu_ras_add_mca_err_addr Remove dead code in amdgpu_ras_add_mca_err_addr Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:09:55 -04:00
Ma Jun	d1a6bfff94	drm/amdgpu: Fix null pointer dereference to bo Check bo before using it Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:09:47 -04:00
shaoyunl	4488cd671c	drm/amdgpu: enable unmapped doorbell handling basic mode on mes 12 This reverts commit `fcc5df722d`. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-17 17:05:38 -04:00
Linus Torvalds	db5d28c0bf	drm for 6.10-rc1 new drivers: - panthor: ARM Mali/Immortalis CSF-based GPU driver core: - add a CONFIG_DRM_WERROR option - make more headers self-contained - grab resv lock in pin/unpin - fix vmap resv locking - EDID/eDP panel matching - Kconfig cleanups - DT sound bindings - Add SIZE_HINTS property for cursor planes - Add struct drm_edid_product_id and helpers. - Use drm device based logging in more drm functions. - drop seq_file.h from a bunch of places - use drm_edid driver conversions dp: - DP Tunnel documentation - MST read sideband cap - Adaptive sync SDP prep work ttm: - improve placement for TTM BOs in idle/busy handling panic: - Fixes for drm-panic, and option to test it. - Add drm panic to simpledrm, mgag200, imx, ast bridge: - improve init ordering - adv7511: allow GPIO pin sharing - tc358775: add tc358675 support panel: - AUO B120XAN01.0 - Samsung s6e3fa7 - BOE NT116WHM-N44 - CMN N116BCA-EA1, - CrystalClear CMT430B19N00 - Startek KD050HDFIA020-C020A - powertip PH128800T006-ZHC01 - Innolux G121X1-L03 - LG sw43408 - Khadas TS050 V2 - EDO RM69380 OLED - CSOT MNB601LS1-1 amdgpu: - HDCP/ODM/RAS fixes - Devcoredump improvements - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - SMU 14.0.2 support - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - DCN 3.5.x Updates - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - Remove invalid TTM resource start check - UAF fix in VA IOCTL - GPUVM page fault redirection to secondary IH rings for IH 6.x - Initial support for mapping kernel queues via MES - Fix VRAM memory accounting amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix - Enable SQ watchpoint for gfx10 i915: - Adding new DG2 PCI ID - add context hints for GT frequency - enable only one CCS for compute workloads - new workarounds - Fix UAF on destroy against retire race and remove two earlier partial fixes - Limit the reserved VM space to only the platforms that need it - Fix gt reset with GuC submission is disable - Add and use gt_to_guc() wrapper i915/xe display: - Lunar Lake display enabling, including cdclk and other refactors - BIOS/VBT/opregion related refactor - Digital port related refactor/clean-up - Fix 2s boot time regression on DP panel replay init - Remove duplication on audio enable/disable on SDVO and g4x+ DP - Disable AuxCCS framebuffers if built for Xe - Make crtc disable more atomic - Increase DP idle pattern wait timeout to 2ms - Start using container_of_const() for some extra const safety - Fix Jasper Lake boot freeze - Enable MST mode for 128b/132b single-stream sideband - Enable Adaptive Sync SDP Support for DP - Fix MTL supported DP rates - removal of UHBR13.5 - PLL refactoring - Limit eDP MSO pipe only for display version 20 - More display refactor towards independence from i915 dev_priv - Convert i915/xe fbdev to DRM client - More initial work to make display code more independent from i915 xe: - improved error capture - clean up some uAPI leftovers - devcoredump update - Add BMG mocs table - Handle GSCCS ER interrupt - Implement xe2- and GuC workarounds - struct xe_device cleanup - Hwmon updates - Add LRC parsing for more GPU instruction - Increase VM_BIND number of per-ioctl Ops - drm/xe: Add XE_BO_GGTT_INVALIDATE flag - Initial development for SR-IOV support - Add new PCI IDs to DG2 platform - Move userptr over to start using hmm_range_fault msm: - Switched to generating register header files during build process instead of shipping pre-generated headers - Merged DPU and MDP4 format databases. - DP: - Stop using compat string to distinguish DP and eDP cases - Added support for X Elite platform (X1E80100) - Reworked DP aux/audio support - Added SM6350 DP to the bindings - GPU: - a7xx perfcntr reg fixes - MAINTAINERS updates - a750 devcoredump support radeon: - Silence UBSAN warnings related to flexible arrays nouveau: - move some uAPI objects to uapi headers omapdrm: - console fix ast: - add i2c polling qaic: - add debugfs entries exynos: - fix platform_driver .owner - drop cleanup code mediatek: - Use devm_platform_get_and_ioremap_resource() in mtk_hdmi_ddc_probe() - Add GAMMA 12-bit LUT support for MT8188 - Rename mtk_drm_* to mtk_* - Drop driver owner initialization - Correct calculation formula of PHY Timing -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZEUU0ACgkQDHTzWXnE hr5qMBAAjUFF0w3YOQMsn0LEAm628kMRHpoVeSXmIfO9z9lTyad30EtiS4ggFgj7 Q/oQ6hHCd5jdsvGSJDgtTTAsTQX+aCkXrgf/18ENbqR5mM3MdefUAPR/zawZ7HR4 8+b2h6p7gHBw8wDjuIvQ5e9InHcnIkKWJc82qnJG5Urgxa05SDh3mu3cosPTJiBw a851vlWaYcxC0yAUwJlWaXDdN8yzdFaSQNboZBS/CMLXF/WE6Ht257uxJmaouc0Y Z0kBybok5x0TPQEXF9IV+kuSW3EYpYcwRi0BFFM9sJjkEBdH3rYRZwuYP1LR+7VZ HKsmIkie8YzCm2VwTquYzUvHgF+swZX4RRch9XJlGz7UvBLc0eBO/2n4X6fNd8Kl QGNNqEfsnUQrAHKvGsOUgoGjSCmEo8voGcMZ3JPIAdJ/GcnJwpMvNxtF6XB08hEu rDxuU6o7WkM4dJbtiaFEHNh0Fmjj6aXdBL23UD9pcqPT1fc9cT3xnUd5RJIRuRwV /tpb2WfkFAoxCkKFiunaC4rE8oG6ME6wr/trYjvoYuhCI5hCVaXRBGzJEtC30IP6 lG2YZ8r0jHjktbgjZ0Cz/hY424H4sxSN9SJAnXXFDzcfjBJ/nOgo5nMD1jKajAD5 SYfqWaD5Y+YygtyLJPMfZQI2XMOpCzteXD8uaNXXFJfpV7Apeyg= =ocVM -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "This is the main pull request for the drm subsystems for 6.10. In drivers the main thing is a new driver for ARM Mali firmware based GPUs, otherwise there are a lot of changes to amdgpu/xe/i915/msm and scattered changes to everything else. In the core a bunch of headers and Kconfig was refactored, along with the addition of a new panic handler which is meant to provide a user friendly message when a panic happens and graphical display is enabled. New drivers: - panthor: ARM Mali/Immortalis CSF-based GPU driver Core: - add a CONFIG_DRM_WERROR option - make more headers self-contained - grab resv lock in pin/unpin - fix vmap resv locking - EDID/eDP panel matching - Kconfig cleanups - DT sound bindings - Add SIZE_HINTS property for cursor planes - Add struct drm_edid_product_id and helpers. - Use drm device based logging in more drm functions. - drop seq_file.h from a bunch of places - use drm_edid driver conversions dp: - DP Tunnel documentation - MST read sideband cap - Adaptive sync SDP prep work ttm: - improve placement for TTM BOs in idle/busy handling panic: - Fixes for drm-panic, and option to test it. - Add drm panic to simpledrm, mgag200, imx, ast bridge: - improve init ordering - adv7511: allow GPIO pin sharing - tc358775: add tc358675 support panel: - AUO B120XAN01.0 - Samsung s6e3fa7 - BOE NT116WHM-N44 - CMN N116BCA-EA1, - CrystalClear CMT430B19N00 - Startek KD050HDFIA020-C020A - powertip PH128800T006-ZHC01 - Innolux G121X1-L03 - LG sw43408 - Khadas TS050 V2 - EDO RM69380 OLED - CSOT MNB601LS1-1 amdgpu: - HDCP/ODM/RAS fixes - Devcoredump improvements - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - SMU 14.0.2 support - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - DCN 3.5.x Updates - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - Remove invalid TTM resource start check - UAF fix in VA IOCTL - GPUVM page fault redirection to secondary IH rings for IH 6.x - Initial support for mapping kernel queues via MES - Fix VRAM memory accounting amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix - Enable SQ watchpoint for gfx10 i915: - Adding new DG2 PCI ID - add context hints for GT frequency - enable only one CCS for compute workloads - new workarounds - Fix UAF on destroy against retire race and remove two earlier partial fixes - Limit the reserved VM space to only the platforms that need it - Fix gt reset with GuC submission is disable - Add and use gt_to_guc() wrapper i915/xe display: - Lunar Lake display enabling, including cdclk and other refactors - BIOS/VBT/opregion related refactor - Digital port related refactor/clean-up - Fix 2s boot time regression on DP panel replay init - Remove duplication on audio enable/disable on SDVO and g4x+ DP - Disable AuxCCS framebuffers if built for Xe - Make crtc disable more atomic - Increase DP idle pattern wait timeout to 2ms - Start using container_of_const() for some extra const safety - Fix Jasper Lake boot freeze - Enable MST mode for 128b/132b single-stream sideband - Enable Adaptive Sync SDP Support for DP - Fix MTL supported DP rates - removal of UHBR13.5 - PLL refactoring - Limit eDP MSO pipe only for display version 20 - More display refactor towards independence from i915 dev_priv - Convert i915/xe fbdev to DRM client - More initial work to make display code more independent from i915 xe: - improved error capture - clean up some uAPI leftovers - devcoredump update - Add BMG mocs table - Handle GSCCS ER interrupt - Implement xe2- and GuC workarounds - struct xe_device cleanup - Hwmon updates - Add LRC parsing for more GPU instruction - Increase VM_BIND number of per-ioctl Ops - drm/xe: Add XE_BO_GGTT_INVALIDATE flag - Initial development for SR-IOV support - Add new PCI IDs to DG2 platform - Move userptr over to start using hmm_range_fault msm: - Switched to generating register header files during build process instead of shipping pre-generated headers - Merged DPU and MDP4 format databases. - DP: - Stop using compat string to distinguish DP and eDP cases - Added support for X Elite platform (X1E80100) - Reworked DP aux/audio support - Added SM6350 DP to the bindings - GPU: - a7xx perfcntr reg fixes - MAINTAINERS updates - a750 devcoredump support radeon: - Silence UBSAN warnings related to flexible arrays nouveau: - move some uAPI objects to uapi headers omapdrm: - console fix ast: - add i2c polling qaic: - add debugfs entries exynos: - fix platform_driver .owner - drop cleanup code mediatek: - Use devm_platform_get_and_ioremap_resource() in mtk_hdmi_ddc_probe() - Add GAMMA 12-bit LUT support for MT8188 - Rename mtk_drm_* to mtk_* - Drop driver owner initialization - Correct calculation formula of PHY Timing" * tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel: (1477 commits) drm/xe/ads: Use flexible-array drm/xe: Use ordered WQ for G2H handler drm/msm/gen_header: allow skipping the validation drm/msm/a6xx: Cleanup indexed regs const'ness drm/msm: Add devcoredump support for a750 drm/msm: Adjust a7xx GBIF debugbus dumping drm/msm: Update a6xx registers XML drm/msm: Fix imported a750 snapshot header for upstream drm/msm: Import a750 snapshot registers from kgsl MAINTAINERS: Add Konrad Dybcio as a reviewer for the Adreno driver MAINTAINERS: Add a separate entry for Qualcomm Adreno GPU drivers drm/msm/a6xx: Avoid a nullptr dereference when speedbin setting fails drm/msm/adreno: fix CP cycles stat retrieval on a7xx drm/msm/a7xx: allow writing to CP_BV counter selection registers drm: zynqmp_dpsub: Always register bridge Revert "drm/bridge: ti-sn65dsi83: Fix enable error path" drm/fb_dma: Add checks in drm_fb_dma_get_scanout_buffer() drm/fbdev-generic: Do not set physical framebuffer address drm/panthor: Fix the FW reset logic drm/panthor: Make sure we handle 'unknown group state' case properly ...	2024-05-15 09:43:42 -07:00
Frank Min	b61467778e	drm/amdgpu: fix mqd corruption for gfx12 1. restore mqd from backup while resuming 2. use copy_toio and copy_fromio while mqd in vram Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Frank Min	ef168e6de9	drm/amdgpu: add initial value for gfx12 AGP aperture add initial value for gfx12 AGP aperture Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Jesse Zhang	e6ae021adb	drm/amdgpu: fix the warning bad bit shift operation for aca_error_type type Filter invalid aca error types before performing a shift operation. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Jesse Zhang	d190b459b2	drm/amdgpu: the warning dereferencing obj for nbio_v7_4 if ras_manager obj null, don't print NBIO err data Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Jesse Zhang	68de5d31b1	drm/amdgpu: remove structurally dead code This code cannot be reached: return "UNKNOWN";. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:02 -04:00
Jesse Zhang	269435aef4	drm/amdgu: remove unused code The same code is executed when the condition err is true or false, because the code in the if-then branch and after the if statement is identical Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:12:01 -04:00
Jesse Zhang	e2bff63ba6	drm/amdgpu: remove structurally dead code for amd_gmc This code cannot be reached: return sysfs_emit(buf, "UNK....) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:53 -04:00
Jesse Zhang	f0574a56fb	drm/amd: fix the warning unchecking return vaule for sdma_v7 check ring allocate success before emit preempt ib Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:53 -04:00
Jesse Zhang	b55bf19eb9	drm/amdgpu: clear the warning unsigned compared against 0 for xcp_id This greater-than-or-equal-to-zero comparison of an unsigned value is always true. fpriv->xcp_id >= 0U Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:53 -04:00
Jesse Zhang	1940708ccf	drm/amdgpu: fix the waring dereferencing hive Check the amdgpu_hive_info *hive that maybe is NULL. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:53 -04:00
Jesse Zhang	b1f7810b05	drm/amdgpu: fix dereference after null check check the pointer hive before use. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:52 -04:00
Jesse Zhang	1a00f2ac82	drm/amdgpu: Fix the warning division or modulo by zero Checks the partition mode and returns an error for an invalid mode. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 16:11:47 -04:00
Saleemkhan Jamadar	98a2e3a0d1	drm/amdgpu/umsch: add support to capture fw debug log Added support to capture unsch fw debug logs in debugfs. To enable set amdgpu_umschfw_log =1 in boot args. v1 - rename variable to umsch_mm_fwlog (Veera) Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:45:41 -04:00
David (Ming Qiang) Wu	4b0497d25d	drm/amd/amdgpu: update jpeg 5 capability Based on the documentation the maximum resolustion should be 16384x16384. Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:45:34 -04:00
David (Ming Qiang) Wu	a166ec28db	drm/amdgpu/vcn: set VCN5 power gating state to GATE on suspend On suspend, we need to set power gating state to GATE when VCN5 is busy, otherwise we will get following error on resume: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring vcn_unified_0 test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] ERROR resume of IP block <vcn_v5_0_0> failed -110 amdgpu: amdgpu_device_ip_resume failed (-110). PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -110 PM: failed to resume async: error -110 Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:45:21 -04:00
David (Ming Qiang) Wu	10fe1a79cd	drm/amdgpu/vcn: remove irq disabling in vcn 5 suspend We do not directly enable/disable VCN IRQ in vcn 5.0.0. And we do not handle the IRQ state as well. So the calls to disable IRQ and set state are removed. This effectively gets rid of the warining of "WARN_ON(!amdgpu_irq_enabled(adev, src, type))" in amdgpu_irq_put(). Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:45:12 -04:00
Jack Xiao	745e0a90be	drm/amdgpu/mes: fix mes12 to map legacy queue Adjust mes12 initialization sequence to fix mapping legacy queue. v2: use dev_err. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:44:20 -04:00
Philip Yang	9095e55440	drm/amdkfd: Remove arbitrary timeout for hmm_range_fault On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary timeout value is not accurate, cause memory allocation failure. Remove the arbitrary timeout value, return EAGAIN to application if hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call ioctl again. Change EAGAIN to debug message as this is not error. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:44:02 -04:00
Michel Dänzer	8d2c930735	drm/amdgpu: Fix comparison in amdgpu_res_cpu_visible It incorrectly claimed a resource isn't CPU visible if it's located at the very end of CPU visible VRAM. Fixes: `a6ff969fe9` ("drm/amdgpu: fix visible VRAM handling during faults") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3343 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-and-Tested-by: Jeremy Day <jsday@noreason.ca> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-05-10 13:05:13 -04:00
Masahiro Yamada	b1992c3772	kbuild: use $(src) instead of $(srctree)/$(src) for source directory Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for checked-in source files. It is merely a convention without any functional difference. In fact, $(obj) and $(src) are exactly the same, as defined in scripts/Makefile.build: src := $(obj) When the kernel is built in a separate output directory, $(src) does not accurately reflect the source directory location. While Kbuild resolves this discrepancy by specifying VPATH=$(srctree) to search for source files, it does not cover all cases. For example, when adding a header search path for local headers, -I$(srctree)/$(src) is typically passed to the compiler. This introduces inconsistency between upstream and downstream Makefiles because $(src) is used instead of $(srctree)/$(src) for the latter. To address this inconsistency, this commit changes the semantics of $(src) so that it always points to the directory in the source tree. Going forward, the variables used in Makefiles will have the following meanings: $(obj) - directory in the object tree $(src) - directory in the source tree (changed by this commit) $(objtree) - the top of the kernel object tree $(srctree) - the top of the kernel source tree Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced with $(src). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>	2024-05-10 04:34:52 +09:00
Michel Dänzer	c4dcb47d46	drm/amdgpu: Fix comparison in amdgpu_res_cpu_visible It incorrectly claimed a resource isn't CPU visible if it's located at the very end of CPU visible VRAM. Fixes: `a6ff969fe9` ("drm/amdgpu: fix visible VRAM handling during faults") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3343 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-and-Tested-by: Jeremy Day <jsday@noreason.ca> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-05-08 18:45:29 -04:00
Srinivasan Shanmugam	e35ba81bb3	drm/amdgpu: Fix buffer size to prevent truncation in gfx_v12_0_init_microcode This commit fixes multiple potential truncations when writing the strings _pfp.bin, _me.bin, _rlc.bin, and _mec.bin into the fw_name buffer in the gfx_v12_0_init_microcode function in the gfx_v12_0.c file The ucode_prefix size was reduced from 30 to 15 to ensure the snprintf function does not exceed the size of the fw_name buffer. Thus fixing the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c: In function ‘gfx_v12_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:421:54: warning: ‘_pfp.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 421 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:421:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 421 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:428:54: warning: ‘_me.bin’ directive output may be truncated writing 7 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 428 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix); \| ^~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:428:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 40 428 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:436:62: warning: ‘_rlc.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 436 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:436:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 436 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:448:54: warning: ‘_mec.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 448 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:448:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 448 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Srinivasan Shanmugam	ffd574459d	drm/amdgpu: Fix truncation by resizing ucode_prefix in imu_v12_0_init_microcode This commit fixes potential truncation when writing the string _imu.bin into the fw_name buffer in the imu_v12_0_init_microcode function in the imu_v12_0.c file The ucode_prefix size was reduced from 30 to 15 to ensure the snprintf function does not exceed the size of the fw_name buffer. Thus fixing the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/imu_v12_0.c: In function ‘imu_v12_0_init_microcode’: drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:51:54: warning: ‘_imu.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 51 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_imu.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:51:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 51 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_imu.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Tim Huang	51dfc0a4d6	drm/amdgpu: fix mc_data out-of-bounds read warning Clear warning that read mc_data[i-1] may out-of-bounds. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Tim Huang	8944acd0f9	drm/amdgpu: fix ucode out-of-bounds read warning Clear warning that read ucode[] may out-of-bounds. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Ma Jun	0991e49d2b	drm/amdgpu: Fix uninitialized variable warning in amdgpu_info_ioctl Check the return value of amdgpu_xcp_get_inst_details, otherwise we may use an uninitialized variable inst_mask Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Ma Jun	d768394fa9	drm/amdgpu: Fix out-of-bounds read of df_v1_7_channel_number Check the fb_channel_number range to avoid the array out-of-bounds read error Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Alex Deucher	cdca89bce4	drm/amdgpu/soc21: use common nbio callback to set remap offset This fixes HDP flushes on systems with non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Alex Deucher	1dd8b24acc	drm/amdgpu/nv: use common nbio callback to set remap offset This fixes HDP flushes on systems with non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:07 -04:00
Alex Deucher	c866201cdc	drm/amdgpu/soc15: use common nbio callback to set remap offset This fixes HDP flushes on systems with non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	30f45a8ea4	drm/amdgpu: add set_reg_remap callback for NBIF 6.3.1 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	3345f7ec0d	drm/amdgpu: add set_reg_remap callback for NBIO 7.7 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	ffd3d6e780	drm/amdgpu: add set_reg_remap callback for NBIO 4.3 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	454847c9f4	drm/amdgpu: add set_reg_remap callback for NBIO 2.3 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	cacbbfbd24	drm/amdgpu: add set_reg_remap callback for NBIO 7.2 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	42ad8ac6bd	drm/amdgpu: add set_reg_remap callback for NBIO 7.11 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	f9a2274b33	drm/amdgpu: add set_reg_remap callback for NBIO 7.9 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	9d0e2915c4	drm/amdgpu: add set_reg_remap callback for NBIO 7.4 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	b2648640b9	drm/amdgpu: add set_reg_remap callback for NBIO 7.0 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	cab62e4839	drm/amdgpu: add set_reg_remap callback for NBIO 6.1 This will be used to consolidate the register remap offset configuration and fix HDP flushes on systems non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
Alex Deucher	0617cdde84	drm/amdgpu: add nbio set_reg_remap helper Will be used to consolidate reg remap settings and fix HDP flushes on systems with non-4K pages. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:06 -04:00
YiPeng Chai	2b3b9d2150	drm/amdgpu: change log level Change log level. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Likun Gao	a735b4a4ad	drm/amdgpu: fix spl component for psp v14 Fix the coding error when load spl component for psp v14. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Tim Huang	9e5da94259	drm/amdgpu: fix uninitialized variable warning for jpeg_v4 Clear warning that using uninitialized variable r. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Yang Wang	329cec8f18	drm/amdgpu: fix RAS unload driver issue in SRIOV Fix null pointer issue when unload driver in SRIOV mode. Adjust the function position to ensure that the amdgpu_mca/aca_xxx_init() related functions can be initialized properly. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Yang Wang	85a24a3ea0	drm/amdgpu: ignoring unsupported ras blocks when MCA bank dispatches This patch is used to solve the problem of incorrect parsing of error counts. When the UE trigger gpu is reset, the driver will attempt to parse all possible ras blocks. For ras blocks that are not supported by the current ASIC, the driver should ignore this error. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Candice Li <candice.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Tim Huang	8f184f8e7a	drm/amdgpu: fix uninitialized variable warning for amdgpu_xgmi Clear warning that using uninitialized variable current_node. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Tim Huang	3aa6b72045	drm/amdgpu: fix uninitialized variable warning for sdma_v7 Clear warning that using uninitialized variable index. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Ma Jun	be1684930f	drm/amdgpu: Fix out-of-bounds write warning Check the ring type value to fix the out-of-bounds write warning Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:05 -04:00
Ma Jun	7e39d7ec35	drm/amdgpu: Fix the uninitialized variable warning Check the user input and phy_id value range to fix "Using uninitialized value phy_id" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Jack Xiao	56fd1f8868	drm/amdgpu/mes11: fix kiq ring ready flag kiq ring test has overwitten ready flag, need disable after gfx hw init. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Lang Yu	89773b8559	drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Sunil Khatri	3b3c9e865e	drm/amdgpu: add se registers to ip dump for gfx10 add the registers of SE block of gfx for ip dump for gfx10 IP. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Sunil Khatri	b4e394e843	drm/amdgpu: add CP headers registers to gfx10 dump add registers in the ip dump for CP headers in gfx10 Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 15:17:04 -04:00
Ville Syrjälä	d26238c680	drm/amdgpu: Use drm_crtc_vblank_crtc() Replace the open coded drm_crtc_vblank_crtc() with the real thing. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240408190611.24914-2-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-08 21:55:30 +03:00
Sunil Khatri	b0923d5d80	drm/amdgpu: remove ip dump reg_count variable reg_count is not used and the register count is directly derived from the array size and hence removed. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-03 09:06:47 -04:00
Asad Kamal	6cd2b87264	drm/amd/amdgpu: Check tbo resource pointer Validate tbo resource pointer, skip if NULL Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:18 -04:00
Kenneth Feng	3474e02ed5	drm/amd/pm: support mode1 reset on smu_v14_0_3 support mode1 reset on smu_v14_0_3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Alex Deucher	ade887c633	drm/amdgpu/mes12: Use a separate fence per transaction We can't use a shared fence location because each transaction should be considered independently. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Alex Deucher	94b51a3d01	drm/amdgpu/mes12: increase mes submission timeout MES internally has a timeout allowance of 2 seconds. Increase driver timeout to 3 seconds to be safe. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Alex Deucher	b1d852920b	drm/amdgpu/mes12: print MES opcodes rather than numbers Makes it easier to review the logs when there are MES errors. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	174fdc07c0	drm/amd/amdgpu: enable mmhub and athub cg on gc 12.0.1 enable mmhub and athub cg on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	dd8707295d	drm/amd/amdgpu: enable gfxoff on gc 12.0.1 Enable gfxoff on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Jack Gui <Jack.Gui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Likun Gao	b9f5d0f978	drm/amdgpu: support cg state get for gfx v12 Support to get clockgating state for gfx v12. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	598a3b753a	drm/amd/amdgpu: enable sram fgcg on gc 12.0.1 enable sram fgcg on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	6f6bb3909c	drm/amd/amdgpu: enable perfcounter mgcg and repeater fgcg enable perfcounter mgcg and repeater fgcg on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	0b6662eb2a	drm/amd/amdgpu: enable 3D cgcg and 3D cgls enable 3D cgcg and 3D cgls on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	af472f68c7	drm/amd/amdgpu: enable mgcg on gfx 12.0.1 enable mgcg on gfx 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
Kenneth Feng	81b09cedb3	drm/amd/amdgpu: enable cgcg and cgls enable cgcg and cgls on gc 12.0.1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:15 -04:00
David (Ming Qiang) Wu	856d1ed4b2	drm/amdgpu/vcn5: Add VCN5 capabilities Add VCN5 encode and decode capabilities support Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Sonny Jiang	117f851393	drm/amdgpu/vcn5: enable DPG mode support Enable DPG mode Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Sonny Jiang	f19cfce87d	drm/amdgpu/jpeg5: enable power gating Enable PG on JPEG5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
David (Ming Qiang) Wu	f8f8e95c5f	amdgpu/vcn: enable AMD_PG_SUPPORT_VCN turn on AMD_PG_SUPPORT_VCN flag for power saving Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
David Belanger	da43e93d1b	drm/amdgpu: Fix physical address mask Mask should be 44-bit. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Likun Gao	0a75dc9831	drm/amdgpu/discovery: add mes v12_0 ip block Add mes v12_0 ip block. v2: squash in update (Alex) v3: rebase on unified mes changes (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Likun Gao	5e676d7180	drm/amdgpu/discovery: add gfx v12_0 ip block Add gfx v12_0 ip block. v2: Squash in update (Alex) v3: add exp flag (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	03f4b8c3ca	drm/amdgpu/mes12: disable logging output Random page fault was oberserved, temporarily disable mes log buffer output. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	3dc434ad26	drm/amdgpu: add module parameter 'amdgpu_uni_mes' Add module parameter 'amdgpu_uni_mes' to enable/disable unified mes fw support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	ad5c0a79df	drm/amdgpu/mes12: add legacy setting hw resource interface For unified mes fw, add the legacy interface to set hardware resources. v2: remove warning (Alex) Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
shaoyunl	fcc5df722d	drm/amdgpu: Disable unmapped doorbell handling basic mode on mes 12 The new mechanism for unmapped doorbell handling requires both driver side and MES fw side change. The FW side changes are still not released. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	663bbfaf68	drm/amdgpu/gfx: enable mes to map legacy queue support Enable mes to map legacy queue support. v2: drop unused gfx_v12_0_kiq_enable_kgq() (Alex) Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:14 -04:00
Jack Xiao	4c2439f908	drm/amdgpu/mes12: add mes mapping legacy queue support Add mes12 map legacy queue packet submission. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jack Xiao	6ce03bd3a4	drm/amdgpu/mes12: enable uni_mes fw on mes pipe0 Enable the unified mes firmware on mes pipe0. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jack Xiao	d2e2c9be78	drm/amdgpu/mes12: add uni_mes fw loading support Add the unified mes firmware loading support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jack Xiao	15ddc4e693	drm/amdgpu/mes: add uni_mes fw loading support Add the unified mes firmware loading support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Sreekant Somasekharan	628e1ace23	drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC Due to a HW bug, the system memory mappings and peer GPU mappings on GFX12 need to be marked as MTYPE_NC. Cc: Joe Greathouse <joseph.greathouse@amd.com> Cc: David Belanger <david.belanger@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jonathan Kim	984b265ff6	drm/amdkfd: fix support for trap on wave start and end for gfx12 Similar to GFX11, GFX12 supports trapping on wave start and end. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
Jonathan Kim	fda3f378c4	drm/amdkfd: always enable ttmp setup for gfx12 Similar to GFX11, always enable the setup of trap temporaries on GFX12. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:13 -04:00
David Belanger	90e4fc8369	drm/amdkfd: Added gfx_v12_kfd2kgd interface for GFX12. Initial implementation, based on GFX11. v2: Removed functions not needed by cp scheduler. v3: Fixed typos. v4: squash in warning fix (Alex) Signed-off-by: David Belanger <david.belanger@amd.com> Acked-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
shaoyunl	2f983d3ca5	drm/amdgpu: Enable event log on MES 12 Enable event log through the HW specific FW API Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
shaoyunl	19e69a5d28	drm/amdgpu: Enable unmapped doorbell handling basic mode on mes 12 Enable basic mode handling for doorbell ring on unmapped CP queue. In this mode, MES can start schedule the queue mapping based on HW interrupt instead of timer. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
Hawking Zhang	a2211e475c	drm/amdgpu: Switch to smuio func to get gpu clk counter Switch to smuio callback to query gpu clock counter Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
Likun Gao	e781af6663	drm/amdgpu: init gfxhub setting to align with mmhub Align gfxhub settings with mmhub when program rlc ram. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:12 -04:00
Likun Gao	b32edc2340	drm/amdgpu: skip dpm check to init imu fw Skip dpm check to init imu firmware for imu v12. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	044feb8e2a	drm/amdgpu: fix active rb and cu number for gfx12 Correct the algorithm of active CU and RB to bypass the disabled SA for gfx12. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	56159fffaa	drm/amdgpu: use new method to program rlc ram Program rlc ram with golden setting data instead. The old method (program_imu_rlc_ram_old) should be retired in the future. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Kenneth Feng	043869be5a	drm/amd/amdgpu: add cgcg&cgls interface for gfx 12.0 add cgcg&cgls interface for gfx 12.0 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Tom St Denis	60917ce8f8	drm/amd/amdgpu: update GFX12 wave data registers Update the registers for gfx12. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	53efeba35d	drm/amdgpu: set different fw data addr for mec pipe For MEC fw data, different pipe should programed into different address. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Kenneth Feng	1e740df77f	drm/amd/amdgpu: workaround for the imu fw loading workaournd for the imu fw loading on gfx 12.0 without psp Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	f5b4c3236f	drm/amd: Move fw init from sw_init to early_init for imu v12 Move microcode loading from sw_init to early_init to align with the perious version of imu init sequence. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	2502af906b	drm/amdgpu: support S&R fw load for gfx v12 Support Save & Restore related fw load with backdoor RLC autoload type on gfx v12. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Jack Xiao	36b2ce4775	drm/amdgpu/gfx12: recalculate available compute rings to use Recalculate the number of compute rings to use based on the gfx hardware configuration. As needed reserve half of compute rings for mes, kgd can't use up all compute rings. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	29d36a9cfd	drm/amdgpu: skip imu related function if dpm=0 Only execute IMU related functions if dpm>0. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Kenneth Feng	32d1637689	drm/amd/amdgpu: imu fw loading support support imu related function for gfx v12. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:11 -04:00
Likun Gao	af204b76a7	drm/amdgpu: set cp fw address set for gfx v12 Split PFF/ME/MEC firmware address setting function from related load microcode funtion, as it's also needed for rlc autolad. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	52cb80c12e	drm/amdgpu: Add gfx v12_0 ip block support (v6) Initial support for GFX 12. v1: Add gfx v12_0 ip block support. (Likun) v2: Switch to gfx.kiq array. Move the vmhub from ring callback to ring. (Hawking) v3: Update various callback function impl. (Hawking) v4: Warning fixes (Alex) v5: squash in imu fix, csb, rlc autoload implementations (Alex) v6: Rebase (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Jack Xiao	4632bec9fa	drm/amdgpu/mes12: update data cache boundary Enlarge the data cache boundary. v2: use the fix data cache boundary. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Jonathan Kim	46c4766610	drm/amdgpu: fix trap enablement for gfx12 Fix request to MES to set SQ_SHADER_TBA_HI.trap_en for GFX12. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
shaoyunl	d817c470cb	drm/amdgpu: Enable MES to handle doorbell ring on unmapped queue On MES12, HW can monitor up to 2048 doorbells that not be mapped currently and trigger the interrupt to MES when these unmapped doorbell been ringed. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Jack Xiao	745f46b6a9	drm/amdgpu: enable mes v12 self test 1. fix available compute queue to use 2. enable mes v12 self test Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	6628f7762b	drm/amdgpu: set mes fw address for mes v12 Split the function of mes fimrware address setting from mes firmware load for mes v12, as it's also needed for rlc autoload. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Jack Xiao	785f0f9fe7	drm/amdgpu: Add mes v12_0 ip block support (v4) v1: Add mes v12_0 ip block support. (Jack) v2: Switch to gfx.kiq array. (Hawking) v3: Switch to AMDGPU_GFXHUB(0). (Hawking) v4: Rebase (Alex) Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	69d4c44e51	drm/amdgpu: init mes ucode name for gfx v12 Keep gfx v12 mes fw name to gc_12_x_x_mes.bin and gc_12_x_x_mes1.bin. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	00c9035633	drm/amdgpu: add rlc TOC header file for soc24 Add RLC autoload TOC header file for soc24 ASIC. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	e3a911bb38	drm/amdgpu: add new TOC structure Add new RLC_TABLE_OF_CONTENT structure definition. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	d8fd91f905	drm/amdgpu: add gfx12 clearstate header Add gfx12 clearstate register arrays. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:10 -04:00
Likun Gao	5638b1cfa7	drm/amdgpu/discovery: Set GC family for GC 12.0 IP Set GC family for GC 12.0 IPs. v2: squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:18:09 -04:00
Sonny Jiang	e56b042118	drm/amdgpu: IB test encode test package change for VCN5 VCN5 session info package interface changed Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 16:17:55 -04:00
Hawking Zhang	5f571c61b9	drm/amdgpu: Add gfx v9_4_4 ip block Add gfx v9_4_4 ip block support Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:49:16 -04:00
Hawking Zhang	a6bcffa596	drm/amdgpu: Add smu v13_0_14 ip block Add smu v13_0_14 ip block support Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:49:11 -04:00
Hawking Zhang	1dbd59f3f4	drm/amdgpu: Add psp v13_0_14 ip block Add psp v13_0_14 ip block support. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:49:05 -04:00
Hawking Zhang	3d1bb1a2e0	drm/amdgpu: Add sdma v4_4_5 ip block Add sdma v4_4_5 ip block support Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:48:57 -04:00
Aurabindo Pillai	c45211adfa	drm/amd: Override DCN410 IP version Override DCN IP version to 4.0.1 from 4.1.0 temporarily until change is made in DC codebase to use 4.1.0 Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:48:46 -04:00
Lang Yu	2d6f49ee84	drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30 Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:44:10 -04:00
Yunxiang Li	4752cac300	drm/amdgpu: Move ras resume into SRIOV function This is part of the reset, move it into the reset function. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:44:02 -04:00
Peyton Lee	3f19cffde9	drm/amdgpu/vpe: fix vpe dpm clk ratio setup failed Some version of BIOS does not enable all clock levels, resulting in high level clock frequency of 0. The number of valid CLKs must be confirmed in advance. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:43:48 -04:00
Yunxiang Li	6e4aa08fa9	drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic The retry loop for SRIOV reset have refcount and memory leak issue. Depending on which function call fails it can potentially call amdgpu_amdkfd_pre/post_reset different number of times and causes kfd_locked count to be wrong. This will block all future attempts at opening /dev/kfd. The retry loop also leakes resources by calling amdgpu_virt_init_data_exchange multiple times without calling the corresponding fini function. Align with the bare-metal reset path which doesn't have these issues. This means taking the amdgpu_amdkfd_pre/post_reset functions out of the reset loop and calling amdgpu_device_pre_asic_reset each retry which properly free the resources from previous try by calling amdgpu_virt_fini_data_exchange. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:41:05 -04:00
Aurabindo Pillai	a5b843269a	drm/amd: Enable DCN410 init Enable initializing Display Manager for DCN410 IP Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:57 -04:00
Yunxiang Li	25c01191c2	drm/amdgpu: Add reset_context flag for host FLR There are other reset sources that pass NULL as the job pointer, such as amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if the FLR comes from the host does not work. Add a flag in reset_context to explicitly mark host triggered reset, and set this flag when we receive host reset notification. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:50 -04:00
Yunxiang Li	f4322b9f8a	drm/amdgpu: Fix two reset triggered in a row Some times a hang GPU causes multiple reset sources to schedule resets. The second source will be able to trigger an unnecessary reset if they schedule after we call amdgpu_device_stop_pending_resets. Move amdgpu_device_stop_pending_resets to after the reset is done. Since at this point the GPU is supposedly in a good state, any reset scheduled after this point would be a legitimate reset. Remove unnecessary and incorrect checks for amdgpu_in_reset that was kinda serving this purpose. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:44 -04:00
Zhigang Luo	f5007c67fc	drm/amdgpu: update vf to pf message retry from 2 to 5 increase retry times to wait host has enough time to complete reset. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:38 -04:00
Zhigang Luo	3bcc0ee147	drm/amdgpu: avoid reading vf2pf info size from FB VF can't access FB when host is doing mode1 reset. Using sizeof to get vf2pf info size, instead of reading it from vf2pf header stored in FB. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-02 15:40:32 -04:00
Geert Uytterhoeven	05b8b6dd22	Revert "drm: Switch DRM_DISPLAY_HELPER to depends on" This reverts commit `e075e496f5`, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/1ba76cc4d96a8afefff5d1bc42fb1e1329c5da68.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-02 17:58:23 +02:00
Geert Uytterhoeven	7fe302ae19	Revert "drm: Switch DRM_DISPLAY_DP_HELPER to depends on" This reverts commit `0323287de8`, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/89ac456805746b6d0c888f10c5120b11aacd3319.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-02 17:58:21 +02:00
Geert Uytterhoeven	9573446953	Revert "drm: Switch DRM_DISPLAY_HDCP_HELPER to depends on" This reverts commit `3166e7e6d9`, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/a40e70a0abd3d841c23c107d452a43fdd70ef37a.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-02 17:58:21 +02:00
Geert Uytterhoeven	d7c128cb77	Revert "drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on" This reverts commit `f6d2dc03fa`, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/bd288a5943dab8609f2d1f2bf413595a61df727a.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-05-02 17:58:20 +02:00
Thomas Zimmermann	aae4682e5d	drm/fbdev-generic: Convert to fbdev-ttm Only TTM-based drivers use fbdev-generic. Rename it to fbdev-ttm and change the symbol infix from _generic_ to _ttm_. Link the source file into TTM helpers, so that it is only build if TTM-based drivers have been selected. Select DRM_TTM_HELPER for loongson. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419083331.7761-43-tzimmermann@suse.de	2024-05-02 11:33:32 +02:00
Shashank Sharma	705d0480e6	drm/amdgpu: fix doorbell regression This patch adds a missed handling of PL domain doorbell while handling VRAM faults. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: `a6ff969fe9` ("drm/amdgpu: fix visible VRAM handling during faults") Reviewed-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 21:59:16 -04:00
Christian König	d3a9331a65	drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2 This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes were suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. v2: add missing NULL check Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `94aeb41173` ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-04-30 21:39:57 -04:00
Mukul Joshi	f06446ef23	drm/amdgpu: Fix VRAM memory accounting Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 21:22:50 -04:00
Tim Huang	0fa4c25db8	drm/amdgpu: fix uninitialized scalar variable warning Clear warning that field bp is uninitialized when calling amdgpu_virt_ras_add_bps. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:04:15 -04:00
Likun Gao	f45ed399d7	drm/amdgpu/discovery: add sdma v7_0 ip block Add sdma v7_0 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:52 -04:00
Likun Gao	4badb9999b	drm/amdgpu: provide more ucode name shown via id Provide some lost ucode name shown via firmware ID. v2: fix whitespace (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:47 -04:00
Likun Gao	807d90b5ef	drm/amdgpu: support SDMA v3 struct fw front door load Add support for new SDMA firmware struct (V3) with PSP front door load type. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:42 -04:00
Jack Xiao	5251b56e38	drm/amdgpu/sdma7: set sdma hang watchdog Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers to improve SDMA reliability. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:36 -04:00
Likun Gao	b412351e91	drm/amdgpu: Add sdma v7_0 ip block support (v7) v1: Add sdma v7_0 ip block support. (Likun) v2: Move vmhub from ring_funcs to ring. (Hawking) v3: Switch to AMDGPU_GFXHUB(0). (Hawking) v4: Move microcode init into early_init. (Likun) v5: Fix warnings (Alex) v6: Squash in various fixes (Alex) v7: Rebase (Alex) v8: Rebase (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:32 -04:00
Likun Gao	9989a924aa	drm/amdgpu: Add sdma fw v3 structure Add sdma firmware struct version 3 to support sdma v7_0 firmware. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:03:12 -04:00
Dan Carpenter	6769a23697	drm/amdgpu: Fix signedness bug in sdma_v4_0_process_trap_irq() The "instance" variable needs to be signed for the error handling to work. Fixes: `8b2faf1a4f` ("drm/amdgpu: add error handle to avoid out-of-bounds") Reviewed-by: Bob Zhou <bob.zhou@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:53 -04:00
Likun Gao	e8a31b4e81	drm/amdgpu: Add new members for sdma v7_0 fw Add new members in sdma instance structure for sdma v7_0 firmware. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:49 -04:00
Likun Gao	1b838189ed	drm/amdgpu/discovery: Add gmc v12_0 ip block Add gmc v12_0 ip block. v2: Squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:39 -04:00
Shashank Sharma	04790139c5	drm/amdgpu: fix doorbell regression This patch adds a missed handling of PL domain doorbell while handling VRAM faults. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: `a6ff969fe9` ("drm/amdgpu: fix visible VRAM handling during faults") Reviewed-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:04 -04:00
Hawking Zhang	980a0a9452	drm/amdgpu: support gfx v12 specific pte/pde fields Add gfx v12 pte/pde support to gmc common helper. v2: squash in fixes (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:01:01 -04:00
Hawking Zhang	f3c3dd1207	drm/amdgpu: Set pte_is_pte flag in gmc v12 gart pte_is_pte is new flag introduced in gmc v12 that needs to be set by default for pte. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:51 -04:00
Hawking Zhang	075b44aa21	drm/amdgpu: Add gmc v12_0 ip block support (v7) Add initial support for GMC v12. v1: Add gmc v12_0 ip block support. v2: Switch to gfx.kiq array. v3: Switch to vmhubs_mask. v4: Switch to AMDGPU_MMHUB0(0) and AMDGPU_GFXHUB(0) v5: Rebase (Alex) v6: Squash in fixes for AGP handling, gfxhub init order, vmhub index (Alex) v7: Rebase (Alex) v8: squash in ecc fix (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:39 -04:00
Hawking Zhang	2d1d875656	drm/amdgpu: Add gfx v12 pte/pde format change Add gfx v12 pte/pde format change. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:35 -04:00
Likun Gao	47677629f6	drm/amdgpu: Add gfxhub v12_0 ip block support (v3) Add initial gfxhub v12 support. v1: Add gfxhub v12_0 ip block support (Likun) v2: Switch to AMDGPU_GFXHUB(0) (Hawking) v3: Squash in keep default error response mode (Hawking) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:30 -04:00
Jack Xiao	27694eace5	drm/amdgpu/mes11: increase waiting time for engine ready mes schq engine require more waiting time for engine ready before packet submission. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:23 -04:00
Jack Xiao	f9d8c5c785	drm/amdgpu/gfx: enable mes to map legacy queue support Enable mes to map legacy queue support. v2: kiq_set_resources is required. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 10:00:01 -04:00
Philip Yang	d53ce02352	drm/amdkfd: Evict BO itself for contiguous allocation If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. v6: user context should use interruptible call (Felix) Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:59:51 -04:00
YiPeng Chai	3ca73073f4	drm/amdgpu: Remove redundant function call Remove redundant function call. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:59:14 -04:00
YiPeng Chai	2c0410fbee	rm/amdgpu: Remove unused code Remove unused code. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:59:08 -04:00
Tim Huang	ebbc2ada5c	drm/amdgpu: fix overflowed array index read warning Clear overflowed array index read warning by cast operation. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:59:03 -04:00
Tim Huang	22a5daaec0	drm/amdgpu: fix potential resource leak warning Clear resource leak warning that when the prepare fails, the allocated amdgpu job object will never be released. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:53 -04:00
Yang Wang	5eccab32c1	drm/amdgpu: avoid dump mca bank log muti times during ras ISR because the ue valid mca count will only be cleared after gpu reset, so only dump mca log on the first time to get mca bank after receive RAS interrupt. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:47 -04:00
Yang Wang	76ad30f51a	drm/amdgpu: add MCA smu cache support v1: because SMU CE valid mca bank will be cleared after reading, this patch adds mca cache at the driver level to ensure that the mca bank is not lost. v2: refine amdgpu_mca_init/fini/reset() function name. v3: add mca_cache.lock support only add CE bank to mca bank cache. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:41 -04:00
Yang Wang	8fb20d9551	drm/amdgpu: add amdgpu MCA bank dispatch function support - Refine mca driver code. - Centralize mca bank dispatch code logic. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:34 -04:00
Hawking Zhang	8e9f1575d1	drm/amdgpu: Add mmhub v4_1_0 ip block support (v4) Add initial support for MMHUB 4.1.0. v1: Add mmhub v4_1_0 ip block support. v2: Switch to AMDGPU_MMHUB0(0). v3: squash in fix for ip version check (Alex) v4: squash in vm_contexts_disable fix (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:25 -04:00
Philip Yang	7005b169da	drm/amdgpu: Evict BOs from same process for contiguous allocation When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system memory then retry the allocation, this skips the KFD BOs from the same process because KFD require all BOs are resident for user queues. If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow TTM evict KFD BOs from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:15 -04:00
Philip Yang	b2dba064c9	drm/amdgpu: Handle sg size limit for contiguous allocation Define macro AMDGPU_MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than AMDGPU_MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:58:08 -04:00
Yang Wang	7e0357bef4	drm/amdgpu: remove unused MCA driver codes - remove unused callback functions. - make part of mca functions static and refine the function order. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:51:44 -04:00
Likun Gao	d22c075676	drm/amdgpu/discovery: Add common soc24 ip block Add common soc24 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:51:06 -04:00
Jack Xiao	ff518e13eb	drm/amdgpu/mes11: adjust mes initialization sequence Adjust mes queue initialization before kgq/kcq initialization to enable mes mapping legacy queue. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:50:47 -04:00
Jack Xiao	486eb6b5a8	drm/amdgpu/mes11: add mes mapping legacy queue support Add mes11 map legacy queue packet submission. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:50:40 -04:00
Christian König	ffda708148	drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2 This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes were suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. v2: add missing NULL check Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `94aeb41173` ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-04-30 09:49:51 -04:00
Hawking Zhang	98b912c50e	drm/amdgpu: Add soc24 common ip block (v2) Add initial soc24 support. v1: Add soc24 common ip block. v2: Switch to new select_se_sh/enter_safe_mode interface. v3: squash in correct ext rev id, etc. (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:46:51 -04:00
Philip Yang	155ce502e9	drm/amdgpu: Support contiguous VRAM allocation RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:44:46 -04:00
Ma Jun	c0d6bd3cd2	drm/amdgpu: Fix uninitialized variable warning in amdgpu_afmt_acr Assign value to clock to fix the warning below: "Using uninitialized value res. Field res.clock is uninitialized" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-30 09:44:34 -04:00
Dave Airlie	4a56c0ed5a	amd-drm-next-6.10-2024-04-26: amdgpu: - Misc code cleanups and refactors - Support setting reset method at runtime - Report OD status - SMU 14.0.1 fixes - SDMA 4.4.2 fixes - VPE fixes - MES fixes - Update BO eviction priorities - UMSCH fixes - Reset fixes - Freesync fixes - GFXIP 9.4.3 fixes - SDMA 5.2 fixes - MES UAF fix - RAS updates - Devcoredump updates for dumping IP state - DSC fixes - JPEG fix - Fix VRAM memory accounting - VCN 5.0 fixes - MES fixes - UMC 12.0 updates - Modify contiguous flags handling - Initial support for mapping kernel queues via MES amdkfd: - Fix rescheduling of restore worker - VRAM accounting for SVM migrations - mGPU fix - Enable SQ watchpoint for gfx10 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZiwmAQAKCRC93/aFa7yZ 2Hp0AP9SOxHrKDPErqCQbenkZXNoPFtLJWAhhKZ9zEb7YgjufwD/Zpk31mO0FpVu kUYSAQ4OFXinVVLtmVH3fE/vK+1ccwM= =oFW7 -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.10-2024-04-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-26: amdgpu: - Misc code cleanups and refactors - Support setting reset method at runtime - Report OD status - SMU 14.0.1 fixes - SDMA 4.4.2 fixes - VPE fixes - MES fixes - Update BO eviction priorities - UMSCH fixes - Reset fixes - Freesync fixes - GFXIP 9.4.3 fixes - SDMA 5.2 fixes - MES UAF fix - RAS updates - Devcoredump updates for dumping IP state - DSC fixes - JPEG fix - Fix VRAM memory accounting - VCN 5.0 fixes - MES fixes - UMC 12.0 updates - Modify contiguous flags handling - Initial support for mapping kernel queues via MES amdkfd: - Fix rescheduling of restore worker - VRAM accounting for SVM migrations - mGPU fix - Enable SQ watchpoint for gfx10 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240426221245.1613332-1-alexander.deucher@amd.com	2024-04-30 14:43:00 +10:00
Daniel Vetter	b84bc94852	Linux 6.9-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmYutdweHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGG5oH/3Ggwz5N+gdEK3np qxUfpgJTWo+fJ6xhRLGy84TnEZ9s9UnK0Su6UuVyOb0F/2Y8hesJ6iwB16yQFKNe Nore/VvuBZ+utshz5N20yNyPugNOP74GGbyOm+d+iJwIKnmSE8jSjWyMwNFHJCZM BfoBxZrpwU/YD/0PD1KkI44jhPX1H/EcEmtNiklLnuYvJydTWiRFeku+CSgcOiRz 6fIFrcZMREZrAytMQSwteBAvI3vWblC0S39ZgJmtZt+oi+s1ksIUNG8Mm5uMAiyF LcGep2tWV06x9uB9XVvrk0qco/kOaUgYuQrHQwNu9LzsaUdYGKqBBMk41SoNXFkB hBXN26U= =N2E2 -----END PGP SIGNATURE----- Merge v6.9-rc6 into drm-next Thomas needs the defio fixes, Maíra needs the vkms fixes and Joonas has some fun with i915-gem conflicts. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2024-04-29 20:22:39 +02:00
Aurabindo Pillai	96557f785a	drm/amd: GFX12 changes for converting tiling flags to modifiers GFX12 swizzle mode and GCC formats changed and is much simpler. Use a seperate function for the same. Changes: * Swizzle mode is now 3 bits only * DCC enablement doesn't come from tiling_flags, it is always set in modifiers * DCC max compressed block size of 128B Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:58 -04:00
Jesse Zhang	ea686fef54	drm/amdgpu: fix the warning about the expression (int)size - len Converting size from size_t to int may overflow. v2: keep reverse xmas tree order (Christian) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Jack Xiao	029c2b0389	drm/amdgpu/mes: add mes mapping legacy queue support Add mes mapping legacy queue framework support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Tim Huang	9a5f15d2a2	drm/amdgpu: fix uninitialized scalar variable warning Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Sunil Khatri	2a8f7464d3	drm/amdgpu: skip ip dump if devcoredump flag is set Do not dump the ip registers during driver reload in passthrough environment. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:44 -04:00
Arunpravin Paneer Selvam	e362b7c8f8	drm/amdgpu: Modify the contiguous flags behaviour Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's placement function. This patch will change the default behaviour of the two flags. When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS - This means contiguous is not mandatory. - we will try to allocate the contiguous buffer. Say if the allocation fails, we fallback to allocate the individual pages. When we setTTM_PL_FLAG_CONTIGUOUS - This means contiguous allocation is mandatory. - we are setting this in amdgpu_bo_pin_restricted() before bo validation and check this flag in the vram manager file. - if this is set, we should allocate the buffer pages contiguously. the allocation fails, we return -ENOSPC. v2: - keep the mem_flags and bo->flags check as is(Christian) - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the amdgpu_bo_pin_restricted function placement range iteration loop(Christian) - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block (Christian) - Keep the kernel BO allocation as is(Christain) - If BO pin vram allocation failed, we need to return -ENOSPC as RDMA cannot work with scattered VRAM pages(Philip) v3(Christian): - keep contiguous flag handling outside of pages_per_block calculation - remove the hacky implementation in contiguous flag error handling code v4(Christian): - use any variable and return value for non-contiguous fallback v5: rebase to amd-staging-drm-next branch Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:44 -04:00
Lancelot SIX	bd31e5026d	drm/amdkfd: Enable SQ watchpoint for gfx10 There are new control registers introduced in gfx10 used to configure hardware watchpoints triggered by SMEM instructions: SQ_WATCH{0,1,2,3}_{CNTL_ADDR_HI,ADDR_L}. Those registers work in a similar way as the TCP_WATCH* registers currently used for gfx9 and above. This patch adds support to program the SQ_WATCH registers for gfx10. The SQ_WATCH?_CNTL.MASK field has one bit more than TCP_WATCH?_CNTL.MASK, so SQ watchpoints can have a finer granularity than TCP_WATCH watchpoints. In this patch, we keep the capabilities advertised to the debugger unchanged (HSA_DBG_WATCH_ADDR_MASK_*_BIT_GFX10) as this reflects what both TCP and SQ watchpoints can do and both watchpoints are configured together. Signed-off-by: Lancelot SIX <lancelot.six@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:44 -04:00
Srinivasan Shanmugam	acce6479e3	drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microcode() The function gfx_v9_4_3_init_microcode in gfx_v9_4_3.c was generating about potential truncation of output when using the snprintf function. The issue was due to the size of the buffer 'ucode_prefix' being too small to accommodate the maximum possible length of the string being written into it. The string being written is "amdgpu/%s_mec.bin" or "amdgpu/%s_rlc.bin", where %s is replaced by the value of 'chip_name'. The length of this string without the %s is 16 characters. The warning message indicated that 'chip_name' could be up to 29 characters long, resulting in a total of 45 characters, which exceeds the buffer size of 30 characters. To resolve this issue, the size of the 'ucode_prefix' buffer has been reduced from 30 to 15. This ensures that the maximum possible length of the string being written into the buffer will not exceed its size, thus preventing potential buffer overflow and truncation issues. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c: In function ‘gfx_v9_4_3_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 379 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~ ...... 439 \| r = gfx_v9_4_3_init_rlc_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 379 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 413 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~ ...... 443 \| r = gfx_v9_4_3_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 413 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `8630112969` ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0") Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
YiPeng Chai	6f3b69139c	drm/amdgpu: Fix ras mode2 reset failure in ras aca mode Fix ras mode2 reset failure in ras aca mode. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Bob Zhou	506c245f3f	drm/amdgpu: fix double free err_addr pointer warnings In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages will be run many times so that double free err_addr in some special case. So set the err_addr to NULL to avoid the warnings. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Jesse Zhang	7bfd16d0ec	drm/amdgpu: initialize the last_jump_jiffies in atom_exec_context The parameter "last_jump_jiffies" should be initialized before being used in the function atom_op_jump. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Jesse Zhang	2d10c3dbde	drm/amdgpu: add check before free wb entry Check if ring is not a mes queue before freeing the wb entry, because we only allocate a wb entry when it's not a mes queue. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Bob Zhou	cd48b97ce7	drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be conducted to put wrong value. So return and check the i2c transfer result. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Bob Zhou	8b2faf1a4f	drm/amdgpu: add error handle to avoid out-of-bounds if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should be stop to avoid out-of-bounds read, so directly return -EINVAL. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
Ma Jun	2e55bcf3d7	drm/amdgpu: Initialize timestamp for some legacy SOCs Initialize the interrupt timestamp for some legacy SOCs to fix the coverity issue "Uninitialized scalar variable" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
YiPeng Chai	48fa90718b	drm/amdgpu: Use new interface to reserve bad page Use new interface to reserve bad page. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:43 -04:00
YiPeng Chai	bcc0934885	drm/amdgpu: Fix address translation defect retired_page is page frame and should be expanded to the full address when querying status. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	e023874081	drm/amdgpu: support ACA logging ecc errors support ACA logging ecc errors. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	370fbff4cc	drm/amdgpu: add poison consumption handler Add poison consumption handler. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	bfa579b38b	drm/amdgpu: prepare to handle pasid poison consumption Prepare to handle pasid poison consumption. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	314c38cde6	drm/amdgpu: retire bad pages for umc v12_0 Retire bad pages for umc v12_0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:42 -04:00
YiPeng Chai	e74313be5a	drm/amdgpu: add condition check for amdgpu_umc_fill_error_record Add condition check for amdgpu_umc_fill_error_record. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	2cf8e50ec3	drm/amdgpu: Add delay work to retire bad pages Add delay work to retire bad pages. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	f27defca68	drm/amdgpu: umc v12_0 logs ecc errors 1. umc v12_0 logs ecc errors. 2. Reserve newly detected ecc error pages. 3. Add tag for bad pages, so that they can be retired later. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	b2aa6b108d	drm/amdgpu: umc v12_0 converts error address Umc v12_0 converts error address. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	95b4063de4	drm/amdgpu: add interface to update umc v12_0 ecc status Add interface to update umc v12_0 ecc status. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	a734adfbcd	drm/amdgpu: add poison creation handler Add poison creation handler. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	f493dd64ee	drm/amdgpu: prepare for logging ecc errors Prepare for logging ecc errors. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
YiPeng Chai	98b5bc878d	drm/amdgpu: add message fifo to handle RAS poison events Add message fifo to handle RAS poison events. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
Jesse Zhang	88a9a467c5	drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x03000001. V2: To really improve the handling we would actually need to have a separate value of 0xffffffff.(Christian) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
Alex Deucher	eef016ba89	drm/amdgpu/mes11: Use a separate fence per transaction We can't use a shared fence location because each transaction should be considered independently. Reviewed-by: Shaoyun.liu <shaoyunl@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:41 -04:00
Alex Deucher	497d7cee24	drm/amdgpu: add a spinlock to wb allocation As we use wb slots more dynamically, we need to lock access to avoid racing on allocation or free. Reviewed-by: Shaoyun.liu <shaoyunl@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Sonny Jiang	754c366e41	drm/amdgpu: update fw_share for VCN5 kmd_fw_shared changed in VCN5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Mukul Joshi	8e1d190595	drm/amdgpu: Fix VRAM memory accounting Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Sathishkumar S	c551316e15	drm/amdgpu: update jpeg max decode resolution jpeg ip version v2.1 and higher supports 16kx16k resolution decode Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Sunil Khatri	af8644121e	drm/amdgpu: add ip dump for each ip in devcoredump Add ip dump for each ip of the asic in the devcoredump for all the ips where a callback is registered for register dump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:40 -04:00
Sunil Khatri	e043a35dc2	drm/amdgpu: dump ip state before reset for each ip Invoke the dump_ip_state function for each ip before the asic resets and save the register values for debugging via devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Sunil Khatri	c8732c80de	drm/amdgpu: add support for gfx v10 print Add support to print ip information to be used to print registers in devcoredump buffer. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Sunil Khatri	40356542c3	drm/amdgpu: add protype for print ip state Add the protoype for print ip state to be used to print the registers in devcoredump during a gpu reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Sunil Khatri	c395dbb68b	drm/amdgpu: add support of gfx10 register dump Adding gfx10 gc registers to be used for register dump via devcoredump during a gpu reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Sunil Khatri	e21d253bd7	drm/amdgpu: add prototype for ip dump Add the prototype to dump ip registers for all ips of different asics and set them to NULL for now. Based on the requirement add a function pointer for each of them. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
YiPeng Chai	af730e0820	drm/amdgpu: Add interface to reserve bad page Add interface to reserve bad page. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Ma Jun	60c448439f	drm/amdgpu: Fix uninitialized variable warnings return 0 to avoid returning an uninitialized variable r Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Jack Xiao	f88da7fbf6	drm/amdgpu/mes: fix use-after-free issue Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Alex Deucher	e0a9bbeea0	drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3 This avoids a potential conflict with firmwares with the newer HDP flush mechanism. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:39 -04:00
Rajneesh Bhardwaj	a16b951586	drm/amdgpu: Update CGCG settings for GFXIP 9.4.3 Tune coarse grain clock gating idle threshold and rlc idle timeout to achieve better kernel launch latency. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Frank Min	ea9238a81b	drm/amdgpu: replace tmz flag into buffer flag Replace tmz flag into buffer flag to make it easier to understand and extend Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Le Ma	92ed1e9cd5	drm/amdgpu: init microcode chip name from ip versions To adapt to different gc versions in gfx_v9_4_3.c file. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Prike Liang	26de73bc0a	drm/amdgpu: Fix the ring buffer size for queue VM flush Here are the corrections needed for the queue ring buffer size calculation for the following cases: - Remove the KIQ VM flush ring usage. - Add the invalidate TLBs packet for gfx10 and gfx11 queue. - There's no VM flush and PFP sync, so remove the gfx9 real ring and compute ring buffer usage. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Lang Yu	a522ec528c	drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend umsch test needs full GPU functionality(e.g., VM update, TLB flush, possibly buffer moving under memory pressure) which may be not ready under these states. Just skip it to avoid potential issues. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:38 -04:00
Damien Le Moal	6927b01680	drm/amdgpu: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-11-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-25 12:53:31 -05:00
Jack Xiao	9482552820	drm/amdgpu/mes: fix use-after-free issue Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 23:23:46 -04:00
Alex Deucher	9792b7cc18	drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3 This avoids a potential conflict with firmwares with the newer HDP flush mechanism. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-23 23:23:46 -04:00
Prike Liang	fe93b0927b	drm/amdgpu: Fix the ring buffer size for queue VM flush Here are the corrections needed for the queue ring buffer size calculation for the following cases: - Remove the KIQ VM flush ring usage. - Add the invalidate TLBs packet for gfx10 and gfx11 queue. - There's no VM flush and PFP sync, so remove the gfx9 real ring and compute ring buffer usage. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 23:23:46 -04:00
Lang Yu	661d71ee5a	drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend umsch test needs full GPU functionality(e.g., VM update, TLB flush, possibly buffer moving under memory pressure) which may be not ready under these states. Just skip it to avoid potential issues. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-23 23:23:28 -04:00
Felix Kuehling	b0b13d5321	drm/amdgpu: Update BO eviction priorities Make SVM BOs more likely to get evicted than other BOs. These BOs opportunistically use available VRAM, but can fall back relatively seamlessly to system memory. It also avoids SVM migrations evicting other, more important BOs as they will evict other SVM allocations first. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 23:23:28 -04:00
Peyton Lee	d59198d2d0	drm/amdgpu/vpe: fix vpe dpm setup failed The vpe dpm settings should be done before firmware is loaded. Otherwise, the frequency cannot be successfully raised. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 23:23:02 -04:00
Lijo Lazar	aebd3eb9d3	drm/amdgpu: Assign correct bits for SDMA HDP flush HDP Flush request bit can be kept unique per AID, and doesn't need to be unique SOC-wide. Assign only bits 10-13 for SDMA v4.4.2. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-23 23:22:54 -04:00
Lang Yu	9c783a1121	drm/amdkfd: make sure VM is ready for updating operations When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: `50661eb1a2` ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-23 23:22:23 -04:00
Mukul Joshi	25e9227c6a	drm/amdgpu: Fix leak when GPU memory allocation fails Free the sync object if the memory allocation fails for any reason. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-23 23:17:30 -04:00
Felix Kuehling	e76691f45a	drm/amdgpu: Update BO eviction priorities Make SVM BOs more likely to get evicted than other BOs. These BOs opportunistically use available VRAM, but can fall back relatively seamlessly to system memory. It also avoids SVM migrations evicting other, more important BOs as they will evict other SVM allocations first. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:31 -04:00
Pierre-Eric Pelloux-Prayer	6e042cee74	drm/amdgpu/vcn: fix unitialized variable warnings Avoid returning an uninitialized value if we never enter the loop. This case should never be hit in practice, but returning 0 doesn't hurt. The same fix is applied to the 4 places using the same pattern. v2: - fixed typos in commit message (Alex) - use "return 0;" before the done label instead of initializing r to 0 Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Alex Deucher	3f0664110a	drm/amdgpu/mes11: print MES opcodes rather than numbers Makes it easier to review the logs when there are MES errors. v2: use dbg for emitted, add helpers for fetching strings v3: fix missing commas (Harish) v4: drop command prefixes (Felix) v5: squash in bounds fix (Jun) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed by Shaoyun.liu <Shaoyun.liu@amd.com> (v2) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Peyton Lee	2476c6bd95	drm/amdgpu/vpe: fix vpe dpm setup failed The vpe dpm settings should be done before firmware is loaded. Otherwise, the frequency cannot be successfully raised. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Lijo Lazar	7b19f1f346	drm/amdgpu: Assign correct bits for SDMA HDP flush HDP Flush request bit can be kept unique per AID, and doesn't need to be unique SOC-wide. Assign only bits 10-13 for SDMA v4.4.2. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Stanley.Yang	939c475181	drm/amdgpu: Support setting reset_method at runtime In order to support more test cases, support user change reset_method at runtime. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-23 12:08:30 -04:00
Maxime Ripard	c058e7a8f8	Merge drm/drm-next into drm-misc-next Maíra needs a backmerge to apply v3d patches, and Danilo for some nouveau patches. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-04-23 08:48:56 +02:00
Arunpravin Paneer Selvam	a68c7eaa7a	drm/amdgpu: Enable clear page functionality Add clear page support in vram memory region. v1(Christian): - Dont handle clear page as TTM flag since when moving the BO back in from GTT again we don't need that. - Make a specialized version of amdgpu_fill_buffer() which only clears the VRAM areas which are not already cleared - Drop the TTM_PL_FLAG_WIPE_ON_RELEASE check in amdgpu_object.c v2: - Modify the function name amdgpu_ttm_* (Alex) - Drop the delayed parameter (Christian) - handle amdgpu_res_cleared(&cursor) just above the size calculation (Christian) - Use AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE for clearing the buffers in the free path to properly wait for fences etc.. (Christian) v3(Christian): - Remove buffer clear code in VRAM manager instead change the AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE handling to set the DRM_BUDDY_CLEARED flag. - Remove ! from amdgpu_res_cleared(&cursor) check. v4(Christian): - vres flag setting move to vram manager file - use dma_fence_get_stub in amdgpu_ttm_clear_buffer function - make fence a mandatory parameter and drop the if and the get/put dance Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419063538.11957-2-Arunpravin.PaneerSelvam@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-04-22 19:44:16 +02:00
Arunpravin Paneer Selvam	96950929eb	drm/buddy: Implement tracking clear page feature - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. - Add a function to support defragmentation. v1: - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list(Christian) - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian) - Defragment the memory beginning from min_order till the required memory space is available. v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. v3: - fix Gitlab user reported lockup issue. - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew) - modify to pass the root order instead max_order in fini() function(Matthew) - change bool 1 to true(Matthew) - add check if min_block_size is power of 2(Matthew) - modify the min_block_size datatype to u64(Matthew) v4: - rename the function drm_buddy_defrag with __force_merge. - Include __force_merge directly in drm buddy file and remove the defrag use in amdgpu driver. - Remove list_empty() check(Matthew) - Remove unnecessary space, headers and placement of new variables(Matthew) - Add a unit test case(Matthew) v5: - remove force merge support to actual range allocation and not to bail out when contains && split(Matthew) - add range support to force merge function. v6: - modify the alloc_range() function clear page non merged blocks allocation(Matthew) - correct the list_insert function name(Matthew). Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419063538.11957-1-Arunpravin.PaneerSelvam@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-04-22 19:44:16 +02:00
Dave Airlie	377b5b397d	amd-drm-next-6.10-2024-04-19: amdgpu: - DC resource allocation logic updates - DC IPS fixes - DC YUV fixes - DMCUB fixes - DML2 fixes - Devcoredump updates - USB-C DSC fix - Misc display code cleanups - PSR fixes - MES timeout fix - RAS updates - UAF fix in VA IOCTL - Fix visible VRAM handling during faults - Fix IP discovery handling during PCI rescans - Misc code cleanups - PSP 14 updates - More runtime PM code rework - SMU 14.0.2 support - GPUVM page fault redirection to secondary IH rings for IH 6.x - Suspend/resume fixes - SR-IOV fixes amdkfd: - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix radeon: - Silence UBSAN warnings related to flexible arrays -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZiLyawAKCRC93/aFa7yZ 2BzfAPoDVZjTunizh6SyCFmQamR3eelnxWeY1xaVzmKBHqLCOAEAo2EyThRGyPCH SjD+f+ZlflaXQZtZpiQrOr0rkLvh5Q4= =96hT -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.10-2024-04-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-19: amdgpu: - DC resource allocation logic updates - DC IPS fixes - DC YUV fixes - DMCUB fixes - DML2 fixes - Devcoredump updates - USB-C DSC fix - Misc display code cleanups - PSR fixes - MES timeout fix - RAS updates - UAF fix in VA IOCTL - Fix visible VRAM handling during faults - Fix IP discovery handling during PCI rescans - Misc code cleanups - PSP 14 updates - More runtime PM code rework - SMU 14.0.2 support - GPUVM page fault redirection to secondary IH rings for IH 6.x - Suspend/resume fixes - SR-IOV fixes amdkfd: - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix radeon: - Silence UBSAN warnings related to flexible arrays Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419224332.2938259-1-alexander.deucher@amd.com	2024-04-22 12:28:49 +10:00
Lang Yu	81bf14519a	drm/amdkfd: make sure VM is ready for updating operations When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: `50661eb1a2` ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:54:49 -04:00
Mukul Joshi	e53a1713de	drm/amdgpu: Fix leak when GPU memory allocation fails Free the sync object if the memory allocation fails for any reason. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:54:49 -04:00
Zhigang Luo	6a009ca1bf	drm/amdgpu: remove virt_init_data_exchange from poison consumption handler Host will initiate an FLR for all poison consumption. Guest should wait for FLR message to re-init data exchange. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:47:26 -04:00
Lijo Lazar	8954c3fbe7	drm/amdgpu: Change AID detection logic On GFX 9.4.3 SOCs, only 2 SDMA instances need to be available to be considered as a valid AID. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:47:19 -04:00
Sunil Khatri	93522c1948	drm/amdgpu: enable redirection of irq's for IH V6.1 Enable redirection of irq for pagefaults for specific clients to avoid overflow without dropping interrupts. So here we redirect the interrupts to another IH ring i.e ring1 where only these interrupts are processed. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:46:56 -04:00
Ahmad Rehman	ea137071ad	drm/amdgpu: Skip the coredump collection on reset during driver reload In passthrough environment, the driver triggers the mode-1 reset on reload. The reset causes the core dump collection which is delayed task and prevents driver from unloading until it is completed. Since we do not need to collect data on "reset on reload" case, we can skip core dump collection. v2: Use the same flag to avoid calling amdgpu_reset_reg_dumps as well. Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:46:45 -04:00
Sunil Khatri	ca0afa2f41	drm/amdgpu: enable redirection of irq's for IH V6.0 Enable redirection of irq for pagefaults for specific clients to avoid overflow without dropping interrupts. So here we redirect the interrupts to another IH ring i.e ring1 where only these interrupts are processed. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:46:37 -04:00
Sunil Khatri	5adcd78fa2	drm:amdgpu: enable IH ring1 for IH v6.1 We need IH ring1 for handling the pagefault interrupts which over flow in default ring for specific usecases. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:45:37 -04:00
Sunil Khatri	eefc85a277	drm:amdgpu: enable IH RB ring1 for IH v6.0 We need IH ring1 for handling the pagefault interrupts which are overflowing the default ring for specific usecases. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-18 23:45:22 -04:00
Christian König	a6ff969fe9	drm/amdgpu: fix visible VRAM handling during faults When we removed the hacky start code check we actually didn't took into account that all VRAM pages needs to be CPU accessible. Clean up the code and unify the handling into a single helper which checks if the whole resource is CPU accessible. The only place where a partial check would make sense is during eviction, but that is neglitible. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `aed01a6804` ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-04-17 11:50:43 -04:00
xinhui pan	6fef2d4c00	drm/amdgpu: validate the parameters of bo mapping operations more clearly Verify the parameters of amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place. Fixes: `dc54d3d174` ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2") Cc: stable@vger.kernel.org Reported-by: Vlad Stolyarov <hexed@google.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-17 11:50:43 -04:00
Christian König	ca7c4507ba	drm/amdgpu: remove invalid resource->start check v2 The majority of those where removed in the commit `aed01a6804` ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") But this one was missed because it's working on the resource and not the BO. Since we also no longer use a fake start address for visible BOs this will now trigger invalid mapping errors. v2: also remove the unused variable Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `aed01a6804` ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") CC: stable@vger.kernel.org Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-17 11:02:57 -04:00
Dave Airlie	34633158b8	amd-drm-next-6.10-2024-04-13: amdgpu: - HDCP fixes - ODM fixes - RAS fixes - Devcoredump improvements - Misc code cleanups - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - SR-IOV fixes - Suspend and Resume fixes - DCN 3.5.x Updates - Z8 fixes - UMSCH fixes - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - VCN partitioning fix - DC DWB fixes - VSC SDP fixes - DCN 3.1.6 fix - GC 11.5 fixes - Remove invalid TTM resource start check - DCN 1.0 fixes amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace radeon: - Misc code cleanups -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZhr4EAAKCRC93/aFa7yZ 2B8jAP9z1JpOnjSQvc2mhHAooXRYO4Mj5HCQ25ZE8N4c8ZZhjAEAqefmEx5/UyLh lv2pWILL4o597qhq9nA7hJ6tTICLPAU= =HUwY -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.10-2024-04-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-13: amdgpu: - HDCP fixes - ODM fixes - RAS fixes - Devcoredump improvements - Misc code cleanups - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - SR-IOV fixes - Suspend and Resume fixes - DCN 3.5.x Updates - Z8 fixes - UMSCH fixes - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - VCN partitioning fix - DC DWB fixes - VSC SDP fixes - DCN 3.1.6 fix - GC 11.5 fixes - Remove invalid TTM resource start check - DCN 1.0 fixes amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace radeon: - Misc code cleanups From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240413213708.3427038-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-04-17 15:48:59 +10:00
Kenneth Feng	0c1195ca0d	drm/amd/swsmu: support smu block discovery for smu v14 Support for smu ip block add for SMU v14. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:16 -04:00
Hawking Zhang	577cbed318	drm/amdgpu: rename DBG_DRV to HAD_DRV for psp v14 Add a psp bl command enum for HAD_DRV. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:16 -04:00
Ma Jun	1347853271	drm/amdgpu: refactoring the runtime pm mode detection code refactor the code of runtime pm mode detection to support amdgpu_runtime_pm =2 and 1 two cases Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:16 -04:00
Hawking Zhang	6c6acc5f33	drm/amdgpu: Load ipkeymgr drv for psp v14 while DBG_DRV is renamed to HAD_DRV for psp v14, part of its APIs/functionality is moved to a new component named Ipkeymgr_Drv. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:16 -04:00
Thorsten Blum	12b8b4e685	drm/amdgpu: Add missing space to DRM_WARN() message s/,please/, please/ Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:15 -04:00
Ma Jun	959056982a	drm/amdgpu: Fix discovery initialization failure during pci rescan Waiting for system ready to fix the discovery initialization failure issue. This failure usually occurs when dGPU is removed and then rescanned via command line. It's caused by following two errors: [1] vram size is 0 [2] wrong binary signature Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:15 -04:00
Christian König	394ae0603a	drm/amdgpu: fix visible VRAM handling during faults When we removed the hacky start code check we actually didn't took into account that all VRAM pages needs to be CPU accessible. Clean up the code and unify the handling into a single helper which checks if the whole resource is CPU accessible. The only place where a partial check would make sense is during eviction, but that is neglitible. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `aed01a6804` ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2024-04-16 22:39:15 -04:00
xinhui pan	98856136c4	drm/amdgpu: validate the parameters of bo mapping operations more clearly Verify the parameters of amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place. Fixes: `dc54d3d174` ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2") Cc: stable@vger.kernel.org Reported-by: Vlad Stolyarov <hexed@google.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:15 -04:00
Yang Wang	f23558627f	drm/amdgpu: add new aca smu callback func parse_error_code() add new aca smu callback parse_error_code{} to avoid specific asic check in amdgpu_aca.c file Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:39:15 -04:00
Jonathan Kim	f7c161a4c2	drm/amdgpu: increase mes submission timeout MES internally has a timeout allowance of 2 seconds. Increase driver timeout to 3 seconds to be safe. Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 22:38:59 -04:00
Sunil Khatri	3c858cf65e	drm/amdgpu: add missing vbios version from devcoredump Add vbios version in the devcoredump along with formatting the information with proper alignment. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 21:25:28 -04:00
Alex Deucher	8b9130bae0	drm/amdgpu/gfx11: properly handle regGRBM_GFX_CNTL in soft reset Need to take the srbm_mutex and while we are here, use the helper function soc21_grbm_select(); Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-16 21:25:23 -04:00
Christian König	c8962679af	drm/amdgpu: remove invalid resource->start check v2 The majority of those where removed in the commit `aed01a6804` ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") But this one was missed because it's working on the resource and not the BO. Since we also no longer use a fake start address for visible BOs this will now trigger invalid mapping errors. v2: also remove the unused variable Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `aed01a6804` ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") CC: stable@vger.kernel.org Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-12 00:33:51 -04:00
Jack Xiao	a0e002cdac	drm/amdgpu/sdma6: set sdma hang watchdog Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers to improve SDMA reliability. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-12 00:33:43 -04:00
Luqmaan Irshad	6b0d78032f	drm/amd/amdgpu: Update PF2VF Header Adding a new field for GPU Capacity to align the header with the host. Signed-off-by: Luqmaan Irshad <Luqmaan.Irshad@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-12 00:33:11 -04:00
Yifan Zhang	6dba20d23e	drm/amdgpu: differentiate external rev id for gfx 11.5.0 This patch to differentiate external rev id for gfx 11.5.0. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-10 00:00:32 -04:00
Tim Huang	bbca7f414a	drm/amdgpu: fix incorrect number of active RBs for gfx11 The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-09 23:30:30 -04:00
ZhenGuo Yin	e33997e18d	drm/amdgpu: clear set_q_mode_offs when VM changed [Why] set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE packet to init shadow memory will be skiped, hence there has a page fault. [How] VM flush is needed after GPU reset, clear set_q_mode_offs when emitting VM flush. Fixes: `8bc75586ea` ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-09 23:27:58 -04:00
Lijo Lazar	f7e232de51	drm/amdgpu: Fix VCN allocation in CPX partition VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In certain configs, VCN instance can be exclusively allocated to a partition even under CPX mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 23:27:23 -04:00
Kenneth Feng	3818708e9c	drm/amd/pm: fix the high voltage issue after unload fix the high voltage issue after unload on smu 13.0.10 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 23:26:32 -04:00
Tao Zhou	f886b49fea	drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2 SDMA_CNTL is not set in some cases, driver configures it by itself. v2: simplify code Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 23:16:51 -04:00
Yifan Zhang	533eefb9be	drm/amdgpu: add smu 14.0.1 discovery support This patch to add smu 14.0.1 support Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 23:15:05 -04:00
shaoyunl	5b0cd091d9	drm/amdgpu : Increase the mes log buffer size as per new MES FW version From MES version 0x54, the log entry increased and require the log buffer size to be increased. The 16k is maximum size agreed Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 23:14:40 -04:00
shaoyunl	a3a4c0b123	drm/amdgpu : Add mes_log_enable to control mes log feature The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 23:14:18 -04:00
Lijo Lazar	8b2be55f4d	drm/amdgpu: Reset dGPU if suspend got aborted For SOC21 ASICs, there is an issue in re-enabling PM features if a suspend got aborted. In such cases, reset the device during resume phase. This is a workaround till a proper solution is finalized. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-09 23:12:44 -04:00
Lang Yu	0f1bbcc2ba	drm/amdgpu/umsch: reinitialize write pointer in hw init Otherwise the old one will be used during GPU reset. That's not expected. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-09 23:12:20 -04:00
Lijo Lazar	4b18a91faf	drm/amdgpu: Refine IB schedule error logging Downgrade to debug information when IBs are skipped. Also, use dev_* to identify the device. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 23:11:59 -04:00
Alex Deucher	65ff8092e4	drm/amdgpu: always force full reset for SOC21 There are cases where soft reset seems to succeed, but does not, so always use mode1/2 for now. Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-09 23:10:06 -04:00
Yifan Zhang	526b184e88	drm/amdgpu: differentiate external rev id for gfx 11.5.0 This patch to differentiate external rev id for gfx 11.5.0. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:20:01 -04:00
Tim Huang	d6d6561f93	drm/amdgpu: fix incorrect number of active RBs for gfx11 The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:14:55 -04:00
Zhigang Luo	d1999b4017	amd/amdgpu: improve VF recover time 1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:14:30 -04:00
Lijo Lazar	b41f742d6f	drm/amdgpu: Set fatal errror detected flag earlier In case of fatal errors, set FED status when interrupt is received. Set the flag on other devices in the hive before RAS recovery work. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:13:36 -04:00
ZhenGuo Yin	05e4014168	drm/amdgpu: clear set_q_mode_offs when VM changed [Why] set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE packet to init shadow memory will be skiped, hence there has a page fault. [How] VM flush is needed after GPU reset, clear set_q_mode_offs when emitting VM flush. Fixes: `8bc75586ea` ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:09:21 -04:00
Tao Zhou	4b0cb230bd	drm/amdgpu: retire UMC v12 mca_addr_to_pa RAS TA will handle it, the function is useless. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:09:15 -04:00
chongli2	f6ac084236	drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriov support MES command SET_HW_RESOURCE1 in sriov Signed-off-by: chongli2 <chongli2@amd.com> Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com> Acked-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:53 -04:00
Tao Zhou	9ecef5b2d0	drm/amdgpu: update check condition for XGMI ACA UE Check more possible ext error codes. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:47 -04:00
Lijo Lazar	91bc860116	drm/amdgpu: Fix VCN allocation in CPX partition VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In certain configs, VCN instance can be exclusively allocated to a partition even under CPX mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:34 -04:00
Ma Jun	fcc0735b00	drm/amdgpu: Add support for BAMACO mode checking Optimize the code to add support for BAMACO mode checking Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:18 -04:00
Hawking Zhang	327eec5427	drm/amdgpu: Bypass asd if display hw is not available ASD is not needed by headless GPU. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:08:05 -04:00
Ma Jun	b2207dc698	drm/amdgpu/pm: Add support for MACO flag checking Add support for MACO flag checking. MACO mode only works if BACO is supported. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:07:59 -04:00
Kenneth Feng	7c1d9e10e6	drm/amd/pm: fix the high voltage issue after unload fix the high voltage issue after unload on smu 13.0.10 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:07:30 -04:00
Danijel Slivka	3e2dacca54	drm/amdgpu: use vm_update_mode=0 as default in sriov for gfx10.3 onwards Apply this rule to all newer asics in sriov case. For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime. Moved the check to amdgpu_device_init() to ensure it is done after amdgpu_device_ip_early_init() where the IP versions are discovered. Signed-off-by: Danijel Slivka <danijel.slivka@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:02:37 -04:00
Arunpravin Paneer Selvam	b7a1a0ef12	drm/amd/amdgpu: add pipe1 hardware support Enable pipe1 support starting from SIENNA CICHLID asic Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 22:02:24 -04:00
ZhenGuo Yin	0453e5f220	drm/amdgpu: select HDP ref/mask according to gfx ring pipe Use correct ref/mask for differnent gfx ring pipe. This should fix the gfx hang issue after enabling gfx pipe1. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:59:44 -04:00
Tao Zhou	029faefb73	drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2 SDMA_CNTL is not set in some cases, driver configures it by itself. v2: simplify code Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:59:20 -04:00
Yifan Zhang	f5a3507c4a	drm/amdgpu: add smu 14.0.1 discovery support This patch to add smu 14.0.1 support Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:51:03 -04:00
shaoyunl	8966c31674	drm/amdgpu : Increase the mes log buffer size as per new MES FW version From MES version 0x54, the log entry increased and require the log buffer size to be increased. The 16k is maximum size agreed Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:50:48 -04:00
shaoyunl	e58acb7613	drm/amdgpu : Add mes_log_enable to control mes log feature The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:50:27 -04:00
Yang Wang	81d96e8b5a	drm/amdgpu: refine function signature of amdgpu_aca_get_error_data() refine function signature of amdgpu_aca_get_error_data(); Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:50:09 -04:00
Lijo Lazar	df3c7dc5c5	drm/amdgpu: Reset dGPU if suspend got aborted For SOC21 ASICs, there is an issue in re-enabling PM features if a suspend got aborted. In such cases, reset the device during resume phase. This is a workaround till a proper solution is finalized. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-04-09 21:49:42 -04:00
Sunil Khatri	6a0e1bafd7	drm/amdgpu: add IP's FW information to devcoredump Add FW information of all the IP's in the devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:49:34 -04:00
Lang Yu	108ab31be9	drm/amdgpu/umsch: reinitialize write pointer in hw init Otherwise the old one will be used during GPU reset. That's not expected. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:49:19 -04:00
Lijo Lazar	cd409dbc69	drm/amdgpu: Refine IB schedule error logging Downgrade to debug information when IBs are skipped. Also, use dev_* to identify the device. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-09 21:49:09 -04:00
Dave Airlie	fee54d08bc	Merge tag 'drm-misc-next-2024-03-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next Two misc-next in one. drm-misc-next for v6.10-rc1: The deal of a lifetime! You get ALL of the previous drm-misc-next-2024-03-21-1 tag!! But WAIT, there's MORE! Cross-subsystem Changes: - Assorted DT binding updates. Core Changes: - Clarify how optional wait_hpd_asserted is. - Shuffle Kconfig names around. Driver Changes: - Assorted build fixes for panthor, imagination, - Add AUO B120XAN01.0 panels. - Assorted small fixes to panthor, panfrost. drm-misc-next for v6.10: UAPI Changes: - Move some nouveau magic constants to uapi. Cross-subsystem Changes: - Move drm-misc to gitlab and freedesktop hosting. - Add entries for panfrost. Core Changes: - Improve placement for TTM bo's in idle/busy handling. - Improve drm/bridge init ordering. - Add CONFIG_DRM_WERROR, and use W=1 for drm. - Assorted documentation updates. - Make more (drm and driver) headers self-contained and add header guards. - Grab reservation lock in pin/unpin callbacks. - Fix reservation lock handling for vmap. - Add edp and edid panel matching, use it to fix a nearly identical panel. Driver Changes: - Add drm/panthor driver and assorted fixes. - Assorted small fixes to xlnx, panel-edp, tidss, ci, nouveau, panel and bridge drivers. - Add Samsung s6e3fa7, BOE NT116WHM-N44, CMN N116BCA-EA1, CrystalClear CMT430B19N00, Startek KD050HDFIA020-C020A, powertip PH128800T006-ZHC01 panels. - Fix console for omapdrm. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/bea310a6-6ff6-477e-9363-f9f053cfd12a@linux.intel.com	2024-04-05 13:16:17 +10:00
Thomas Zimmermann	0d21364c6e	Merge drm/drm-next into drm-misc-next Backmerging to get v6.9-rc2 changes into drm-misc-next. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2024-04-02 09:51:30 +02:00
Maxime Ripard	f6d2dc03fa	drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on Most of our helpers have relied on being selected so far through Kconfig, but that creates issues when we have multiple layers of helpers with some depending on others. Indeed, select doesn't select a dependency's dependencies, and thus isn't super intuitive. Depends on however doesn't have that limitation, so we can just switch all the drivers that were selecting DRM_DISPLAY_HDMI_HELPER to depend on it. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-12-eafee11b84b3@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-03-28 11:26:52 +01:00
Maxime Ripard	3166e7e6d9	drm: Switch DRM_DISPLAY_HDCP_HELPER to depends on Most of our helpers have relied on being selected so far through Kconfig, but that creates issues when we have multiple layers of helpers with some depending on others. Indeed, select doesn't select a dependency's dependencies, and thus isn't super intuitive. Depends on however doesn't have that limitation, so we can just switch all the drivers that were selecting DRM_DISPLAY_HDCP_HELPER to depend on it. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-11-eafee11b84b3@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-03-28 11:26:51 +01:00
Maxime Ripard	0323287de8	drm: Switch DRM_DISPLAY_DP_HELPER to depends on Most of our helpers have relied on being selected so far through Kconfig, but that creates issues when we have multiple layers of helpers with some depending on others. Indeed, select doesn't select a dependency's dependencies, and thus isn't super intuitive. Depends on however doesn't have that limitation, so we can just switch all the drivers that were selecting DRM_DISPLAY_DP_HELPER to depend on it. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-10-eafee11b84b3@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-03-28 11:26:51 +01:00
Maxime Ripard	e075e496f5	drm: Switch DRM_DISPLAY_HELPER to depends on Most of our helpers have relied on being selected so far through Kconfig, but that creates issues when we have multiple layers of helpers with some depending on others. Indeed, select doesn't select a dependency's dependencies, and thus isn't super intuitive. Depends on however doesn't have that limitation, so we can just switch all the drivers that were selecting DRM_DISPLAY_HELPER to depend on it. Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-8-eafee11b84b3@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-03-28 11:26:49 +01:00
Johannes Weiner	8678b1060a	drm/amdgpu: fix deadlock while reading mqd from debugfs An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] ====================================================== [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted [ 1318.017598][ T1082] ------------------------------------------------------ [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082] do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082] kernel_init+0x15/0x1a0 [ 1318.027242][ T1082] ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082] do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082] kernel_init+0x15/0x1a0 [ 1318.031168][ T1082] ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}: [ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082] __might_fault+0x58/0x80 [ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082] full_proxy_read+0x55/0x80 [ 1318.034487][ T1082] vfs_read+0xa7/0x360 [ 1318.034788][ T1082] ksys_read+0x70/0xf0 [ 1318.035085][ T1082] do_syscall_64+0x94/0x180 [ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082] CPU0 CPU1 [ 1318.038101][ T1082] ---- ---- [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(&mm->mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] * DEADLOCK * [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] <TASK> [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100 [ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d [ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 [ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d [ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008 [ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007 [ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00 [ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800 [ 1318.050638][ T1082] </TASK> amdgpu_debugfs_mqd_read() holds a reservation when it calls put_user(), which may fault and acquire the mmap_sem. This violates the established locking order. Bounce the mqd data through a kernel buffer to get put_user() out of the illegal section. Fixes: `445d85e3c1` ("drm/amdgpu: add debugfs interface for reading MQDs") Cc: stable@vger.kernel.org # v6.5+ Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 09:30:34 -04:00
Lang Yu	68a2afbcca	drm/amdgpu: enable UMSCH 4.0.6 Share same codes with 4.0.5 and enable collaborate mode for VPE. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 09:30:05 -04:00
Lang Yu	6b154c00cd	drm/amdgpu/umsch: update UMSCH 4.0 FW interface Align with FW changes. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 09:29:42 -04:00
Mario Limonciello	ca299b4512	drm/amd: Flush GFXOFF requests in prepare stage If the system hasn't entered GFXOFF when suspend starts it can cause hangs accessing GC and RLC during the suspend stage. Cc: <stable@vger.kernel.org> # 6.1.y: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.1.y: `cb11ca3233` ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.1.y: `2ceec37b0e` ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.1.y: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.6.y: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.6.y: `cb11ca3233` ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.6.y: `2ceec37b0e` ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.6.y: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.1+ Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132 Fixes: `ab4750332d` ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 08:55:54 -04:00
Peyton Lee	eed14eb48e	drm/amdgpu/vpe: power on vpe when hw_init To fix mode2 reset failure. Should power on VPE when hw_init. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 08:50:20 -04:00
Alex Deucher	d7f1487643	drm/amdgpu: always force full reset for SOC21 There are cases where soft reset seems to succeed, but does not, so always use mode1/2 for now. Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 01:45:22 -04:00
Johannes Weiner	c25d09bcb7	drm/amdgpu: fix deadlock while reading mqd from debugfs An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] ====================================================== [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted [ 1318.017598][ T1082] ------------------------------------------------------ [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082] do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082] kernel_init+0x15/0x1a0 [ 1318.027242][ T1082] ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082] do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082] kernel_init+0x15/0x1a0 [ 1318.031168][ T1082] ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}: [ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082] __might_fault+0x58/0x80 [ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082] full_proxy_read+0x55/0x80 [ 1318.034487][ T1082] vfs_read+0xa7/0x360 [ 1318.034788][ T1082] ksys_read+0x70/0xf0 [ 1318.035085][ T1082] do_syscall_64+0x94/0x180 [ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082] CPU0 CPU1 [ 1318.038101][ T1082] ---- ---- [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(&mm->mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] * DEADLOCK * [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] <TASK> [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100 [ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d [ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 [ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d [ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008 [ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007 [ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00 [ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800 [ 1318.050638][ T1082] </TASK> amdgpu_debugfs_mqd_read() holds a reservation when it calls put_user(), which may fault and acquire the mmap_sem. This violates the established locking order. Bounce the mqd data through a kernel buffer to get put_user() out of the illegal section. Fixes: `445d85e3c1` ("drm/amdgpu: add debugfs interface for reading MQDs") Cc: stable@vger.kernel.org # v6.5+ Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 01:44:24 -04:00
Lang Yu	b9a8aee136	drm/amdgpu: enable UMSCH 4.0.6 Share same codes with 4.0.5 and enable collaborate mode for VPE. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 01:44:17 -04:00
Lang Yu	f3e698978c	drm/amdgpu/umsch: update UMSCH 4.0 FW interface Align with FW changes. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-27 01:44:07 -04:00
Mario Limonciello	0355b24bde	drm/amd: Flush GFXOFF requests in prepare stage If the system hasn't entered GFXOFF when suspend starts it can cause hangs accessing GC and RLC during the suspend stage. Cc: <stable@vger.kernel.org> # 6.1.y: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.1.y: `cb11ca3233` ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.1.y: `2ceec37b0e` ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.1.y: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.6.y: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Cc: <stable@vger.kernel.org> # 6.6.y: `cb11ca3233` ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks") Cc: <stable@vger.kernel.org> # 6.6.y: `2ceec37b0e` ("drm/amd: Add missing kernel doc for prepare_suspend()") Cc: <stable@vger.kernel.org> # 6.6.y: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Cc: <stable@vger.kernel.org> # 6.1+ Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132 Fixes: `ab4750332d` ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:57:04 -04:00
Srinivasan Shanmugam	eb4f6eca26	drm/amdgpu: Fix truncations in gfx_v11_0_init_microcode() Reducing the size of ucode_prefix to 25 in the gfx_v11_0_init_microcode function. This would ensure that the total number of characters being written into fw_name does not exceed its size of 40. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c: In function ‘gfx_v11_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:523:54: warning: ‘_pfp.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 523 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:523:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 523 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:540:54: warning: ‘_me.bin’ directive output may be truncated writing 7 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 540 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix); \| ^~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:540:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 40 540 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:557:70: warning: ‘_rlc.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 557 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:557:25: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 557 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:569:54: warning: ‘_mec.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 569 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:569:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 569 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CC [M] drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_clockpowergating.o Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:56:57 -04:00
Tao Zhou	8e4617c25d	drm/amdgpu: simplify convert_error_address interface for UMC v12 Replace separate parameters with struct ta_ras_query_address_input. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:56:18 -04:00
Srinivasan Shanmugam	539ff12ee5	drm/amdgpu: Fix truncation issues in gfx_v9_0.c The size of fw_name is increased to ensure that it can accommodate the maximum possible size of the string being written into it. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c: In function ‘gfx_v9_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1255:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1255 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", chip_name); \| ^~ ...... 1393 \| r = gfx_v9_0_init_cp_gfx_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1255:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 1255 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1261:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1261 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", chip_name); \| ^~ ...... 1393 \| r = gfx_v9_0_init_cp_gfx_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1261:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 30 1261 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1267:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1267 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce.bin", chip_name); \| ^~ ...... 1393 \| r = gfx_v9_0_init_cp_gfx_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1267:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 30 1267 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1303:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1303 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc_am4.bin", chip_name); \| ^~ ...... 1398 \| r = gfx_v9_0_init_rlc_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1303:17: note: ‘snprintf’ output between 20 and 49 bytes into a destination of size 30 1303 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc_am4.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1309:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1309 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_kicker_rlc.bin", chip_name); \| ^~ ...... 1398 \| r = gfx_v9_0_init_rlc_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1309:17: note: ‘snprintf’ output between 23 and 52 bytes into a destination of size 30 1309 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_kicker_rlc.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1311:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1311 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~ ...... 1398 \| r = gfx_v9_0_init_rlc_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1311:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 1311 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1344:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1344 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec.bin", chip_name); \| ^~ ...... 1402 \| r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1344:17: note: ‘snprintf’ output between 20 and 49 bytes into a destination of size 30 1344 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1346:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1346 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~ ...... 1402 \| r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1346:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 1346 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1356:68: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1356 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec2.bin", chip_name); \| ^~ ...... 1402 \| r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1356:25: note: ‘snprintf’ output between 21 and 50 bytes into a destination of size 30 1356 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec2.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1358:68: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 1358 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2.bin", chip_name); \| ^~ ...... 1402 \| r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix); \| ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1358:25: note: ‘snprintf’ output between 17 and 46 bytes into a destination of size 30 1358 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2.bin", chip_name); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:56:10 -04:00
Srinivasan Shanmugam	927a8a800e	drm/amdgpu: Fix truncation in gfx_v10_0_init_microcode The total size of the fw_name buffer is 8 (for "amdgpu/") + 30 (for ucode_prefix) + 5 (for "_pfp") + 5 (for "_wks") + 5 (for ".bin") = 53 characters. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c: In function ‘gfx_v10_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3982:58: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 3982 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp%s.bin", ucode_prefix, wks); \| ^~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3982:9: note: ‘snprintf’ output between 16 and 49 bytes into a destination of size 40 3982 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3988:57: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 1 and 30 [-Wformat-truncation=] 3988 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me%s.bin", ucode_prefix, wks); \| ^~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3988:9: note: ‘snprintf’ output between 15 and 48 bytes into a destination of size 40 3988 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3994:57: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 1 and 30 [-Wformat-truncation=] 3994 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce%s.bin", ucode_prefix, wks); \| ^~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3994:9: note: ‘snprintf’ output between 15 and 48 bytes into a destination of size 40 3994 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4001:62: warning: ‘_rlc.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 4001 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4001:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 4001 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4017:58: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 4017 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec%s.bin", ucode_prefix, wks); \| ^~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4017:9: note: ‘snprintf’ output between 16 and 49 bytes into a destination of size 40 4017 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4024:54: warning: ‘_mec2’ directive output may be truncated writing 5 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 4024 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2%s.bin", ucode_prefix, wks); \| ^~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4024:9: note: ‘snprintf’ output between 17 and 50 bytes into a destination of size 40 4024 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2%s.bin", ucode_prefix, wks); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:55:54 -04:00
Srinivasan Shanmugam	20fd14460f	drm/amdgpu: Fix 'fw_name' buffer size to prevent truncations in amdgpu_mes_init_microcode The snprintf function is used to write a formatted string into fw_name. The format of the string is "amdgpu/%s_mes%s.bin", where %s is replaced by the string in ucode_prefix and the second %s is replaced by either "_2" or "1" depending on the condition pipe == AMDGPU_MES_SCHED_PIPE. The length of the string "amdgpu/%s_mes%s.bin" is 16 characters plus the length of ucode_prefix and the length of the string "_2" or "1". The size of ucode_prefix is 30, so the maximum length of ucode_prefix is 29 characters (since one character is needed for the null terminator). Therefore, the maximum possible length of the string written into fw_name is 16 + 29 + 2 = 47 characters. The size of fw_name is 40, so if the length of the string written into fw_name is more than 39 characters (since one character is needed for the null terminator), it will be truncated by the snprintf function, and thus warnings will be seen. By increasing the size of fw_name to 50, we ensure that fw_name is large enough to hold the maximum possible length of the string, so the snprintf function will not truncate the output. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c: In function ‘amdgpu_mes_init_microcode’: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1482:66: warning: ‘%s’ directive output may be truncated writing up to 1 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 1482 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1482:17: note: ‘snprintf’ output between 16 and 46 bytes into a destination of size 40 1482 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1483 \| ucode_prefix, \| ~~~~~~~~~~~~~ 1484 \| pipe == AMDGPU_MES_SCHED_PIPE ? "" : "1"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:66: warning: ‘%s’ directive output may be truncated writing 1 byte into a region of size between 0 and 29 [-Wformat-truncation=] 1477 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~ 1478 \| ucode_prefix, 1479 \| pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); \| ~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:17: note: ‘snprintf’ output between 17 and 46 bytes into a destination of size 40 1477 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1478 \| ucode_prefix, \| ~~~~~~~~~~~~~ 1479 \| pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:66: warning: ‘%s’ directive output may be truncated writing 2 bytes into a region of size between 0 and 29 [-Wformat-truncation=] 1477 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~ 1478 \| ucode_prefix, 1479 \| pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); \| ~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:17: note: ‘snprintf’ output between 18 and 47 bytes into a destination of size 40 1477 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1478 \| ucode_prefix, \| ~~~~~~~~~~~~~ 1479 \| pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1489:62: warning: ‘_mes.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=] 1489 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes.bin", \| ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1489:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40 1489 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes.bin", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1490 \| ucode_prefix); \| ~~~~~~~~~~~~~ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:55:44 -04:00
Srinivasan Shanmugam	7c2bc34ab9	drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init() Reducing the size of ucode_prefix to 25 in the amdgpu_vcn_early_init function. This would ensure that the total number of characters being written into fw_name does not exceed its size of 40. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c: In function ‘amdgpu_vcn_early_init’: drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:66: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=] 102 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:17: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 40 102 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:66: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=] 102 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:17: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 40 102 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:105:73: warning: ‘.bin’ directive output may be truncated writing 4 bytes into a region of size between 2 and 31 [-Wformat-truncation=] 105 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_%d.bin", ucode_prefix, i); \| ^~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:105:25: note: ‘snprintf’ output between 14 and 43 bytes into a destination of size 40 105 \| snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_%d.bin", ucode_prefix, i); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:55:01 -04:00
Tao Zhou	8b3495eafb	drm/amdgpu: add socket id parameter for psp query address cmd And set the socket id. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:54:54 -04:00
Shashank Sharma	f88a7dd06a	drm/amdgpu: Add a NULL check for freeing root PT This patch adds a NULL check to fix this crash reported during the freeing of root PT entry: BUG: unable to handle page fault for address: ffffc9002d637aa0 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page RIP: 0010:amdgpu_vm_pt_free+0x66/0xe0 [amdgpu] PKRU: 55555554 Call Trace: <TASK> amdgpu_vm_pt_free_root+0x60/0xa0 [amdgpu] amdgpu_vm_fini+0x2cb/0x5d0 [amdgpu] ? amdgpu_ctx_mgr_entity_fini+0x53/0x1c0 [amdgpu] amdgpu_driver_postclose_kms+0x191/0x2d0 [amdgpu] drm_file_free.part.0+0x1e5/0x260 [drm] Cc: Christian König <Christian.Koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Acked-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:51:55 -04:00
Sunil Khatri	9022f01b97	drm/amdgpu: refactor code to split devcoredump code Refractor devcoredump code into new files since its functionality is expanded further and better to slit and devcoredump to have its own file. v2: Fix the build failure caught by arm compiler of implicit function declaration with #ifdef v3: squash in fix for implicit declaration error Cc: Ivan Lipski <ivan.lipski@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:51:48 -04:00
Peyton Lee	9ddafd1d14	drm/amdgpu/vpe: power on vpe when hw_init To fix mode2 reset failure. Should power on VPE when hw_init. Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:50:20 -04:00
Yang Wang	31fd330b97	drm/amdgpu: add ras event id support for ACA add ras event id support for ACA. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:48:18 -04:00
Yang Wang	bd15bf742f	drm/amdgpu: avoid update aca bank multi times during ras isr Because the UE Valid MCA count will only be cleared after reset, in order to avoid repeated counting of the error count, the aca bank is only updated once during ras isr. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:48:11 -04:00
Yang Wang	f7bcfb7a56	drm/amdgpu: retrieve umc odecc error count for aca umc v12.0 retrieve umc odecc error count for aca umc v12.0 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:48:03 -04:00
Shashank Sharma	b6c4f90b38	drm/amdgpu: sync page table freeing with tlb flush The idea behind this patch is to delay the freeing of PT entry objects until the TLB flush is done. This patch: - Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the objects that need to be freed after tlb_flush. - Adds PT entries in this list in amdgpu_vm_ptes_update after finding the PT entry. - Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free) to simply freeing of the BOs, also renames it to amdgpu_vm_pt_free_list to reflect this same. - Exports function amdgpu_vm_pt_free_list to be called directly. - Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range. V2: rebase V4: Addressed review comments from Christian - add only locked PTEs entries in TLB flush waitlist. - do not create a separate function for list flush. - do not create a new lock for TLB flush. - there is no need to wait on tlb_flush_fence exclusively. V5: Addressed review comments from Christian - change the amdgpu_vm_pt_free_dfs's functionality to simple freeing of the objects and rename it. - add all the PTE objects in params->tlb_flush_waitlist - let amdgpu_vm_pt_free_root handle the freeing of BOs independently - call amdgpu_vm_pt_free_list directly V6: Rebase V7: Rebase V8: Added a NULL check to fix this backtrace issue: [ 415.351447] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 415.359245] #PF: supervisor write access in kernel mode [ 415.365081] #PF: error_code(0x0002) - not-present page [ 415.370817] PGD 101259067 P4D 101259067 PUD 10125a067 PMD 0 [ 415.377140] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 415.382004] CPU: 0 PID: 25481 Comm: test_with_MPI.e Tainted: G OE 5.18.2-mi300-build-140423-ubuntu-22.04+ #24 [ 415.394437] Hardware name: AMD Corporation Sh51p/Sh51p, BIOS RMO1001AS 02/21/2024 [ 415.402797] RIP: 0010:amdgpu_vm_ptes_update+0x6fd/0xa10 [amdgpu] [ 415.409648] Code: 4c 89 ff 4d 8d 66 30 e8 f1 ed ff ff 48 85 db 74 42 48 39 5d a0 74 40 48 8b 53 20 48 8b 4b 18 48 8d 43 18 48 8d 75 b0 4c 89 ff <48 > 89 51 08 48 89 0a 49 8b 56 30 48 89 42 08 48 89 53 18 4c 89 63 [ 415.430621] RSP: 0018:ffffc9000401f990 EFLAGS: 00010287 [ 415.436456] RAX: ffff888147bb82f0 RBX: ffff888147bb82d8 RCX: 0000000000000000 [ 415.444426] RDX: 0000000000000000 RSI: ffffc9000401fa30 RDI: ffff888161f80000 [ 415.452397] RBP: ffffc9000401fa80 R08: 0000000000000000 R09: ffffc9000401fa00 [ 415.460368] R10: 00000007f0cc0000 R11: 00000007f0c85000 R12: ffffc9000401fb20 [ 415.468340] R13: 00000007f0d00000 R14: ffffc9000401faf0 R15: ffff888161f80000 [ 415.476312] FS: 00007f132ff89840(0000) GS:ffff889f87c00000(0000) knlGS:0000000000000000 [ 415.485350] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 415.491767] CR2: 0000000000000008 CR3: 0000000161d46003 CR4: 0000000000770ef0 [ 415.499738] PKRU: 55555554 [ 415.502750] Call Trace: [ 415.505482] <TASK> [ 415.507825] amdgpu_vm_update_range+0x32a/0x880 [amdgpu] [ 415.513869] amdgpu_vm_clear_freed+0x117/0x250 [amdgpu] [ 415.519814] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x18c/0x250 [amdgpu] [ 415.527729] kfd_ioctl_unmap_memory_from_gpu+0xed/0x340 [amdgpu] [ 415.534551] kfd_ioctl+0x3b6/0x510 [amdgpu] V9: Addressed review comments from Christian - No NULL check reqd for root PT freeing - Free PT list regardless of needs_flush - Move adding BOs in list in a separate function V10: Added Christian's RB V11: squash in list fix Cc: Christian König <Christian.Koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-22 15:47:17 -04:00
Linus Torvalds	7ee0490121	drm fixes for 6.9-rc1 core: - fix rounding in drm_fixp2int_round() bridge: - fix documentation for DRM_BRIDGE_OP_EDID sun4i: - fix 64-bit division on 32-bit architectures tests: - fix dependency on DRM_KMS_HELPER probe-helper: - never return negative values from .get_modes() plus driver fixes xe: - invalidate userptr vma on page pin fault - fail early on sysfs file creation error - skip VMA pinning on xe_exec if no batches nouveau: - clear bo resource bus after eviction - documentation fixes - don't check devinit disable on GSP amdgpu: - Freesync fixes - UAF IOCTL fixes - Fix mmhub client ID mapping - IH 7.0 fix - DML2 fixes - VCN 4.0.6 fix - GART bind fix - GPU reset fix - SR-IOV fix - OD table handling fixes - Fix TA handling on boards without display hardware - DML1 fix - ABM fix - eDP panel fix - DPPCLK fix - HDCP fix - Revert incorrect error case handling in ioremap - VPE fix - HDMI fixes - SDMA 4.4.2 fix - Other misc fixes amdkfd: - Fix duplicate BO handling in process restore -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmX833kACgkQDHTzWXnE hr4ZDA/8CC/jW8drOInD52pqPxtFrhLPZ/pD+Vz4BWkcLnzRiM2d5gM3z3/JE7xi BGcxmhPgwT/M9oZENRxtSf2uUV/RZ0OMj+Mpwnew1YpANwWBe8pKeiwbHA4A4qzP Z46VLK0+BXEjh0btC4RY/Ji6yEuNCNAh0FBWTaLYoakN8M1JAJCUYFrXeWp8gYVm 4yZETXO64iGNYy4wz9tD5fohC3xo1t9WRcskBV97uNrntDQlagoEdAnh1VF2K3yC SAwF3O8J60xh7osNx5YE4ENXynvh7UaAc75kliSsWoZKoTb1TFlyu8abE7NQRRXc 9fjzwB6tQ6BJRFpmQGF6RHAhuoddqm9nPaOYOfQ/wXbVV1ajYvXmK+eGhHfR6/VO YYzhXksd9LGX34RWAcs9lJbV+EjG4buYnSThkvCYcPs2Ys+JppwlPYSd9sC8vqgP 6D7slAoa8rh99WWz4mZ7ZuOveiOUS3Yzie5Vms2Dlwl/kHW5E1WpeEw+fXAqq08M m83whU2cod/oUvJtcuYFLoNAlhYYngVOI9XGgdM+eL/dpdKjTpyH9JHHEGj/Nejr W7Kts9CLLBShNKR8Wo2fyTu1n9dwY/eFVA1P48Mt03345G/fMNtPxy+M1Rt6LHQ2 fmeBSU1P6mqoFeji4xCRXdJ4oDveNnHlyW9J9QJGXG44mN89PCc= =EW4i -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Fixes from the last week (or 3 weeks in amdgpu case), after amdgpu, it's xe and nouveau then a few scattered core fixes. core: - fix rounding in drm_fixp2int_round() bridge: - fix documentation for DRM_BRIDGE_OP_EDID sun4i: - fix 64-bit division on 32-bit architectures tests: - fix dependency on DRM_KMS_HELPER probe-helper: - never return negative values from .get_modes() plus driver fixes xe: - invalidate userptr vma on page pin fault - fail early on sysfs file creation error - skip VMA pinning on xe_exec if no batches nouveau: - clear bo resource bus after eviction - documentation fixes - don't check devinit disable on GSP amdgpu: - Freesync fixes - UAF IOCTL fixes - Fix mmhub client ID mapping - IH 7.0 fix - DML2 fixes - VCN 4.0.6 fix - GART bind fix - GPU reset fix - SR-IOV fix - OD table handling fixes - Fix TA handling on boards without display hardware - DML1 fix - ABM fix - eDP panel fix - DPPCLK fix - HDCP fix - Revert incorrect error case handling in ioremap - VPE fix - HDMI fixes - SDMA 4.4.2 fix - Other misc fixes amdkfd: - Fix duplicate BO handling in process restore" * tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel: (50 commits) drm/amdgpu/pm: Don't use OD table on Arcturus drm/amdgpu: drop setting buffer funcs in sdma442 drm/amd/display: Fix noise issue on HDMI AV mute drm/amd/display: Revert Remove pixle rate limit for subvp Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode" Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()" drm/amd/display: Add a dc_state NULL check in dc_state_release drm/amd/display: Return the correct HDCP error code drm/amd/display: Implement wait_for_odm_update_pending_complete drm/amd/display: Lock all enabled otg pipes even with no planes drm/amd/display: Amend coasting vtotal for replay low hz drm/amd/display: Fix idle check for shared firmware state drm/amd/display: Update odm when ODM combine is changed on an otg master pipe with no plane drm/amd/display: Init DPPCLK from SMU on dcn32 drm/amd/display: Add monitor patch for specific eDP drm/amd/display: Allow dirty rects to be sent to dmub when abm is active drm/amd/display: Override min required DCFCLK in dml1_validate drm/amdgpu: Bypass display ta if display hw is not available drm/amdgpu: correct the KGQ fallback message drm/amdgpu/pm: Check the validity of overdiver power limit ...	2024-03-21 19:04:31 -07:00
Hawking Zhang	a61e2ce9d4	drm/amdgpu: Enable smuio v14_0_2 callbacks Enable smuio v14_0_2_callbacks Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:16 -04:00
Hawking Zhang	d80e44a34e	drm/amdgpu: Add smuio callback to get gpu clk counter Add smuio callback to get gpu clk counter Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:16 -04:00
Hawking Zhang	2d93151de8	drm/amdgpu: Add smuio v14_0_2 ip block support Add smuio v14_0_2 ip block support Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:16 -04:00
Yang Wang	b93d759f54	drm/amdgpu: add umc v12.0.0 deferred error support add umc v12.0.0 deferred error support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:15 -04:00
Yang Wang	865d339763	drm/amdgpu: add aca deferred error type support add aca deferred error type support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:15 -04:00
Tao Zhou	2fc46e0b2f	drm/amdgpu: make reset method configurable for RAS poison Each RAS block has different requirement for gpu reset in poison consumption handling. Add support for mmhub RAS poison consumption handling. v2: remove the mmhub poison support for kfd int v10. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:15 -04:00
Yang Wang	e3d4de8d8b	drm/amdgpu: retire unused aca_bank_report data structure retire unused aca_bank_report data structure. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:15 -04:00
Candice Li	f26c4e3fc9	drm/amdgpu: Update setting EEPROM table version Use helper function instead of umc callback to set EEPROM table version. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:15 -04:00
Yang Wang	69bf42fbb2	drm/amdgpu: refine aca error cache for umc v12.0 refine aca error cache for umc v12.0 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:15 -04:00
Yang Wang	87428b4054	drm/amdgpu: refine aca error cache for sdma v4.4.2 refine aca error cache for sdma v4.4.2 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:15 -04:00
Yang Wang	62d2aaa7d4	drm/amdgpu: refine aca error cache for xgmi v6.4.0 refine aca error cache for xgmi v6.4.0 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:15 -04:00
Tao Zhou	d8070c4241	drm/amdgpu: support utcl2 RAS poison query for mmhub Support the query for both gfxhub and mmhub, also replace xcc_id with hub_inst. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:14 -04:00
Tao Zhou	176c3e8956	drm/amdgpu: add utcl2 RAS poison query for mmhub Add it for mmhub v1.8. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:14 -04:00
Yang Wang	5275114a70	drm/amdgpu: refine aca error cache for mmhub v1.8 refine aca error cache for mmhub v1.8 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:14 -04:00
Christian Koenig	d8a3f0a034	drm/amdgpu: implement TLB flush fence The problem is that when (for example) 4k pages are replaced with a single 2M page we need to wait for change to be flushed out by invalidating the TLB before the PT can be freed. Solve this by moving the TLB flush into a DMA-fence object which can be used to delay the freeing of the PT BOs until it is signaled. V2: (Shashank) - rebase - set dma_fence_error only in case of error - add tlb_flush fence only when PT/PD BO is locked (Felix) - use vm->pasid when f is NULL (Mukul) V4: - add a wait for (f->dependency) in tlb_fence_work (Christian) - move the misplaced fence_create call to the end (Philip) V5: - free the f->dependency properly V6: (Shashank) - light code movement, moved all the clean-up in previous patch - introduce params.needs_flush and its usage in this patch - rebase without TLB HW sequence patch V7: - Keep the vm->last_update_fence and tlb_cb code until we can fix the HW sequencing (Christian) - Move all the tlb_fence related code in a separate function so that its easier to read and review V9: Addressed review comments from Christian - start PT update only when we have callback memory allocated V10: - handle device unlock in OOM case (Christian, Mukul) - added Christian's R-B Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:14 -04:00
Yang Wang	e6136150cd	drm/amdgpu: refine aca error cache for gfx v9.4.3 refine aca error cache for gfx 9.4.3 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:14 -04:00
Yang Wang	949899cbac	drm/amdgpu: add new api to save error count into aca cache add new api to save error count into aca cache. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:14 -04:00
Yang Wang	abc3b5d21d	drm/amdgpu: add new aca_smu_type support Add new types to distinguish between ACA error type and smu mca type. e.g.: the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank channel, so add new type 'aca_smu_type' to distinguish aca error type and smu mca type. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:14 -04:00
Sunil Khatri	6fe4dab331	drm/amdgpu: remove the adev check for NULL adev is a global data structure and isn't expected to be NULL and hence removing the redundant adev check from the devcoredump code. Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:13 -04:00
Likun Gao	3cfaadbe0f	drm/amdgpu: add support for atom fw version v3_5 Support for atom_firmware_info_v3_5. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:13 -04:00
Hawking Zhang	765bea0d73	drm/amdgpu: Apply retry to IP discovery v2 and v4 To ensure GPU driver touch the local framebuffer until it is initialized by integrated firmware. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:13 -04:00
Yang Wang	9dc57c2adf	drm/amdgpu: add ras event id support add amdgpu ras event id support to better distinguish different error information sources in dmesg logs. the following log will be identify by event id: {event_id} interrupt to inform RAS event {event_id} ACA logs {event_id} errors statistic since from current injection/error query {event_id} errors statistic since from gpu load Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:13 -04:00
Zhigang Luo	ab66c83284	drm/amdgpu: trigger flr_work if reading pf2vf data failed if reading pf2vf data failed 30 times continuously, it means something is wrong. Need to trigger flr_work to recover the issue. also use dev_err to print the error message to get which device has issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL timeout. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:13 -04:00
Sunil Khatri	d72e2bdac4	drm/amdgpu: add the hw_ip version of all IP's Add all the IP's version information on a SOC to the devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:38:11 -04:00
Victor Skvortsov	6bb89d1340	drm/amdgpu: Skip virt_exchange_init on SDMA poison consumption Host will initiate an FLR in SDMA poison consumption scenario. Guest should wait for FLR message to re-init data exchange. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:37:38 -04:00
Lijo Lazar	dfe9c3cde2	drm/amdgpu: Do a basic health check before reset Check if the device is present in the bus before trying to recover. It could be that device itself is lost from the bus in some hang situations. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:37:38 -04:00
Shashank Sharma	97d5aa6030	drm/amdgpu: cleanup unused variable This patch removes an unused input variable in the MES doorbell function. Cc: Christian König <Christian.Koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:37:37 -04:00
Tao Zhou	0c501d3c11	drm/amdgpu: skip GFX FED error in page fault handling Let kfd interrupt handler process it. v2: return 0 instead of 1 for fed error. drop the usage of strcmp in interrupt handler. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:37:36 -04:00
Tao Zhou	71a8d61ebc	drm/amdgpu: retire gfx ras query_utcl2_poison_status Replace it with related interface in gfxhub functions. v2: replace node id with xcc id. get node id for query_utcl2_poison_status Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:37:36 -04:00
Sunil Khatri	3eb899c40a	drm/amdgpu: add ring buffer information in devcoredump Add relevant ringbuffer information such as rptr, wptr,rb mask, ring name, ring size and also the rings content for each ring on a gpu reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:37:36 -04:00
Sunil Khatri	583681d4a4	drm/amdgpu: add vm fault information to devcoredump Add page fault information to the devcoredump. Output of devcoredump: ** AMDGPU Device Coredump ** version: 1 kernel: 6.7.0-amd-staging-drm-next module: amdgpu time: 29.725011811 process_name: soft_recovery_p PID: 1720 Ring timed out details IP Type: 0 Ring Name: gfx_0.0.0 [gfxhub] Page fault observed Faulty page starting at address: 0x0000000000000000 Protection fault status register: 0x301031 VRAM is lost due to GPU reset! Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:37:36 -04:00
Tao Zhou	fb0f5f5414	drm/amdgpu: add utcl2 poison query for gfxhub Implement it for gfxhub 1.0 and 1.2. v2: input logical xcc id for poison query interface. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:37:36 -04:00
Sunil Khatri	dc406d92a0	drm/amdgpu: add recent pagefault info in vm_manager Currently page fault information is stored per vm and which could be freed or stale during reset. Add it pagefault information in the vm_manager which is a global space for vm's and remains valid across. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:37:36 -04:00
Le Ma	ad550dbe8a	drm/amdgpu: drop setting buffer funcs in sdma442 To fix the entity rq NULL issue. This setting has been moved to upper level. Fixes: `b70438004a` ("drm/amdgpu: move buffer funcs setting up a level") Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:36:29 -04:00
Lang Yu	1b7eec6bf3	Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode" Ready now. Remove this workaround. This reverts commit `d40f6213b5`. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Tested-by: Alan Liu <haoping.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:59 -04:00
Ma Jun	03c6284df1	Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()" This patch causes the following iounmap erorr and calltrace iounmap: bad address 00000000d0b3631f The original patch was unjustified because amdgpu_device_fini_sw() will always cleanup the rmmio mapping. This reverts commit `eb4f139888`. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:59 -04:00
Hawking Zhang	9b3fec307f	drm/amdgpu: Bypass display ta if display hw is not available Do not load/invoke display TA if display hardware is not available. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:58 -04:00
Prike Liang	43bda3e782	drm/amdgpu: correct the KGQ fallback message Fix the KGQ fallback function name, as this will help differentiate the failure in the KCQ enablement. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:58 -04:00
ZhenGuo Yin	56b30ac84c	drm/amdgpu: Skip access PF-only registers on gfx10/gfxhub2_1 under SRIOV [Why] RLCG interface returns "out-of-range" error under SRIOV VF when accessing PF-only registers. [How] Skip access PF-only registers on gfx10/gfxhub2_1 under SRIOV. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:57 -04:00
Ahmad Rehman	f679fd6057	drm/amdgpu: Init zone device and drm client after mode-1 reset on reload In passthrough environment, when amdgpu is reloaded after unload, mode-1 is triggered after initializing the necessary IPs, That init does not include KFD, and KFD init waits until the reset is completed. KFD init is called in the reset handler, but in this case, the zone device and drm client is not initialized, causing app to create kernel panic. v2: Removing the init KFD condition from amdgpu_amdkfd_drm_client_create. As the previous version has the potential of creating DRM client twice. v3: v2 patch results in SDMA engine hung as DRM open causes VM clear to SDMA before SDMA init. Adding the condition to in drm client creation, on top of v1, to guard against drm client creation call multiple times. Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:57 -04:00
Philip Yang	6c6064cbe5	drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag Otherwise after the GTT bo is released, the GTT and gart space is freed but amdgpu_ttm_backend_unbind will not clear the gart page table entry and leave valid mapping entry pointing to the stale system page. Then if GPU access the gart address mistakely, it will read undefined value instead page fault, harder to debug and reproduce the real issue. Cc: stable@vger.kernel.org Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:57 -04:00
Saleemkhan Jamadar	6a7cbbc267	drm/amdgpu/vcn: enable vcn1 fw load for VCN 4_0_6 v1 - update the fw header for each vcn instance (Veera) VCN1 has different FW binary in VCN v4_0_6. Add changes to load the VCN1 fw binary Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:57 -04:00
Friedrich Vock	c6ba60af01	drm/amdgpu: Reset IH OVERFLOW_EN bit for IH 7.0 IH 7.0 support landed shortly after the original patch for resetting the bit on all other generations, but without that patch applied. Fixes: `12443fc53e` ("drm/amdgpu: Add ih v7_0 ip block support") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:57 -04:00
Lang Yu	6540ff6482	drm/amdgpu: fix mmhub client id out-of-bounds access Properly handle cid 0x140. Fixes: `aba2be4147` ("drm/amdgpu: add mmhub 3.3.0 support") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:57 -04:00
Vitaly Prosyak	22207fd5c8	drm/amdgpu: fix use-after-free bug The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl to the AMDGPU DRM driver on any ASICs with an invalid address and size. The bug was reported by Joonkyo Jung <joonkyoj@yonsei.ac.kr>. For example the following code: static void Syzkaller1(int fd) { struct drm_amdgpu_gem_userptr arg; int ret; arg.addr = 0xffffffffffff0000; arg.size = 0x80000000; /2 Gb/ arg.flags = 0x7; ret = drmIoctl(fd, 0xc1186451/amdgpu_gem_userptr_ioctl/, &arg); } Due to the address and size are not valid there is a failure in amdgpu_hmm_register->mmu_interval_notifier_insert->__mmu_interval_notifier_insert-> check_shl_overflow, but we even the amdgpu_hmm_register failure we still call amdgpu_hmm_unregister into amdgpu_gem_object_free which causes access to a bad address. The following stack is below when the issue is reproduced when Kazan is enabled: [ +0.000014] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000009] RIP: 0010:mmu_interval_notifier_remove+0x327/0x340 [ +0.000017] Code: ff ff 49 89 44 24 08 48 b8 00 01 00 00 00 00 ad de 4c 89 f7 49 89 47 40 48 83 c0 22 49 89 47 48 e8 ce d1 2d 01 e9 32 ff ff ff <0f> 0b e9 16 ff ff ff 4c 89 ef e8 fa 14 b3 ff e9 36 ff ff ff e8 80 [ +0.000014] RSP: 0018:ffffc90002657988 EFLAGS: 00010246 [ +0.000013] RAX: 0000000000000000 RBX: 1ffff920004caf35 RCX: ffffffff8160565b [ +0.000011] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881a9f78260 [ +0.000010] RBP: ffffc90002657a70 R08: 0000000000000001 R09: fffff520004caf25 [ +0.000010] R10: 0000000000000003 R11: ffffffff8161d1d6 R12: ffff88810e988c00 [ +0.000010] R13: ffff888126fb5a00 R14: ffff88810e988c0c R15: ffff8881a9f78260 [ +0.000011] FS: 00007ff9ec848540(0000) GS:ffff8883cc880000(0000) knlGS:0000000000000000 [ +0.000012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000010] CR2: 000055b3f7e14328 CR3: 00000001b5770000 CR4: 0000000000350ef0 [ +0.000010] Call Trace: [ +0.000006] <TASK> [ +0.000007] ? show_regs+0x6a/0x80 [ +0.000018] ? __warn+0xa5/0x1b0 [ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340 [ +0.000018] ? report_bug+0x24a/0x290 [ +0.000022] ? handle_bug+0x46/0x90 [ +0.000015] ? exc_invalid_op+0x19/0x50 [ +0.000016] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? kasan_save_stack+0x26/0x50 [ +0.000017] ? mmu_interval_notifier_remove+0x23b/0x340 [ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340 [ +0.000019] ? mmu_interval_notifier_remove+0x23b/0x340 [ +0.000020] ? __pfx_mmu_interval_notifier_remove+0x10/0x10 [ +0.000017] ? kasan_save_alloc_info+0x1e/0x30 [ +0.000018] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __kasan_kmalloc+0xb1/0xc0 [ +0.000018] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_read+0x11/0x20 [ +0.000020] amdgpu_hmm_unregister+0x34/0x50 [amdgpu] [ +0.004695] amdgpu_gem_object_free+0x66/0xa0 [amdgpu] [ +0.004534] ? __pfx_amdgpu_gem_object_free+0x10/0x10 [amdgpu] [ +0.004291] ? do_syscall_64+0x5f/0xe0 [ +0.000023] ? srso_return_thunk+0x5/0x5f [ +0.000017] drm_gem_object_free+0x3b/0x50 [drm] [ +0.000489] amdgpu_gem_userptr_ioctl+0x306/0x500 [amdgpu] [ +0.004295] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004270] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __this_cpu_preempt_check+0x13/0x20 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ +0.000022] ? drm_ioctl_kernel+0x17b/0x1f0 [drm] [ +0.000496] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004272] ? drm_ioctl_kernel+0x190/0x1f0 [drm] [ +0.000492] drm_ioctl_kernel+0x140/0x1f0 [drm] [ +0.000497] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004297] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ +0.000489] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? __kasan_check_write+0x14/0x20 [ +0.000016] drm_ioctl+0x3da/0x730 [drm] [ +0.000475] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004293] ? __pfx_drm_ioctl+0x10/0x10 [drm] [ +0.000506] ? __pfx_rpm_resume+0x10/0x10 [ +0.000016] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? __kasan_check_write+0x14/0x20 [ +0.000010] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? _raw_spin_lock_irqsave+0x99/0x100 [ +0.000015] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? preempt_count_sub+0x18/0xc0 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000010] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ +0.000019] amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu] [ +0.004272] __x64_sys_ioctl+0xcd/0x110 [ +0.000020] do_syscall_64+0x5f/0xe0 [ +0.000021] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ +0.000015] RIP: 0033:0x7ff9ed31a94f [ +0.000012] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ +0.000013] RSP: 002b:00007fff25f66790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000016] RAX: ffffffffffffffda RBX: 000055b3f7e133e0 RCX: 00007ff9ed31a94f [ +0.000012] RDX: 000055b3f7e133e0 RSI: 00000000c1186451 RDI: 0000000000000003 [ +0.000010] RBP: 00000000c1186451 R08: 0000000000000000 R09: 0000000000000000 [ +0.000009] R10: 0000000000000008 R11: 0000000000000246 R12: 00007fff25f66ca8 [ +0.000009] R13: 0000000000000003 R14: 000055b3f7021ba8 R15: 00007ff9ed7af040 [ +0.000024] </TASK> [ +0.000007] ---[ end trace 0000000000000000 ]--- v2: Consolidate any error handling into amdgpu_hmm_register which applied to kfd_bo also. (Christian) v3: Improve syntax and comment (Christian) Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Joonkyo Jung <joonkyoj@yonsei.ac.kr> Cc: Dokyung Song <dokyungs@yonsei.ac.kr> Cc: <jisoo.jang@yonsei.ac.kr> Cc: <yw9865@yonsei.ac.kr> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:56 -04:00
Mukul Joshi	71b9d19220	drm/amdgpu: Handle duplicate BOs during process restore In certain situations, some apps can import a BO multiple times (through IPC for example). To restore such processes successfully, we need to tell drm to ignore duplicate BOs. While at it, also add additional logging to prevent silent failures when process restore fails. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-20 13:12:56 -04:00
Linus Torvalds	e5eb28f6d1	- Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfMnvgAKCRDdBJ7gKXxA jjKMAP4/Upq07D4wjkMVPb+QrkipbbLpdcgJ++q3z6rba4zhPQD+M3SFriIJk/Xh tKVmvihFxfAhdDthseXcIf1nBjMALwY= =8rVc -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ...	2024-03-14 18:03:09 -07:00
Dave Airlie	119b225f01	amd-drm-next-6.9-2024-03-08-1: amdgpu: - DCN 3.5.1 support - Fixes for IOMMUv2 removal - UAF fix - Misc small fixes and cleanups - SR-IOV fixes - MCBP cleanup - devcoredump update - NBIF 6.3.1 support - VPE 6.1.1 support amdkfd: - Misc fixes and cleanups - GFX10.1 trap fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZetE3AAKCRC93/aFa7yZ 2K4iAQC9mqfalMDJQegU7lUHUFzklfyWMMwrxt6Ull9HjaIcBAEAvKzRdXGWl64j R7qt7aF8/U/90RuplIyukCBbrh7FEQI= =bDXu -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-03-08-1: amdgpu: - DCN 3.5.1 support - Fixes for IOMMUv2 removal - UAF fix - Misc small fixes and cleanups - SR-IOV fixes - MCBP cleanup - devcoredump update - NBIF 6.3.1 support - VPE 6.1.1 support amdkfd: - Misc fixes and cleanups - GFX10.1 trap fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240308170741.3691166-1-alexander.deucher@amd.com	2024-03-11 13:32:12 +10:00
Thomas Zimmermann	d6eb77731c	Merge drm/drm-next into drm-misc-next Backmerging to get the latest fixes from drm-next; specifically the build fix from the patchset at [1]. Also fixes the build by removing an unused variable from rzg2l_du_vsp_atomic_flush(). Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/series/130720/ # 1	2024-03-08 10:53:46 +01:00
Dave Airlie	af165fb00a	amd-drm-next-6.9-2024-03-01: amdgpu: - GC 11.5.1 updates - Misc display cleanups - NBIO 7.9 updates - Backlight fixes - DMUB fixes - MPO fixes - atomfirmware table updates - SR-IOV fixes - VCN 4.x updates - use RMW accessors for pci config registers - PSR fixes - Suspend/resume fixes - RAS fixes - ABM fixes - Misc code cleanups - SI DPM fix - Revert freesync video amdkfd: - Misc cleanups - Error handling fixes radeon: - use RMW accessors for pci config registers -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZeI9fgAKCRC93/aFa7yZ 2DmCAQCVUsteinfHeJiyd7+pWnZdJVKzy5fRayvfOV8InHapawD/RAMlJhxy08IO B8tw7HY+YqRqW12T3klwRiBTk3Iw9Ac= =vBW4 -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.9-2024-03-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-03-01: amdgpu: - GC 11.5.1 updates - Misc display cleanups - NBIO 7.9 updates - Backlight fixes - DMUB fixes - MPO fixes - atomfirmware table updates - SR-IOV fixes - VCN 4.x updates - use RMW accessors for pci config registers - PSR fixes - Suspend/resume fixes - RAS fixes - ABM fixes - Misc code cleanups - SI DPM fix - Revert freesync video amdkfd: - Misc cleanups - Error handling fixes radeon: - use RMW accessors for pci config registers From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240301204857.13960-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-03-08 11:21:13 +10:00
lima1002	7c5fde53b1	drm/amdgpu/soc21: add mode2 asic reset for SMU IP v14.0.1 Set the default reset method to mode2 for SMU IP v14.0.1 Signed-off-by: lima1002 <li.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-07 15:33:25 -05:00
Alex Deucher	155d46835c	drm/amdgpu: add VPE 6.1.1 discovery support Enable VPE 6.1.1. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-07 15:33:16 -05:00
Lang Yu	f9070b0f2f	drm/amdgpu/vpe: add VPE 6.1.1 support Add initial support for VPE 6.1.1. v2: squash in updates (Alex) Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-07 15:33:10 -05:00
Lang Yu	d40f6213b5	drm/amdgpu/vpe: don't emit cond exec command under collaborate mode Not ready now. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-07 15:33:04 -05:00
Lang Yu	26f5f34e6e	drm/amdgpu/vpe: add collaborate mode support for VPE Under clollaborate mode, multiple VPE instances share a ring buferr and work together to finish a job. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-07 15:33:01 -05:00
Lang Yu	72f4ae0a64	drm/amdgpu/vpe: add PRED_EXE and COLLAB_SYNC OPCODE To support multi VPE collaborate mode. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-07 15:32:58 -05:00
Lang Yu	709ef39f95	drm/amdgpu/vpe: add multi instance VPE support Add support for multi instance VPE processing. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-07 15:32:48 -05:00
Likun Gao	79698b145f	drm/amdgpu/discovery: add nbif v6_3_1 ip block Add nbif v6_3_1 ip block. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-07 15:32:45 -05:00
Hawking Zhang	894c6d3522	drm/amdgpu: Add nbif v6_3_1 ip block support Add nbif v6_3_1 ip block support. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-07 15:32:42 -05:00
Sunil Khatri	5e592956cc	drm/amdgpu: add ring timeout information in devcoredump Add ring timeout related information in the amdgpu devcoredump file for debugging purposes. During the gpu recovery process the registered call is triggered and add the debug information in data file created by devcoredump framework under the directory /sys/class/devcoredump/devcdx/ Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-06 15:24:50 -05:00
Yifan Zhang	2bdebcb1e4	drm/amdgpu: add dcn3.5.1 support This patch to add dcn3.5.1 support. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-06 15:24:50 -05:00
Pierre-Eric Pelloux-Prayer	bf909454fe	drm/amdgpu: disable ring_muxer if mcbp is off Using the ring_muxer without preemption adds overhead for no reason since mcbp cannot be triggered. Moving back to a single queue in this case also helps when high priority app are used: in this case the gpu_scheduler priority handling will work as expected - much better than ring_muxer with its 2 independant schedulers competing for the same hardware queue. This change requires moving amdgpu_device_set_mcbp above amdgpu_device_ip_early_init because we use adev->gfx.mcbp. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-06 15:24:49 -05:00
Jesse Zhang	bb8863cc9d	drm/amdgpu: remove unused code Remove the unused function - amdgpu_vm_pt_is_root_clean and remove the impossible condition v1: entries == 0 is not possible any more, so this condition could probably be removed (Felix) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by：Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-06 15:24:24 -05:00
Christian König	8bc75586ea	drm/amdgpu: workaround to avoid SET_Q_MODE packets v2 It turned out that executing the SET_Q_MODE packet on every submission creates to much overhead. Implement a workaround which allows skipping the SET_Q_MODE packet if subsequent submissions all use the same parameters. v2: add a NULL check for ring_obj Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-04 15:59:08 -05:00
Christian König	c68cbbfd54	drm/amdgpu: cleanup conditional execution First of all calculating the number of dw to patch into a conditional execution is not something HW generation specific. This is just standard ring buffer calculations. While at it also reduce the BUG_ON() into WARN_ON(). Then instead of a random bit pattern use 0 as default value for the number of dw skipped, this way it's not mandatory any more to patch the conditional execution. And last make the address to check a parameter of the conditional execution instead of getting this from the ring. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-04 15:59:08 -05:00
Ma Jun	86e14a7386	drm/amdgpu: Use rpm_mode flag instead of checking it again for rpm Because the rpm_mode flag is already set when the driver is initialized, we use it directly for runtime suspend/resume instead of checking it again Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-04 15:59:08 -05:00
Shashank Sharma	b8f67b9ddf	drm/amdgpu: change vm->task_info handling This patch changes the handling and lifecycle of vm->task_info object. The major changes are: - vm->task_info is a dynamically allocated ptr now, and its uasge is reference counted. - introducing two new helper funcs for task_info lifecycle management - amdgpu_vm_get_task_info: reference counts up task_info before returning this info - amdgpu_vm_put_task_info: reference counts down task_info - last put to task_info() frees task_info from the vm. This patch also does logistical changes required for existing usage of vm->task_info. V2: Do not block all the prints when task_info not found (Felix) V3: Fixed review comments from Felix - Fix wrong indentation - No debug message for -ENOMEM - Add NULL check for task_info - Do not duplicate the debug messages (ti vs no ti) - Get first reference of task_info in vm_init(), put last in vm_fini() V4: Fixed review comments from Felix - fix double reference increment in create_task_info - change amdgpu_vm_get_task_info_pasid - additional changes in amdgpu_gem.c while porting Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-04 15:59:08 -05:00
Jesse Zhang	feb13f52c8	Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven fix the issue: "amdgpu: Failed to create process VM object". [Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table. But when clifo run. It also initializes a vm for a process device through the function kfd_process_device_init_vm and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean. So they have a conflict, and clinfo always failed. v1: - remove all the pte_supports_ats stuff from the amdgpu_vm code (Felix) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-04 15:58:45 -05:00
Christian König	216c1282dd	drm/amdgpu: use GTT only as fallback for VRAM\|GTT Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240229134003.3688-2-christian.koenig@amd.com	2024-03-01 17:12:26 +01:00
Bjorn Helgaas	b07395d5d5	drm/amdgpu: remove misleading amdgpu_pmops_runtime_idle() comment After `4020c22802` ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)"), "ret" is unconditionally set later before being used, so there's point in initializing it and the associated comment is no longer meaningful. Remove the comment and the unnecessary initialization. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-29 20:35:39 -05:00
Alex Deucher	959143dab1	Revert "drm/amd: Remove freesync video mode amdgpu parameter" This reverts commit `e94e787e37`. This conflicts with how compositors want to handle VRR. Now that compositors actually handle VRR, we probably don't need freesync video. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2985 Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-29 20:35:31 -05:00
Tao Zhou	2c684b9342	drm/amdgpu: add deferred error check for UMC v12 address query Both RAS UE and deferred errors need page retirement. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-29 20:35:14 -05:00
Eric Huang	1761d9a688	amd/amdkfd: remove unused parameter The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev), and adev is also not used in the function amdgpu_amdkfd_map_gtt_bo_to_gart(). Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-28 17:10:53 -05:00
Srinivasan Shanmugam	eb4f139888	drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init() This ensures that the memory mapped by ioremap for adev->rmmio, is properly handled in amdgpu_device_init(). If the function exits early due to an error, the memory is unmapped. If the function completes successfully, the memory remains mapped. Reported by smatch: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4337 amdgpu_device_init() warn: 'adev->rmmio' from ioremap() not released on lines: 4035,4045,4051,4058,4068,4337 Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-27 11:06:58 -05:00
Srinivasan Shanmugam	7cf1ad2fe1	drm/amdgpu: Fix missing break in ATOM_ARG_IMM Case of atom_get_src_int() Missing break statement in the ATOM_ARG_IMM case of a switch statement, adds the missing break statement, ensuring that the program's control flow is as intended. Fixes the below: drivers/gpu/drm/amd/amdgpu/atom.c:323 atom_get_src_int() warn: ignoring unreachable code. Fixes: `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") Cc: Jammy Zhou <Jammy.Zhou@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-27 11:06:29 -05:00
Mario Limonciello	0887054d14	drm/amd: Drop abm_level property This vendor specific property has never been used by userspace software and conflicts with the panel_power_savings sysfs file. That is a compositor and user could fight over the same data. Fixes: `63d0b87213` ("drm/amd/display: add panel_power_savings sysfs entry to eDP connectors") Suggested-by: Harry Wentland <Harry.Wentland@amd.com> Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Cc: "Sun peng Li (Leo)" <Sunpeng.Li@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-27 10:46:59 -05:00
Tim Huang	93d64097f7	drm/amdgpu: reserve more memory for MES runtime DRAM This patch fixes a MES firmware boot failure issue when backdoor loading the MES firmware. MES firmware runtime DRAM size is changed to 512k, the driver needs to reserve this amount of memory in FB, otherwise adjacent memory will be overwritten by the MES firmware startup code. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-26 11:15:32 -05:00
Prike Liang	63fcd306c0	drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series Currently, GPU resets can now be performed successfully on the Raven series. While GPU reset is required for the S3 suspend abort case. So now can enable gpu reset for S3 abort cases on the Raven series. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-26 11:15:25 -05:00
Victor Lu	56f7d2ac6d	drm/amdgpu: Do not program SQ_TIMEOUT_CONFIG in SRIOV VF should not program this register. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-26 11:14:46 -05:00
Stanley.Yang	7ec11c2f65	drm/amdgpu: Fix ineffective ras_mask settings Check amdgpu_ras_mask to fix ineffective ras_mask setting due to special asic without sram ecc enable but with poison supported. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-26 11:14:37 -05:00
Lijo Lazar	e1f6746f33	drm/amdkfd: Skip packet submission on fatal error If fatal error is detected, packet submission won't go through. Return error in such cases. Also, avoid waiting for fence when fatal error is detected. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-26 11:14:31 -05:00
Lijo Lazar	1b6ef74b2b	drm/amdgpu: Add fatal error detected flag For a RAS error that needs a full reset to recover, set the fatal error status. Clear the status once the device is reset. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-26 11:14:24 -05:00
Ma Jun	f435b5156b	drm/amdgpu: Fix the runtime resume failure issue Don't set power state flag when system enter runtime suspend, or it may cause runtime resume failure issue. Fixes: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-26 11:11:46 -05:00
Daniel Vetter	f112b68f27	Linux 6.8-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmXb0T4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5YQH/3eCV90sNGch0Y94 8rtTdqFrVx7QPNl0pz+Mo6OUIKUUHvTuwime16ckLxG+3x2Y3I0MjP1edd1NB99C Kje//JTpaZBPpTZ/jY4u8B1Shov2Drdx/J4NFnE/9rG6yXzKQBtvON/xAxXDCVHT mLhst2LR0FeCSMk9jAX6CoqUPEgwlylNyAetKxaDQgoHl4GTZC7FDO17WxyjpIxe 1rVHsrV9Eq8kD4uxrzpTYWgZrwTObPmlZjvefa1JfzSwRNABIBJj/C1nra1Zc1oi b7xVaXS1cMOxrtuuG00fmHsPnWivu0tuND7H3/yLd1mRCZAPSsVbVvrI/KNtoeV4 1euINlY= =7IFt -----END PGP SIGNATURE----- Merge v6.8-rc6 into drm-next Thomas Zimmermann asked to backmerge -rc6 for drm-misc branches, there's a few same-area-changed conflicts (xe and amdgpu mostly) that are getting a bit too annoying. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2024-02-26 11:41:07 +01:00
Daniel Vetter	71ab34f72f	drm-misc-next for v6.9: UAPI Changes: - changes to fdinfo stats Cross-subsystem Changes: agp: - remove unused type field from struct agp_bridge_data Core Changes: ci: - update test names - cleanups gem: - add stats for shared buffers plus updates to amdgpu, i915, xe Documentation: - fixes syncobj: - fixes to waiting and sleeping Driver Changes: bridge: - adv7511: fix crash on irq during probe - dw_hdmi: set bridge type host1x: - cleanups ivpu: - updates to firmware API - refactor BO allocation meson: - fix error handling in probe panel: - revert "drm/panel-edp: Add auo_b116xa3_mode" - add Himax HX83112A plus DT bindings - ltk500hd1829: add support for ltk101b4029w and admatec 9904370 - simple: add BOE BP082WX1-100 8.2" panel plus DT bindungs renesas: - add RZ/G2L DU support plus DT bindings -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmXXUhsACgkQaA3BHVML eiNVFQf+IoOXCACGkWEVmVaen50pjEfLq0OjSGHdbTJqhc9wU7Q/kPC+jEpZLyqo OUMdXlA55BeLX52O+bvLordDPNETUsYH1QX2BYKDwcNIrvj8ISXcvdbnDcbVmttD ZUaaBgZ0g2M6sZQvTVU88/1RtaG64+zuk9VA1dPlh6WnBtXBUeXNtD6YQjH6xY+a MjZpB5VafwJTmQxy7qJ4yTLX291Ao8J2YZK8cCSyEr3FQKkAx9sJyp3hPurVIjLM f1y1rtoHhxUV/OVg4M559fp6F6tUkFauv4qu5VUvmPPihJTaU0eSQxir0za4VJ4e Jr2GOkju0oRRpKfjd0aKvaoWhl+MNg== =aaTQ -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2024-02-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.9: UAPI Changes: - changes to fdinfo stats Cross-subsystem Changes: agp: - remove unused type field from struct agp_bridge_data Core Changes: ci: - update test names - cleanups gem: - add stats for shared buffers plus updates to amdgpu, i915, xe Documentation: - fixes syncobj: - fixes to waiting and sleeping Driver Changes: bridge: - adv7511: fix crash on irq during probe - dw_hdmi: set bridge type host1x: - cleanups ivpu: - updates to firmware API - refactor BO allocation meson: - fix error handling in probe panel: - revert "drm/panel-edp: Add auo_b116xa3_mode" - add Himax HX83112A plus DT bindings - ltk500hd1829: add support for ltk101b4029w and admatec 9904370 - simple: add BOE BP082WX1-100 8.2" panel plus DT bindungs renesas: - add RZ/G2L DU support plus DT bindings Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240222135841.GA6677@localhost.localdomain	2024-02-26 09:51:49 +01:00
Nathan Chancellor	2947a4567f	treewide: update LLVM Bugzilla links LLVM moved their issue tracker from their own Bugzilla instance to GitHub issues. While all of the links are still valid, they may not necessarily show the most up to date information around the issues, as all updates will occur on GitHub, not Bugzilla. Another complication is that the Bugzilla issue number is not always the same as the GitHub issue number. Thankfully, LLVM maintains this mapping through two shortlinks: https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num> https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num> Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the "https://llvm.org/pr<num>" shortlink so that the links show the most up to date information. Each migrated issue links back to the Bugzilla entry, so there should be no loss of fidelity of information here. Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Fangrui Song <maskray@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-02-22 15:38:51 -08:00
Ma Jun	bbfaf2aea7	drm/amdgpu: Fix the runtime resume failure issue Don't set power state flag when system enter runtime suspend, or it may cause runtime resume failure issue. Fixes: `3a9626c816` ("drm/amd: Stop evicting resources on APUs in suspend") Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-02-22 12:28:27 -05:00
Yifan Zhang	a24029cc40	drm/amdgpu: add vcn 4.0.6 discovery support This patch is to add vcn 4.0.6 support Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 12:05:23 -05:00
Ilpo Järvinen	bb87e511b2	drm/amdgpu: Use RMW accessors for changing LNKCTL2 Convert open coded RMW accesses for LNKCTL2 to use pcie_capability_clear_and_set_word() which makes its easier to understand what the code tries to do. LNKCTL2 is not really owned by any driver because it is a collection of control bits that PCI core might need to touch. RMW accessors already have support for proper locking for a selected set of registers (LNKCTL2 is not yet among them but likely will be in the future) to avoid losing concurrent updates. Acked-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 12:05:20 -05:00
Kunwu Chan	a5fc4e5014	drm/amdgpu: Simplify the allocation of sync slab caches Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 12:05:16 -05:00
Veerabadhran Gopalakrishnan	84eaa2c2c6	drm/amdgpu/soc21: Enabling PG and CG flags for VCN 4.0.6 Enabled the VCN Power Gating and Clock Gating flags for VCN 4.0.6. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 12:05:12 -05:00
Kunwu Chan	e4e4618bc1	drm/amdgpu: Simplify the allocation of mux_chunk slab caches Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:28:24 -05:00
Kunwu Chan	3d14cb0263	drm/amdgpu: Simplify the allocation of fence slab caches Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:28:19 -05:00
Veerabadhran Gopalakrishnan	437591d237	drm/amdgpu/soc21: Added Video Capabilities for VCN 406 Updated Query Video codecs for VCN 406 Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:28:16 -05:00
Veerabadhran Gopalakrishnan	2b53b3668e	drm/amdgpu/vcn: Enable VCN 4.0.6 Support Modified driver to use the appropriate FW files and instance. v2: squash in fixes (Alex) Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:28:11 -05:00
Saleemkhan Jamadar	07cb7fd0fd	drm/amdgpu/jpeg: add support for jpeg multi instance Enable support for multi instance on JPEG 4.0.6. v2: squash in fixes (Alex) Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:28:04 -05:00
Victor Lu	8f4de8f72e	drm/amdgpu: Use correct SRIOV macro for gmc_v9_0_vm_fault_interrupt_state Under SRIOV, programming to VM_CONTEXT_CNTL regs failed because the current macro does not pass through the correct xcc instance. Use the REG32_XCC macro in this case. The behaviour without SRIOV is the same without this patch. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:27:42 -05:00
Victor Lu	bea07b215d	drm/amdgpu: Do not program IH_CHICKEN in vega20_ih.c under SRIOV IH_CHICKEN is blocked for VF writes; this access should be skipped. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:27:31 -05:00
Victor Lu	8093383ae7	drm/amdgpu: Improve error checking in amdgpu_virt_rlcg_reg_rw (v2) The current error detection only looks for a timeout. This should be changed to also check scratch_reg1 for any errors returned from RLCG. v2: remove new error value Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:27:23 -05:00
Yifan Zhang	d6a76c0a5a	drm/amdgpu: enable MES discovery for GC 11.5.1 This patch to enable MES for GC 11.5.1 Reviewed-by: shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:27:18 -05:00
Yifan Zhang	e2442d3e32	drm/amdgpu: add GC 11.5.1 discovery support This patch to add GC 11.5.1 support Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:27:13 -05:00
Tim Huang	455918cf28	drm/amdgpu: enable CGPG for GFX ip v11.5.1 Enable CGPG support for GFX ip v11.5.1 Signed-off-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:27:07 -05:00
Yifan Zhang	7c15ac1183	drm/amdgpu: initialize gfx11.5.1 Initialize gfx 11.5.0 and set gfx hw configuration. v2: squash in CG, PG, GFXOFF fixes (Alex) Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:27:03 -05:00
Yifan Zhang	846f7385bf	drm/amdgpu: add mes firmware support for GC 11.5.1 This patch to add MES PIPE0 and PIPE1 firmware support for gc_11_5_1. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:27:01 -05:00
Yifan Zhang	fa744c0dd2	drm/amdgpu: add imu firmware support for GC 11.5.1 This patch is to add imu firmware support for GC 11.5.1 Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:26:58 -05:00
Yifan Zhang	dad4f543ac	drm/amdgpu: add firmware for GC 11.5.1 This patch is to add firmware for GC 11.5.1 Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:26:55 -05:00
Yifan Zhang	93c5cc8312	drm/amdgpu: add GC 11.5.1 to GC 11.5.0 family This patch to add GC 11.5.1 to GC 11.5.0 family. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:26:52 -05:00
Yifan Zhang	f1c40b6ea4	drm/amdgpu: enable soc21 discovery support for GC 11.5.1 This patch to enable soc21 support for GC 11.5.1 Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:26:49 -05:00
Yifan Zhang	e971995657	drm/amdgpu: add initial GC 11.5.1 soc21 support Disable clock gating and power gating for now. v2: squash in revision fix (Alex) Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:26:44 -05:00
Yifan Zhang	278318d371	drm/amdgpu: enable gmc11 discovery support for GC 11.5.1 This patch to enable gmc11 for GC 11.5.1 Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:26:40 -05:00
Asad Kamal	5fe4a8d3c6	drm/amdgpu: Remove pcie bw sys entry Remove pcie bw sys entry for asics not supporting such function Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:23:45 -05:00
Asad Kamal	c607e76e64	Revert "drm/amdgpu: Add pcie usage callback to nbio" pcie usage is now handled by fw This reverts commit `8d759dc664`. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:21:27 -05:00
Ma Jun	4acd31e6c2	drm/amdgpu: Drop redundant parameter in amdgpu_gfx_kiq_init_ring Drop redundant parameters in function amdgpu_gfx_kiq_init_ring to simplify the code Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:17:45 -05:00
Asad Kamal	86a08f1af2	Revert "drm/amdgpu: Add pci usage to nbio v7.9" Remove implementation to get pcie usage for nbio v7.9 as pcie usage is handled by fw This reverts commit `59070fd9cc`. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:15:26 -05:00
Yifan Zhang	2612c8313f	drm/amdgpu: add tmz support for GC IP v11.5.1 Add tmz support for GC 11.5.1. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:14:30 -05:00
Hawking Zhang	5c07015619	drm/amdgpu: Do not toggle bif ras irq from guest Only do this from host side. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:14:24 -05:00
Yifan Zhang	46e5de77b3	drm/amdgpu: add GFXHUB 11.5.1 support This patch to add GFXHUB 11.5.1 support. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-22 10:14:17 -05:00
Dave Airlie	40d47c5fb4	amd-drm-next-6.9-2024-02-19: amdgpu: - ATHUB 4.1 support - EEPROM support updates - RAS updates - LSDMA 7.0 support - JPEG DPG support - IH 7.0 support - HDP 7.0 support - VCN 5.0 support - Misc display fixes - Retimer fixes - DCN 3.5 fixes - VCN 4.x fixes - PSR fixes - PSP 14.0 support - VA_RESERVED cleanup - SMU 13.0.6 updates - NBIO 7.11 updates - SDMA 6.1 updates - MMHUB 3.3 updates - Suspend/resume fixes - DMUB updates amdkfd: - Trap handler enhancements - Fix cache size reporting - Relocate the trap handler radeon: - fix typo in print statement -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZdPLUgAKCRC93/aFa7yZ 2AakAQDmhlQkJAxIxJw4/5mEQY5zaMJ033lcZGzBQbj8uL42pQD/aQ/gdN/bOfPZ gsdidzgL5MThBOFfw72pBkEoE+kQXgc= =oP2J -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.9-2024-02-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-02-19: amdgpu: - ATHUB 4.1 support - EEPROM support updates - RAS updates - LSDMA 7.0 support - JPEG DPG support - IH 7.0 support - HDP 7.0 support - VCN 5.0 support - Misc display fixes - Retimer fixes - DCN 3.5 fixes - VCN 4.x fixes - PSR fixes - PSP 14.0 support - VA_RESERVED cleanup - SMU 13.0.6 updates - NBIO 7.11 updates - SDMA 6.1 updates - MMHUB 3.3 updates - Suspend/resume fixes - DMUB updates amdkfd: - Trap handler enhancements - Fix cache size reporting - Relocate the trap handler radeon: - fix typo in print statement Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219214810.4911-1-alexander.deucher@amd.com	2024-02-22 13:21:19 +10:00
Yifan Zhang	31e0a586f3	drm/amdgpu: add MMHUB 3.3.1 support This patch to add MMHUB 3.3.1 support. v2: squash in fault info fix (Alex) Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-19 14:50:46 -05:00
Mario Limonciello	2bb2ad58f6	drm/amd: Change `jpeg_v4_0_5_start_dpg_mode()` to void jpeg_v4_0_5_start_dpg_mode() always returns 0 and the return value doesn't get used in the caller jpeg_v4_0_5_start(). Modify the function to be void. Reported-by: coverity-bot <keescook+coverity-bot@chromium.org> Addresses-Coverity-ID: 1583635 ("Code maintainability issues") Fixes: `0a119d53f7` ("drm/amdgpu/jpeg: add support for jpeg DPG mode") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:44:30 -05:00
Srinivasan Shanmugam	6f18d7ad9d	drm/amdgpu: Fix missing parameter descriptions in ih_v7_0.c Rectifies kdoc warnings related to the 'ih' parameter in the 'ih_v7_0_get_wptr', 'ih_v7_0_irq_rearm', and 'ih_v7_0_set_rptr' functions within the 'ih_v7_0.c' file. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/ih_v7_0.c:392: warning: Function parameter or member 'ih' not described in 'ih_v7_0_get_wptr' drivers/gpu/drm/amd/amdgpu/ih_v7_0.c:432: warning: Function parameter or member 'ih' not described in 'ih_v7_0_irq_rearm' drivers/gpu/drm/amd/amdgpu/ih_v7_0.c:458: warning: Function parameter or member 'ih' not described in 'ih_v7_0_set_rptr' Fixes: `12443fc53e` ("drm/amdgpu: Add ih v7_0 ip block support") Cc: Likun Gao <Likun.Gao@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:42:26 -05:00
Yifan Zhang	a02cfac90f	drm/amdgpu: add SDMA 6.1.1 discovery support This patch to add SDMA 6.1.1 support. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:42:23 -05:00
Yifan Zhang	c40797d320	drm/amdgpu: add sdma 6.1.1 firmware This patch to add sdma 6.1.1 firmware declaration. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:42:21 -05:00
Yifan Zhang	aec765a4dc	drm/amdgpu: add psp 14.0.1 discovery support This patch to add psp 14.0.1 support. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:42:18 -05:00
Yifan Zhang	24b5a5df94	drm/amdgpu: add PSP 14.0.1 support This patch to add PSP 14.0.1 support. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:42:15 -05:00
Yifan Zhang	c5ce1f1a21	drm/amdgpu: add smuio 14.0.1 support This patch to add smuio 14.0.1 support. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:42:11 -05:00
Yifan Zhang	bd377b1281	drm/amdgpu: add nbio 7.11.1 discovery support This patch to add nbio 7.11.1 support. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:42:08 -05:00
Yifan Zhang	dc84f52eb2	drm/amdgpu/nbio: Add NBIO 7.11.1 Support Fix up doorbell setup and clockgating. v2: squash in fixes (Alex) Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:42:03 -05:00
Felix Kuehling	34a1de0f79	drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole The TBA and TMA, along with an unused IB allocation, reside at low addresses in the VM address space. A stray VM fault which hits these pages must be serviced by making their page table entries invalid. The scheduler depends upon these pages being resident and fails, preventing a debugger from inspecting the failure state. By relocating these pages above 47 bits in the VM address space they can only be reached when bits [63:48] are set to 1. This makes it much less likely for a misbehaving program to generate accesses to them. The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL access with a small offset. v2: - Move it to the reserved space to avoid concflicts with Mesa - Add macros to make reserved space management easier v3: - Move VM max PFN calculation into AMDGPU_VA_RESERVED macros Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-16 15:41:50 -05:00
Alex Deucher	ba1a58d5b9	drm/amdgpu: add shared fdinfo stats Add shared stats. Useful for seeing shared memory. v2: take dma-buf into account as well v3: use the new gem helper Link: https://lore.kernel.org/all/20231207180225.439482-1-alexander.deucher@amd.com/ Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Rob Clark <robdclark@gmail.com> Reviewed-by: Christian König <christian.keonig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>	2024-02-16 12:52:50 +01:00
Thong	2f542421a4	drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution Update the maximum resolution reported for HEVC encoding on VCN 4 devices to reflect its 8K encoding capability. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3159 Signed-off-by: Thong <thong.thai@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-02-15 14:18:43 -05:00
Mario Limonciello	9163616853	Revert "drm/amd: flush any delayed gfxoff on suspend entry" commit `ab4750332d` ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") caused GFXOFF control to be used more heavily and the codepath that was removed from commit `0dee726395` ("drm/amd: flush any delayed gfxoff on suspend entry") now can be exercised at suspend again. Users report that by using GNOME to suspend the lockscreen trigger will cause SDMA traffic and the system can deadlock. This reverts commit `0dee726395`. Acked-by: Alex Deucher <alexander.deucher@amd.com> Fixes: `ab4750332d` ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-15 14:18:43 -05:00
Mario Limonciello	3a9626c816	drm/amd: Stop evicting resources on APUs in suspend commit `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") intentionally moved the eviction of resources to earlier in the suspend process, but this introduced a subtle change that it occurs before adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to evict resources at suspend time as well. Explicitly set s0ix or s3 in the prepare() stage, and unset them if the prepare() stage failed. v2: squash in warning fix from Stephen Rothwell Reported-by: Jürg Billeter <j@bitron.ch> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038 Fixes: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-15 14:18:43 -05:00
Hamza Mahfooz	d16df040c8	drm/amdgpu: make damage clips support configurable We have observed that there are quite a number of PSR-SU panels on the market that are unable to keep up with what user space throws at them, resulting in hangs and random black screens. So, make damage clips support configurable and disable it by default for PSR-SU displays. Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-15 14:18:42 -05:00
Likun Gao	efc11f34e2	drm/amdgpu: support psp ip block discovery for psp v14 Support PSP ip block discovery for psp v14. Add psp ip block for psp v14_0_2 and v14_0_3. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:18:27 -05:00
Likun Gao	8152825498	drm/amdgpu: add psp_timeout to limit PSP related operation Add a new parameter psp_timeout to limit psp related operation to unify the timeout limition for psp. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:18:24 -05:00
Likun Gao	e71658299d	drm/amdgpu/psp: set boot_time_tmr flag Set boot_time_tmr flag for the ASIC which MP0 ip version newer than 14.0.2 For runtime TMR: Init tmr and load tmr should did. For boottime TMR: If do not support autoload, skip init TMR. If support autoload, excute init TMR but skip load tmr. v2: rebase (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:18:20 -05:00
Likun Gao	2fb4460fb8	drm/amdgpu/psp: handle TMR type via flag Add flag boot_time_tmr to indicate boot time TMR or runtime TMR instead of function. v2: rework logic (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:18:16 -05:00
Likun Gao	8d339b0df2	drm/amdgpu/psp: set autoload support by default Set psp->autoload_supported to true by default, as only a few version of ASIC not support autoload, and the furture version of PSP should support this. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:18:06 -05:00
Likun Gao	a78791c2b2	drm/amdgpu: support psp ip block for psp v14 Support PSP ip block for psp v14. Add psp ip block for psp v14_0_2 and v14_0_3. v2: sqaush in 14.0.3 firmware fix (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:16:12 -05:00
Likun Gao	f19cb91615	drm/amdgpu: use spirom update wait_for helper for psp v14 Spirom update typically requires extremely long duration for command execution, and special helper function to wait for it's completion. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:16:07 -05:00
Felix Kuehling	efe0f34c2b	drm/amdgpu: Reduce VA_RESERVED_BOTTOM to 64KB The reservation is there to catch NULL pointer dereferences from the GPU. Reduce the size to 64KB to make sure that shared virtual address programming models can map all CPU-accessible virtual addresses for GPU access. This is also the default for CPU virtual address mappings as seen in /proc/sys/vm/mmap_min_addr. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:15:50 -05:00
Hawking Zhang	876fa5f8a0	drm/amdgpu: Add psp v14_0 ip block support Add psp v14_0 ip block support. v2: rebase (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:15:46 -05:00
Thong	fc2d4230e5	drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution Update the maximum resolution reported for HEVC encoding on VCN 4 devices to reflect its 8K encoding capability. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3159 Signed-off-by: Thong <thong.thai@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-14 17:13:46 -05:00
Mario Limonciello	ce311df91d	Revert "drm/amd: flush any delayed gfxoff on suspend entry" commit `ab4750332d` ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") caused GFXOFF control to be used more heavily and the codepath that was removed from commit `0dee726395` ("drm/amd: flush any delayed gfxoff on suspend entry") now can be exercised at suspend again. Users report that by using GNOME to suspend the lockscreen trigger will cause SDMA traffic and the system can deadlock. This reverts commit `0dee726395`. Acked-by: Alex Deucher <alexander.deucher@amd.com> Fixes: `ab4750332d` ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-13 08:59:50 -05:00
Mario Limonciello	226db36032	drm/amd: Stop evicting resources on APUs in suspend commit `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") intentionally moved the eviction of resources to earlier in the suspend process, but this introduced a subtle change that it occurs before adev->in_s0ix or adev->in_s3 are set. This meant that APUs actually started to evict resources at suspend time as well. Explicitly set s0ix or s3 in the prepare() stage, and unset them if the prepare() stage failed. v2: squash in warning fix from Stephen Rothwell Reported-by: Jürg Billeter <j@bitron.ch> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038 Fixes: `5095d54181` ("drm/amd: Evict resources during PM ops prepare() callback") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-13 08:59:49 -05:00
Dave Airlie	b344e64fbd	amd-drm-next-6.9-2024-02-09: amdgpu: - Validate DMABuf imports in compute VMs - Add RAS ACA framework - PSP 13 fixes - Misc code cleanups - Replay fixes - Atom interpretor PS, WS bounds checking - DML2 fixes - Audio fixes - DCN 3.5 Z state fixes - Remove deprecated ida_simple usage - UBSAN fixes - RAS fixes - Enable seq64 infrastructure - DC color block enablement - Documentation updates - DC documentation updates - DMCUB updates - S3 fixes - VCN 4.0.5 fixes - DP MST fixes - SR-IOV fixes amdkfd: - Validate DMABuf imports in compute VMs - SVM fixes - Trap handler updates radeon: - Atom interpretor PS, WS bounds checking - Misc code cleanups UAPI: - Bump KFD version so UMDs know that the fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD are present - Add INFO query for input power. This matches the existing INFO query for average power. Used in gaming HUDs, etc. Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZcaM8gAKCRC93/aFa7yZ 2L64AP9S8Wh5T2dEm3Nr8zBR008KdFQyOGVoO4qwlmyJMgin3wEA57gHiUrvs3o7 HRR+PU4JMo4OxQZNpVQtYYHc1BL6nQU= =3AqF -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.9-2024-02-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-02-09: amdgpu: - Validate DMABuf imports in compute VMs - Add RAS ACA framework - PSP 13 fixes - Misc code cleanups - Replay fixes - Atom interpretor PS, WS bounds checking - DML2 fixes - Audio fixes - DCN 3.5 Z state fixes - Remove deprecated ida_simple usage - UBSAN fixes - RAS fixes - Enable seq64 infrastructure - DC color block enablement - Documentation updates - DC documentation updates - DMCUB updates - S3 fixes - VCN 4.0.5 fixes - DP MST fixes - SR-IOV fixes amdkfd: - Validate DMABuf imports in compute VMs - SVM fixes - Trap handler updates radeon: - Atom interpretor PS, WS bounds checking - Misc code cleanups UAPI: - Bump KFD version so UMDs know that the fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD are present - Add INFO query for input power. This matches the existing INFO query for average power. Used in gaming HUDs, etc. Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240209221459.5453-1-alexander.deucher@amd.com	2024-02-13 11:32:23 +10:00
Alex Deucher	ddc23e6e23	drm/amdgpu/psp: update define to better align with its meaning MEM_TRAINING_ENCROACHED_SIZE is for BIST training data. It's not memory type specific. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:14:12 -05:00
Hamza Mahfooz	040fdcde28	drm/amdgpu: respect the abmlevel module parameter value if it is set Currently, if the abmlevel module parameter is set, it is possible for user space to override the ABM level at some point after boot. However, that is undesirable because it means that we aren't respecting the user's wishes with regard to the level that they want to use. So, prevent user space from changing the ABM level if the module parameter is set to a non-auto value. Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:14:05 -05:00
Sonny Jiang	cff9960317	drm/amdgpu: Add jpeg_v5_0_0 ip block support Enable support for jpeg_v5_0_0 ip block. Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:13:44 -05:00
Sonny Jiang	75a178926c	drm/amdgpu/jpeg5: Enable doorbell Add doorbell for JPEG5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:13:39 -05:00
Sonny Jiang	785e53a83b	drm/amdgpu/jpeg5: add power gating support Add PG support for JPEG5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:13:35 -05:00
Sonny Jiang	dfad65c657	drm/amdgpu: Add JPEG5 support Add support for JPEG5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:12:00 -05:00
Sonny Jiang	470675f6bf	amdgpu/drm: Add vcn_v5_0_0_ip_block support Enable support for vcn_v5_0_0_ip_block Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:11:53 -05:00
Hamza Mahfooz	fc184dbe9f	drm/amdgpu: make damage clips support configurable We have observed that there are quite a number of PSR-SU panels on the market that are unable to keep up with what user space throws at them, resulting in hangs and random black screens. So, make damage clips support configurable and disable it by default for PSR-SU displays. Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:10:24 -05:00
Sonny Jiang	b6d1a06320	drm/amdgpu: add VCN_5_0_0 IP block support Add VCN_5_0_0 IP init, ring functions, DPG support. v2: squash in warning fixes (Alex) v3: squash in block and ring init, boot, doorbell enablement, DPG support (Alex) Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:10:18 -05:00
Sonny Jiang	816dae1d69	drm/amdgpu: add VCN_5_0_0 firmware support Add VCN5_0_0 firmware support Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:10:14 -05:00
Likun Gao	ca46c25909	drm/amdgpu/discovery: Add hdp v7_0 ip block Add hdp v7_0 ip block Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:10:00 -05:00
Likun Gao	f3bcdf2d90	drm/amdgpu: Add hdp v7_0 ip block support Add hdp v7_0 ip block support. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:09:57 -05:00
Likun Gao	56018e8363	drm/amdgpu/discovery: Add ih v7_0 ip block Add ih v7_0 ip block. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:09:45 -05:00
Likun Gao	12443fc53e	drm/amdgpu: Add ih v7_0 ip block support Add ih v7_0 ip block support. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:09:42 -05:00
Saleemkhan Jamadar	0a119d53f7	drm/amdgpu/jpeg: add support for jpeg DPG mode Jpeg DPG support for GC IP v11_5_0 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:09:32 -05:00
Saleemkhan Jamadar	617efef4af	drm/amdgpu: add ucode id for jpeg DPG support add ucode id and cmd buffer for jpeg psp sram programming and Jpeg DPG support. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:09:11 -05:00
Likun Gao	39df603d2c	drm/amdgpu/discovery: Add lsdma v7_0 ip block Add lsdma v7_0 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:08:46 -05:00
Likun Gao	aa2fb23605	drm/amdgpu: Add lsdma v7_0 ip block support Add lsdma v7_0 ip block support. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:08:41 -05:00
Yang Wang	f579c06bdc	drm/amdgpu: send smu rma reason event in ras eeprom driver send smu rma reason event to smu in ras eeprom driver. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:08:27 -05:00
Hawking Zhang	53edf77179	drm/amdgpu: Add athub v4_1_0 ip block support Add athub v4_1_0 ip block support. v2: fix clang warning (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:08:12 -05:00
Likun Gao	b35c3feafe	drm/amdgpu: support rlc auotload type set Support to set fw_load_type=3 to use backdoor rlc autoload. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:07:45 -05:00
Likun Gao	45b801c24c	drm/amdgpu: skip ucode bo reserve for RLC AUTOLOAD Skip ucode BO reservation for backdoor RLC autoload. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-12 16:07:42 -05:00
Lijo Lazar	534c8a5b9d	drm/amdgpu: Fix HDP flush for VFs on nbio v7.9 HDP flush remapping is not done for VFs. Keep the original offsets in VF environment. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 18:30:11 -05:00
Lijo Lazar	55173942a6	drm/amdgpu: Avoid fetching VRAM vendor info The present way to fetch VRAM vendor information turns out to be not reliable on GFX 9.4.3 dGPUs as well. Avoid using the data. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-02-07 18:28:31 -05:00
Stanley.Yang	2dcf82a8e8	drm/amdgpu: Fix shared buff copy to user ta if invoke node buffer \|-------- ta type ----------\| \|-------- ta id ----------\| \|-------- cmd id ----------\| \|------ shared buf len -----\| \|------ shared buffer ------\| ta if invoke node buffer is as above, copy shared buffer data to correct location Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 18:22:04 -05:00
Li Ma	897925dcc5	drm/amdgpu: remove asymmetrical irq disabling in jpeg 4.0.5 suspend A supplement to commit: `615dd56ac5` There is an irq warning of jpeg during resume in s2idle process. No irq enabled in jpeg 4.0.5 resume. Fixes: `615dd56ac5` ("drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend") Signed-off-by: Li Ma <li.ma@amd.com> Acked-By: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 18:19:22 -05:00
Prike Liang	6ef82ac664	drm/amdgpu: reset gpu for s3 suspend abort case In the s3 suspend abort case some type of gfx9 power rail not turn off from FCH side and this will put the GPU in an unknown power status, so let's reset the gpu to a known good power state before reinitialize gpu device. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 18:19:08 -05:00
Prike Liang	93bafa32a6	drm/amdgpu: skip to program GFXDEC registers for suspend abort In the suspend abort cases, the gfx power rail doesn't turn off so some GFXDEC registers/CSB can't reset to default value and at this moment reinitialize GFXDEC/CSB will result in an unexpected error. So let skip those program sequence for the suspend abort case. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 18:19:04 -05:00
Lijo Lazar	d559744403	drm/amdgpu: Fix HDP flush for VFs on nbio v7.9 HDP flush remapping is not done for VFs. Keep the original offsets in VF environment. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 12:26:24 -05:00
Lijo Lazar	3d1554d999	drm/amdgpu: Avoid fetching VRAM vendor info The present way to fetch VRAM vendor information turns out to be not reliable on GFX 9.4.3 dGPUs as well. Avoid using the data. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 12:26:23 -05:00
Stanley.Yang	28b34ad207	drm/amdgpu: Fix shared buff copy to user ta if invoke node buffer \|-------- ta type ----------\| \|-------- ta id ----------\| \|-------- cmd id ----------\| \|------ shared buf len -----\| \|------ shared buffer ------\| ta if invoke node buffer is as above, copy shared buffer data to correct location Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 12:26:22 -05:00
Srinivasan Shanmugam	cdb637d339	drm/amdgpu: Fix potential out-of-bounds access in 'amdgpu_discovery_reg_base_init()' The issue arises when the array 'adev->vcn.vcn_config' is accessed before checking if the index 'adev->vcn.num_vcn_inst' is within the bounds of the array. The fix involves moving the bounds check before the array access. This ensures that 'adev->vcn.num_vcn_inst' is within the bounds of the array before it is used as an index. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1289 amdgpu_discovery_reg_base_init() error: testing array offset 'adev->vcn.num_vcn_inst' after use. Fixes: `a0ccc717c4` ("drm/amdgpu/discovery: validate VCN and SDMA instances") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 12:26:22 -05:00
Li Ma	0a8ff0cbee	drm/amdgpu: remove asymmetrical irq disabling in jpeg 4.0.5 suspend A supplement to commit: `615dd56ac5` There is an irq warning of jpeg during resume in s2idle process. No irq enabled in jpeg 4.0.5 resume. Fixes: `615dd56ac5` ("drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend") Signed-off-by: Li Ma <li.ma@amd.com> Acked-By: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 12:26:22 -05:00
Prike Liang	3f719cf22f	drm/amdgpu: reset gpu for s3 suspend abort case In the s3 suspend abort case some type of gfx9 power rail not turn off from FCH side and this will put the GPU in an unknown power status, so let's reset the gpu to a known good power state before reinitialize gpu device. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 12:26:22 -05:00
Prike Liang	0326de4c44	drm/amdgpu: skip to program GFXDEC registers for suspend abort In the suspend abort cases, the gfx power rail doesn't turn off so some GFXDEC registers/CSB can't reset to default value and at this moment reinitialize GFXDEC/CSB will result in an unexpected error. So let skip those program sequence for the suspend abort case. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 12:26:22 -05:00
Qiang Ma	aeaf3e6cf8	drm/amdgpu: Clear the hotplug interrupt ack bit before hpd initialization Problem: The computer in the bios initialization process, unplug the HDMI display, wait until the system up, plug in the HDMI display, did not enter the hotplug interrupt function, the display is not bright. Fix: After the above problem occurs, and the hpd ack interrupt bit is 1, the interrupt should be cleared during hpd_init initialization so that when the driver is ready, it can respond to the hpd interrupt normally. Signed-off-by: Qiang Ma <maqianga@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 10:01:10 -05:00
Alex Deucher	39a82d304b	drm/amdgpu: fix typo in parameter description Missing space. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 10:01:05 -05:00
shaoyunl	3fad156572	drm/amdgpu: Only create mes event log debugfs when mes is enabled Skip the debugfs file creation for mes event log if the GPU doesn't use MES. This to prevent potential kernel oops when user try to read the event log in debugfs on a GPU without MES Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-02-07 10:00:56 -05:00
Thomas Zimmermann	0e85f1ae4a	Merge drm/drm-next into drm-misc-next Backmerging to update drm-misc-next to the state of v6.8-rc3. Also fixes a build problem with xe. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2024-02-07 13:02:20 +01:00
Friedrich Vock	7330256268	drm/amdgpu: Reset IH OVERFLOW_CLEAR bit Allows us to detect subsequent IH ring buffer overflows as well. Cc: Joshua Ashton <joshua@froggi.es> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 17:39:47 -05:00
Yifan Zhang	4f56acdee4	drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status. Beside, current set function callbacks are empty with no real effect. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 17:39:40 -05:00
Yifan Zhang	de4a733868	drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0 No need to set GC golden settings in driver from gfx 11.5.0 onwards. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 17:38:02 -05:00
Lang Yu	9c29282ecb	drm/amdkfd: reserve the BO before validating it Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: `101b810430` ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 17:37:53 -05:00
Srinivasan Shanmugam	16da399091	drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()' Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()' Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r' Fixes: `fac4ebd79f` ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 17:36:44 -05:00
David McFarland	8ef85a0ce2	drm/amd: Don't init MEC2 firmware when it fails to load The same calls are made directly above, but conditional on the firmware loading and validating successfully. Cc: stable@vger.kernel.org Fixes: `9931b67690` ("drm/amd: Load GFX10 microcode during early_init") Signed-off-by: David McFarland <corngood@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 17:34:14 -05:00
Ma Jun	bb34bc2cd3	drm/amdgpu: Fix the warning info in mode1 reset Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: `11b3b9f461` ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 17:34:05 -05:00
Le Ma	db2aad036e	drm/amdgpu: move the drm client creation behind drm device registration This patch is to eliminate interrupt warning below: "[drm] Fence fallback timer expired on ring sdma0.0". An early vm pt clearing job is sent to SDMA ahead of interrupt enabled. And re-locating the drm client creation following after drm_dev_register looks like a more proper flow. v2: wrap the drm client creation Fixes: `1819200166` ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 15:33:52 -05:00
Friedrich Vock	9217b91c64	drm/amdgpu: Reset IH OVERFLOW_CLEAR bit Allows us to detect subsequent IH ring buffer overflows as well. Cc: Joshua Ashton <joshua@froggi.es> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:05:28 -05:00
Yifan Zhang	615dd56ac5	drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status. Beside, current set function callbacks are empty with no real effect. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:05:20 -05:00
Tao Zhou	01087a1974	drm/amdgpu: use PSP address query command Get UMC physical address from PSP in RAS error address coversion. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:05:19 -05:00
Tao Zhou	a1eac5bd91	drm/amdgpu: add PSP RAS address query command Convert mca address to physical address or vice versa via RAS TA. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:05:19 -05:00
Yifan Zhang	e4d65510e8	drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0 No need to set GC golden settings in driver from gfx 11.5.0 onwards. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:05:19 -05:00
Lang Yu	0c93bd4957	drm/amdkfd: reserve the BO before validating it Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: `101b810430` ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:05:19 -05:00
Srinivasan Shanmugam	fa8a91b0e5	drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()' Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()' Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r' Fixes: `fac4ebd79f` ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:05:19 -05:00
YiPeng Chai	adb4d6a40d	drm/amdgpu: Need to resume ras during gpu reset for gfx v9_4_3 sriov Need to resume ras during gpu reset for gfx v9_4_3 sriov Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:05:18 -05:00
Tao Zhou	edfdde9013	drm/amdgpu: disable RAS feature when fini Send RAS disable feature command in fini. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:05:18 -05:00
Hawking Zhang	1731ba9b64	drm/amdgpu: Update boot time errors polling sequence Update boot time errors polling sequence to align with the latest firmware change. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 14:04:55 -05:00
David McFarland	c3ec8c4f9a	drm/amd: Don't init MEC2 firmware when it fails to load The same calls are made directly above, but conditional on the firmware loading and validating successfully. Cc: stable@vger.kernel.org Fixes: `9931b67690` ("drm/amd: Load GFX10 microcode during early_init") Signed-off-by: David McFarland <corngood@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 13:53:06 -05:00
Ma Jun	9749c86843	drm/amdgpu: Fix the warning info in mode1 reset Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: `11b3b9f461` ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 09:40:42 -05:00
Jani Nikula	345a36c4f1	drm/amdgpu: prefer snprintf over sprintf This will trade the W=1 warning -Wformat-overflow to -Wformat-truncation. This lets us enable -Wformat-overflow subsystem wide. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pan, Xinhui <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/fea7a52924f98b1ac24f4a7e6ba21d7754422430.1704908087.git.jani.nikula@intel.com	2024-01-31 11:04:25 +02:00
Yang Wang	788686e2d9	drm/amdgpu: use helper macro HW_ERR instead of Hardware error string use helper macro HW_ERR to instead of Hardware error string. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-29 15:47:02 -05:00
Le Ma	c0125b848a	drm/amdgpu: move the drm client creation behind drm device registration This patch is to eliminate interrupt warning below: "[drm] Fence fallback timer expired on ring sdma0.0". An early vm pt clearing job is sent to SDMA ahead of interrupt enabled. And re-locating the drm client creation following after drm_dev_register looks like a more proper flow. v2: wrap the drm client creation Fixes: `1819200166` ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-29 15:35:13 -05:00
Mario Limonciello	c92c108403	Revert "drm/amd/pm: fix the high voltage and temperature issue" This reverts commit `5f38ac54e6`. This causes issues with rebooting and the 7800XT. Cc: Kenneth Feng <kenneth.feng@amd.com> Cc: stable@vger.kernel.org Fixes: `5f38ac54e6` ("drm/amd/pm: fix the high voltage and temperature issue") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3062 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-29 08:56:13 -05:00
Maxime Ripard	4db102dcb0	Merge drm/drm-next into drm-misc-next Kickstart 6.9 development cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-01-29 14:20:23 +01:00
Alex Deucher	3380fcad2c	drm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Updated firmware is also required for AQL. Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-01-25 15:48:57 -05:00
Alex Deucher	03ff6d7238	drm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Updated firmware is also required for AQL. Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-01-25 15:48:51 -05:00
Tom St Denis	e7a8594cc2	drm/amd/amdgpu: Assign GART pages to AMD device mapping This allows kernel mapped pages like the PDB and PTB to be read via the iomem debugfs when there is no vram in the system. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.7.x	2024-01-25 15:47:36 -05:00
Lijo Lazar	89a7c0bd74	drm/amdgpu: Show vram vendor only if available Ony if vram vendor info is available, show in sysfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.7.x	2024-01-25 15:44:11 -05:00
Lijo Lazar	90751bdeee	drm/amdgpu: Avoid fetching vram vendor information For GFX 9.4.3 APUs, the current method of fetching vram vendor information is not reliable. Avoid fetching the information. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.7.x	2024-01-25 15:41:57 -05:00
Victor Skvortsov	362936d613	amdgpu/drm: Use vram manager for virtualization page retirement In runtime, use vram manager for virtualization page retirement. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-25 14:58:03 -05:00
Victor Skvortsov	2474414c60	drm/amdgpu: Add RAS_POISON_READY host response message In a non-FLR page avoidance scenario, the host driver will provide the bad pages in the pf2vf exchange region. Adding a new host response message to indicate when the pf2vf exchange region has been updated. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-25 14:58:03 -05:00
YiPeng Chai	ed1e1e42fd	drm/amdgpu: Support passing poison consumption ras block to SRIOV Support passing poison consumption ras blocks to SRIOV. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-25 14:58:03 -05:00
Yang Wang	c0c48f0d61	drm/amdgpu: adjust aca init/fini sequence to match gpu reset - move aca init/fini function into ras init/fini to adapt gpu reset sequence. - add new function amdgpu_aca_reset() Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-25 14:58:02 -05:00
Yang Wang	6eb726a082	drm/amdgpu: add aca sysfs remove support add aca sysfs remove support. Fixes: `37973b69ea` ("drm/amdgpu: add aca sysfs support") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-25 14:58:02 -05:00
Mukul Joshi	c84a7e21db	drm/amdgpu: Fix module unload hang with RAS enabled The driver unload hangs because the page retirement kthread cannot be stopped as it is sleeping and waiting on page retirement event to occur. Add kthread_should_stop() to the event condition to wake up the kthread when kthread stop is called during driver unload. Fixes: `3fdcd0a31d` ("drm/amdgpu: Prepare for asynchronous processing of umc page retirement") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-25 14:57:52 -05:00
Alex Deucher	fc8f5a29d4	drm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Updated firmware is also required for AQL. Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-01-25 14:49:03 -05:00
Alex Deucher	ca01082353	drm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Updated firmware is also required for AQL. Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-01-25 14:48:58 -05:00
Tom St Denis	8352ca1090	drm/amd/amdgpu: Assign GART pages to AMD device mapping This allows kernel mapped pages like the PDB and PTB to be read via the iomem debugfs when there is no vram in the system. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-25 14:48:52 -05:00
Srinivasan Shanmugam	8d1717fb64	drm/amdgpu: Fix return type in 'aca_bank_hwip_is_matched()' Change the return type of "if (!bank \|\| type == ACA_HWIP_TYPE_UNKNOW)" to be bool instead of int. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c:185 aca_bank_hwip_is_matched() warn: signedness bug returning '(-22)' Fixes: `f5e4cc8461` ("drm/amdgpu: implement RAS ACA driver framework") Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-25 14:47:03 -05:00
Somalapuram Amaranath	a78a8da51b	drm/ttm: replace busy placement with flags v6 Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip v4: fix some typos pointed out by checkpatch v5: cleanup some rebase problems with VMWGFX v6: implement some missing VMWGFX functionality pointed out by Zack, rename the flags as suggested by Michel, rebase on drm-tip and adjust XE as well Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240112125158.2748-4-christian.koenig@amd.com	2024-01-25 09:59:44 +01:00
Mario Limonciello	7055c5856a	Revert "drm/amd/pm: fix the high voltage and temperature issue" This reverts commit `5f38ac54e6`. This causes issues with rebooting and the 7800XT. Cc: Kenneth Feng <kenneth.feng@amd.com> Cc: stable@vger.kernel.org Fixes: `5f38ac54e6` ("drm/amd/pm: fix the high voltage and temperature issue") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3062 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:28 -05:00
Yang Wang	2866a4549c	drm/amdgpu: skip call ras_late_init if ras block is not supported skip call ras_late_init callback if ras block is not supported. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:28 -05:00
Lijo Lazar	9c3f6e2c4e	drm/amdgpu: Show vram vendor only if available Ony if vram vendor info is available, show in sysfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:28 -05:00
Lijo Lazar	e0eb08dcec	drm/amdgpu: Avoid fetching vram vendor information For GFX 9.4.3 APUs, the current method of fetching vram vendor information is not reliable. Avoid fetching the information. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:28 -05:00
Tao Zhou	1757bb7dab	drm/amdgpu: update check condition of query for ras page retire Support page retirement handling in debug mode. v2: revert smu_v13_0_6_get_ecc_info directly. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:26 -05:00
Srinivasan Shanmugam	be91a828d0	drm/amdgpu: Cleanup inconsistent indenting in 'amdgpu_gfx_enable_kcq()' Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting Cc: Le Ma <Le.Ma@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:26 -05:00
YiPeng Chai	0795b5d234	drm/amdgpu:Support retiring multiple MCA error address pages Support retiring multiple MCA error address pages in one in-band query for umc v12_0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:25 -05:00
YiPeng Chai	afb617f38f	drm/amdgpu: add interface to check mca umc status Add interface to check mca umc status. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:25 -05:00
YiPeng Chai	6c23f3d12a	drm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning Use asynchronous polling to handle umc_v12_0 poisoning. v2: 1. Change function name. 2. Change the debugging information content. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:25 -05:00
Stanley.Yang	ee9c3031d0	drm/amdgpu: Fix ras features value calltrace The high three bits of ras features mask indicate socket id, it should skip to check high three bits of ras features mask before disable all ras features. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:25 -05:00
YiPeng Chai	3fdcd0a31d	drm/amdgpu: Prepare for asynchronous processing of umc page retirement Preparing for asynchronous processing of umc page retirement. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:25 -05:00
YiPeng Chai	22f6e3e112	drm/amdgpu: Add log info for umc_v12_0 Add log info for umc_v12_0. v2: Delete redundant logs. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-22 17:13:25 -05:00
Arunpravin Paneer Selvam	00a11f977b	drm/amdgpu: Enable seq64 manager and fix bugs - Enable the seq64 mapping sequence. - Fix wflinfo va conflict and other bugs. v1: - The seq64 area needs to be included in the AMDGPU_VA_RESERVED_SIZE otherwise the areas will conflict with user space allocations (Alex) - It needs to be mapped read only in the user VM (Alex) v2: - Instead of just one define for TOP/BOTTOM reserved space separate them into two (Christian) - Fix the CPU and VA calculations and while at it also cleanup error handling and kerneldoc (Christian) Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2024-01-22 17:13:18 -05:00
Linus Torvalds	e08b575815	drm fixes for 6.8-rc1 amdgpu: - DSC fixes - DC resource pool fixes - OTG fix - DML2 fixes - Aux fix - GFX10 RLC firmware handling fix - Revert a broken workaround for SMU 13.0.2 - DC writeback fix - Enable gfxoff when ROCm apps are active on gfx11 with the proper FW version amdkfd: - Fix dma-buf exports using GEM handles nouveau: - fix a unneeded WARN_ON triggering xe: - Fix for definition of wakeref_t - Fix for an error code aliasing - Fix for VM_UNBIND_ALL in the case there are no bound VMAs - Fixes for a number of __iomem address space mismatches reported by sparse - Fixes for the assignment of exec_queue priority - A Fix for skip_guc_pc not taking effect - Workaround for a build problem on GCC 11 - A couple of fixes for error paths - Fix a Flat CCS compression metadata copy issue - Fix a misplace array bounds checking - Don't have display support depend on EXPERT (as discussed on IRC) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmWqGdoACgkQDHTzWXnE hr5tiA//ZIos/mK+70JprAhJkXN/Lo5IDBOsldDQ1BkakkVLU1taHIsrER6iDT8g WmDzuC4ZIkHZyJ1V8zcIZ4wjE+sIOUeje0fSuMRgwPD+rrdn3WjUiAuvofuxQ5fD LFW+O/9hzTl3xBoxidPdupf33WGRMAKhNuYvhwfH14LaqNDVVdAHU7MPTORmFsyY CbCOLze5dwAmlB4rk+LDsO0gFtXjB7/Ewg2vHUzlEmaYmPpRxu9MHCpadQGq2Sal nwxwI5lb8DqR8jbel3pA0/kz06EdSKKr20YTdw+RVrp/tPqDq4njkZfJMPK+2VYf VpSPnGJOvvdtehCrKBBOBK4WRZWgbTUSuxrjgtIy+fp9aWt84NOEG2xDk1W+qWHS sqp5PeFb3meK1bsCdBBVSjj1fKApKmzJWqiCr0Z8dzV8QsZWSpPzhyfp3puxKESV dORGhMWdtdTMSPRysDUiTMGxn+fxaFTTx9Jsd0LbLizf/sob+ol9EjbNihaa5aKM FdHpUgO08X3OAppmcGmGQdIXG2K+TmKMsvguRAK98OVYXhVUpB/BrCGnd9eGyjYY mSQWrwKGm+Y/dGnr+ylHKxcyRQMmhc33gHmPAItNdMC8OemiHWZKKzoKBOzT2aME pZc6ZRJesG46bfrNqID6ASAIw6SEA+Zj3rk3QsWNPAbHVJcZboY= =LInS -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-01-19' of git://anongit.freedesktop.org/drm/drm Pull more drm fixes from Dave Airlie: "This is mostly amdgpu and xe fixes, with an amdkfd and nouveau fix thrown in. The amdgpu ones are just the usual couple of weeks of fixes. The xe ones are bunch of cleanups for the new xe driver, the fix you put in on the merge commit and the kconfig fix that was hiding the problem from me. amdgpu: - DSC fixes - DC resource pool fixes - OTG fix - DML2 fixes - Aux fix - GFX10 RLC firmware handling fix - Revert a broken workaround for SMU 13.0.2 - DC writeback fix - Enable gfxoff when ROCm apps are active on gfx11 with the proper FW version amdkfd: - Fix dma-buf exports using GEM handles nouveau: - fix a unneeded WARN_ON triggering xe: - Fix for definition of wakeref_t - Fix for an error code aliasing - Fix for VM_UNBIND_ALL in the case there are no bound VMAs - Fixes for a number of __iomem address space mismatches reported by sparse - Fixes for the assignment of exec_queue priority - A Fix for skip_guc_pc not taking effect - Workaround for a build problem on GCC 11 - A couple of fixes for error paths - Fix a Flat CCS compression metadata copy issue - Fix a misplace array bounds checking - Don't have display support depend on EXPERT (as discussed on IRC)" * tag 'drm-next-2024-01-19' of git://anongit.freedesktop.org/drm/drm: (71 commits) nouveau/vmm: don't set addr on the fail path to avoid warning drm/amdgpu: Enable GFXOFF for Compute on GFX11 drm/amd/display: Drop 'acrtc' and add 'new_crtc_state' NULL check for writeback requests. drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2" drm/amdkfd: init drm_client with funcs hook drm/amd/display: Fix a switch statement in populate_dml_output_cfg_from_stream_state() drm/amdgpu: Fix the null pointer when load rlc firmware drm/amd/display: Align the returned error code with legacy DP drm/amd/display: Fix DML2 watermark calculation drm/amd/display: Clear OPTC mem select on disable drm/amd/display: Port DENTIST hang and TDR fixes to OTG disable W/A drm/amd/display: Add logging resource checks drm/amd/display: Init link enc resources in dc_state only if res_pool presents drm/amd/display: Fix late derefrence 'dsc' check in 'link_set_dsc_pps_packet()' drm/amd/display: Avoid enum conversion warning drm/amd/pm: Fix smuv13.0.6 current clock reporting drm/amd/pm: Add error log for smu v13.0.6 reset drm/amdkfd: Fix 'node' NULL check in 'svm_range_get_range_boundaries()' drm/amdgpu: drop exp hw support check for GC 9.4.3 drm/amdgpu: move debug options init prior to amdgpu device init ...	2024-01-19 11:50:00 -08:00
Linus Torvalds	ed8d84530a	This cycle, I2C removes the currently unused CLASS_DDC support (controllers set the flag, but there is no client to use it). Also, CLASS_SPD support gets simplified to prepare removal in the future. Class based instantiation is not recommended these days anyhow. Furthermore, I2C core now creates a debugfs directory per I2C adapter. Current bus driver users were converted to use it. Then, there are also quite some driver updates. Standing out are patches for the wmt-driver which is refactored to support more variants. This is the rebased pull request where a large series for the designware driver was dropped. -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmWph0UPHHdzYUBrZXJu ZWwub3JnAAoJEBQN5MwUoCm2kbIQAJotSmX0mM+nNPReYCMMiloxoxUwgpiErNwY WDrYQSezthAJ1LDsGOEeLcE4f4I+UcUHBO1BoERtOZg3cGtE0Ii5N845sp100S9O ktyaKS5utoErymThWFFrnZX60/8yKXUMzZmNzy96560gPcxbFyyyVhKfBSPzK9T+ O8CGu7GRNqgWHlvH3yqGeCbreWYrYVSrluEpBu6807cp3zDxrU+autOnsewm5+md ka3DdqrbxJSblYK8fJKESAUgkRmZgYKbgl0iiCuqX+ib6I4OA3Z68ny7dl0fY3Ws vwt7d88SaBKDdJmUZyb/sm4aJsW69GN+ECZolxrn4TIw45k4tes2s6Ma5+TV3E9h Fd1RuqduFEqQ7cj31UPe2x8rgj5Fo5nbjCWxdZv+/3zF8+cHwi8iwkp2PScsPCsa fmCdehUE5DrgobsRNANe6XJzxY5wp2VNpGEWKeaQz2Z0/d9T1YFS7a8aewvhXoPC isZboi6GQh2XoE8UgGJa29VUuaIkUW513DwCGw8mz1yKN+kHGcsRXXjkjaZoQn3U MMvh/zkI2Hpy/m2R8PWeIq5XhLJvmlZ19JJzUHJIjXh9Fn9EVtXhlUleh6mzMfeM n8NOg7Eukep2sBgmaufkUKz2Jtogs59YDSXZEvqJjIkPM2Wi0hA18Qj+pilES1ff 3ckk3mxY =8D3Q -----END PGP SIGNATURE----- Merge tag 'i2c-for-6.8-rc1-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "This removes the currently unused CLASS_DDC support (controllers set the flag, but there is no client to use it). Also, CLASS_SPD support gets simplified to prepare removal in the future. Class based instantiation is not recommended these days anyhow. Furthermore, I2C core now creates a debugfs directory per I2C adapter. Current bus driver users were converted to use it. Finally, quite some driver updates. Standing out are patches for the wmt-driver which is refactored to support more variants. This is the rebased pull request where a large series for the designware driver was dropped" * tag 'i2c-for-6.8-rc1-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits) MAINTAINERS: use proper email for my I2C work i2c: stm32f7: add support for stm32mp25 soc i2c: stm32f7: perform I2C_ISR read once at beginning of event isr dt-bindings: i2c: document st,stm32mp25-i2c compatible i2c: stm32f7: simplify status messages in case of errors i2c: stm32f7: perform most of irq job in threaded handler i2c: stm32f7: use dev_err_probe upon calls of devm_request_irq i2c: i801: Add lis3lv02d for Dell XPS 15 7590 i2c: i801: Add lis3lv02d for Dell Precision 3540 i2c: wmt: Reduce redundant: REG_CR setting i2c: wmt: Reduce redundant: function parameter i2c: wmt: Reduce redundant: clock mode setting i2c: wmt: Reduce redundant: wait event complete i2c: wmt: Reduce redundant: bus busy check i2c: mux: reg: Remove class-based device auto-detection support i2c: make i2c_bus_type const dt-bindings: at24: add ROHM BR24G04 eeprom: at24: use of_match_ptr() i2c: cpm: Remove linux,i2c-index conversion from be32 i2c: imx: Make SDA actually optional for bus recovering ...	2024-01-18 17:29:01 -08:00
Ori Messinger	aa0901a900	drm/amdgpu: Enable GFXOFF for Compute on GFX11 On GFX version 11, GFXOFF was disabled due to a MES KIQ firmware issue, which has since been fixed after version 64. This patch only re-enables GFXOFF for GFX version 11 if the GPU's MES KIQ firmware version is newer than version 64. V2: Keep GFXOFF disabled on GFX11 if MES KIQ is below version 64. V3: Add parentheses to avoid GCC warning for parentheses: "suggest parentheses around comparison in operand of ‘&’" V4: Remove "V3" from commit title V5: Change commit description and insert 'Acked-by' Signed-off-by: Ori Messinger <Ori.Messinger@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-01-18 16:45:19 -05:00
Christian König	fb1c93c2e9	drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2" Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the HW in an active state and is an unbalanced use of the IP callbacks. Using the IP callbacks like this can lead to memory leaks, double free and imbalanced reference counters. Leaving the HW in an active state can lead to DMA accesses to memory now freed by the driver. Both is a complete no-go for driver unload so completely revert the workaround for now. This reverts commit `f5c7e77970`. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-01-18 16:43:42 -05:00
Flora Cui	3c4e4eb5d8	drm/amdkfd: init drm_client with funcs hook otherwise drm_client_dev_unregister() would try to kfree(&adev->kfd.client). Fixes: `1819200166` ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 16:42:54 -05:00
Ma Jun	bc03c02cc1	drm/amdgpu: Fix the null pointer when load rlc firmware If the RLC firmware is invalid because of wrong header size, the pointer to the rlc firmware is released in function amdgpu_ucode_request. There will be a null pointer error in subsequent use. So skip validation to fix it. Fixes: `3da9b71563` ("drm/amd: Use `amdgpu_ucode_*` helpers for GFX10") Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-01-18 16:34:45 -05:00
Tao Zhou	a9e4f61df1	drm/amdgpu: update error condition check for umc_v12_0_query_error_address Deferred error is also taken into account. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:47:24 -05:00
Stanley.Yang	601429cca9	drm/amdgpu: Skip do PCI error slot reset during RAS recovery Why: The PCI error slot reset maybe triggered after inject ue to UMC multi times, this caused system hang. [ 557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume [ 557.373718] [drm] PCIE GART of 512M enabled. [ 557.373722] [drm] PTB located at 0x0000031FED700000 [ 557.373788] [drm] VRAM is lost due to GPU reset! [ 557.373789] [drm] PSP is resuming... [ 557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset [ 557.547067] [drm] PCI error: detected callback, state(1)!! [ 557.547069] [drm] No support for XGMI hive yet... [ 557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter [ 557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations [ 557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered [ 557.610492] [drm] PCI error: slot reset callback!! ... [ 560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded! [ 560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded! [ 560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI [ 560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G OE 5.15.0-91-generic #101-Ubuntu [ 560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023 [ 560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu] [ 560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00 [ 560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202 [ 560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0 [ 560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010 [ 560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08 [ 560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000 [ 560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000 [ 560.803889] FS: 0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000 [ 560.812973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0 [ 560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 560.843444] PKRU: 55555554 [ 560.846480] Call Trace: [ 560.849225] <TASK> [ 560.851580] ? show_trace_log_lvl+0x1d6/0x2ea [ 560.856488] ? show_trace_log_lvl+0x1d6/0x2ea [ 560.861379] ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.867778] ? show_regs.part.0+0x23/0x29 [ 560.872293] ? __die_body.cold+0x8/0xd [ 560.876502] ? die_addr+0x3e/0x60 [ 560.880238] ? exc_general_protection+0x1c5/0x410 [ 560.885532] ? asm_exc_general_protection+0x27/0x30 [ 560.891025] ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.898323] amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.904520] process_one_work+0x228/0x3d0 How: In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:47:15 -05:00
Stanley.Yang	2c7a1560e8	drm/amdgpu: Show deferred error count for UMC Show deferred error count for UMC syfs node Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:47:07 -05:00
Ori Messinger	776b0953ab	drm/amdgpu: Enable GFXOFF for Compute on GFX11 On GFX version 11, GFXOFF was disabled due to a MES KIQ firmware issue, which has since been fixed after version 64. This patch only re-enables GFXOFF for GFX version 11 if the GPU's MES KIQ firmware version is newer than version 64. V2: Keep GFXOFF disabled on GFX11 if MES KIQ is below version 64. V3: Add parentheses to avoid GCC warning for parentheses: "suggest parentheses around comparison in operand of ‘&’" V4: Remove "V3" from commit title V5: Change commit description and insert 'Acked-by' Signed-off-by: Ori Messinger <Ori.Messinger@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:46:56 -05:00
Yang Wang	7ed97155b2	drm/amdgpu: fix UBSAN array-index-out-of-bounds for ras_block_string[] fix array index out of bounds issue for ras_block_string[] array. Fixes: `30df05fb74` ("drm/amdgpu: Align ras block enum with firmware") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:46:07 -05:00
YuanShang	b5387349ca	drm/amd/amdgpu: Update RLC_SPM_MC_CNT by ring wreg in guest Submit command of wreg in GFX and COMPUTE ring to update RLC_SPM_MC_CNT in guest machine during runtime. Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:45:58 -05:00
Felix Kuehling	5394fb2a5b	drm/amdgpu: Remove unnecessary NULL check A static checker pointed out, that bo_va->base.bo was already derefenced earlier in the same scope. Therefore this check is unnecessary here. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: `50661eb1a2` ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:44:49 -05:00
Christian König	087a3e13ec	drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2" Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the HW in an active state and is an unbalanced use of the IP callbacks. Using the IP callbacks like this can lead to memory leaks, double free and imbalanced reference counters. Leaving the HW in an active state can lead to DMA accesses to memory now freed by the driver. Both is a complete no-go for driver unload so completely revert the workaround for now. This reverts commit `f5c7e77970`. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:42:20 -05:00
Christophe JAILLET	8a1f7fddab	drm/amdgpu: Remove usage of the deprecated ida_simple_xx() API ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_range() is inclusive. So a -1 has been added when needed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:42:13 -05:00
Flora Cui	733965a90f	drm/amdkfd: init drm_client with funcs hook otherwise drm_client_dev_unregister() would try to kfree(&adev->kfd.client). Fixes: `1819200166` ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:41:08 -05:00
chenxuebing	7937b6f63f	drm/amdgpu: Clean up errors in umc_v6_0.c Fix the following errors reported by checkpatch: ERROR: space required after that ',' (ctx:VxV) Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:38:00 -05:00
chenxuebing	ac4d654f3d	drm/amdgpu: Clean up errors in clearstate_si.h Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-18 15:37:52 -05:00
Heiner Kallweit	e965a70727	drm: remove I2C_CLASS_DDC support After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC. Class-based device auto-detection is a legacy mechanism and shouldn't be used in new code. So we can remove this class completely now. Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>	2024-01-18 21:10:41 +01:00
chenxuebing	762343f79e	drm/amdgpu: Clean up errors in amdgpu.h Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:40 -05:00
chenxuebing	f5e1f90b67	drm/amdgpu: Clean up errors in amdgpu_gmc.c Fix the following errors reported by checkpatch: ERROR: need consistent spacing around '-' (ctx:WxV) Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:40 -05:00
chenxuebing	0b0fb6da9b	drm/amdgpu: Clean up errors in jpeg_v2_5.c Fix the following errors reported by checkpatch: ERROR: space required before the open parenthesis '(' ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:40 -05:00
chenxuebing	7230ebeb0a	drm/amdgpu: Clean up errors in gfx_v9_4.c Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:40 -05:00
chenxuebing	b679566bf0	drm/amdgpu: Clean up errors in amdgpu_drv.c Fix the following errors reported by checkpatch: ERROR: do not initialise globals to 0 Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:40 -05:00
chenxuebing	995d629f74	drm/amd: Clean up errors in amdgpu_vkms.c Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:39 -05:00
Ma Jun	849e133c97	drm/amdgpu: Fix the null pointer when load rlc firmware If the RLC firmware is invalid because of wrong header size, the pointer to the rlc firmware is released in function amdgpu_ucode_request. There will be a null pointer error in subsequent use. So skip validation to fix it. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:39 -05:00
Hawking Zhang	4e2965bd3b	drm/amdgpu: Centralize ras cap query to amdgpu_ras_check_supported Move ras capablity check to amdgpu_ras_check_supported. Driver will query ras capablity through psp interace, or vbios interface, or specific ip callbacks. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:39 -05:00
Hawking Zhang	0e14eb0cef	drm/amdgpu: Query ras capablity from psp v2 Instead of traditional atomfirmware interfaces for RAS capability, host driver can query ras capability from psp starting from psp v13_0_6. v2: drop redundant local variable from get_ras_capability. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:39 -05:00
chenxuebing	73888bad4d	drm/amdgpu: Clean up errors in amdgpu_rlc.c Fix the following errors reported by checkpatch: ERROR: space prohibited before that '++' (ctx:WxB) Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:38 -05:00
chenxuebing	1ed8ccf268	drm/amd: Clean up errors in sdma_v2_4.c Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line ERROR: trailing statements should be on next line Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:38 -05:00
chenxuebing	32a0a398fc	drm/amd/amdgpu: Clean up errors in amdgpu_umr.h Fix the following errors reported by checkpatch: spaces required around that '=' (ctx:VxV) Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:38 -05:00
chenxuebing	2bb012138d	drm/amdgpu: Clean up errors in amdgpu_atomfirmware.h Fix the following errors reported by checkpatch: ERROR: "foo* bar" should be "foo *bar" Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:38 -05:00
chenxuebing	05ec623147	drm/amdgpu: Clean up errors in clearstate_gfx9.h Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:38 -05:00
chenxuebing	2763da27f9	drm/amdgpu: Clean up errors in navi10_ih.c Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:38 -05:00
Alexander Richards	4630d5031c	drm/amdgpu: check PS, WS index Theoretically, it would be possible for a buggy or malicious VBIOS to overwrite past the bounds of the passed parameters (or its own workspace); add bounds checking to prevent this from happening. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3093 Signed-off-by: Alexander Richards <electrodeyt@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:37 -05:00
Hawking Zhang	30df05fb74	drm/amdgpu: Align ras block enum with firmware Driver and firmware share the same ras block enum. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:37 -05:00
Yang Wang	1714a1ffaf	drm/amdgpu: replace MCA macro with ACA for XGMI use new ACA macro to instead of MCA Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:37 -05:00
Candice Li	46e2231ce0	drm/amdgpu: Log deferred error separately Separate deferred error from UE and CE and log it individually. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:37 -05:00
Candice Li	9c97bf88f4	drm/amdgpu: Do bad page retirement for deferred errors Needs to do bad page retirement for deferred errors. v2: Drop unused dev_info. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:37 -05:00
Yang Wang	bbcbfd4363	drm/amdgpu: add xgmi v6.4.0 ACA support add xgmi v6.4.0 ACA driver support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:37 -05:00
Yang Wang	f45e6f2b5c	drm/amdgpu: add mmhub v1.8 ACA support v1: add mmhub v1.8 ACA driver support v2: use macro to define smn address value. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:36 -05:00
Yang Wang	373e970a4a	drm/amdgpu: add sdma v4.4.2 ACA support v1: add sdma v4.4.2 ACA driver support v2: use macro to define smn address value. v3: squash in fix for unbalanced irqs Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:36 -05:00
Yang Wang	0f3cd24e96	drm/amdgpu: add gfx v9.4.3 ACA support v1: add gfx v9.4.3 ACA driver support v2: use macro to define smn address value. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:36 -05:00
Ma Jun	e372baeb3d	drm/amdgpu: Check extended configuration space register when system uses large bar Some customer platforms do not enable mmconfig for various reasons, such as bios bug, and therefore cannot access the GPU extend configuration space through mmio. When the system enters the d3cold state and resumes, the amdgpu driver fails to resume because the extend configuration space registers of GPU can't be restored. At this point, Usually we only see some failure dmesg log printed by amdgpu driver, it is difficult to find the root cause. Therefor print a warnning message if the system can't access the extended configuration space register when using large bar. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:36 -05:00
Yang Wang	f38765de83	drm/amdgpu: add umc v12.0 ACA support add umc v12.0 ACA driver support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:36 -05:00
Yang Wang	37973b69ea	drm/amdgpu: add aca sysfs support add aca sysfs node support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:36 -05:00
Yang Wang	04c4fcd263	drm/amdgpu: add amdgpu ras aca query interface v1: add ACA error query interface v2: Add a new helper function to determine whether to use ACA or MCA. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:36 -05:00
Alex Deucher	26405ff430	drm/amdgpu: move kiq_reg_write_reg_wait() out of amdgpu_virt.c It's used for more than just SR-IOV now, so move it to amdgpu_gmc.c and rename it to better match the functionality and update the comments in the code paths to better document when each path is used and why. No functional change. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun.Liu@amd.com Cc: Christian.Koenig@amd.com	2024-01-15 18:35:36 -05:00
Alex Deucher	d3f452f3a0	drm/amdgpu: add new INFO IOCTL query for input power Some chips provide both average and input power. Previously we just exposed average power, add a new query for input power. Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:36 -05:00
Hawking Zhang	d4b9cfe2c7	drm/amdgpu: Query boot status if boot failed Check and report firmware boot status if it doesn't reach steady status. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:36 -05:00
Yang Wang	33dcda51e9	drm/amdgpu: add ACA bank dump debugfs support add ACA bank dump debugfs support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:35 -05:00
Yang Wang	0599849c32	drm/amdgpu: add ACA kernel hardware error log support add ACA kernel hardware error log support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:35 -05:00
Hawking Zhang	c8cb7e09db	drm/amdgpu: Query boot status if discovery failed Check and report boot status if discovery failed. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:35 -05:00
Hawking Zhang	cce4febb27	drm/amdgpu: Add ras helper to query boot errors v2 Add ras helper function to query boot time gpu errors. v2: use aqua_vanjaram smn addressing pattern Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:35 -05:00
Yang Wang	f5e4cc8461	drm/amdgpu: implement RAS ACA driver framework v1: implement new RAS ACA driver code framework. v2: - rename aca_bank_set to aca_banks. - rename aca_source_xxx to aca_handle_xxx. v3: Optimize some function implementation details. (from Hawking's suggestion) Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:35 -05:00
Hawking Zhang	ad390542ec	drm/amdgpu: Init pcie_index/data address as fallback (v2) To allow using this helper for indirect access when nbio funcs is not available. For instance, in ip discovery phase. v2: define macro for pcie_index/data/index_hi fallback. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:35 -05:00
Hawking Zhang	ea0f6dfeec	drm/amdgpu: drop psp v13 query_boot_status implementation Will replace it with new implementation to cover boot fails in ip discovery phase. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:35 -05:00
Hawking Zhang	ac3ff8a906	drm/amdgpu: Replace DRM_* with dev_* in amdgpu_psp.c So kernel message has the device pcie bdf information, which helps issue debugging especially in multiple GPU system. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:35 -05:00
Felix Kuehling	50661eb1a2	drm/amdgpu: Auto-validate DMABuf imports in compute VMs DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to validate them or add eviction fences to them. This patch automatically validates and fences dymanic DMABuf imports when they are added to a compute VM. Revalidation after evictions is handled in the VM code. v2: * Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate * Eliminated evicted_user state, use evicted state for VM BOs and user BOs * Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs * Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into amdgpu_vm_validate, outside the vm->status_lock * Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds without KFD v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the reservation with its fences is shared with the export, as long as all imports are from KFD, with the exports already reserved, validated and fenced by the KFD restore worker. v5: Reintroduced separate evicted_user state to simplify the state machine and CS error handling when amdgpu_vm_validate is called without a ticket. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:35:35 -05:00
Alex Deucher	c3d5e297dc	drm/amdgpu: drop exp hw support check for GC 9.4.3 No longer needed. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.7.x	2024-01-15 18:32:56 -05:00
Le Ma	51258acdc4	drm/amdgpu: move debug options init prior to amdgpu device init To bring debug options into effect in early initialization phase Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:32:54 -05:00
Le Ma	d20e1aec88	drm/amdgpu: add debug flag to place fw bo on vram for frontdoor loading Use debug_mask=0x8 param to help isolating data path issues on new systems in early phase. v2: rename the flag for explicitness (lijo) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:32:49 -05:00
Le Ma	6c5683bd9e	Revert "drm/amdgpu: add param to specify fw bo location for front-door loading" This reverts commit `c572abffe9`. Will use debug module param instead of independent module param. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:32:23 -05:00
Yifan Zhang	2b9a073b73	drm/amdgpu: update regGL2C_CTRL4 value in golden setting This patch to update regGL2C_CTRL4 in golden setting. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.7.x	2024-01-15 18:32:15 -05:00
Srinivasan Shanmugam	8a44fdd3cf	drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()' In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554. Cc: Monk Liu <Monk.Liu@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:32:06 -05:00
Srinivasan Shanmugam	8e8272f0dc	drm/amdgpu: Fix unsigned comparison with less than zero in vpe_u1_8_from_fraction() The variables 'numerator' and 'denominator', are unsigned 16-bit integer types, that can never be less than 0. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:62 vpe_u1_8_from_fraction() warn: unsigned 'numerator' is never less than zero. drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:63 vpe_u1_8_from_fraction() warn: unsigned 'denominator' is never less than zero. Cc: Peyton Lee <peytolee@amd.com> Cc: Lang Yu <lang.yu@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Peyton Lee <peyton.lee@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:32:03 -05:00
Srinivasan Shanmugam	fac4ebd79f	drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()' The amdgpu_gmc_vram_checking() function in emulation checks whether all of the memory range of shared system memory could be accessed by GPU, from this aspect, -EIO is returned for error scenarios. Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:919 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1103 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1223 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2344 gmc_v9_0_hw_init() warn: missing error code? 'r' Cc: Xiaojian Du <Xiaojian.Du@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:31:59 -05:00
Victor Lu	30d8dffab7	drm/amdgpu: Do not program VM_L2_CNTL under SRIOV VM_L2_CNTL* should not be programmed on driver unload under SRIOV. These regs are skipped during SRIOV driver init. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:31:56 -05:00
Yifan Zhang	c9edcc1864	drm/amdgpu: update ATHUB_MISC_CNTL offset for athub v3.3 This patch to update ATHUB_MISC_CNTL offset for athub v3.3 v2: correct a typo (Tim) v3: correct patch title (Lang) Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:31:45 -05:00
Alex Deucher	d02069850f	drm/amdgpu: fall back to INPUT power for AVG power via INFO IOCTL For backwards compatibility with userspace. Fixes: `47f1724db4` ("drm/amd: Introduce `AMDGPU_PP_SENSOR_GPU_INPUT_POWER`") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2897 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-15 18:31:24 -05:00
James Zhu	50e60184bf	drm/amdgpu: make a correction on comment Use a generic comment for AMDGPU_VM_RESERVED_VRAM size. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-09 15:44:13 -05:00
Felix Kuehling	c147ddc68e	drm/amdkfd: Fix sparse __rcu annotation warnings Properly mark kfd_process->ef as __rcu and consistently use the right accessor functions. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312052245.yFpBSgNH-lkp@intel.com/ Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-09 15:44:13 -05:00
Hawking Zhang	73cb81dc54	drm/amdgpu: Packed socket_id to ras feature mask Initialize RAS feature mask bit[31:29] with socket_id. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-09 15:44:13 -05:00
Candice Li	fb1e917199	drm/amdgpu: Support poison error injection via ras_ctrl debugfs Support poison error injection. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-09 15:44:13 -05:00
Likun Gao	f4a94dbb6d	drm/amdgpu: correct the cu count for gfx v11 Correct the algorithm of active CU to skip disabled sa for gfx v11. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-01-09 15:43:54 -05:00
Candice Li	90bd01471d	drm/amdgpu: Drop unnecessary sentences about CE and deferred error. Remove "no user action is needed" for correctable and deferred error to avoid confusion. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-09 15:43:54 -05:00
Dave Airlie	e54478fbda	amd-drm-next-6.8-2024-01-05: amdgpu: - VRR fixes - PSR-SU fixes - SubVP fixes - DCN 3.5 fixes - Documentation updates - DMCUB fixes - DML2 fixes - UMC 12.0 updates - GPUVM fix - Misc code cleanups and whitespace cleanups - DP MST fix - Let KFD sync with GPUVM fences - GFX11 reset fix - SMU 13.0.6 fixes - VSC fix for DP/eDP - Navi12 display fix - RN/CZN system aperture fix - DCN 2.1 bandwidth validation fix - DCN INIT cleanup amdkfd: - SVM fixes - Revert TBA/TMA location change -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZZh7ZQAKCRC93/aFa7yZ 2EPJAQDkoOlSjRLZoqwOPvBCo3WIzVO+4N6pOgohbTrjhHvDFAD+ONEgIH/wydk1 IOdtyizh9o7spo2qN2Oi06MDimclDg8= =bFa3 -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.8-2024-01-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.8-2024-01-05: amdgpu: - VRR fixes - PSR-SU fixes - SubVP fixes - DCN 3.5 fixes - Documentation updates - DMCUB fixes - DML2 fixes - UMC 12.0 updates - GPUVM fix - Misc code cleanups and whitespace cleanups - DP MST fix - Let KFD sync with GPUVM fences - GFX11 reset fix - SMU 13.0.6 fixes - VSC fix for DP/eDP - Navi12 display fix - RN/CZN system aperture fix - DCN 2.1 bandwidth validation fix - DCN INIT cleanup amdkfd: - SVM fixes - Revert TBA/TMA location change Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240105220522.4976-1-alexander.deucher@amd.com	2024-01-09 09:07:50 +10:00
Alex Deucher	16783d8ef0	drm/amdgpu: apply the RV2 system aperture fix to RN/CZN as well These chips needs the same fix. This was previously not seen on then since the AGP aperture expanded the system aperture, but this showed up again when AGP was disabled. Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-05 16:10:44 -05:00
Srinivasan Shanmugam	bf2ad4fb8a	drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()' Return value of container_of(...) can't be null, so null check is not required for 'fence'. Hence drop its NULL check. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c:93 to_amdgpu_amdkfd_fence() warn: can 'fence' even be NULL? Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-05 16:10:43 -05:00
Srinivasan Shanmugam	13a1851f92	drm/amdgpu: Fix 'fw' from request_firmware() not released in 'amdgpu_ucode_request()' Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c:1404 amdgpu_ucode_request() warn: 'fw' from request_firmware() not released on lines: 1404. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-05 16:10:43 -05:00
Srinivasan Shanmugam	4f32504a2f	drm/amdgpu: Fix variable 'mca_funcs' dereferenced before NULL check in 'amdgpu_mca_smu_get_mca_entry()' Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c:377 amdgpu_mca_smu_get_mca_entry() warn: variable dereferenced before check 'mca_funcs' (see line 368) 357 int amdgpu_mca_smu_get_mca_entry(struct amdgpu_device adev, enum amdgpu_mca_error_type type, 358 int idx, struct mca_bank_entry entry) 359 { 360 const struct amdgpu_mca_smu_funcs *mca_funcs = adev->mca.mca_funcs; 361 int count; 362 363 switch (type) { 364 case AMDGPU_MCA_ERROR_TYPE_UE: 365 count = mca_funcs->max_ue_count; mca_funcs is dereferenced here. 366 break; 367 case AMDGPU_MCA_ERROR_TYPE_CE: 368 count = mca_funcs->max_ce_count; mca_funcs is dereferenced here. 369 break; 370 default: 371 return -EINVAL; 372 } 373 374 if (idx >= count) 375 return -EINVAL; 376 377 if (mca_funcs && mca_funcs->mca_get_mca_entry) ^^^^^^^^^ Checked too late! Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-05 16:10:43 -05:00
Le Ma	c572abffe9	drm/amdgpu: add param to specify fw bo location for front-door loading This param can help isolating data path issues on new systems in early phase. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-05 16:10:43 -05:00
Srinivasan Shanmugam	8e317a811f	drm/amdgpu: Remove unreachable code in 'atom_skip_src_int()' Fixes the below: drivers/gpu/drm/amd/amdgpu/atom.c:398 atom_skip_src_int() warn: ignoring unreachable code. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-05 16:10:33 -05:00
Alex Deucher	fb915c87ed	drm/amdgpu: skip gpu_info fw loading on navi12 It's no longer required. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2318 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-05 16:05:08 -05:00
Hawking Zhang	6697dbf0af	Revert "drm/amdgpu: enable mca debug mode on APU by default" Not needed any more with firmware fixes Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-05 16:04:36 -05:00
Srinivasan Shanmugam	b8d55a90fd	drm/amdgpu: Fix possible NULL dereference in amdgpu_ras_query_error_status_helper() Return invalid error code -EINVAL for invalid block id. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1183 amdgpu_ras_query_error_status_helper() error: we previously assumed 'info' could be null (see line 1176) Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 11:16:06 -05:00
Srinivasan Shanmugam	5ce9a6ad8e	drm/amdgpu: Drop redundant unsigned >=0 comparision 'amdgpu_gfx_rlc_init_microcode()' unsigned int "version_minor" is always >= 0 Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c:534 amdgpu_gfx_rlc_init_microcode() warn: always true condition '(version_minor >= 0) => (0-u16max >= 0)' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 11:16:06 -05:00
Srinivasan Shanmugam	b57e3ca1fb	drm/amdgpu: Use kvcalloc instead of kvmalloc_array in amdgpu_cs_parser_bos() kvmalloc_array + __GFP_ZERO is the same with kvcalloc. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:873 amdgpu_cs_parser_bos() warn: Please consider using kvcalloc instead of kvmalloc_array Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 11:16:06 -05:00
Srinivasan Shanmugam	091411be7a	drm/amdgpu: Use kzalloc instead of kmalloc+__GFP_ZERO in amdgpu_ras.c Fixes the below smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2543 amdgpu_ras_recovery_init() warn: Please consider using kzalloc instead of kmalloc drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2830 amdgpu_ras_init() warn: Please consider using kzalloc instead of kmalloc Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 11:16:06 -05:00
Srinivasan Shanmugam	abaf0666a6	drm/amdgpu: Cleanup indenting in amdgpu_connector_dvi_detect() drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1106 amdgpu_connector_dvi_detect() warn: inconsistent indenting Fixes: `8a1de314d1` ("drm/amdgpu: Refactor 'amdgpu_connector_dvi_detect' in amdgpu_connectors.c") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 11:16:05 -05:00
Jack Xiao	4b5c5f5ad3	drm/amdgpu/gfx11: need acquire mutex before access CP_VMID_RESET v2 It's required to take the gfx mutex before access to CP_VMID_RESET, for there is a race condition with CP firmware to write the register. v2: add extra code to ensure the mutex releasing is successful. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 10:46:52 -05:00
Felix Kuehling	ec9ba4821f	drm/amdgpu: Let KFD sync with VM fences Change the rules for amdgpu_sync_resv to let KFD synchronize with VM fences on page table reservations. This fixes intermittent memory corruption after evictions when using amdgpu_vm_handle_moved to update page tables for VM mappings managed through render nodes. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 10:46:13 -05:00
Stanley.Yang	a32c6f7f57	drm/amdgpu: Fix ecc irq enable/disable unpaired The ecc_irq is disabled while GPU mode2 reset suspending process, but not be enabled during GPU mode2 reset resume process. Changed from V1: only do sdma/gfx ras_late_init in aldebaran_mode2_restore_ip delete amdgpu_ras_late_resume function Changed from V2: check umc ras supported before put ecc_irq Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 10:30:49 -05:00
Mangesh Gadre	5eb8094a9b	drm/amdgpu: Add register read/write debugfs support for AID's SMN address is larger than 32 bits for registers on different AID's Updating existing interface to support access to such registers. Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 10:30:19 -05:00
Dave Airlie	22a2decedf	Merge tag 'drm-msm-next-2023-12-15' of https://gitlab.freedesktop.org/drm/msm into drm-next Updates for v6.8: Core: - Add support for SDM670, SM8650 - Handle the CFG interconnect to fix the obscure hangs / timeouts on register write - Kconfig fix for QMP dependency - DT schema fixes DPU: - Add support for SDM670, SM8650 - Enable SmartDMA on SM8350 and SM8450 - Correct UBWC settings for SC8280XP - Fix catalog settings for SC8180X - Actually make use of the version to switch between QSEED3/3LITE/4 scalers - Use devres-managed and drm-managed allocations where appropriate - misc other fixes - Enabled YUV writeback on SC7280, SM8250 - Enabled writeback on SM8350, SM8450 - CRC fix when encoder is selected as the input source - other misc fixes MDP4: - Use devres-managed and drm-managed allocations where appropriate - flush vblank event on CRTC disable MDP5: - Use devres-managed and drm-managed allocations where appropriate DP: - Add support for SM8650 - Enable PM runtime support - Merge msm-specific debugfs dir with the generic one - Described DisplayPort on SM8150 in DeviceTree bindings - Moved dp_display_get_next_bridge() to probe() DSI: - Add support for SM8650 - Enable PM runtime support GPU/GEM: - demote userspace triggerable warnings to debug - add GEM object metadata UAPI - move GPU devcoredumps to GPU device - fix hangcheck to skip retired submits - expose UBWC config to userspace - fix a680 chip-id - drm_exec conversion - drm/ci: remove rebase-merge directory (to unblock CI) [airlied: fix drm_exec/amd interaction] Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs9auYqmo-7NSd9FsbNBCDf7aBevd=4xkcF3A5G_OGvMQ@mail.gmail.com	2023-12-20 07:54:03 +10:00
ZhenGuo Yin	87825c860e	drm/amdgpu: re-create idle bo's PTE during VM state machine reset Idle bo's PTE needs to be re-created when resetting VM state machine. Set idle bo's vm_bo as moved to mark it as invalid. Fixes: `55bf196f60` ("drm/amdgpu: reset VM when an error is detected") Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-19 14:59:03 -05:00
YiPeng Chai	99cab331a4	drm/amdgpu: Add umc page retirement for umc v12_0 Add umc page retirement for umc v12_0. V2: 1. Changed umc page retirement check condition to call umc_v12_0_is_uncorrectable_error. 2. Use memset to clear the contents of the umc error address structure. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-19 14:59:03 -05:00
YiPeng Chai	a8c77a121c	drm/amdgpu: Add poison mode check error condition for umc v12_0 Add poison mode check error condition for umc v12_0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-19 14:59:03 -05:00
YiPeng Chai	9f91e983ee	drm/amdgpu: MCA supports recording umc address information MCA supports recording umc address information. V2: Move err_addr variable from struct ras_err_node to struct ras_err_info. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-19 14:59:03 -05:00
James Zhu	0c8c0e7a9e	drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages Only schedule when hmm_range_fault returns error. Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-15 12:16:42 -05:00
James Zhu	78b4dfd359	drm/amdgpu: increase hmm range get pages timeout When application tries to allocate all system memory and cause memory to swap out. Needs more time for hmm_range_fault to validate the remaining page for allocation. To be safe, increase timeout value to 1 second for 64MB range. Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-15 12:16:34 -05:00
Alex Deucher	afe58346d5	drm/amdgpu/debugfs: fix error code when smc register accessors are NULL Should be -EOPNOTSUPP. Fixes: `5104fdf50d` ("drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-14 15:27:42 -05:00
Peyton Lee	5f82a0c90c	drm/amdgpu/vpe: enable vpe dpm enable vpe dpm Signed-off-by: Peyton Lee <peytolee@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-14 15:26:21 -05:00
Melissa Wen	b8b92c1bd7	drm/amd/display: add plane CTM driver-specific property Plane CTM for pre-blending color space conversion. Only enable driver-specific plane CTM property on drivers that support both pre- and post-blending gamut remap matrix, i.e., DCN3+ family. Otherwise it conflits with DRM CRTC CTM property. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-14 15:26:15 -05:00
Wang, Beyond	94aeb41173	drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap Issue: during evict or validate happened on amdgpu_bo, the 'from' and 'to' is always same in ftrace event of amdgpu_bo_move where calling the 'trace_amdgpu_bo_move', the comment says move_notify is called before move happens, but actually it is called after move happens, here the new_mem is same as bo->resource Fix: move trace_amdgpu_bo_move from move_notify to amdgpu_bo_move Signed-off-by: Wang, Beyond <Wang.Beyond@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-14 15:25:48 -05:00
Christian König	65d2765d62	drm/amdgpu: warn when there are still mappings when a BO is destroyed v2 This can only happen when there is a reference counting bug. v2: fix typo Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 16:33:02 -05:00
Christian König	683b8c7e7a	drm/amdgpu: fix tear down order in amdgpu_vm_pt_free When freeing PD/PT with shadows it can happen that the shadow destruction races with detaching the PD/PT from the VM causing a NULL pointer dereference in the invalidation code. Fix this by detaching the the PD/PT from the VM first and then freeing the shadow instead. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: https://gitlab.freedesktop.org/drm/amd/-/issues/2867 Cc: <stable@vger.kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 16:08:01 -05:00
Jani Nikula	d9501844d5	drm/amd: include drm/drm_edid.h only where needed Including drm_edid.h from amdgpu_mode.h causes the rebuild of literally hundreds of files when drm_edid.h is modified, while there are only a handful of files that actually need to include drm_edid.h. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 16:08:01 -05:00
Melissa Wen	0f5afa190b	drm/amd/display: add CRTC gamma TF driver-specific property Add AMD pre-defined transfer function property to default DRM CRTC gamma to convert to wire encoding with or without a user gamma LUT. There is no post-blending regamma ROM for pre-defined TF. When setting Gamma TF (!= Identity) and LUT at the same time, the color module will combine the pre-defined TF and the custom LUT values into the LUT that's actually programmed. v2: - enable CRTC prop in the end of driver-specific prop sequence - define inverse EOTFs as supported regamma TFs - reword driver-specific function doc to remove shaper/3D LUT v3: - spell out TF+LUT behavior in the commit and comments (Harry) Reviewed-by: Harry Wentland <harry.wentland@amd.com> Co-developed-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 16:08:01 -05:00
Joshua Ashton	0ef47454dc	drm/amd/display: add plane blend LUT and TF driver-specific properties Blend 1D LUT or a pre-defined transfer function (TF) can be set to linearize content before blending, so that it's positioned just before blending planes in the AMD color mgmt pipeline, and after 3D LUT (non-linear space). Shaper and Blend LUTs are 1D LUTs that sandwich 3D LUT. Drivers should advertize blend properties according to HW caps. There is no blend ROM for pre-defined TF. When setting blend TF (!= Identity) and LUT at the same time, the color module will combine the pre-defined TF and the custom LUT values into the LUT that's actually programmed. v3: - spell out TF+LUT behavior in the commit and comments (Harry) v5: - get blend blob correctly Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 16:08:00 -05:00
Melissa Wen	f545d82479	drm/amd/display: add plane shaper LUT and TF driver-specific properties On AMD HW, 3D LUT always assumes a preceding shaper 1D LUT used for delinearizing and/or normalizing the color space before applying a 3D LUT. Add pre-defined transfer function to enable delinearizing content with or without shaper LUT, where AMD color module calculates the resulted shaper curve. We apply an inverse EOTF to go from linear values to encoded values. If we are already in a non-linear space and/or don't need to normalize values, we can bypass shaper LUT with a linear transfer function that is also the default TF value. There is no shaper ROM. When setting shaper TF (!= Identity) and LUT at the same time, the color module will combine the pre-defined TF and the custom LUT values into the LUT that's actually programmed. v2: - squash commits for shaper LUT and shaper TF - define inverse EOTF as supported shaper TFs v3: - spell out TF+LUT behavior in the commit and comments (Harry) - replace BT709 EOTF by inv OETF v5: - get shaper blob correctly (Joshua) Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 16:08:00 -05:00
Jonathan Kim	bd33bb1409	drm/amdkfd: fix mes set shader debugger process management MES provides the driver a call to explicitly flush stale process memory within the MES to avoid a race condition that results in a fatal memory violation. When SET_SHADER_DEBUGGER is called, the driver passes a memory address that represents a process context address MES uses to keep track of future per-process calls. Normally, MES will purge its process context list when the last queue has been removed. The driver, however, can call SET_SHADER_DEBUGGER regardless of whether a queue has been added or not. If SET_SHADER_DEBUGGER has been called with no queues as the last call prior to process termination, the passed process context address will still reside within MES. On a new process call to SET_SHADER_DEBUGGER, the driver may end up passing an identical process context address value (based on per-process gpu memory address) to MES but is now pointing to a new allocated buffer object during KFD process creation. Since the MES is unaware of this, access of the passed address points to the stale object within MES and triggers a fatal memory violation. The solution is for KFD to explicitly flush the process context address from MES on process termination. Note that the flush call and the MES debugger calls use the same MES interface but are separated as KFD calls to avoid conflicting with each other. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Alice Wong <shiwei.wong@amd.com> Reviewed-by: Eric Huang <jinhuieric.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 16:07:43 -05:00
Mario Limonciello	c01b9be7b2	drm/amd: Fix a probing order problem on SDMA 2.4 commit `751e293f2c` ("drm/amd: Move microcode init from sw_init to early_init for SDMA v2.4") made a fateful mistake in `adev->sdma.num_instances` wasn't declared when sdma_v2_4_init_microcode() was run. This caused probing to fail. Move the declaration to right before sdma_v2_4_init_microcode(). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3043 Fixes: `751e293f2c` ("drm/amd: Move microcode init from sw_init to early_init for SDMA v2.4") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:28:55 -05:00
Hawking Zhang	058eb51912	drm/amdgpu: Switch to aca bank for xgmi pcs err cnt Instead of software managed counters. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:28:47 -05:00
Melissa Wen	671994e3bf	drm/amd/display: add plane 3D LUT driver-specific properties Add 3D LUT property for plane color transformations using a 3D lookup table. 3D LUT allows for highly accurate and complex color transformations and is suitable to adjust the balance between color channels. It's also more complex to manage and require more computational resources. Since a 3D LUT has a limited number of entries in each dimension we want to use them in an optimal fashion. This means using the 3D LUT in a colorspace that is optimized for human vision, such as sRGB, PQ, or another non-linear space. Therefore, userpace may need one 1D LUT (shaper) before it to delinearize content and another 1D LUT after 3D LUT (blend) to linearize content again for blending. The next patches add these 1D LUTs to the plane color mgmt pipeline. v3: - improve commit message about 3D LUT - describe the 3D LUT entries and size (Harry) v4: - advertise 3D LUT max size as the size of a single-dimension v5: - get lut3d blob correctly (Joshua) - fix doc about 3d-lut dimension size (Sebastian) Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:28:38 -05:00
Friedrich Vock	91963397c4	drm/amdgpu: Enable tunneling on high-priority compute queues This improves latency if the GPU is already busy with other work. This is useful for VR compositors that submit highly latency-sensitive compositing work on high-priority compute queues while the GPU is busy rendering the next frame. Userspace merge request: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462 v2: bump driver version (Alex) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:23:59 -05:00
Alex Deucher	94b1e028e1	drm/amdgpu/sdma5.2: add begin/end_use ring callbacks Add begin/end_use ring callbacks to disallow GFXOFF when SDMA work is submitted and allow it again afterward. This should avoid corner cases where GFXOFF is erroneously entered when SDMA is still active. For now just allow/disallow GFXOFF in the begin and end helpers until we root cause the issue. This should not impact power as SDMA usage is pretty minimal and GFXOSS should not be active when SDMA is active anyway, this just makes it explicit. v2: move everything into sdma5.2 code. No reason for this to be generic at this point. v3: Add comments in new code Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2220 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> (v1) Tested-by: Mario Limonciello <mario.limonciello@amd.com> (v1) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.15+	2023-12-13 15:23:51 -05:00
Evan Quan	b8b39de646	drm/amd/pm: setup the framework to support Wifi RFI mitigation feature With WBRF feature supported, as a driver responding to the frequencies, amdgpu driver is able to do shadow pstate switching to mitigate possible interference(between its (G-)DDR memory clocks and local radio module frequency bands used by Wifi 6/6e/7). -- v1->v2: - update the prompt for feature support(Lijo) v8->v9: - update parameter document for smu_wbrf_event_handler(Simon) v9->v10: v10->v11: - correct the logics for wbrf range sorting(Lijo) v13: - Fix the format issue (IIpo Jarvinen) Signed-off-by: Evan Quan <quanliangl@hotmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:23:50 -05:00
Felix Kuehling	0188006d7c	drm/amdkfd: Import DMABufs for interop through DRM Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This ensures that a GEM handle is created on import and that obj->dma_buf will be set and remain set as long as the object is imported into KFD. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Xiaogang.Chen <Xiaogang.Chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:09:55 -05:00
Felix Kuehling	1819200166	drm/amdkfd: Export DMABufs from KFD using GEM handles Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM handles are created in a drm_client_dev context to avoid exposing them in user mode contexts through a DMABuf import. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:09:55 -05:00
Vignesh Chander	4e95669ecb	drm/amdgpu: xgmi_fill_topology_info 1. Use the mirrored topology info to fill links for VF. The new solution is required to simplify and optimize host driver logic. Only use the new solution for VFs that support full duplex and extended_peer_link_info otherwise the info would be incomplete. 2. avoid calling extended_link_info on VF as its not supported Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:09:55 -05:00
Joshua Ashton	ec7b2a5546	drm/amd/display: add plane HDR multiplier driver-specific property Multiplier to 'gain' the plane. When PQ is decoded using the fixed func transfer function to the internal FP16 fb, 1.0 -> 80 nits (on AMD at least) When sRGB is decoded, 1.0 -> 1.0. Therefore, 1.0 multiplier = 80 nits for SDR content. So if you want, 203 nits for SDR content, pass in (203.0 / 80.0). v4: - comment about the PQ TF need for L-to-NL (from Harry's review) Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Joshua Ashton <joshua@froggi.es> Co-developed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:09:55 -05:00
Joshua Ashton	d5a348d96e	drm/amd/display: add plane degamma TF driver-specific property Allow userspace to tell the kernel driver the input space and, therefore, uses correct predefined transfer function (TF) to go from encoded values to linear values. v2: - rename TF enum prefix from DRM_ to AMDGPU_ (Harry) - remove HLG TF Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Joshua Ashton <joshua@froggi.es> Co-developed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:09:54 -05:00
Saleemkhan Jamadar	b70aed8f5d	drm/amdgpu/jpeg: configure doorbell for each playback Doorbell is configured during start of each playback. v1 - add comment for the doorbell programming change Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:09:53 -05:00
Lijo Lazar	ed342a2e78	drm/amdgpu: Use the right method to get IP version Replace direct usage of adev->ip_versions with amdgpu_ip_version. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:09:53 -05:00
Melissa Wen	9342a9ae54	drm/amd/display: add driver-specific property for plane degamma LUT Hook up driver-specific atomic operations for managing AMD color properties. Create AMD driver-specific color management properties and attach them according to HW capabilities defined by `struct dc_color_caps`. First add plane degamma LUT properties that means user-blob and its size. We will add more plane color properties in the next patches. In addition, we define AMD_PRIVATE_COLOR to guard these driver-specific plane properties. Plane degamma can be used to linearize input space for arithmetical operations that are more accurate when applied in linear color. v2: - update degamma LUT prop description - move private color operations from amdgpu_display to amdgpu_dm_color v5: - get degamma blob correctly (Joshua) Reviewed-by: Harry Wentland <harry.wentland@amd.com> Co-developed-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:09:53 -05:00
Alex Deucher	51ea405c47	drm/amdgpu: fix buffer funcs setting order on suspend harder Part of commit `c035819862` ("drm/amdgpu: fix buffer funcs setting order on suspend") got dropped accidently. Add it back. Fixes: `c035819862` ("drm/amdgpu: fix buffer funcs setting order on suspend") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-13 15:09:52 -05:00
Dave Airlie	a0a28956b4	amd-drm-next-6.8-2023-12-08: amdgpu: - SR-IOV fixes - DCN 3.5 updates - Backlight fixes - MST fixes - DMCUB fixes - DPIA fixes - Display powergating updates - Enable writeback connectors - Misc code cleanups - Add more register state debugging for aquavanjaram - Suspend fix - Clockgating fixes - SMU 14 updates - PSR fixes - MES logging updates - Misc fixes amdkfd: - SVM fix radeon: - Fix potential memory leaks in error paths -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZXOBHwAKCRC93/aFa7yZ 2E9hAP9Dqv/NKWf9aMZ01msdOysScu7FmAsnFrXjR0Mx0zX4WQD/er2/UrSR146+ VDaxFbcjINoq0Q8b7CWLG4gy5Q9fDAo= =JU7h -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.8-2023-12-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.8-2023-12-08: amdgpu: - SR-IOV fixes - DCN 3.5 updates - Backlight fixes - MST fixes - DMCUB fixes - DPIA fixes - Display powergating updates - Enable writeback connectors - Misc code cleanups - Add more register state debugging for aquavanjaram - Suspend fix - Clockgating fixes - SMU 14 updates - PSR fixes - MES logging updates - Misc fixes amdkfd: - SVM fix radeon: - Fix potential memory leaks in error paths Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231208205613.4861-1-alexander.deucher@amd.com	2023-12-13 15:55:55 +10:00
Dave Airlie	c1ee197d64	Linux 6.7-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmV2PMQeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhgQIAKSdjtpkv+IeDwph umLHxDlCTT8nl+TAhEYuxSdMiHt1i7+zyF0snnHhGthBRnBZBf+PDdkQOmqtjLZj e591VGY7tdNCAVUwmRRaI82JFz9Pl1k8DXL3f+CE1+MfbpsirAecoIAL0rsvg+4f ICKSAOBZonL7GT7IGrsbxgqGKCLC+2PcgNT4pADBdjtbC1nmVSLK21dtWr6i4xsn CT9MxAnej1+EOpreyHhNZUVqfQy0/04x4OY0u/EOXU3PgpfGTrAzkyfwzcd/eit0 x59TZ0owDVXfvvdXUUAC0zG1th2ek5LKNtL+C4rKOoAbq7GZoQfjN5M/Ug+yykwE aqm6CJ8= =tZ+R -----END PGP SIGNATURE----- Backmerge tag 'v6.7-rc5' into drm-next Linux 6.7-rc5 Alex requested this for some amdkfd work relying on the symbols exports. Signed-off-by: Dave Airlie <airlied@redhat.com>	2023-12-12 11:32:33 +10:00
Rob Clark	05d249352f	drm/exec: Pass in initial # of objects In cases where the # is known ahead of time, it is silly to do the table resize dance. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Christian König <christian.koenig@amd.com> Patchwork: https://patchwork.freedesktop.org/patch/568338/	2023-12-10 10:38:47 -08:00
Dave Airlie	a60501d7c2	drm-misc-next for 6.8: UAPI Changes: - Remove Userspace Mode-Setting ioctls - v3d: New uapi to handle jobs involving the CPU Cross-subsystem Changes: Core Changes: - atomic: Add support for FB-less planes which got reverted a bit later for lack of IGT tests and userspace code, Dump private objects state in drm_state_dump. - dma-buf: Add fence deadline support - encoder: Create per-encoder debugfs directory, move the bridge chain file to that directory Driver Changes: - Include drm_auth.h in driver that use it but don't include it, Drop drm_plane_helper.h from drivers that include it but don't use it - imagination: Plenty of small fixes - panfrost: Improve interrupt handling at poweroff - qaic: Convert to persistent DRM devices - tidss: Support for the AM62A7, a few probe improvements, some cleanups - v3d: Support for jobs involving the CPU - bridge: - Create transparent aux-bridge for DP/USB-C - lt8912b: Add suspend/resume support and power regulator support - panel: - himax-hx8394: Drop prepare, unprepare and shutdown logic, Support panel rotation - New panels: BOE BP101WX1-100, Powkiddy X55, Ampire AM8001280G, Evervision VGG644804, SDC ATNA45AF01 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCZXGXPAAKCRDj7w1vZxhR xblIAP4tz7X0mqa7KFr6L0sIfVILHANho4L81IUFi+kpTn64WgEAnqJ3IqJtdLbi czXzgJPae3Ifstm7CW7+72fLQCR6Cg8= =pU2c -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2023-12-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 6.8: UAPI Changes: - Remove Userspace Mode-Setting ioctls - v3d: New uapi to handle jobs involving the CPU Cross-subsystem Changes: Core Changes: - atomic: Add support for FB-less planes which got reverted a bit later for lack of IGT tests and userspace code, Dump private objects state in drm_state_dump. - dma-buf: Add fence deadline support - encoder: Create per-encoder debugfs directory, move the bridge chain file to that directory Driver Changes: - Include drm_auth.h in driver that use it but don't include it, Drop drm_plane_helper.h from drivers that include it but don't use it - imagination: Plenty of small fixes - panfrost: Improve interrupt handling at poweroff - qaic: Convert to persistent DRM devices - tidss: Support for the AM62A7, a few probe improvements, some cleanups - v3d: Support for jobs involving the CPU - bridge: - Create transparent aux-bridge for DP/USB-C - lt8912b: Add suspend/resume support and power regulator support - panel: - himax-hx8394: Drop prepare, unprepare and shutdown logic, Support panel rotation - New panels: BOE BP101WX1-100, Powkiddy X55, Ampire AM8001280G, Evervision VGG644804, SDC ATNA45AF01 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/yu5heqaufyeo4nlowzieu4s5unwqrqyx4jixbfjmzdon677rpk@t53vceua2dao	2023-12-08 16:27:00 +10:00
shaoyunl	47c4533543	drm/amdgpu: Enable event log on MES 11 Enable event log through the HW specific FW API Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-07 17:43:28 -05:00
shaoyunl	b2662d4cc4	drm/amdgpu: SW part of MES event log enablement This is the generic SW part, prepare the event log buffer and dump it through debugfs Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-07 17:43:13 -05:00
Alex Deucher	dab96d8b61	drm/amdgpu: fix buffer funcs setting order on suspend We need to disable this after the last eviction call, but before we disable the SDMA IP. Fixes: `b70438004a` ("drm/amdgpu: move buffer funcs setting up a level") Link: https://lore.kernel.org/r/87edgv4x3i.fsf@vps.thesusis.net Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Tested-by: Phillip Susi <phill@thesusis.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Phillip Susi <phill@thesusis.net> Cc: Luben Tuikov <ltuikov89@gmail.com>	2023-12-06 16:05:32 -05:00
Lijo Lazar	27b024a88a	drm/amdgpu: Avoid querying DRM MGCG status MP0 v13.0.6 SOCs don't support DRM MGCG. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 16:05:32 -05:00
Lijo Lazar	555e39f027	drm/amdgpu: Update HDP 4.4.2 clock gating flags HDP 4.4.2 clockgating is enabled by default, update the flags accordingly. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 16:05:32 -05:00
Lijo Lazar	81577503ef	drm/amdgpu: Add NULL checks for function pointers Check if function is implemented before making the call. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 16:05:32 -05:00
Lijo Lazar	6fce23a4d8	drm/amdgpu: Restrict extended wait to PSP v13.0.6 Only PSPv13.0.6 SOCs take a longer time to reach steady state. Other PSPv13 based SOCs don't need extended wait. Also, reduce PSPv13.0.6 wait time. Cc: stable@vger.kernel.org Fixes: `fc59889071` ("drm/amdgpu: update retry times for psp vmbx wait") Fixes: `d8c1925ba8` ("drm/amdgpu: update retry times for psp BL wait") Link: https://lore.kernel.org/amd-gfx/34dd4c66-f7bf-44aa-af8f-c82889dd652c@amd.com/ Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 16:05:32 -05:00
Yang Wang	dbf3850d12	drm/amdgpu: optimize the printing order of error data sort error data list to optimize the printing order. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 16:05:32 -05:00
Hawking Zhang	0e8af20517	drm/amdgpu: Update fw version for boot time error query Boot time error query is not available until fw a10109 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 16:05:32 -05:00
Yang Wang	37c57631c1	drm/amd/pm: support new mca smu error code decoding support new mca smu error code decoding from smu 85.86.0 for smu v13.0.6 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 16:05:32 -05:00
Jiadong Zhu	d6a5758866	drm/amdgpu: disable MCBP by default Disable MCBP(mid command buffer preemption) by default as old Mesa hangs with it. We shall not enable the feature that breaks old usermode driver. Fixes: `50a7c8765c` ("drm/amdgpu: enable mcbp by default on gfx9") Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-12-06 15:39:59 -05:00
Bokun Zhang	e17768691d	drm/amd/amdgpu: SRIOV full reset issue with VCN - After a full reset, VF's FB will be cleaned. This includes the VCN's fw_shared memory. However, there is no suspend-resume routine for SRIOV VF. Therefore, the data in the fw_shared memory will be lost forever and it causes engine hang later on. We must repopulate the data in fw_shared during SRIOV hw_init Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:37 -05:00
Alex Deucher	c035819862	drm/amdgpu: fix buffer funcs setting order on suspend We need to disable this after the last eviction call, but before we disable the SDMA IP. Fixes: `b70438004a` ("drm/amdgpu: move buffer funcs setting up a level") Link: https://lore.kernel.org/r/87edgv4x3i.fsf@vps.thesusis.net Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Tested-by: Phillip Susi <phill@thesusis.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Phillip Susi <phill@thesusis.net> Cc: Luben Tuikov <ltuikov89@gmail.com>	2023-12-06 15:22:37 -05:00
Lijo Lazar	b12fb29539	drm/amdgpu: Avoid querying DRM MGCG status MP0 v13.0.6 SOCs don't support DRM MGCG. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:37 -05:00
Lijo Lazar	828afefd4b	drm/amdgpu: Update HDP 4.4.2 clock gating flags HDP 4.4.2 clockgating is enabled by default, update the flags accordingly. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:37 -05:00
Lijo Lazar	6146081d58	drm/amdgpu: Add NULL checks for function pointers Check if function is implemented before making the call. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:37 -05:00
Lijo Lazar	4657b3e456	drm/amdgpu: Restrict extended wait to PSP v13.0.6 Only PSPv13.0.6 SOCs take a longer time to reach steady state. Other PSPv13 based SOCs don't need extended wait. Also, reduce PSPv13.0.6 wait time. Cc: stable@vger.kernel.org Fixes: `fc59889071` ("drm/amdgpu: update retry times for psp vmbx wait") Fixes: `d8c1925ba8` ("drm/amdgpu: update retry times for psp BL wait") Link: https://lore.kernel.org/amd-gfx/34dd4c66-f7bf-44aa-af8f-c82889dd652c@amd.com/ Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:37 -05:00
Lijo Lazar	650f0487d6	drm/amdgpu: Read aquavanjaram USR register state Add support to read state of USR links in aquavanjaram SOC. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:37 -05:00
Lijo Lazar	13ac7c0e30	drm/amdgpu: Read aquavanjaram WAFL register state Add support to read state of WAFL links in aquavanjaram SOC. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:36 -05:00
Yang Wang	04a71f1104	drm/amdgpu: optimize the printing order of error data sort error data list to optimize the printing order. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:36 -05:00
Hawking Zhang	71a9d7a2a1	drm/amdgpu: Update fw version for boot time error query Boot time error query is not available until fw a10109 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:36 -05:00
Yang Wang	0d65efcbe3	drm/amd/pm: support new mca smu error code decoding support new mca smu error code decoding from smu 85.86.0 for smu v13.0.6 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:36 -05:00
Alex Hung	f872e2f5f0	drm/amd/display: Add writeback enable field (wb_enabled) [WHAT] Add a new field to keep track whether a crtc is previously writeback-enabled. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:35 -05:00
Alex Hung	c81e13b929	drm/amd/display: Hande writeback request from userspace [WHAT] Handle writeback requests and fill in the required information for DWB programming and setup. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:35 -05:00
Bokun Zhang	d5e78f1c26	drm/amd/amdgpu: Move vcn4 fw_shared init to a single function - Move VCN4's fw_shared initialization to a separated function. This way, the function can be reused at different locations. Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-06 15:22:20 -05:00
Jiadong Zhu	fd2ef5fa35	drm/amdgpu: disable MCBP by default Disable MCBP(mid command buffer preemption) by default as old Mesa hangs with it. We shall not enable the feature that breaks old usermode driver. Fixes: `50a7c8765c` ("drm/amdgpu: enable mcbp by default on gfx9") Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-12-06 15:21:24 -05:00
Dave Airlie	5edfd7d94b	amd-drm-next-6.8-2023-12-01: amdgpu: - Add new 64 bit sequence number infrastructure. This will ultimately be used for user queue synchronization. - GPUVM updates - Misc code cleanups - RAS updates - DCN 3.5 updates - Rework PCIe link speed handling - Document GPU reset types - DMUB fixes - eDP fixes - NBIO 7.9 updates - NBIO 7.11 updates - SubVP updates - DCN 3.1.4 fixes - ABM fixes - AGP aperture fix - DCN 3.1.5 fix - Fix some potential error path memory leaks - Enable PCIe PMEs - Add XGMI, PCIe state dumping for aqua vanjaram - GFX11 golden register updates - Misc display fixes amdkfd: - Migrate TLB flushing logic to amdgpu - Trap handler fixes - Fix restore workers handling on suspend and reset - Fix possible memory leak in pqm_uninit() radeon: - Fix some possible overflows in command buffer checking - Check for errors in ring_lock -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZWohuwAKCRC93/aFa7yZ 2A6XAP9a6/3noDYxabnLmr3B+13wtweSkvEwD2bjOk7KinyS6AEAr8H3HZ5U0HMP zj35QhcNcYaTsjrip7e53XunpMBQSwQ= =DvId -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.8-2023-12-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.8-2023-12-01: amdgpu: - Add new 64 bit sequence number infrastructure. This will ultimately be used for user queue synchronization. - GPUVM updates - Misc code cleanups - RAS updates - DCN 3.5 updates - Rework PCIe link speed handling - Document GPU reset types - DMUB fixes - eDP fixes - NBIO 7.9 updates - NBIO 7.11 updates - SubVP updates - DCN 3.1.4 fixes - ABM fixes - AGP aperture fix - DCN 3.1.5 fix - Fix some potential error path memory leaks - Enable PCIe PMEs - Add XGMI, PCIe state dumping for aqua vanjaram - GFX11 golden register updates - Misc display fixes amdkfd: - Migrate TLB flushing logic to amdgpu - Trap handler fixes - Fix restore workers handling on suspend and reset - Fix possible memory leak in pqm_uninit() radeon: - Fix some possible overflows in command buffer checking - Check for errors in ring_lock From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231201181743.5313-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2023-12-05 12:11:41 +10:00
Yang Wang	f875f61b1f	drm/amdgpu: enable mca debug mode on APU by default enable MCA debug mode on APU device by default. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-30 18:26:31 -05:00
Likun Gao	9596ffe1cc	drm/amdgpu: distinguish rlc fw for different SKU For some SKU, rlc firmware should use different one compared with the normal rlc firmware. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-30 18:26:31 -05:00
Yang Wang	00f9d49bce	drm/amdgpu: Fix missing mca debugfs node Use amdgpu_ip_version() helper function to check ip version. The ip version contains other information, use the helper function to avoid reading wrong value. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-30 18:26:22 -05:00
ZhenGuo Yin	04fcc3fec5	drm/amdgpu: Skip access gfx11 golden registers under SRIOV [Why] Golden registers are PF-only registers on gfx11. RLCG interface will return "out-of-range" under SRIOV VF. [How] Skip access gfx11 golden registers under SRIOV. Reviewed-by: Horace Chen <horace.chen@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-30 18:21:30 -05:00
Lijo Lazar	ed6e4f0a27	drm/amdgpu: Use another offset for GC 9.4.3 remap The legacy region at 0x7F000 maps to valid registers in GC 9.4.3 SOCs. Use 0x1A000 offset instead as MMIO register remap region. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 18:13:35 -05:00
Candice Li	e0409021e3	drm/amdgpu: Update EEPROM I2C address for smu v13_0_0 Check smu v13_0_0 SKU type to select EEPROM I2C address. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x	2023-11-29 18:11:39 -05:00
Lu Yao	2161e09cd0	drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init 'didt_rreg' and 'didt_wreg' to 'NULL'. But in func 'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT' lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use these two definitions won't be added for 'AMDGPU_FAMILY_SI'. So, add null pointer judgment before calling. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 18:09:53 -05:00
Mario Limonciello	6967741d26	drm/amd: Enable PCIe PME from D3 When dGPU is put into BOCO it may be in D3cold but still able send PME on display hotplug event. For this to work it must be enabled as wake source from D3. When runpm is enabled use pci_wake_from_d3() to mark wakeup as enabled by default. Cc: stable@vger.kernel.org # 6.1+ Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 18:09:34 -05:00
Alex Deucher	e222b36e96	drm/amdgpu: fix AGP addressing when GART is not at 0 This worked by luck if the GART aperture ended up at 0. When we ended up moving GART on some chips, the GART aperture ended up offsetting the AGP address since the resource->start is a GART offset, not an MC address. Fix this by moving the AGP address setup into amdgpu_bo_gpu_offset_no_check(). v2: check mem_type before checking agp v3: check if the ttm bo has a ttm_tt allocated yet Fixes: `67318cb843` ("drm/amdgpu/gmc11: set gart placement GC11") Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Jesse Zhang <Jesse.Zhang@amd.com> Reported-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: christian.koenig@amd.com Cc: mario.limonciello@amd.com	2023-11-29 18:09:00 -05:00
Prike Liang	c6df7f3137	drm/amdgpu: correct the amdgpu runtime dereference usage count Fix the amdgpu runpm dereference usage count. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-29 18:04:55 -05:00
Tim Huang	6b0b7789a7	drm/amdgpu: fix memory overflow in the IB test Fix a memory overflow issue in the gfx IB test for some ASICs. At least 20 bytes are needed for the IB test packet. v2: correct code indentation errors. (Christian) Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-29 18:03:11 -05:00
Li Ma	5c908a3586	drm/amdgpu: add init_registers for nbio v7.11 enable init_registers callback func for nbio v7.11. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 18:02:36 -05:00
Alex Sierra	4b27a33c3b	drm/amdgpu: Force order between a read and write to the same address Setting register to force ordering to prevent read/write or write/read hazards for un-cached modes. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x	2023-11-29 17:57:33 -05:00
Hawking Zhang	884e9b0827	drm/amdgpu: Do not issue gpu reset from nbio v7_9 bif interrupt In nbio v7_9, host driver should not issu gpu reset Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 17:57:02 -05:00
Perry Yuan	8c4e9105b2	drm/amdgpu: optimize RLC powerdown notification on Vangogh The smu needs to get the rlc power down message to sync the rlc state with smu, the rlc state updating message need to be sent at while smu begin suspend sequence , otherwise SMU will crash while RLC state is not notified by driver, and rlc state probally changed after that notification, so it needs to notify rlc state to smu at the end of the suspend sequence in amdgpu_device_suspend() that can make sure the rlc state is correctly set to SMU. [ 101.000590] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 [ 101.000598] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff! [ 110.838026] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 [ 110.838035] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features. [ 110.838039] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features! [ 110.838040] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] ERROR suspend of IP block <smu> failed -62 [ 110.884394] PM: suspend of devices aborted after 21213.620 msecs [ 110.884402] PM: start suspend of devices aborted after 21213.882 msecs [ 110.884405] PM: Some devices failed to suspend, or early wake event detected Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 17:55:47 -05:00
Jonathan Kim	b9eab9e0aa	drm/amdgpu: update xgmi num links info post gc9.4.2 GC IP 9.4.2 and up support TA reporting of the number of xGMI links between peers. Tested-by: Vignesh Chander <vignesh.chander@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 17:53:16 -05:00
Lijo Lazar	36fd9969fa	drm/amdgpu: Use another offset for GC 9.4.3 remap The legacy region at 0x7F000 maps to valid registers in GC 9.4.3 SOCs. Use 0x1A000 offset instead as MMIO register remap region. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:35 -05:00
Lijo Lazar	92e508eaf3	drm/amdgpu: Read aquavanjaram XGMI register state Add support to read state of XGMI links in aquavanjaram SOC. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:35 -05:00
Lijo Lazar	081a6eda2b	drm/amdgpu: Read aquavanjaram PCIE register state Add support to read aqua vanjaram PCIE register state Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:35 -05:00
Lijo Lazar	af39e6f4d8	drm/amdgpu: Add reg_state sysfs attribute Add reg_state attribute to fetch the register snapshot of different IPs like XGMI, WAFL,PCIE and USR. To get a snapshot for a particular IP 1) Open the sysfs file 2) Seek to the offset as defined in amdgpu_sysfs_reg_offset 3) Read Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:35 -05:00
Alex Deucher	9a5095e785	drm/amdgpu: add amdgpu_reg_state.h This header defines the reg state structures exposed via sysfs for umr debugging. v2: add content type Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>	2023-11-29 16:49:24 -05:00
Candice Li	ca0ad76089	drm/amdgpu: Update EEPROM I2C address for smu v13_0_0 Check smu v13_0_0 SKU type to select EEPROM I2C address. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:23 -05:00
Felix Kuehling	9a1c1339ab	drm/amdkfd: Run restore_workers on freezable WQs Make restore workers freezable so we don't have to explicitly flush them in suspend and GPU reset code paths, and we don't accidentally try to restore BOs while the GPU is suspended. Not having to flush restore_work also helps avoid lock/fence dependencies in the GPU reset case where we're not allowed to wait for fences. A side effect of this is, that we can now have multiple concurrent threads trying to signal the same eviction fence. Rework eviction fence signaling and replacement to account for that. The GPU reset path can no longer rely on restore_process_worker to resume queues because evict/restore workers can run independently of it. Instead call a new restore_process_helper directly. This is an RFC and request for testing. v2: - Reworked eviction fence signaling - Introduced restore_process_helper v3: - Handle unsignaled eviction fences in restore_process_bos Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:23 -05:00
Mario Limonciello	bd1f6a31e7	drm/amd: Enable PCIe PME from D3 When dGPU is put into BOCO it may be in D3cold but still able send PME on display hotplug event. For this to work it must be enabled as wake source from D3. When runpm is enabled use pci_wake_from_d3() to mark wakeup as enabled by default. Cc: stable@vger.kernel.org # 6.1+ Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:23 -05:00
Lu Yao	7b194fdccb	drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init 'didt_rreg' and 'didt_wreg' to 'NULL'. But in func 'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT' lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use these two definitions won't be added for 'AMDGPU_FAMILY_SI'. So, add null pointer judgment before calling. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:23 -05:00
Alex Deucher	ca0b006939	drm/amdgpu: fix AGP addressing when GART is not at 0 This worked by luck if the GART aperture ended up at 0. When we ended up moving GART on some chips, the GART aperture ended up offsetting the AGP address since the resource->start is a GART offset, not an MC address. Fix this by moving the AGP address setup into amdgpu_bo_gpu_offset_no_check(). v2: check mem_type before checking agp v3: check if the ttm bo has a ttm_tt allocated yet Fixes: `67318cb843` ("drm/amdgpu/gmc11: set gart placement GC11") Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Jesse Zhang <Jesse.Zhang@amd.com> Reported-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: christian.koenig@amd.com Cc: mario.limonciello@amd.com	2023-11-29 16:49:22 -05:00
Lijo Lazar	201761b5eb	drm/amdgpu: Move mca debug mode decision to ras Refactor code such that ras block decides the default mca debug mode, and not swsmu block. By default mca debug mode is set to false. v2: squash in uninitialized value fix (Alex) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:01 -05:00
Prike Liang	0e6a12884c	drm/amdgpu: correct the amdgpu runtime dereference usage count Fix the amdgpu runpm dereference usage count. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:00 -05:00
Tim Huang	88f4b10a79	drm/amdgpu: fix memory overflow in the IB test Fix a memory overflow issue in the gfx IB test for some ASICs. At least 20 bytes are needed for the IB test packet. v2: correct code indentation errors. (Christian) Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:00 -05:00
Li Ma	ee95135bfe	drm/amdgpu: add init_registers for nbio v7.11 enable init_registers callback func for nbio v7.11. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:00 -05:00
Alex Sierra	20b07b0cb3	drm/amdgpu: Force order between a read and write to the same address Setting register to force ordering to prevent read/write or write/read hazards for un-cached modes. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:48:59 -05:00
Hawking Zhang	4b8251e019	drm/amdgpu: Do not issue gpu reset from nbio v7_9 bif interrupt In nbio v7_9, host driver should not issu gpu reset Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:48:59 -05:00
Perry Yuan	2e9b152325	drm/amdgpu: optimize RLC powerdown notification on Vangogh The smu needs to get the rlc power down message to sync the rlc state with smu, the rlc state updating message need to be sent at while smu begin suspend sequence , otherwise SMU will crash while RLC state is not notified by driver, and rlc state probally changed after that notification, so it needs to notify rlc state to smu at the end of the suspend sequence in amdgpu_device_suspend() that can make sure the rlc state is correctly set to SMU. [ 101.000590] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 [ 101.000598] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff! [ 110.838026] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 [ 110.838035] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features. [ 110.838039] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features! [ 110.838040] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] ERROR suspend of IP block <smu> failed -62 [ 110.884394] PM: suspend of devices aborted after 21213.620 msecs [ 110.884402] PM: start suspend of devices aborted after 21213.882 msecs [ 110.884405] PM: Some devices failed to suspend, or early wake event detected Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:48:58 -05:00
Hawking Zhang	702e2fb579	drm/amdgpu: Retire query/reset_ras_err_status from gfx_v9_4_3 Not needed anymore. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:48:58 -05:00
Jonathan Kim	35c425f5cc	drm/amdgpu: update xgmi num links info post gc9.4.2 GC IP 9.4.2 and up support TA reporting of the number of xGMI links between peers. Tested-by: Vignesh Chander <vignesh.chander@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:48:45 -05:00
André Almeida	613ecd6563	drm/amd: Document device reset methods Document what each amdgpu driver reset method does. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:23:31 -05:00
Thomas Zimmermann	26b9a880d2	Merge drm/drm-next into drm-misc-next Backmerging to get commit `8d6ef26501` ("drm/ast: Disconnect BMC if physical connector is connected") into drm-misc-next. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2023-11-28 15:32:24 +01:00
Luben Tuikov	38f922a563	drm/sched: Reverse run-queue priority enumeration Reverse run-queue priority enumeration such that the higest priority is now 0, and for each consecutive integer the prioirty diminishes. Run-queues correspond to priorities. To an external observer a scheduler created with a single run-queue, and another created with DRM_SCHED_PRIORITY_COUNT number of run-queues, should always schedule sched->sched_rq[0] with the same "priority", as that index run-queue exists in both schedulers, i.e. a scheduler with one run-queue or many. This patch makes it so. In other words, the "priority" of sched->sched_rq[n], n >= 0, is the same for any scheduler created with any allowable number of run-queues (priorities), 0 to DRM_SCHED_PRIORITY_COUNT. Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-6-ltuikov89@gmail.com	2023-11-24 23:03:53 -05:00
Luben Tuikov	fe375c7480	drm/sched: Rename priority MIN to LOW Rename DRM_SCHED_PRIORITY_MIN to DRM_SCHED_PRIORITY_LOW. This mirrors DRM_SCHED_PRIORITY_HIGH, for a list of DRM scheduler priorities in ascending order, DRM_SCHED_PRIORITY_LOW, DRM_SCHED_PRIORITY_NORMAL, DRM_SCHED_PRIORITY_HIGH, DRM_SCHED_PRIORITY_KERNEL. Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-5-ltuikov89@gmail.com	2023-11-24 23:03:53 -05:00
Daniel Vetter	c79b972eb8	drm-misc-next for 6.8: UAPI Changes: - drm: Introduce CLOSE_FB ioctl - drm/dp-mst: Documentation for the PATH property - fdinfo: Do not align to a MB if the size is larger than 1MiB - virtio-gpu: add explicit virtgpu context debug name Cross-subsystem Changes: - dma-buf: Add dma_fence_timestamp helper Core Changes: - client: Do not acquire module reference - edid: split out drm_eld, add SAD helpers - format-helper: Cache format conversion buffers - sched: Move from a kthread to a workqueue, rename some internal functions to make it clearer, implement dynamic job-flow control - gpuvm: Provide more features to handle GEM objects - tests: Remove slow kunit tests Driver Changes: - ivpu: Update FW API, new debugfs file, a new NOP job submission test mode, improve suspend/resume, PM improvements, MMU PT optimizations, firmware profiling frequency support, support for uncached buffers, switch to gem shmem helpers, replace kthread with threaded interrupts - panfrost: PM improvements - qaic: Allow to run with a single MSI, support host/device time synchronization, misc improvements - simplefb: Support memory-regions, support power-domains - ssd130x: Unitialized variable fixes - omapdrm: dma-fence lockdep annotation fix - tidss: dma-fence lockdep annotation fix - v3d: Support BCM2712 (RaspberryPi5), Support fdinfo and gputop - panel: - edp: Support AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49 V8.0, plus a whole bunch of panels used on Mediatek chromebooks. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCZVc02QAKCRDj7w1vZxhR xRXbAPsHQkuUz2n1XWx/8uExPCn+PmS5Lg6IcbtvqpfuFrGt1QEA+DaZf2yfekFj FgY66UExyddH/2LDHzpKa0WJk6l6vQw= =1HSK -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2023-11-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 6.8: UAPI Changes: - drm: Introduce CLOSE_FB ioctl - drm/dp-mst: Documentation for the PATH property - fdinfo: Do not align to a MB if the size is larger than 1MiB - virtio-gpu: add explicit virtgpu context debug name Cross-subsystem Changes: - dma-buf: Add dma_fence_timestamp helper Core Changes: - client: Do not acquire module reference - edid: split out drm_eld, add SAD helpers - format-helper: Cache format conversion buffers - sched: Move from a kthread to a workqueue, rename some internal functions to make it clearer, implement dynamic job-flow control - gpuvm: Provide more features to handle GEM objects - tests: Remove slow kunit tests Driver Changes: - ivpu: Update FW API, new debugfs file, a new NOP job submission test mode, improve suspend/resume, PM improvements, MMU PT optimizations, firmware profiling frequency support, support for uncached buffers, switch to gem shmem helpers, replace kthread with threaded interrupts - panfrost: PM improvements - qaic: Allow to run with a single MSI, support host/device time synchronization, misc improvements - simplefb: Support memory-regions, support power-domains - ssd130x: Unitialized variable fixes - omapdrm: dma-fence lockdep annotation fix - tidss: dma-fence lockdep annotation fix - v3d: Support BCM2712 (RaspberryPi5), Support fdinfo and gputop - panel: - edp: Support AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49 V8.0, plus a whole bunch of panels used on Mediatek chromebooks. Note that the one missing s-o-b for `0da611a870` ("dma-buf: add dma_fence_timestamp helper") has been supplied here, and rebasing the entire tree with upsetting committers didn't seem worth the trouble: https://lore.kernel.org/dri-devel/ce94020e-a7d4-4799-b87d-fbea7b14a268@gmail.com/ Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/y4awn5vcfy2lr2hpauo7rc4nfpnc6kksr7btmnwaz7zk63pwoi@gwwef5iqpzva	2023-11-20 09:50:09 +01:00
Srinivasan Shanmugam	699d392903	drm/amdgpu: Add function parameter 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb' Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1373: warning: Function parameter or member 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb' Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:30:51 -05:00
Prike Liang	425285d39a	drm/amdgpu: add amdgpu runpm usage trace for separate funcs Add trace for amdgpu runpm separate funcs usage and this will help debugging on the case of runpm usage missed to dereference. In the normal case the runpm usage count referred by one kind of functionality pairwise and usage should be changed from 1 to 0, otherwise there will be an issue in the amdgpu runpm usage dereference. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:30:51 -05:00
Shiwu Zhang	75fb313c55	drm/amdgpu: expose the connected port num info through sysfs By catting the xgmi_port_num sysfs node, it prints out the info in the format of <src node id>:<src port num> -> <dst node id>:<dst port num> for one xgmi link. For example, in case of 4 sockets fully and evenly connected setup, it would be like as below for the first node in the hive. 01:02 -> 02:03 01:03 -> 02:02 01:07 -> 03:04 01:04 -> 03:07 01:06 -> 04:05 01:05 -> 04:06 Based on the fact that there is two xgmi links between each socket pair, "01:02 -> 02:03" means that the current socket in question use the port 2 to connect with port 3 of the second node in the hive and so on. v2: print out the src/dst node id for each xgmi link (lijo) v3: replace the current_node++ with +1 to align with dst node (le) and use the dev_err instead of pr_err (lijo) v4: fix checkpatch warning (alex) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:30:51 -05:00
Mario Limonciello	d9b3a066df	drm/amd: Exclude dGPUs in eGPU enclosures from DPM quirks The PCIe speed capabilities advertised by a USB4 or TBT3 link are limited to PCIe gen 1 per the USB4 spec. In reality the speed will change dynamically based on fabric conditions and other traffic. DPM is disabled when dGPUs are connected directly to Intel hosts since the PCIe root port isn't able to handle dynamic speed switching. As this limitation is specifically for PCIe root ports in the SoC, don't apply it when connected to an eGPU enclosure connected to an Intel host. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2885 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:30:51 -05:00
Mario Limonciello	466a7d1153	drm/amd: Use the first non-dGPU PCI device for BW limits When bandwidth limits are looked up using pcie_bandwidth_available() virtual links such as USB4 are analyzed which might not represent the real speed. Furthermore devices may change speeds autonomously which may introduce conditional variation to the results reported in the status registers. Instead look at the capabilities of first PCI device outside of dGPU to decide upper limits that the dGPU will work at. For eGPU this effectively means that it will use the speed of the link partner. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925#note_2145860 Link: https://www.usb.org/document-library/usb4r-specification-v20 USB4 V2 with Errata and ECN through June 2023 Section 11.2.1 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:30:50 -05:00
Srinivasan Shanmugam	8a1de314d1	drm/amdgpu: Refactor 'amdgpu_connector_dvi_detect' in amdgpu_connectors.c Fixes the below: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: Missing a blank line after declarations WARNING: Too many leading tabs - consider code refactoring + if (list_connector->connector_type != DRM_MODE_CONNECTOR_VGA) { WARNING: Too many leading tabs - consider code refactoring + if (!amdgpu_display_hpd_sense(adev, amdgpu_connector->hpd.hpd)) { Cc: Guchun Chen <guchun.chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:29:54 -05:00
Sam James	b5a52d2afe	amdgpu: Adjust kmalloc_array calls for new -Walloc-size GCC 14 introduces a new -Walloc-size included in -Wextra which errors out on various files in drivers/gpu/drm/amd/amdgpu like: ``` amdgpu_amdkfd_gfx_v8.c:241:15: error: allocation of insufficient size ‘4’ for type ‘uint32_t[2]’ {aka ‘unsigned int[2]'} with size ‘8’ [-Werror=alloc-size] ``` This is because each HQD_N_REGS is actually a uint32_t[2]. Move the * 2 to the size argument so GCC sees we're allocating enough. Originally did 'sizeof(uint32_t) * 2' for the size but a friend suggested 'sizeof(**dump)' better communicates the intent. Link: https://lore.kernel.org/all/87wmuwo7i3.fsf@gentoo.org/ Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:29:54 -05:00
Felix Kuehling	94e2dae0a8	drm/amdkfd: Move TLB flushing logic into amdgpu This will make it possible for amdgpu GEM ioctls to flush TLBs on compute VMs. This removes VMID-based TLB flushing and always uses PASID-based flushing. This still works because it scans the VMID-PASID mapping registers to find the right VMID. It's only slightly less efficient. This is not a production use case. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:29:53 -05:00
Felix Kuehling	e6ed364efa	drm/amdgpu: update mappings not managed by KFD When restoring after an eviction, use amdgpu_vm_handle_moved to update BO VA mappings in KFD VMs that are not managed through the KFD API. This should allow using the render node API to create more flexible memory mappings in KFD VMs. v2: rebase on drm_exec changes (Alex) Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:29:53 -05:00
Arunpravin Paneer Selvam	c8031019dc	drm/amdgpu: Implement a new 64bit sequence memory driver Developed a new driver which allocates a 64bit memory on each request in sequence order. At the moment, user queue fence memory is the main consumer of this seq64 driver. v2: Worked on review comments from Christian for the following modifications - Move driver name from "semaphore" to "seq64" - Remove unnecessary PT/PD mapping - Move enable_mes check into init/fini functions. v3: Worked on review comments from Christian - drop enable_mes check - use DECLARE_BITMAP for bit array - added kerneldoc for seq64 v4: Worked on review comments from Christian - Rename amdgpu_seq64_get name with amdgpu_seq64_alloc v5: Worked on review comments from Christian - Fix seq64 lockdep warning - move fpriv->seq64_va check into amdgpu_seq64_unmap() - make the function amdgpu_seq64_unmap() return as void. - reserve the buffers as not interruptible. v6: port to drm_exec (Alex) v7: disable for now (Arun) Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 09:29:53 -05:00
Alex Deucher	e8c2d3e25b	drm/amdgpu/gmc9: disable AGP aperture We've had misc reports of random IOMMU page faults when this is used. It's just a rarely used optimization anyway, so let's just disable it. It can still be toggled via the module parameter for testing. v2: leave it configurable via module parameter Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1) Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:58:41 -05:00
Alex Deucher	61fc93695b	drm/amdgpu/gmc10: disable AGP aperture We've had misc reports of random IOMMU page faults when this is used. It's just a rarely used optimization anyway, so let's just disable it. It can still be toggled via the module parameter for testing. v2: leave it configurable via module parameter Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1) Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:58:34 -05:00
Alex Deucher	0db062eac3	drm/amdgpu/gmc11: disable AGP aperture We've had misc reports of random IOMMU page faults when this is used. It's just a rarely used optimization anyway, so let's just disable it. It can still be toggled via the module parameter for testing. v2: leave it configurable via module parameter Fixes: `67318cb843` ("drm/amdgpu/gmc11: set gart placement GC11") Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1) Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:58:28 -05:00
Alex Deucher	6ba5b61383	drm/amdgpu: add a module parameter to control the AGP aperture Add a module parameter to control the AGP aperture. The AGP aperture is an aperture in the GPU's internal address space which provides direct non-paged access to the platform address space. This access is non-snooped so only uncached memory can be accessed. Add a knob so that we can toggle this for debugging. Fixes: `67318cb843` ("drm/amdgpu/gmc11: set gart placement GC11") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:58:20 -05:00
Alex Deucher	564ca1b53e	drm/amdgpu/gmc11: fix logic typo in AGP check Should be && rather than \|\|. Fixes: `b2e1cbe628` ("drm/amdgpu/gmc11: disable AGP on GC 11.5") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:58:11 -05:00
Shiwu Zhang	9ddea8c977	drm/amdgpu: add and populate the port num into xgmi topology info The port num info is firstly introduced with 20.00.01.13 xgmi ta and make them as part of topology info. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:53:49 -05:00
Yang Wang	07ee43faeb	drm/amdgpu: fix ras err_data null pointer issue in amdgpu_ras.c fix ras err_data null pointer issue in amdgpu_ras.c Fixes: `8cc0f5669e` ("drm/amdgpu: Support multiple error query modes") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:51:58 -05:00
YuanShang	50d51374b4	drm/amdgpu: correct chunk_ptr to a pointer to chunk. The variable "chunk_ptr" should be a pointer pointing to a struct drm_amdgpu_cs_chunk instead of to a pointer of that. Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:48:41 -05:00
Srinivasan Shanmugam	8a0173cd90	drm/amdgpu: Address member 'ring' not described in 'amdgpu_ vce, uvd_entity_init()' Fixes the following: drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:237: warning: Function parameter or member 'ring' not described in 'amdgpu_vce_entity_init' drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:405: warning: Function parameter or member 'ring' not described in 'amdgpu_uvd_entity_init' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:47:14 -05:00
Le Ma	bdb72185d3	drm/amdgpu: finalizing mem_partitions at the end of GMC v9 sw_fini The valid num_mem_partitions is required during ttm pool fini, thus move the cleanup at the end of the function. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:47:05 -05:00
Victor Lu	0288603040	drm/amdgpu: Do not program VF copy regs in mmhub v1.8 under SRIOV (v2) MC_VM_AGP_* registers should not be programmed by guest driver. v2: move early return outside of loop Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-17 00:46:27 -05:00
Maxime Ripard	3bf3e21c15	Merge drm/drm-next into drm-misc-next Let's kickstart the v6.8 release cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2023-11-15 10:56:44 +01:00
Linus Torvalds	c0d12d7692	drm fixes for 6.7-rc1 - big pile of amd fixes, but mostly hw support newly added in 6.7 - i915 fixes, mostly minor things - qxl memory leak fix - vc4 uaf fix in mock helpers - syncobj fix for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmVOlEQACgkQTA9ye/CY qnGawQ//d7M2CZpd9LvkRnaV2+fG9yGONOwZsG5fVtXrfT4RDQmITC9KMEh2TxJb s1W6HA+UijMMx6RtQN6cTYNHeYDaX2b55/g3lMnreXydii0COJwkWe52iFn0Dpcm RpsT5cYLEiRtiTvEzKbrkxS+rrQMu9jwxcA23b+lMkmybVgqQe1m9hYtRxZCFqr7 6BMKOgrCRoY1mYZrNaccXBHvvgOtcpWPOsuNNgjW3MDKB663BmpABT1iDg3xxPdX BI8SAl6PX6ju2Jwi3WPlscmI199fhcUXDuqb8LXhJsuqynhwU940aUlxcbI7Hz6Y LaFwSK4OiaIsIC8yisa7cZ1z2mqnMIiGXasP6mfYVmYqpGYMW+AZcOmzui/MLiGd duOcvK/bLxN7moSqcKgz+mmrLfZzJkPLV8pGEVk0IbTn239AnR76bqrEQJ+Iqukx d4yLGE4OJchD4zzw5RlVmQIhwA8M/5croJBIo6yNyQB5xgN/Krd4QUc6KjjgzNo8 e402NjaW2/PqWIiPtsL5tK4XdkVtvMvVUq+bJSch6Wfn3j9u06/wgFl1FBRyX7zS 3QYvMtF/QM5/QTpbdl82hSSsJ78iO62tPYSOhycIxgB/BHoc/fap+IOFtKKnT3RZ xr7UiwAQA043gAMvS+TkZAc6bFW8U8Dzxu5XxEPE1L+WU6XCSbs= =l45e -----END PGP SIGNATURE----- Merge tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Daniel Vetter: "Dave's VPN to the big machine died, so it's on me to do fixes pr this and next week while everyone else is at plumbers. - big pile of amd fixes, but mostly for hw support newly added in 6.7 - i915 fixes, mostly minor things - qxl memory leak fix - vc4 uaf fix in mock helpers - syncobj fix for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE" * tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm: (78 commits) drm/amdgpu: fix error handling in amdgpu_vm_init drm/amdgpu: Fix possible null pointer dereference drm/amdgpu: move UVD and VCE sched entity init after sched init drm/amdgpu: move kfd_resume before the ip late init drm/amd: Explicitly check for GFXOFF to be enabled for s0ix drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2) drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5) drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query support drm/amdgpu: fix software pci_unplug on some chips drm/amd/display: remove duplicated argument drm/amdgpu: correct mca debugfs dump reg list drm/amdgpu: correct acclerator check architecutre dump drm/amdgpu: add pcs xgmi v6.4.0 ras support drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3 drm/amdgpu: disable smu v13.0.6 mca debug mode by default drm/amdgpu: Support multiple error query modes drm/amdgpu: refine smu v13.0.6 mca dump driver drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2) drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV drm: amd: Resolve Sphinx unexpected indentation warning ...	2023-11-10 14:59:30 -08:00
Daniel Vetter	03df0fc007	amd-drm-next-6.7-2023-11-10: amdgpu: - SR-IOV fixes - DMCUB fixes - DCN3.5 fixes - DP2 fixes - SubVP fixes - SMU14 fixes - SDMA4.x fixes - Suspend/resume fixes - AGP regression fix - UAF fixes for some error cases - SMU 13.0.6 fixes - Documentation fixes - RAS fixes - Hotplug fixes - Scheduling entity ordering fix - GPUVM fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZU58aAAKCRC93/aFa7yZ 2PvJAQDF1IHj90BAqH3EzOx7p2jkGVeK1p+em2sS051kOvpgiAD/fvZovVUBmt/V tD0NOtkL8bqmIavP3vDV0Yvf9tW48Qs= =Z4Je -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.7-2023-11-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.7-2023-11-10: amdgpu: - SR-IOV fixes - DMCUB fixes - DCN3.5 fixes - DP2 fixes - SubVP fixes - SMU14 fixes - SDMA4.x fixes - Suspend/resume fixes - AGP regression fix - UAF fixes for some error cases - SMU 13.0.6 fixes - Documentation fixes - RAS fixes - Hotplug fixes - Scheduling entity ordering fix - GPUVM fixes Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110190703.4741-1-alexander.deucher@amd.com	2023-11-10 20:51:38 +01:00
Christian König	8473bfdcb5	drm/amdgpu: fix error handling in amdgpu_vm_init When clearing the root PD fails we need to properly release it again. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-10 11:33:28 -05:00
Felix Kuehling	256503071c	drm/amdgpu: Fix possible null pointer dereference mem = bo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: `1802537820` ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-10 11:33:28 -05:00
Alex Deucher	037b98a231	drm/amdgpu: move UVD and VCE sched entity init after sched init We need kernel scheduling entities to deal with handle clean up if apps are not cleaned up properly. With commit `56e449603f` ("drm/sched: Convert the GPU scheduler to variable number of run-queues") the scheduler entities have to be created after scheduler init, so change the ordering to fix this. v2: Leave logic in UVD and VCE code Fixes: `56e449603f` ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: ltuikov89@gmail.com	2023-11-10 11:33:08 -05:00
Tim Huang	8ed79c409e	drm/amdgpu: move kfd_resume before the ip late init The kfd_resume needs to touch GC registers to enable the interrupts, it needs to be done before GFXOFF is enabled to ensure that the GFX is not off and GC registers can be touched. So move kfd_resume before the amdgpu_device_ip_late_init which enables the CGPG/GFXOFF. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-10 11:08:33 -05:00
Mario Limonciello	e4c44b1a19	drm/amd: Explicitly check for GFXOFF to be enabled for s0ix If a user has disabled GFXOFF this may cause problems for the suspend sequence. Ensure that it is enabled in amdgpu_acpi_is_s0ix_active(). The system won't reach the deepest state but it also won't hang. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-10 11:08:20 -05:00
Daniel Vetter	aec3e2e23b	Merge tag 'drm-misc-fixes-2023-11-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-fixes for v6.7-rc1: qxl: - qxl memory leak fix. syncobj: - Fix waiting for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE vc4: - Fix UAF in mock helpers Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> [sima: Stitch together both changelogs from Maarten. Also because of branch history this contains a few more bugfixes which are already in v6.6, but I didn't feel like this justifies some backmerge since there wasn't any real conflict.] Link: https://patchwork.freedesktop.org/patch/msgid/bc8598ee-d427-4616-8ebd-64107ab9a2d8@linux.intel.com	2023-11-10 16:57:49 +01:00
Danilo Krummrich	a78422e9df	drm/sched: implement dynamic job-flow control Currently, job flow control is implemented simply by limiting the number of jobs in flight. Therefore, a scheduler is initialized with a credit limit that corresponds to the number of jobs which can be sent to the hardware. This implies that for each job, drivers need to account for the maximum job size possible in order to not overflow the ring buffer. However, there are drivers, such as Nouveau, where the job size has a rather large range. For such drivers it can easily happen that job submissions not even filling the ring by 1% can block subsequent submissions, which, in the worst case, can lead to the ring run dry. In order to overcome this issue, allow for tracking the actual job size instead of the number of jobs. Therefore, add a field to track a job's credit count, which represents the number of credits a job contributes to the scheduler's credit limit. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com	2023-11-10 02:54:29 +01:00
Victor Lu	1972642843	drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2) W/RREG32_RLC is hardedcoded to use instance 0. W/RREG32_SOC15_RLC should be used instead when inst != 0. v2: rebase Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:03:16 -05:00
Victor Lu	85150626ea	drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5) amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0. Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter. Using amdgpu_sriov_runtime to determine whether to access via kiq or RLC is sufficient for now. v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call v4: avoid using amdgpu_sriov_w/rreg v3: use W/RREG32_XCC to handle non-kiq case v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters of amdgpu_device_wreg/rreg Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:03:07 -05:00
Yang Wang	76d2da18af	drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query support add pcs xgmi ras error query support for smu v13.0.6. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:02:59 -05:00
Vitaly Prosyak	4638e0c29a	drm/amdgpu: fix software pci_unplug on some chips When software 'pci unplug' using IGT is executed we got a sysfs directory entry is NULL for differant ras blocks like hdp, umc, etc. Before call 'sysfs_remove_file_from_group' and 'sysfs_remove_group' check that 'sd' is not NULL. [ +0.000001] RIP: 0010:sysfs_remove_group+0x83/0x90 [ +0.000002] Code: 31 c0 31 d2 31 f6 31 ff e9 9a a8 b4 00 4c 89 e7 e8 f2 a2 ff ff eb c2 49 8b 55 00 48 8b 33 48 c7 c7 80 65 94 82 e8 cd 82 bb ff <0f> 0b eb cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 [ +0.000001] RSP: 0018:ffffc90002067c90 EFLAGS: 00010246 [ +0.000002] RAX: 0000000000000000 RBX: ffffffff824ea180 RCX: 0000000000000000 [ +0.000001] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000001] RBP: ffffc90002067ca8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ +0.000001] R13: ffff88810a395f48 R14: ffff888101aab0d0 R15: 0000000000000000 [ +0.000001] FS: 00007f5ddaa43a00(0000) GS:ffff88841e800000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f8ffa61ba50 CR3: 0000000106432000 CR4: 0000000000350ef0 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000001] ? show_regs+0x72/0x90 [ +0.000002] ? sysfs_remove_group+0x83/0x90 [ +0.000002] ? __warn+0x8d/0x160 [ +0.000001] ? sysfs_remove_group+0x83/0x90 [ +0.000001] ? report_bug+0x1bb/0x1d0 [ +0.000003] ? handle_bug+0x46/0x90 [ +0.000001] ? exc_invalid_op+0x19/0x80 [ +0.000002] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000003] ? sysfs_remove_group+0x83/0x90 [ +0.000001] dpm_sysfs_remove+0x61/0x70 [ +0.000002] device_del+0xa3/0x3d0 [ +0.000002] ? ktime_get_mono_fast_ns+0x46/0xb0 [ +0.000002] device_unregister+0x18/0x70 [ +0.000001] i2c_del_adapter+0x26d/0x330 [ +0.000002] arcturus_i2c_control_fini+0x25/0x50 [amdgpu] [ +0.000236] smu_sw_fini+0x38/0x260 [amdgpu] [ +0.000241] amdgpu_device_fini_sw+0x116/0x670 [amdgpu] [ +0.000186] ? mutex_lock+0x13/0x50 [ +0.000003] amdgpu_driver_release_kms+0x16/0x40 [amdgpu] [ +0.000192] drm_minor_release+0x4f/0x80 [drm] [ +0.000025] drm_release+0xfe/0x150 [drm] [ +0.000027] __fput+0x9f/0x290 [ +0.000002] ____fput+0xe/0x20 [ +0.000002] task_work_run+0x61/0xa0 [ +0.000002] exit_to_user_mode_prepare+0x150/0x170 [ +0.000002] syscall_exit_to_user_mode+0x2a/0x50 Cc: Hawking Zhang <hawking.zhang@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:02:49 -05:00
Yang Wang	8140b07b0a	drm/amdgpu: correct mca debugfs dump reg list avoid driver to touch invalid mca reg. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:02:32 -05:00
Hawking Zhang	d406aec8dc	drm/amdgpu: correct acclerator check architecutre dump So driver doesn't touch invalid aca entries. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:02:26 -05:00
Yang Wang	27d80f7d68	drm/amdgpu: add pcs xgmi v6.4.0 ras support add pcs xgmi v6.4.0 ras support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:02:20 -05:00
David Yat Sin	4abf0b0bdf	drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3 Change local memory type to MTYPE_UC on revision id 0 Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:02:14 -05:00
Hawking Zhang	8cc0f5669e	drm/amdgpu: Support multiple error query modes Direct error query mode and firmware error query mode are supported for now. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:01:58 -05:00
Yang Wang	07c1db7036	drm/amdgpu: refine smu v13.0.6 mca dump driver refine smu mca driver to support query ras error from pmfw path. - correct gfx smu bank hwid (from mp5 to smu bank) - retire unused callback function in amdgpu_mca_smu_funcs{} - add new mca_bank_set{} structure to collect mca bank - move enum mca_reg_idx into amdgpu_mca.h header - add mca status register field decode macro Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:01:51 -05:00
Victor Lu	0b1695710a	drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2) The following regs can only be programmed by the PF: HDP_MISC_CNTL HDP_NONSURFACE_BASE HDP_NONSURFACE_BASE_HI v2: update commit message Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:01:42 -05:00
Victor Lu	a78b481469	drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV PCTL0_MMHUB_DEEPSLEEP_IB is blocked for VF access Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:01:35 -05:00
Yang Wang	bf13da6ae1	drm/amdgpu: correct smu v13.0.6 umc ras error check correct smu v13.0.0 umc ras error check Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:01:20 -05:00
Victor Lu	bc3c566071	drm/amdgpu: Add xcc param to SRIOV kiq write and WREG32_SOC15_IP_NO_KIQ (v4) WREG32/RREG32_SOC15_IP_NO_KIQ and amdgpu_virt_kiq_reg_write_reg_wait are not using the correct rlcg interface or mec engine, respectively. Add xcc instance parameter to them. v4: Use GET_INST and squash commit with: "drm/amdgpu: Add xcc_inst param to amdgpu_virt_kiq_reg_write_reg_wait" v3: xcc not needed for MMMHUB v2: rebase Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:01:10 -05:00
Victor Lu	f64c3fce46	drm/amdgpu: Add flag to enable indirect RLCG access for gfx v9.4.3 The "rlcg_reg_access_supported" flag is missing. Add it back in. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:01:01 -05:00
Yang Wang	4eaa007c73	drm/amdgpu: correct amdgpu ip block rev info correct following amdgpu ip block version information: - gfx_v9_4_3 - sdma_v4_4_2 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:00:48 -05:00
Tao Zhou	61d7052216	drm/amdgpu: Don't warn for unsupported set_xgmi_plpd_mode set_xgmi_plpd_mode may be unsupported and this isn't error, no need to print warning for it. v2: add ret2 to save the status of psp_ras_trigger_error. Suggested-by: lijo.lazar@amd.com Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-09 17:00:32 -05:00
Christian König	17daf01ab4	drm/amdgpu: lower CS errors to debug severity Otherwise userspace can spam the logs by using incorrect input values. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-09 17:00:17 -05:00
Christian König	12f76050d8	drm/amdgpu: fix error handling in amdgpu_bo_list_get() We should not leak the pointer where we couldn't grab the reference on to the caller because it can be that the error handling still tries to put the reference then. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-09 16:59:57 -05:00
Alex Deucher	bff3315ba8	drm/amdgpu: fix AGP init order The default AGP settings were overwriting the IP selected ones since the default was getting set after the IP ones were selected. Fixes: `de59b69932` ("drm/amdgpu/gmc: set a default disable value for AGP") Link: https://lists.freedesktop.org/archives/amd-gfx/2023-November/100966.html Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>	2023-11-09 16:59:46 -05:00
Linus Torvalds	25b6377007	drm next and fixes for 6.7-rc1 renesas: - atomic conversion - DT support ssd13xx: - dt binding fix for ssd132x - Initialize ssd130x crtc_state to NULL. amdgpu: - Fix RAS support check - RAS fixes - MES fixes - SMU13 fixes - Contiguous memory allocation fix - BACO fixes - GPU reset fixes - Min power limit fixes - GFX11 fixes - USB4/TB hotplug fixes - ARM regression fix - GFX9.4.3 fixes - KASAN/KCSAN stack size check fixes - SR-IOV fixes - SMU14 fixes - PSP13 fixes - Display blend fixes - Flexible array size fixes amdkfd: - GPUVM fix radeon: - Flexible array size fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmVJmdsACgkQDHTzWXnE hr5hphAAoFdk4ma7TauUNyP3JoUwht+Ohm9NcUHq/9P9kOCwPIIehxMZnwPoTGyw VWwXpqpeVsW6zyMgfxWq/+P1S1C5LvZ1HLccbP3xv327fUZ1QnLapHwFxT1SYNpi Tw5qhQN/cwNOX0Pc9uBavYmJzf54OhvPxt2CHHPDShsHBOBc0Gd88gKJr7GWUF5M Ri6i20Tfsgq8AopWQj9628TT+y/aN3rVIfYYBiNejxejlFtt1HODkKFX3DuBNPOI bZOtQm11cbxmX7/2RI92mI20axUb4UMNIQFDYEl3bVlyPyEhhwKPQMSDxhQQodPg 8zMY9Fbl4Z4VaOFEDbpiRv/0/HeWoLefmpQ5LZbz35RhKLTkwsWXHUELPUEj3uTr 7+EnLwuvQdtbT9W2J8btO7v1dHOy86ArlnmqNjB2cEnvGaR3DNM4jxPVLn60SyAc N7CFWNU4EoJf02XhwAludYa5pQHEaTFL8ss6TSoRWXuHHg1vNdu2SorOvkcGkg28 q/t28gZDZaOpXfMmf3ec+PHhO6nxXLXJRbiksVP/rpXQQ42cEI72hr6//UYLRuzg BbYPZ8uXmDjsSZIqYceZwc3Vr6oEmD6EAzHM9+zwS5h0IZ12jKTmFiDEAhG2DwoG 8PCaj5UXkxVK+6iHndz8Qwg4+Fu1j5nodKM+vBNem/iSnxC/OVw= =TsFN -----END PGP SIGNATURE----- Merge tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm Pull more drm updates from Dave Airlie: "Geert pointed out I missed the renesas reworks in my main pull, so this pull contains the renesas next work for atomic conversion and DT support. It also contains a bunch of amdgpu and some small ssd13xx fixes. renesas: - atomic conversion - DT support ssd13xx: - dt binding fix for ssd132x - Initialize ssd130x crtc_state to NULL. amdgpu: - Fix RAS support check - RAS fixes - MES fixes - SMU13 fixes - Contiguous memory allocation fix - BACO fixes - GPU reset fixes - Min power limit fixes - GFX11 fixes - USB4/TB hotplug fixes - ARM regression fix - GFX9.4.3 fixes - KASAN/KCSAN stack size check fixes - SR-IOV fixes - SMU14 fixes - PSP13 fixes - Display blend fixes - Flexible array size fixes amdkfd: - GPUVM fix radeon: - Flexible array size fixes" * tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm: (83 commits) drm/amd/display: Enable fast update on blendTF change drm/amd/display: Fix blend LUT programming drm/amd/display: Program plane color setting correctly drm/amdgpu: Query and report boot status drm/amdgpu: Add psp v13 function to query boot status drm/amd/swsmu: remove fw version check in sw_init. drm/amd/swsmu: update smu v14_0_0 driver if and metrics table drm/amdgpu: Add C2PMSG_109/126 reg field shift/masks drm/amdgpu: Optimize the asic type fix code drm/amdgpu: fix GRBM read timeout when do mes_self_test drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count drm/amd/pm: only check sriov vf flag once when creating hwmon sysfs drm/amdgpu: Attach eviction fence on alloc drm/amdkfd: Improve amdgpu_vm_handle_moved drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2 drm/amd/display: Avoid NULL dereference of timing generator drm/amdkfd: Update cache info for GFX 9.4.3 drm/amdkfd: Populate cache info for GFX 9.4.3 drm/amdgpu: don't put MQDs in VRAM on ARM \| ARM64 drm/amdgpu/smu13: drop compute workload workaround ...	2023-11-07 17:10:02 -08:00
Tao Zhou	20238a2cc9	drm/amdgpu: add RAS reset/query operations for XGMI v6_4 Reset/query RAS error status and count. v2: use XGMI IP version instead of WAFL version. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-07 12:03:31 -05:00
Tao Zhou	61fe5536d0	drm/amdgpu: handle extra UE register entries for gfx v9_4_3 The UE registe list is larger than CE list. Reported-by: yipeng.chai@amd.com Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-07 12:03:31 -05:00
Lijo Lazar	0553eb9f33	drm/amdgpu: Fix sdma 4.4.2 doorbell rptr/wptr init Doorbell rptr/wptr can be set through multiple ways including direct register initialization. Disable doorbell during hw_fini once the ring is disabled so that during next module reload direct initialization takes effect. Also, move the direct initialization after minor update is set to 1 since rptr/wptr are reinitialized back to 0 which could be lower than the previous doorbell value (ex: cases like module reload). Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Tested-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-07 12:03:30 -05:00
Jiadong Zhu	9c561ca2d3	drm/amdgpu/soc21: add mode2 asic reset for SMU IP v14.0.0 Set the default reset method to mode2 for SMU IP v14.0.0 Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-07 12:03:30 -05:00
Surbhi Kakarya	9256e8d47a	drm/amd: Disable XNACK on SRIOV environment The purpose of this patch is to disable XNACK or set XNACK OFF mode on SRIOV platform which doesn't support it. This will prevent user-space application to fail or result into unexpected behaviour whenever the application need to run test-case in XNACK ON mode. Signed-off-by: Surbhi Kakarya <surbhi.kakarya@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-07 11:15:37 -05:00
Hawking Zhang	23618280cc	drm/amdgpu: Query and report boot status Query boot status and report boot errors. A follow up change is needed to stop GPU initialization if boot fails. v2: only invoke the call for dGPU (Le/Lijo) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-03 12:18:33 -04:00
Hawking Zhang	df57e019d5	drm/amdgpu: Add psp v13 function to query boot status Add psp v13 function to query boot status. v2: limit the use case to dGPU only (Lijo) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-03 12:18:33 -04:00
Ma Jun	dbab63561b	drm/amdgpu: Optimize the asic type fix code Use a new struct array to define the asic information which asic type needs to be fixed. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-03 12:18:32 -04:00
Tim Huang	36e7ff5c13	drm/amdgpu: fix GRBM read timeout when do mes_self_test Use a proper MEID to make sure the CP_HQD_* and CP_GFX_HQD_* registers can be touched when initialize the compute and gfx mqd in mes_self_test. Otherwise, we expect no response from CP and an GRBM eventual timeout. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-03 12:18:32 -04:00
Tao Zhou	18eae367cb	drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count Handle xgmi hive case. Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-03 12:18:32 -04:00
Felix Kuehling	0e2e7c5b3d	drm/amdgpu: Attach eviction fence on alloc Instead of attaching the eviction fence when a KFD BO is first mapped, attach it when it is allocated or imported. This in preparation to allow KFD BOs to be mapped using the render node API. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-03 12:18:32 -04:00
Felix Kuehling	5a104cb97c	drm/amdkfd: Improve amdgpu_vm_handle_moved Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by the caller. This will be useful for handling extra BO VA mappings in KFD VMs that are managed through the render node API. v2: rebase against drm_exec changes (Alex) Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-03 12:18:32 -04:00
Alex Deucher	ba0fb4b48c	drm/amdgpu: don't put MQDs in VRAM on ARM \| ARM64 Issues were reported with commit `1cfb4d6121` ("drm/amdgpu: put MQDs in VRAM") on an ADLINK Ampere Altra Developer Platform (AVA developer platform). Various ARM systems seem to have problems related to PCIe and MMIO access. In this case, I'm not sure if this is specific to the ADLINK platform or ARM in general. Seems to be some coherency issue with VRAM. For now, just don't put MQDs in VRAM on ARM. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100453.html Fixes: `1cfb4d6121` ("drm/amdgpu: put MQDs in VRAM") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: alexey.klimov@linaro.org	2023-11-03 11:59:51 -04:00
Alex Deucher	3938eb956e	drm/amdgpu: add a retry for IP discovery init AMD dGPUs have integrated FW that runs as soon as the device gets power and initializes the board (determines the amount of memory, provides configuration details to the driver, etc.). For direct PCIe attached cards this happens as soon as power is applied and normally completes well before the OS has even started loading. However, with hotpluggable ports like USB4, the driver needs to wait for this to complete before initializing the device. This normally takes 60-100ms, but could take longer on some older boards periodically due to memory training. Retry for up to a second. In the non-hotplug case, there should be no change in behavior and this should complete on the first try. v2: adjust test criteria v3: adjust checks for the masks, only enable on removable devices v4: skip bif_fb_en check Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-03 11:59:51 -04:00
Perry Yuan	886b92f635	drm/amdgpu: ungate power gating when system suspend [Why] During suspend, if GFX DPM is enabled and GFXOFF feature is enabled the system may get hung. So, it is suggested to disable GFXOFF feature during suspend and enable it after resume. [How] Update the code to disable GFXOFF feature during suspend and enable it after resume. [ 311.396526] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 [ 311.396530] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features! [ 311.396531] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] ERROR suspend of IP block <smu> failed -62 Acked-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Kun Liu <kun.liu2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-03 11:59:51 -04:00
Alex Deucher	7b1c6263ea	drm/amdgpu: don't use pci_is_thunderbolt_attached() It's only valid on Intel systems with the Intel VSEC. Use dev_is_removable() instead. This should do the right thing regardless of the platform. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-03 11:59:44 -04:00
Alex Deucher	432e664e7c	drm/amdgpu: don't use ATRM for external devices The ATRM ACPI method is for fetching the dGPU vbios rom image on laptops and all-in-one systems. It should not be used for external add in cards. If the dGPU is thunderbolt connected, don't try ATRM. v2: pci_is_thunderbolt_attached only works for Intel. Use pdev->external_facing instead. v3: dev_is_removable() seems to be what we want Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-11-03 11:59:13 -04:00
Alex Deucher	b3c942bb6c	drm/amdgpu/gfx10,11: use memcpy_to/fromio for MQDs Since they were moved to VRAM, we need to use the IO variants of memcpy. Fixes: `1cfb4d6121` ("drm/amdgpu: put MQDs in VRAM") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-03 11:38:19 -04:00
Tao Zhou	f7aeee7346	drm/amdgpu: use mode-2 reset for RAS poison consumption Switch from mode-1 reset to mode-2 for poison consumption. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-03 11:38:13 -04:00
Lin.Cao	b77cc85bdb	drm/amdgpu doorbell range should be set when gpu recovery GFX doorbell range should be set after flr otherwise the gfx doorbell range will be overlap with MEC. v2: remove "amdgpu_sriov_vf" and "amdgpu_in_reset" check, and add grbm select for the case of 2 gfx rings. Signed-off-by: Lin.Cao <lincao12@amd.com> Acked-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-03 11:38:04 -04:00
Linus Torvalds	27beb3ca34	pci-v6.7-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmVBaU8UHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vwEdxAAo++s98+ZaaTdUuoV0Zpft1fuY6Yr mR80jUDxjHDbcI1G4iNVUSWG6pGIdlURnrBp5kU74FV9R2Ps3Fl49XQUHowE0HfH D/qmihiJQdnMsQKwzw3XGoTSINrDcF6nLafl9brBItVkgjNxfxSEbnweJMBf+Boc rpRXHzxbVHVjwwhBLODF2Wt/8sQ24w9c+wcQkpo7im8ZZReoigNMKgEa4J7tLlqA vTyPR/K6QeU8IBUk2ObCY3GeYrVuqi82eRK3Uwzu7IkQwA9orE416Okvq3Z026/h TUAivtrcygHaFRdGNvzspYLbc2hd2sEXF+KKKb6GNAjxuDWUhVQW4ObY4FgFkZ65 Gqz/05D6c1dqTS3vTxp3nZYpvPEbNnO1RaGRL4h0/mbU+QSPSlHXWd9Lfg6noVVd 3O+CcstQK8RzMiiWLeyctRPV5XIf7nGVQTJW5aCLajlHeJWcvygNpNG4N57j/hXQ gyEHrz3idXXHXkBKmyWZfre6YpLkxZtKyONZDHWI/AVhU0TgRdJWmqpRfC1kVVUe IUWBRcPUF4/r3jEu6t10N/aDWQN1uQzIsJNnCrKzAddPDTTYQJk8VVzKPo8SVxPD X+OjEMgBB/fXUfkJ7IMwgYnWaFJhxthrs6/3j1UqRvGYRoulE4NdWwJDky9UYIHd qV3dzuAxC/cpv08= =G//C -----END PGP SIGNATURE----- Merge tag 'pci-v6.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Use acpi_evaluate_dsm_typed() instead of open-coding _DSM evaluation to learn device characteristics (Andy Shevchenko) - Tidy multi-function header checks using new PCI_HEADER_TYPE_MASK definition (Ilpo Järvinen) - Simplify config access error checking in various drivers (Ilpo Järvinen) - Use pcie_capability_clear_word() (not pcie_capability_clear_and_set_word()) when only clearing (Ilpo Järvinen) - Add pci_get_base_class() to simplify finding devices using base class only (ignoring subclass and programming interface) (Sui Jingfeng) - Add pci_is_vga(), which includes ancient PCI_CLASS_NOT_DEFINED_VGA devices from before the Class Code was added to PCI (Sui Jingfeng) - Use pci_is_vga() for vgaarb, sysfs "boot_vga", virtio, qxl to include ancient VGA devices (Sui Jingfeng) Resource management: - Make pci_assign_unassigned_resources() non-init because sparc uses it after init (Randy Dunlap) Driver binding: - Retain .remove() and .probe() callbacks (previously __init) because sysfs may cause them to be called later (Uwe Kleine-König) - Prevent xHCI driver from claiming AMD VanGogh USB3 DRD device, so it can be claimed by dwc3 instead (Vicki Pfau) PCI device hotplug: - Add Ampere Altra Attention Indicator extension driver for acpiphp (D Scott Phillips) Power management: - Quirk VideoPropulsion Torrent QN16e with longer delay after reset (Lukas Wunner) - Prevent users from overriding drivers that say we shouldn't use D3cold (Lukas Wunner) - Avoid PME from D3hot/D3cold for AMD Rembrandt and Phoenix USB4 because wakeup interrupts from those states don't work if amd-pmc has put the platform in a hardware sleep state (Mario Limonciello) IOMMU: - Disable ATS for Intel IPU E2000 devices with invalidation message endianness erratum (Bartosz Pawlowski) Error handling: - Factor out interrupt enable/disable into helpers (Kai-Heng Feng) Peer-to-peer DMA: - Fix flexible-array usage in struct pci_p2pdma_pagemap in case we ever use pagemaps with multiple entries (Gustavo A. R. Silva) ASPM: - Revert a change that broke when drivers disabled L1 and users later enabled an L1.x substate via sysfs, and fix a similar issue when users disabled L1 via sysfs (Heiner Kallweit) Endpoint framework: - Fix double free in __pci_epc_create() (Dan Carpenter) - Use IS_ERR_OR_NULL() to simplify endpoint core (Ruan Jinjie) Cadence PCIe controller driver: - Drop unused "is_rc" member (Li Chen) Freescale Layerscape PCIe controller driver: - Enable 64-bit addressing in endpoint mode (Guanhua Gao) Intel VMD host bridge driver: - Fix multi-function header check (Ilpo Järvinen) Microsoft Hyper-V host bridge driver: - Annotate struct hv_dr_state with __counted_by (Kees Cook) NVIDIA Tegra194 PCIe controller driver: - Drop setting of LNKCAP_MLW (max link width) since dw_pcie_setup() already does this via dw_pcie_link_set_max_link_width() (Yoshihiro Shimoda) Qualcomm PCIe controller driver: - Use PCIE_SPEED2MBS_ENC() to simplify encoding of link speed (Manivannan Sadhasivam) - Add a .write_dbi2() callback so DBI2 register writes, e.g., for setting the BAR size, work correctly (Manivannan Sadhasivam) - Enable ASPM for platforms that use 1.9.0 ops, because the PCI core doesn't enable ASPM states that haven't been enabled by the firmware (Manivannan Sadhasivam) Renesas R-Car Gen4 PCIe controller driver: - Add DesignWare core support (set max link width, EDMA_UNROLL flag, .pre_init(), .deinit(), etc) for use by R-Car Gen4 driver (Yoshihiro Shimoda) - Add driver and DT schema for DesignWare-based Renesas R-Car Gen4 controller in both host and endpoint mode (Yoshihiro Shimoda) Xilinx NWL PCIe controller driver: - Update ECAM size to support 256 buses (Thippeswamy Havalige) - Stop setting bridge primary/secondary/subordinate bus numbers, since PCI core does this (Thippeswamy Havalige) Xilinx XDMA controller driver: - Add driver and DT schema for Zynq UltraScale+ MPSoCs devices with Xilinx XDMA Soft IP (Thippeswamy Havalige) Miscellaneous: - Use FIELD_GET()/FIELD_PREP() to simplify and reduce use of _SHIFT macros (Ilpo Järvinen, Bjorn Helgaas) - Remove logic_outb(), _outw(), outl() duplicate declarations (John Sanpe) - Replace unnecessary UTF-8 in Kconfig help text because menuconfig doesn't render it correctly (Liu Song)" * tag 'pci-v6.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (102 commits) PCI: qcom-ep: Add dedicated callback for writing to DBI2 registers PCI: Simplify pcie_capability_clear_and_set_word() to ..._clear_word() PCI: endpoint: Fix double free in __pci_epc_create() PCI: xilinx-xdma: Add Xilinx XDMA Root Port driver dt-bindings: PCI: xilinx-xdma: Add schemas for Xilinx XDMA PCIe Root Port Bridge PCI: xilinx-cpm: Move IRQ definitions to a common header PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses PCI: xilinx-nwl: Rename the NWL_ECAM_VALUE_DEFAULT macro dt-bindings: PCI: xilinx-nwl: Modify ECAM size in the DT example PCI: xilinx-nwl: Remove redundant code that sets Type 1 header fields PCI: hotplug: Add Ampere Altra Attention Indicator extension driver PCI/AER: Factor out interrupt toggling into helpers PCI: acpiphp: Allow built-in drivers for Attention Indicators PCI/portdrv: Use FIELD_GET() PCI/VC: Use FIELD_GET() PCI/PTM: Use FIELD_GET() PCI/PME: Use FIELD_GET() PCI/ATS: Use FIELD_GET() PCI/ATS: Show PASID Capability register width in bitmasks PCI/ASPM: Fix L1 substate handling in aspm_attr_store_common() ...	2023-11-02 14:05:18 -10:00
Matthew Brost	a6149f0393	drm/sched: Convert drm scheduler to use a work queue rather than kthread In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1 mapping between a drm_gpu_scheduler and drm_sched_entity. At first this seems a bit odd but let us explain the reasoning below. 1. In Xe the submission order from multiple drm_sched_entity is not guaranteed to be the same completion even if targeting the same hardware engine. This is because in Xe we have a firmware scheduler, the GuC, which allowed to reorder, timeslice, and preempt submissions. If a using shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls apart as the TDR expects submission order == completion order. Using a dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. 2. In Xe submissions are done via programming a ring buffer (circular buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow control on the ring for free. A problem with this design is currently a drm_gpu_scheduler uses a kthread for submission / job cleanup. This doesn't scale if a large number of drm_gpu_scheduler are used. To work around the scaling issue, use a worker rather than kthread for submission / job cleanup. v2: - (Rob Clark) Fix msm build - Pass in run work queue v3: - (Boris) don't have loop in worker v4: - (Tvrtko) break out submit ready, stop, start helpers into own patch v5: - (Boris) default to ordered work queue v6: - (Luben / checkpatch) fix alignment in msm_ringbuffer.c - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue - (Luben) Update comment for drm_sched_wqueue_enqueue - (Luben) Positive check for submit_wq in drm_sched_init - (Luben) s/alloc_submit_wq/own_submit_wq v7: - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue v8: - (Luben) Adjust var names / comments Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-01 17:29:21 -04:00
Matthew Brost	35963cf2cd	drm/sched: Add drm_sched_wqueue_* helpers Add scheduler wqueue ready, stop, and start helpers to hide the implementation details of the scheduler from the drivers. v2: - s/sched_wqueue/sched_wqueue (Luben) - Remove the extra white line after the return-statement (Luben) - update drm_sched_wqueue_ready comment (Luben) Cc: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-01 17:29:20 -04:00
Linus Torvalds	7d461b291e	drm for 6.7-rc1 kernel: - add initial vmemdup-user-array core: - fix platform remove() to return void - drm_file owner updated to reflect owner - move size calcs to drm buddy allocator - let GPUVM build as a module - allow variable number of run-queues in scheduler edid: - handle bad h/v sync_end in EDIDs panfrost: - add Boris as maintainer fbdev: - use fb_ops helpers more - only allow logo use from fbcon - rename fb_pgproto to pgprot_framebuffer - add HPD state to drm_connector_oob_hotplug_event - convert to fbdev i/o mem helpers i915: - Enable meteorlake by default - Early Xe2 LPD/Lunarlake display enablement - Rework subplatforms into IP version checks - GuC based TLB invalidation for Meteorlake - Display rework for future Xe driver integration - LNL FBC features - LNL display feature capability reads - update recommended fw versions for DG2+ - drop fastboot module parameter - added deviceid for Arrowlake-S - drop preproduction workarounds - don't disable preemption for resets - cleanup inlines in headers - PXP firmware loading fix - Fix sg list lengths - DSC PPS state readout/verification - Add more RPL P/U PCI IDs - Add new DG2-G12 stepping - DP enhanced framing support to state checker - Improve shared link bandwidth management - stop using GEM macros in display code - refactor related code into display code - locally enable W=1 warnings - remove PSR watchdog timers on LNL amdgpu: - RAS/FRU EEPROM updatse - IP discovery updatses - GC 11.5 support - DCN 3.5 support - VPE 6.1 support - NBIO 7.11 support - DML2 support - lots of IP updates - use flexible arrays for bo list handling - W=1 fixes - Enable seamless boot in more cases - Enable context type property for HDMI - Rework GPUVM TLB flushing - VCN IB start/size alignment fixes amdkfd: - GC 10/11 fixes - GC 11.5 support - use partial migration in GPU faults radeon: - W=1 Fixes - fix some possible buffer overflow/NULL derefs nouveau: - update uapi for NO_PREFETCH - scheduler/fence fixes - rework suspend/resume for GSP-RM - rework display in preparation for GSP-RM habanalabs: - uapi: expose tsc clock - uapi: block access to eventfd through control device - uapi: force dma-buf export to PAGE_SIZE alignments - complete move to accel subsystem - move firmware interface include files - perform hard reset on PCIe AXI drain event - optimise user interrupt handling msm: - DP: use existing helpers for DPCD - DPU: interrupts reworked - gpu: a7xx (a730/a740) support - decouple msm_drv from kms for headless devices mediatek: - MT8188 dsi/dp/edp support - DDP GAMMA - 12 bit LUT support - connector dynamic selection capability rockchip: - rv1126 mipi-dsi/vop support - add planar formats ast: - rename constants panels: - Mitsubishi AA084XE01 - JDI LPM102A188A - LTK050H3148W-CTA6 ivpu: - power management fixes qaic: - add detach slice bo api komeda: - add NV12 writeback tegra: - support NVSYNC/NHSYNC - host1x suspend fixes ili9882t: - separate into own driver -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmVAgzYACgkQDHTzWXnE hr7ZEQ//UXne3tyGOsU3X8r+lstLFDMa90a3hvTg6hX+Q0MjHd/clwkKFkLpkipL n7gIZlaHl11dRs0FzrIZA5EVAAgjMLKmIl10NBDFec6ZFA3VERcggx8y61uifI15 VviMR1VbLHYZaCdyrQOK0A4wcktWnKXyoXp7cwy9crdc2GOBMUZkdIqtvD7jHxQx UMIFnzi1CyKUX/Fjt/JceYcNk9y2ZGkzakYO3sHcUdv4DPu9qX4kNzpjF691AZBP UeKWvCswTRVg2M0kuo/RYIBzqaTmOlk6dHLWBognIeZPyuyhCcaGC2d64c6tShwQ dtHdi+IgyQ8s2qb350ymKTQUP7xA/DfZBwH7LvrZALBxeQGYQN1CnsgDMOS2wcUc XrRFiS7PxEOtMMBctcPBnnoV5ttnsLLlPpzM9puh9sUFMn6CgLzcAMqXdqxzMajH +dz2aD1N0vMqq4varozOg9SC2QamgUiPN/TQfrulhCTCfQaXczy5x1OYiIz65+Sl mKoe2WASuP9Ve8do4N/wEwH5SZY2ItipBdUTRxttY9NTanmV0X5DjZBXH5b9XGci Zl5Ar613f9zwm5T5BVA5k6s3ZbGY6QcP5pDNTCPaSgitfFXIdReBZ2CaYzK3MPg/ Wit/TXrud9yT6VPpI1igboMyasf5QubV1MY1K83kOCWr9u8R2CM= =l79u -----END PGP SIGNATURE----- Merge tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm Pull drm updates from Dave Airlie: "Highlights: - AMD adds some more upcoming HW platforms - Intel made Meteorlake stable and started adding Lunarlake - nouveau has a bunch of display rework in prepartion for the NVIDIA GSP firmware support - msm adds a7xx support - habanalabs has finished migration to accel subsystem Detail summary: kernel: - add initial vmemdup-user-array core: - fix platform remove() to return void - drm_file owner updated to reflect owner - move size calcs to drm buddy allocator - let GPUVM build as a module - allow variable number of run-queues in scheduler edid: - handle bad h/v sync_end in EDIDs panfrost: - add Boris as maintainer fbdev: - use fb_ops helpers more - only allow logo use from fbcon - rename fb_pgproto to pgprot_framebuffer - add HPD state to drm_connector_oob_hotplug_event - convert to fbdev i/o mem helpers i915: - Enable meteorlake by default - Early Xe2 LPD/Lunarlake display enablement - Rework subplatforms into IP version checks - GuC based TLB invalidation for Meteorlake - Display rework for future Xe driver integration - LNL FBC features - LNL display feature capability reads - update recommended fw versions for DG2+ - drop fastboot module parameter - added deviceid for Arrowlake-S - drop preproduction workarounds - don't disable preemption for resets - cleanup inlines in headers - PXP firmware loading fix - Fix sg list lengths - DSC PPS state readout/verification - Add more RPL P/U PCI IDs - Add new DG2-G12 stepping - DP enhanced framing support to state checker - Improve shared link bandwidth management - stop using GEM macros in display code - refactor related code into display code - locally enable W=1 warnings - remove PSR watchdog timers on LNL amdgpu: - RAS/FRU EEPROM updatse - IP discovery updatses - GC 11.5 support - DCN 3.5 support - VPE 6.1 support - NBIO 7.11 support - DML2 support - lots of IP updates - use flexible arrays for bo list handling - W=1 fixes - Enable seamless boot in more cases - Enable context type property for HDMI - Rework GPUVM TLB flushing - VCN IB start/size alignment fixes amdkfd: - GC 10/11 fixes - GC 11.5 support - use partial migration in GPU faults radeon: - W=1 Fixes - fix some possible buffer overflow/NULL derefs nouveau: - update uapi for NO_PREFETCH - scheduler/fence fixes - rework suspend/resume for GSP-RM - rework display in preparation for GSP-RM habanalabs: - uapi: expose tsc clock - uapi: block access to eventfd through control device - uapi: force dma-buf export to PAGE_SIZE alignments - complete move to accel subsystem - move firmware interface include files - perform hard reset on PCIe AXI drain event - optimise user interrupt handling msm: - DP: use existing helpers for DPCD - DPU: interrupts reworked - gpu: a7xx (a730/a740) support - decouple msm_drv from kms for headless devices mediatek: - MT8188 dsi/dp/edp support - DDP GAMMA - 12 bit LUT support - connector dynamic selection capability rockchip: - rv1126 mipi-dsi/vop support - add planar formats ast: - rename constants panels: - Mitsubishi AA084XE01 - JDI LPM102A188A - LTK050H3148W-CTA6 ivpu: - power management fixes qaic: - add detach slice bo api komeda: - add NV12 writeback tegra: - support NVSYNC/NHSYNC - host1x suspend fixes ili9882t: - separate into own driver" * tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits) drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo drm/amdgpu: Remove duplicate fdinfo fields drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table drm/amdgpu: Identify data parity error corrected in replay mode drm/amdgpu: Fix typo in IP discovery parsing drm/amd/display: fix S/G display enablement drm/amdxcp: fix amdxcp unloads incompletely drm/amd/amdgpu: fix the GPU power print error in pm info drm/amdgpu: Use pcie domain of xcc acpi objects drm/amd: check num of link levels when update pcie param drm/amdgpu: Add a read to GFX v9.4.3 ring test drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported. drm/amdgpu: get RAS poison status from DF v4_6_2 drm/amdgpu: Use discovery table's subrevision drm/amd/display: 3.2.256 drm/amd/display: add interface to query SubVP status drm/amd/display: Read before writing Backlight Mode Set Register drm/amd/display: Disable SYMCLK32_SE RCO on DCN314 ...	2023-11-01 06:28:35 -10:00
Yang Wang	2bfb0ca3dd	drm/amdgpu: remove unused macro HW_REV remove unused macro HW_REV Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-31 17:13:59 -04:00
Arunpravin Paneer Selvam	9ae587f850	drm/amdgpu: Fix the vram base start address If the size returned by drm buddy allocator is higher than the required size, we take the higher size to calculate the buffer start address. This is required if we couldn't trim the buffer to the requested size. This will fix the display corruption issue on APU's which has limited VRAM size. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2859 Fixes: `0a1844bf0b` ("drm/buddy: Improve contiguous memory allocation") Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-31 17:10:13 -04:00
Tao Zhou	d539b0ad7c	drm/amdgpu: set XGMI IP version manually for v6_4 The version can't be queried from discovery table. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-31 17:09:53 -04:00
Tong Liu01	853eebe6ec	drm/amdgpu: add unmap latency when gfx11 set kiq resources [why] If driver does not set unmap latency for KIQ, the default value of KIQ unmap latency is zero. When do unmap queue, KIQ will return that almost immediately after receiving unmap command. So, the queue status will be saved to MQD incorrectly or lost in some chance. [how] Set unmap latency when do kiq set resources. The unmap latency is set to be 1 second that is synchronized with Windows driver. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-31 16:40:16 -04:00
Kenneth Feng	5f38ac54e6	drm/amd/pm: fix the high voltage and temperature issue fix the high voltage and temperature issue after the driver is unloaded on smu 13.0.0, smu 13.0.7 and smu 13.0.10 v2 - fix the code format and make sure it is used on the unload case only. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-31 16:40:16 -04:00
Yifan Zhang	a17f574ab4	drm/amdgpu: remove amdgpu_mes_self_test in gpu recover gpu tlb flush is skipped if reset sem is held, it makes mes_self_test fail since it involves add_hw_queue/remove_hw_queue which needs tlb flush functional. Remove mes_self_test in gpu recover sequence. This patch is to fix the recover failure in gfx11. [ 1831.768292] [drm] ring sdma_32769.3.3 was added [ 1831.768313] [drm] ring gfx_32769.1.1 ib test pass [ 1831.768337] [drm] ring compute_32769.2.2 ib test pass [ 1831.768399] amdgpu 0000:c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process pid 0 thread pid 0) [ 1831.768434] amdgpu 0000:c2:00.0: amdgpu: in page starting at address 0x0000aec200000000 from client 10 [ 1831.768456] amdgpu 0000:c2:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800A30 [ 1831.768473] amdgpu 0000:c2:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [ 1831.768489] amdgpu 0000:c2:00.0: amdgpu: MORE_FAULTS: 0x0 [ 1831.768501] amdgpu 0000:c2:00.0: amdgpu: WALKER_ERROR: 0x0 [ 1831.768513] amdgpu 0000:c2:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 1831.768521] amdgpu 0000:c2:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 1831.768529] amdgpu 0000:c2:00.0: amdgpu: RW: 0x0 [ 1831.931229] amdgpu 0000:c2:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring sdma_32769.3.3 test failed (-110) [ 1832.062917] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=3 [ 1832.063107] [drm:amdgpu_mes_remove_hw_queue [amdgpu]] ERROR failed to remove hardware queue, queue id = 3 Fixes: `e2e3788850` ("drm/amdgpu: rework lock handling for flush_tlb v2") Reported-by: Li Ma <li.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-31 16:40:15 -04:00
Candice Li	e020d01575	drm/amdgpu: Drop deferred error in uncorrectable error check Drop checking deferred error which can be handled by poison consumption. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-31 16:40:15 -04:00
Tao Zhou	d1d4c0b7b6	drm/amdgpu: check RAS supported first in ras_reset_error_count Not all platforms support RAS. Fixes: `73582be11a` ("drm/amdgpu: bypass RAS error reset in some conditions") Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-31 16:40:15 -04:00
Dave Airlie	631808095a	amd-drm-next-6.7-2023-10-27: amdgpu: - RAS fixes - Seamless boot fixes - NBIO 7.7 fix - SMU 14.0 fixes - GC 11.5 fixes - DML2 fixes - ASPM fixes - VPE fixes - Misc code cleanups - SRIOV fixes - Add some missing copyright notices - DCN 3.5 fixes - FAMS fixes - Backlight fix - S/G display fix - fdinfo cleanups - EXT_COHERENT fixes for APU and NUMA systems amdkfd: - Misc fixes - Misc code cleanups - SVM fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZTwWOAAKCRC93/aFa7yZ 2Lz3AP0cInVS4qZYTZCh2O/k5AoidJjcmRl2DVm8OdowBPCa4wEAoNsekTIQnZsI Ru4SoVKhT2bs1LEMOcdzexsVrwlaxA8= =G3QX -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.7-2023-10-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.7-2023-10-27: amdgpu: - RAS fixes - Seamless boot fixes - NBIO 7.7 fix - SMU 14.0 fixes - GC 11.5 fixes - DML2 fixes - ASPM fixes - VPE fixes - Misc code cleanups - SRIOV fixes - Add some missing copyright notices - DCN 3.5 fixes - FAMS fixes - Backlight fix - S/G display fix - fdinfo cleanups - EXT_COHERENT fixes for APU and NUMA systems amdkfd: - Misc fixes - Misc code cleanups - SVM fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231027200343.57132-1-alexander.deucher@amd.com	2023-10-31 12:37:19 +10:00
Dave Airlie	915b6d034b	Merge tag 'drm-misc-next-2023-10-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.7-rc1: drm-misc-next-2023-10-19 + following: UAPI Changes: Cross-subsystem Changes: - Convert fbdev drivers to use fbdev i/o mem helpers. Core Changes: - Use cross-references for macros in docs. - Make drm_client_buffer_addb use addfb2. - Add NV20 and NV30 YUV formats. - Documentation updates for create_dumb ioctl. - CI fixes. - Allow variable number of run-queues in scheduler. Driver Changes: - Rename drm/ast constants. - Make ili9882t its own driver. - Assorted fixes in ivpu, vc4, bridge/synopsis, amdgpu. - Add planar formats to rockchip. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/3d92fae8-9b1b-4165-9ca8-5fda11ee146b@linux.intel.com	2023-10-31 10:47:50 +10:00
Umio Yasuno	dd3dd9829b	drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo Remove unused variables from amdgpu_show_fdinfo Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-27 14:23:01 -04:00
Rob Clark	e8e696c307	drm/amdgpu: Remove duplicate fdinfo fields Some of the fields that are handled by drm_show_fdinfo() crept back in when rebasing the patch. Remove them again. Fixes: `376c25f8ca` ("drm/amdgpu: Switch to fdinfo helper") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: <alexander.deucher@amd.com> Co-developed-by: Umio Yasuno <coelacanth_dream@protonmail.com> Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-27 14:23:01 -04:00
Kenneth Feng	3ea8dd3758	drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded avoid to disable gfxhub interrupt when driver is unloaded on gmc 11 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-27 14:15:39 -04:00
David Francis	142262a1c0	drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems On gfx943 APU, EXT_COHERENT should give MTYPE_CC for local and MTYPE_UC for nonlocal memory. On NUMA systems, local memory gets the local mtype, set by an override callback. If EXT_COHERENT is set, memory will be set as MTYPE_UC by default, with local memory MTYPE_CC. Add an option in the override function for this case, and add a check to ensure it is not used on UNCACHED memory. V2: Combined APU and NUMA code into one patch V3: Fixed a potential nullptr in amdgpu_vm_bo_update Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-27 14:15:16 -04:00
Candice Li	a395f7ffce	drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table Retrieve correctable error count from ce_count_lo_chip instead of mca_umc_status. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-27 14:15:10 -04:00
Candice Li	d59fcfb084	drm/amdgpu: Identify data parity error corrected in replay mode Use ErrorCodeExt field to identify data parity error in replay mode. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-27 14:15:03 -04:00
Mukul Joshi	f7a17b2b36	drm/amdgpu: Fix typo in IP discovery parsing Fix a typo in parsing of the GC info table header when reading the IP discovery table. Fixes: `0e64c9aad0` ("drm/amdgpu: add type conversion for gc info") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-27 14:13:27 -04:00
Dave Airlie	44117828ed	amd-drm-fixes-6.6-2023-10-25: amdgpu: - Extend VI APSM quirks to more platforms -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZTngEQAKCRC93/aFa7yZ 2AJrAP432Pqn+TmmQnm564esdTzy/aH8m/5eeB5KSTE3SQt+agD/bAoDPRCk/jgi PMgVEWp8pQo1Kezzcb/+iJq0tUfxQgc= =kRQ2 -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.6-2023-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.6-2023-10-25: amdgpu: - Extend VI APSM quirks to more platforms Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231026035452.14921-1-alexander.deucher@amd.com	2023-10-27 12:17:26 +10:00
Dave Airlie	6366ffa6ed	Short summary of fixes pull: amdgpu: - ignore duplicated BOs in CS parser - remove redundant call to amdgpu_ctx_priority_is_valid() amdkfd: - reserve fence slot while locking BO dp_mst: - Fix NULL deref in get_mst_branch_device_by_guid_helper() logicvc: - Kconfig: Select REGMAP and REGMAP_MMIO ivpu: - Fix missing VPUIP interrupts -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmU6RugACgkQaA3BHVML eiOH1ggAsc+3EcnoNbeKuPRTefAGb7xsoljmyY0EJriSQZtiropmVGpXYmLSSSSC 4x8TLztWsDXkM/+AKJEQWT56egKwP4Z+QgASOo25p050z56IlPScSsENasuwumW1 vI/hlUQRkDjAixEd6WHcD/hGPu9kKNPhS/DUVNeWYr2237rFJChHg+hWw3PycTzm 2aD8fx1275sx6/OXkKW5bQKmshjHtGYnT86w3xgAJUfvTDPq/eZtwqCKhqCqaQja 5tAccKVmnWAaXneCvs7XwAXYou1Qny3GOBOOpCrBQYaj49N9cMyflovY2y2s1uyR pQyquVzLZLVhFHc2B58++49NyES8bw== =d+WA -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2023-10-26' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: amdgpu: - ignore duplicated BOs in CS parser - remove redundant call to amdgpu_ctx_priority_is_valid() amdkfd: - reserve fence slot while locking BO dp_mst: - Fix NULL deref in get_mst_branch_device_by_guid_helper() logicvc: - Kconfig: Select REGMAP and REGMAP_MMIO ivpu: - Fix missing VPUIP interrupts Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20231026110132.GA10591@linux-uq9g.fritz.box	2023-10-27 11:51:35 +10:00
Lijo Lazar	d055714a21	drm/amdgpu: Use pcie domain of xcc acpi objects PCI domain/segment information of xccs is available through ACPI DSM methods. Consider that also while looking for devices. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 19:04:39 -04:00
Lijo Lazar	3f69d5860f	drm/amdgpu: Add a read to GFX v9.4.3 ring test Issue a read to confirm the register write before ringing doorbell. With multiple XCCs there is chance for race condition. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 19:04:13 -04:00
Tao Zhou	2cea7bb911	drm/amdgpu: get RAS poison status from DF v4_6_2 Add DF block and RAS poison mode query for DF v4_6_2. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 19:02:52 -04:00
Lijo Lazar	dd2687f5d9	drm/amdgpu: Use discovery table's subrevision Use subrevision of IP version in discovery table to identify SOC revision id for NBIO v7.9 SOCs. Only newer bootloaders update subrevision field. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 19:02:44 -04:00
Mario Limonciello	2757a848cb	drm/amd: Explicitly disable ASPM when dynamic switching disabled Currently there are separate but related checks: * amdgpu_device_should_use_aspm() * amdgpu_device_aspm_support_quirk() * amdgpu_device_pcie_dynamic_switching_supported() Simplify into checking whether DPM was enabled or not in the auto case. This works because amdgpu_device_pcie_dynamic_switching_supported() populates that value. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:23 -04:00
Mario Limonciello	1a6513de49	drm/amd: Move AMD_IS_APU check for ASPM into top level function There is no need for every ASIC driver to perform the same check. Move the duplicated code into amdgpu_device_should_use_aspm(). Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:23 -04:00
Mario Limonciello	fbf1035b03	drm/amd: Disable PP_PCIE_DPM_MASK when dynamic speed switching not supported Rather than individual ASICs checking for the quirk, set the quirk at the driver level. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:23 -04:00
Lin.Cao	9ee819285c	drm/amdgpu remove restriction of sriov max_pfn on Vega10 Remove restriction of sriov max_pfn so that TBA and TMA can move to high 47 bits address. Regression test: change range alloc flag of libdrm as AMDGPU_VA_RANGE_HIGH and there is no flr occur when testing amdgpu_test of drm. Signed-off-by: Lin.Cao <lincao12@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:22 -04:00
Qu Huang	5104fdf50d	drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log: 1. Navigate to the directory: /sys/kernel/debug/dri/0 2. Execute command: cat amdgpu_regs_smc 3. Exception Log:: [4005007.702554] BUG: kernel NULL pointer dereference, address: 0000000000000000 [4005007.702562] #PF: supervisor instruction fetch in kernel mode [4005007.702567] #PF: error_code(0x0010) - not-present page [4005007.702570] PGD 0 P4D 0 [4005007.702576] Oops: 0010 [#1] SMP NOPTI [4005007.702581] CPU: 4 PID: 62563 Comm: cat Tainted: G OE 5.15.0-43-generic #46-Ubunt u [4005007.702590] RIP: 0010:0x0 [4005007.702598] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.702600] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.702605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.702609] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.702612] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.702615] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.702618] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.702622] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.702626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.702629] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 [4005007.702633] Call Trace: [4005007.702636] <TASK> [4005007.702640] amdgpu_debugfs_regs_smc_read+0xb0/0x120 [amdgpu] [4005007.703002] full_proxy_read+0x5c/0x80 [4005007.703011] vfs_read+0x9f/0x1a0 [4005007.703019] ksys_read+0x67/0xe0 [4005007.703023] __x64_sys_read+0x19/0x20 [4005007.703028] do_syscall_64+0x5c/0xc0 [4005007.703034] ? do_user_addr_fault+0x1e3/0x670 [4005007.703040] ? exit_to_user_mode_prepare+0x37/0xb0 [4005007.703047] ? irqentry_exit_to_user_mode+0x9/0x20 [4005007.703052] ? irqentry_exit+0x19/0x30 [4005007.703057] ? exc_page_fault+0x89/0x160 [4005007.703062] ? asm_exc_page_fault+0x8/0x30 [4005007.703068] entry_SYSCALL_64_after_hwframe+0x44/0xae [4005007.703075] RIP: 0033:0x7f5e07672992 [4005007.703079] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 e c 28 48 89 54 24 [4005007.703083] RSP: 002b:00007ffe03097898 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [4005007.703088] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5e07672992 [4005007.703091] RDX: 0000000000020000 RSI: 00007f5e06753000 RDI: 0000000000000003 [4005007.703094] RBP: 00007f5e06753000 R08: 00007f5e06752010 R09: 00007f5e06752010 [4005007.703096] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000 [4005007.703099] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [4005007.703105] </TASK> [4005007.703107] Modules linked in: nf_tables libcrc32c nfnetlink algif_hash af_alg binfmt_misc nls_ iso8859_1 ipmi_ssif ast intel_rapl_msr intel_rapl_common drm_vram_helper drm_ttm_helper amd64_edac t tm edac_mce_amd kvm_amd ccp mac_hid k10temp kvm acpi_ipmi ipmi_si rapl sch_fq_codel ipmi_devintf ipm i_msghandler msr parport_pc ppdev lp parport mtd pstore_blk efi_pstore ramoops pstore_zone reed_solo mon ip_tables x_tables autofs4 ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v 2 amd_sched(OE) amdkcl(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm igb ahci xhci_pci libahci i2c_piix4 i2c_algo_bit xhci_pci_renesas dca [4005007.703184] CR2: 0000000000000000 [4005007.703188] ---[ end trace ac65a538d240da39 ]--- [4005007.800865] RIP: 0010:0x0 [4005007.800871] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.800874] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.800878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.800881] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.800883] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.800886] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.800888] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.800891] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.800895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.800898] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 Signed-off-by: Qu Huang <qu.huang@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:22 -04:00
Tao Zhou	73582be11a	drm/amdgpu: bypass RAS error reset in some conditions PMFW is responsible for RAS error reset in some conditions, driver can skip the operation. v2: add check for ras->in_recovery, it's set earlier than amdgpu_in_reset. v3: fix error in gpu reset check. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:22 -04:00
Tao Zhou	f3a3bbf156	drm/amdgpu: enable RAS poison mode for APU Enable it by default on APU platform. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:22 -04:00
Lang Yu	fc4981b69c	drm/amdgpu/vpe: correct queue stop programing Otherwise IB test would fail during GPU reset. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:22 -04:00
Mario Limonciello	e5f52a84bf	drm/amd: Disable ASPM for VI w/ all Intel systems Originally we were quirking ASPM disabled specifically for VI when used with Alder Lake, but it appears to have problems with Rocket Lake as well. Like we've done in the case of dpm for newer platforms, disable ASPM for all Intel systems. Cc: stable@vger.kernel.org # 5.15+ Fixes: `0064b0ce85` ("drm/amd/pm: enable ASPM by default") Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:22 -04:00
Lijo Lazar	8eece69ace	drm/amdgpu: Add API to get full IP version Fetch the full version of IP including variant and subrevision. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:21 -04:00
Jiadong Zhu	037fb9c600	drm/amdgpu: add tmz support for GC IP v11.5.0 Add tmz support for GC 11.5.0. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:21 -04:00
Li Ma	493c75bbe3	drm/amdgpu: modify if condition in nbio_v7_7.c remove unnecessary "enable" in if condition. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:21 -04:00
Yang Wang	ec3e0a9167	drm/amdgpu: refine ras error kernel log print refine ras error kernel log to avoid user-ridden ambiguity. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:21 -04:00
Yang Wang	53d4d77927	drm/amdgpu: fix find ras error node error the origin function might return the wrong node. Fixes: `5b1270beb3` ("drm/amdgpu: add ras_err_info to identify RAS error source") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:21 -04:00
Alex Deucher	b70438004a	drm/amdgpu: move buffer funcs setting up a level Rather than doing this in the IP code for the SDMA paging engine, move it up to the core device level init level. This should fix the scheduler init ordering. v2: drop extra parens v3: drop SDMA helpers v4: Added a Fixes tag because amdgpu dereferences an uninitialized scheduler without this patch, and this patch fixes this. (Luben) Tested-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231025171928.3318505-1-alexander.deucher@amd.com Acked-by: Christian König <christian.koenig@amd.com> Fixes: `56e449603f` ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-10-26 16:04:24 -04:00
Luben Tuikov	56e449603f	drm/sched: Convert the GPU scheduler to variable number of run-queues The GPU scheduler has now a variable number of run-queues, which are set up at drm_sched_init() time. This way, each driver announces how many run-queues it requires (supports) per each GPU scheduler it creates. Note, that run-queues correspond to scheduler "priorities", thus if the number of run-queues is set to 1 at drm_sched_init(), then that scheduler supports a single run-queue, i.e. single "priority". If a driver further sets a single entity per run-queue, then this creates a 1-to-1 correspondence between a scheduler and a scheduled entity. Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Emma Anholt <emma@anholt.net> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com	2023-10-26 12:03:47 -04:00
Mario Limonciello	64ffd2f1d0	drm/amd: Disable ASPM for VI w/ all Intel systems Originally we were quirking ASPM disabled specifically for VI when used with Alder Lake, but it appears to have problems with Rocket Lake as well. Like we've done in the case of dpm for newer platforms, disable ASPM for all Intel systems. Cc: stable@vger.kernel.org # 5.15+ Fixes: `0064b0ce85` ("drm/amd/pm: enable ASPM by default") Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-25 09:53:17 -04:00
Dave Airlie	0ecf4aa32b	amd-drm-next-6.7-2023-10-20: amdgpu: - SMU 13 updates - UMSCH updates - DC MPO fixes - RAS updates - MES 11 fixes - Fix possible memory leaks in error pathes - GC 11.5 fixes - Kernel doc updates - PSP updates - APU IMU fixes - Misc code cleanups - SMU 11 fixes - OD fix - Frame size warning fixes - SR-IOV fixes - NBIO 7.11 updates - NBIO 7.7 updates - XGMI fixes - devcoredump updates amdkfd: - Misc code cleanups - SVM fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZTLX4QAKCRC93/aFa7yZ 2ORJAP9lnoyvUPIP63Hx5TADQtZHiA+ShkATGQmDia94ABtCxwEAlo88TipxAo7c tRX8Mn+rix3M739FDFxV0bp7hCXsbgQ= =fY9I -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.7-2023-10-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.7-2023-10-20: amdgpu: - SMU 13 updates - UMSCH updates - DC MPO fixes - RAS updates - MES 11 fixes - Fix possible memory leaks in error pathes - GC 11.5 fixes - Kernel doc updates - PSP updates - APU IMU fixes - Misc code cleanups - SMU 11 fixes - OD fix - Frame size warning fixes - SR-IOV fixes - NBIO 7.11 updates - NBIO 7.7 updates - XGMI fixes - devcoredump updates amdkfd: - Misc code cleanups - SVM fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231020195043.4937-1-alexander.deucher@amd.com	2023-10-25 10:54:22 +10:00
Christian König	4984fc578a	drm/amdkfd: reserve a fence slot while locking the BO Looks like the KFD still needs this. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `8abc1eb298` ("drm/amdkfd: switch over to using drm_exec v3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231020123306.43978-1-christian.koenig@amd.com	2023-10-23 14:48:47 +02:00
Dave Airlie	7cd62eab9b	Linux 6.6-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmU1ngkeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGrsIH/0k/+gdBBYFFdEym foRhKir9WV3ZX4oIozJjA1f7T+qVYclKs6kaYm3gNepRBb6AoG8pdgv4MMAqhYsf QMe2XHi0MrO/qKBgfNfivxEa9jq+0QK5uvTbqCRqCAB8LfwVyDqapCmg3EuiZcPW UbMITmnwLIfXgPxvp9rabmCsTqO6FLbf0GDOVIkNSAIDBXMpcO1iffjrWUbhRa7n oIoiJmWJLcXLxPWDsRKbpJwzw2cIG08YhfQYAiQnC3YaeRm1FKLDIICRBsmfYzja rWv9r4dn4TDfV4/AnjggQnsZvz2yPCxNaFSQIT88nIeiLvyuUTJ9j8aidsSfMZQf xZAbzbA= =NoQv -----END PGP SIGNATURE----- BackMerge tag 'v6.6-rc7' into drm-next This is needed to add the msm pr which is based on a higher base. Signed-off-by: Dave Airlie <airlied@redhat.com>	2023-10-23 18:20:06 +10:00
Luben Tuikov	d3df66fd98	drm/amdgpu: Remove redundant call to priority_is_valid() Remove a redundant call to amdgpu_ctx_priority_is_valid() from amdgpu_ctx_priority_permit(), which is called from amdgpu_ctx_init() which is called from amdgpu_ctx_alloc() which is called from amdgpu_ctx_ioctl(), where we've called amdgpu_ctx_priority_is_valid() already first thing in the function. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20231018010359.30393-1-luben.tuikov@amd.com	2023-10-21 20:27:15 -04:00
André Almeida	de009982c6	drm/amdgpu: Create version number for coredumps Even if there's nothing currently parsing amdgpu's coredump files, if we eventually have such tools they will be glad to find a version field to properly read the file. Create a version number to be displayed on top of coredump file, to be incremented when the file format or content get changed. Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:29 -04:00
André Almeida	69619868d3	drm/amdgpu: Move coredump code to amdgpu_reset file Giving that we use codedump just for device resets, move it's functions and structs to a more semantic file, the amdgpu_reset.{c, h}. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:29 -04:00
André Almeida	2d6a2a28cd	drm/amdgpu: Encapsulate all device reset info To better organize struct amdgpu_device, keep all reset information related fields together in a separated struct. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Shiwu Zhang	723fac64d0	drm/amdgpu: support the port num info based on the capability flag XGMI TA will set the capability flag to indicate whether the port_num info is supported or not. KGD checks the flag and accordingly picks up the right buffer format and send the right command to TA to retrieve the info. v2: simplify the code by reusing the same statement (lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Shiwu Zhang	e8a5ded36b	drm/amdgpu: prepare the output buffer for GET_PEER_LINKS command Per the xgmi ta implementation, KGD needs to fill in node_ids in concern into the shared command output buffer rather than the command input buffer. Input buffer is not used for GET_PEER_LINKS command execution. In this way, xgmi ta can reuse the node info in the output buffer just filled in and populate the same buffer with link info directly. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Tao Zhou	d9443ac4f9	drm/amdgpu: drop status query/reset for GCEA 9.4.3 and MMEA 1.8 PMFW will be responsible for them. v2: remove query interfaces. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Shiwu Zhang	626121fce4	drm/amdgpu: update the xgmi ta interface header Update the header file to the v20.00.00.13 v1: rename TA_COMMAND_XGMI__GET_GET_TOPOLOGY_INFO to TA_COMMAND_XGMI__GET_TOPOLOGY_INFO And also rename struct ta_xgmi_cmd_get_peer_link_info_output to ta_xgmi_cmd_get_peer_link_info accordingly v2: add structs to support xgmi GET_EXTEND_PEER_LINK command Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Tao Zhou	8096df7664	drm/amdgpu: add set/get mca debug mode operations Record the debug mode status in RAS. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Tao Zhou	21226f02d7	drm/amdgpu: replace reset_error_count with amdgpu_ras_reset_error_count Simplify the code. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Li Ma	9d7a965e22	drm/amdgpu: add clockgating support for NBIO v7.7.1 add clockgating support for NBIO ip 7.7.1 Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Li Ma	fa9dd7a285	drm/amdgpu: fix missing stuff in NBIO v7.11 add get_clockgating_state, update_medium_grain_light_sleep and update_medium_grain_clock_gating in nbio_v7_11_funcs v1: add missing funcs in nbio_v7_11.c v2: modify the if condition and add spport for nbio v7.11 clockgating. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Stanley.Yang	66d64e4e03	drm/amdgpu: Enable RAS feature by default for APU Enable RAS feature by default for aqua vanjaram on apu platform. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Yang Wang	49c260bef3	drm/amdgpu: fix typo for amdgpu ras error data print typo fix. Fixes: `5b1270beb3` ("drm/amdgpu: add ras_err_info to identify RAS error source") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Candice Li <candice.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Bokun Zhang	017634a68d	drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P4 - In VCN 4 SRIOV code path, add code to enable RB decouple feature Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:27 -04:00
Bokun Zhang	eb9d6256b9	drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P3 - Update VCN header for RB decouple feature - Add metadata struct, metadata will be placed after each RB Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:27 -04:00
Bokun Zhang	fc3136730b	drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P2 - Add function to check if RB decouple is enabled under SRIOV Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:27 -04:00
Bokun Zhang	97b2821643	drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P1 - Update SRIOV header with RB decouple flag Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:27 -04:00
Stanley.Yang	8a65661114	drm/amdgpu: Fix delete nodes that have been relesed Fix delete nodes that it has been freed. Fixes: `5b1270beb3` ("drm/amdgpu: add ras_err_info to identify RAS error source") Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:27 -04:00
Hawking Zhang	f2176d7063	drm/amdgpu: Add UVD_VCPU_INT_EN2 to dpg sram Add RAS sepcifc programming to dpg sram. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:27 -04:00
Hawking Zhang	9248462d7e	drm/amdgpu: Enable software RAS in vcn v4_0_3 Set VCN/JPEG RAS masks to enable software RAS for VCN and JPEG. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:26 -04:00
Tao Zhou	472c5fb297	drm/amdgpu: define ras_reset_error_count function Make the code architecture more simple. v2: reuse ras_reset_error_count in ras_reset_error_status. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:26 -04:00
Candice Li	afcf949cf3	drm/amdgpu: Log UE corrected by replay as correctable error Support replay mode where UE could be converted to CE. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:26 -04:00
Dave Airlie	d43c76c820	Short summary of fixes pull: amdgpu: - Disable AMD_CTX_PRIORITY_UNSET bridge: - ti-sn65dsi86: Fix device lifetime edid: - Add quirk for BenQ GW2765 ivpu: - Extend address range for MMU mmap nouveau: - DP-connector fixes - Documentation fixes panel: - Move AUX B116XW03 into panel-simple scheduler: - Eliminate DRM_SCHED_PRIORITY_UNSET ttm: - Fix possible NULL-ptr deref in cleanup -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmUxFsoACgkQaA3BHVML eiNkkwf/YORvSPLN2LEqh4P8to7X92moFR7XwkA1GVofKOC3C04wDBq+Ezdy5pnw 7T/uvEnmrnDP5Iucerly8bE5IdJGkafIB8+oBIcGhGvPgtYNGSV8NWoeA/lhBmsS av56+zBPpBF3s5fZa4rB/WciGRuCsLhbcYUN/LGGVBXdhLgb8LRb/seTv/Ah9Sra LQtFmrDNNE0+FIWuKkUsY/CQYysbMzuHYFiLumtY59lE1R5kyvlzE8OrdcqlDXhC HCFt0KuA+4onzdHKnge/as5T3jBJucGV9mTcgal3158n94U/ODrXEJqVD/KFKle6 052vhpApOL3+RyDKGjXQLm/GumEc9w== =LJB1 -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2023-10-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: amdgpu: - Disable AMD_CTX_PRIORITY_UNSET bridge: - ti-sn65dsi86: Fix device lifetime edid: - Add quirk for BenQ GW2765 ivpu: - Extend address range for MMU mmap nouveau: - DP-connector fixes - Documentation fixes panel: - Move AUX B116XW03 into panel-simple scheduler: - Eliminate DRM_SCHED_PRIORITY_UNSET ttm: - Fix possible NULL-ptr deref in cleanup Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20231019114605.GA22540@linux-uq9g	2023-10-20 14:07:58 +10:00
Felix Kuehling	316baf09d3	drm/amdgpu: Reserve fences for VM update In amdgpu_dma_buf_move_notify reserve fences for the page table updates in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON in dma_resv_add_fence when using SDMA for page table updates. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:56:57 -04:00
Felix Kuehling	51b79f3381	drm/amdgpu: Fix possible null pointer dereference abo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: `1802537820` ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:56:50 -04:00
Felix Kuehling	207430b76a	drm/amdgpu: Reserve fences for VM update In amdgpu_dma_buf_move_notify reserve fences for the page table updates in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON in dma_resv_add_fence when using SDMA for page table updates. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:52 -04:00
Felix Kuehling	e6f8588733	drm/amdgpu: Fix possible null pointer dereference abo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: `1802537820` ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:52 -04:00
Stanley.Yang	b1338a8e71	drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery This is workaround, kiq ring test failed in suspend stage when do ras recovery. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:52 -04:00
Mario Limonciello	e56690bb37	drm/amd: Read IMU FW version from scratch register during hw_init If the IMU version wasn't discovered from the header, such as when the firmware was directly loaded by PSP then there is no firmware version to show to userspace from sysfs or IOCTL. The IMU F/W stores the version in the first scratch register though, so fetch it in these cases to let the driver export. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:51 -04:00
Mario Limonciello	4916615fe9	drm/amd: Don't parse IMU ucode version if it won't be loaded When the IMU ucode is loaded by the PSP parsing the version that comes from Linux will vary. Rather than showing the wrong data to kernel interface consumers, avoid populating it in this case. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:51 -04:00
Mario Limonciello	d757dfd667	drm/amd: Move microcode init step to early_init() The intention for early init is to find any missing microcode early and fail the driver load if it's missing. Move this step to earlier in driver init to match other IP blocks. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:51 -04:00
Asad Kamal	d8c1925ba8	drm/amdgpu: update retry times for psp BL wait Increase retry time for PSP BL wait, to compensate for longer time to set c2pmsg 35 ready bit during mode1 with RAS Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:51 -04:00
Alex Deucher	28ab9a02b6	drm/amdgpu/mes11: remove aggregated doorbell code It's not enabled in hardware so the code is dead. Remove it. Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:51 -04:00
Asad Kamal	53dd920c1f	drm/amdgpu : Add hive ras recovery check If one of the devices in the hive detects a fatal error, need to send ras recovery reset message to PMFW of all devices in the hive. For that add a flag in hive to indicate that it's undergoing ras recovery Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:51 -04:00
Mangesh Gadre	2d955a06a5	Revert "drm/amdgpu: Program xcp_ctl registers as needed" This reverts commit `0bdebfef3f`. XCP_CTL register is programmed by firmware and register access is protected. Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:50 -04:00
Lang Yu	ab29ac57ad	drm/amdgpu/umsch: add suspend and resume callback Add missing IP callbacks. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-19 18:26:50 -04:00
Christian König	6b18ef481f	drm/amdgpu: ignore duplicate BOs again Looks like RADV is actually hitting this. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `ca6c1e210a` ("drm/amdgpu: use the new drm_exec object for CS v3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017121015.1336786-1-christian.koenig@amd.com	2023-10-19 13:19:44 +02:00
Dave Airlie	27442758e9	amd-drm-next-6.7-2023-10-13: amdgpu: - DC replay fixes - Misc code cleanups and spelling fixes - Documentation updates - RAS EEPROM Updates - FRU EEPROM Updates - IP discovery updates - SR-IOV fixes - RAS updates - DC PQ fixes - SMU 13.0.6 updates - GC 11.5 Support - NBIO 7.11 Support - GMC 11 Updates - Reset fixes - SMU 11.5 Updates - SMU 13.0 OD support - Use flexible arrays for bo list handling - W=1 Fixes - SubVP fixes - DPIA fixes - DCN 3.5 Support - Devcoredump fixes - VPE 6.1 support - VCN 4.0 Updates - S/G display fixes - DML fixes - DML2 Support - MST fixes - VRR fixes - Enable seamless boot in more cases - Enable content type property for HDMI - OLED fixes - Rework and clean up GPUVM TLB flushing - DC ODM fixes - DP 2.x fixes - AGP aperture fixes - SDMA firmware loading cleanups - Cyan Skillfish GPU clock counter fix - GC 11 GART fix - Cache GPU fault info for userspace queries - DC cursor check fixes - eDP fixes - DC FP handling fixes - Variable sized array fixes - SMU 13.0.x fixes - IB start and size alignment fixes for VCN - SMU 14 Support - Suspend and resume sequence rework - vkms fix amdkfd: - GC 11 fixes - GC 10 fixes - Doorbell fixes - CWSR fixes - SVM fixes - Clean up GC info enumeration - Rework memory limit handling - Coherent memory handling fixes - Use partial migrations in GPU faults - TLB flush fixes - DMA unmap fixes - GC 9.4.3 fixes - SQ interrupt fix - GTT mapping fix - GC 11.5 Support radeon: - Misc code cleanups - W=1 Fixes - Fix possible buffer overflow - Fix possible NULL pointer dereference UAPI: - Add EXT_COHERENT memory allocation flags. These allow for system scope atomics. Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 - Add support for new VPE engine. This is a memory to memory copy engine with advanced scaling, CSC, and color management features Proposed mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25713 - Add INFO IOCTL interface to query GPU faults Proposed Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238 Proposed libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/298 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZSmDAQAKCRC93/aFa7yZ 2EdeAQC2lkQ9IHLOon5kIZUK+r9IPYlgFsii+qfmMPLBaMcuwgEA8F4eJln/cc9V 02EKhlapkggYXYa+uhOE2KTnWgMFJgI= =SEXq -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.7-2023-10-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.7-2023-10-13: amdgpu: - DC replay fixes - Misc code cleanups and spelling fixes - Documentation updates - RAS EEPROM Updates - FRU EEPROM Updates - IP discovery updates - SR-IOV fixes - RAS updates - DC PQ fixes - SMU 13.0.6 updates - GC 11.5 Support - NBIO 7.11 Support - GMC 11 Updates - Reset fixes - SMU 11.5 Updates - SMU 13.0 OD support - Use flexible arrays for bo list handling - W=1 Fixes - SubVP fixes - DPIA fixes - DCN 3.5 Support - Devcoredump fixes - VPE 6.1 support - VCN 4.0 Updates - S/G display fixes - DML fixes - DML2 Support - MST fixes - VRR fixes - Enable seamless boot in more cases - Enable content type property for HDMI - OLED fixes - Rework and clean up GPUVM TLB flushing - DC ODM fixes - DP 2.x fixes - AGP aperture fixes - SDMA firmware loading cleanups - Cyan Skillfish GPU clock counter fix - GC 11 GART fix - Cache GPU fault info for userspace queries - DC cursor check fixes - eDP fixes - DC FP handling fixes - Variable sized array fixes - SMU 13.0.x fixes - IB start and size alignment fixes for VCN - SMU 14 Support - Suspend and resume sequence rework - vkms fix amdkfd: - GC 11 fixes - GC 10 fixes - Doorbell fixes - CWSR fixes - SVM fixes - Clean up GC info enumeration - Rework memory limit handling - Coherent memory handling fixes - Use partial migrations in GPU faults - TLB flush fixes - DMA unmap fixes - GC 9.4.3 fixes - SQ interrupt fix - GTT mapping fix - GC 11.5 Support radeon: - Misc code cleanups - W=1 Fixes - Fix possible buffer overflow - Fix possible NULL pointer dereference UAPI: - Add EXT_COHERENT memory allocation flags. These allow for system scope atomics. Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 - Add support for new VPE engine. This is a memory to memory copy engine with advanced scaling, CSC, and color management features Proposed mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25713 - Add INFO IOCTL interface to query GPU faults Proposed Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238 Proposed libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/298 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231013175758.1735031-1-alexander.deucher@amd.com	2023-10-18 16:08:07 +10:00
Luben Tuikov	fa8391ad68	gpu/drm: Eliminate DRM_SCHED_PRIORITY_UNSET Eliminate DRM_SCHED_PRIORITY_UNSET, value of -2, whose only user was amdgpu. Furthermore, eliminate an index bug, in that when amdgpu boots, it calls drm_sched_entity_init() with DRM_SCHED_PRIORITY_UNSET, which uses it to index sched->sched_rq[]. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Link: https://lore.kernel.org/r/20231017035656.8211-2-luben.tuikov@amd.com	2023-10-17 20:35:38 -04:00
Luben Tuikov	eab0261967	drm/amdgpu: Unset context priority is now invalid A context priority value of AMD_CTX_PRIORITY_UNSET is now invalid--instead of carrying it around and passing it to the Direct Rendering Manager--and it becomes AMD_CTX_PRIORITY_NORMAL in amdgpu_ctx_ioctl(), the gateway to context creation. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Link: https://lore.kernel.org/r/20231017035656.8211-1-luben.tuikov@amd.com	2023-10-17 20:35:38 -04:00
Ma Ke	cd90511557	drm/amdgpu/vkms: fix a possible null pointer dereference In amdgpu_vkms_conn_get_modes(), the return value of drm_cvt_mode() is assigned to mode, which will lead to a NULL pointer dereference on failure of drm_cvt_mode(). Add a check to avoid null pointer dereference. Signed-off-by: Ma Ke <make_ruc2021@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:36:25 -04:00
Yang Wang	3bba4bc6a0	drm/amdgpu: add RAS error info support for umc_v12_0 add RAS error info support for umc_v12_0. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:36:11 -04:00
Yang Wang	8736d17a7f	drm/amdgpu: add RAS error info support for mmhub_v1_8 add RAS error info support for mmhub_v1_8. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:36:03 -04:00
Yang Wang	156c2814c2	drm/amdgpu: add RAS error info support for gfx_v9_4_3 add RAS error info support for gfx_v9_4_3. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:35:55 -04:00
Yang Wang	dd401cd29a	drm/amdgpu: add RAS error info support for sdma_v4_4_2. add RAS error info support for sdma_v4_4_2. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:35:45 -04:00
Yang Wang	5b1270beb3	drm/amdgpu: add ras_err_info to identify RAS error source introduced "ras_err_info" to better identify a RAS ERROR source. NOTE: For legacy chips, keep the original RAS error print format. v1: RAS errors may come from different dies during a RAS error query, therefore, need a new data structure to identify the source of RAS ERROR. v2: - use new data structure 'amdgpu_smuio_mcm_config_info' instead of ras_err_id (in v1 patch) - refine ras error dump function name - refine ras error dump log format Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:35:35 -04:00
Yifan Zhang	6a1c31c7a8	drm/amdgpu: flush the correct vmid tlb for specific pasid flush the correct vmid tlb for specific pasid on gmc 11. Fixes: `041a574388` ("drm/amdgpu: fix and cleanup gmc_v11_0_flush_gpu_tlb_pasid") Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:34:29 -04:00
Yang Wang	1a00cfab37	drm/amdgpu: make err_data structure built-in for ras_manager (No effect outside the ras_mgr data structure) Since a new member was added to the ras_err_data data structure, it becomes unreasonable for the ras_mgr instance to contain this data, because ras mgr only uses the 2 member information of ue_count/ce_count in err_data. This patch changes the code err_data into built-in structure members, making the code directly compatible. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:34:13 -04:00
Jesse Zhang	e341631f4a	drm/amdgpu: disable GFXOFF and PG during compute for GFX9 Temporary workaround to fix issues observed in some compute applications when GFXOFF is enabled on GFX9. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:33:56 -04:00
Lang Yu	ef2354c70f	drm/amdgpu/umsch: fix missing stuff during rebase These are missed during rebase. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:33:49 -04:00
Lang Yu	fb5b73acf7	drm/amdgpu/umsch: correct IP version format FW uses IP_VERSION_MAJ_MIN_REV format. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:33:42 -04:00
Lang Yu	1c1f14a472	drm/amdgpu: don't use legacy invalidation on MMHUB v3.3 Legacy invalidation is not supported. This is missed during rebase. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:33:29 -04:00
Lang Yu	4661482b9c	drm/amdgpu: correct NBIO v7.11 programing Use v7.7 before, switch to v7.11 now. Fix incorrect programing. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:33:21 -04:00
Xiaogang Chen	ffa88b0019	drm/amdgpu: Correctly use bo_va->ref_count in compute VMs This is needed to correctly handle BOs imported into compute VM from gfx. Both kfd and gfx should use same bo_va and set bo_va->ref_count correctly when map the Bos into same VM, otherwise we may trigger kernel general protection when iterate mappings over bo_va's valids or invalids list. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Tested-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:33:08 -04:00
Lijo Lazar	f20f3b0d6c	drm/amd/pm: Add P2S tables for SMU v13.0.6 Add P2S table load support on SMU v13.0.6 ASICs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:33:01 -04:00
Lijo Lazar	79daf69246	drm/amdgpu: Add support to load P2S tables Add support to load P2S tables through PSP. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:32:55 -04:00
Lijo Lazar	cd21cb1fcb	drm/amdgpu: Update PSP interface header Adds FW id for P2S table. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:32:47 -04:00
Lijo Lazar	a8558fce7a	drm/amdgpu: Avoid FRU EEPROM access on APU FRU EEPROM access is not valid for APU devices. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:32:41 -04:00
Lin.Cao	f74f19c440	drm/amdgpu: save VCN instances init info before jpeg init JPEG init header will overwirte vcn init header info which will loss some debug information Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:32:34 -04:00
Alex Hung	98a80bb3dd	Revert "drm/amd/display: Hande writeback request from userspace" This reverts commit `cd1a4bc228`. [WHY & HOW] The writeback series cause a regression in thunderbolt display. Signed-off-by: Alex Hung <alex.hung@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:24:47 -04:00
Alex Hung	731a20cb89	Revert "drm/amd/display: Add writeback enable field (wb_enabled)" This reverts commit `f6893fcb10`. [WHY & HOW] The writeback series cause a regression in thunderbolt display. Signed-off-by: Alex Hung <alex.hung@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:08:57 -04:00
Asad Kamal	625e5f3851	drm/amdgpu: Expose ras version & schema info Expose ras table version & schema info to sysfs v2: Updated schema to get poison support info from ras context, removed asic specific checks Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:02:54 -04:00
Lijo Lazar	d4a02673b3	drm/amdgpu: Read PSPv13 OS version from register PSP OS updates the version information in register. On APUs with PSPv13, PSP OS will already be loaded with SBIOS. Hence use the version register instead of using information in driver binary header. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:02:43 -04:00
Lang Yu	faeddb6eab	drm/amdgpu/umsch: enable doorbell for umsch Program vcn_doorbell_range with vcn_ring0_1. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:01:37 -04:00
Mario Limonciello	db99889065	drm/amd: Split up UVD suspend into prepare and suspend steps amdgpu_uvd_suspend() allocates memory and copies objects into that allocated memory. This fails under memory pressure. Instead move majority of this code into a prepare step when swap can still be allocated. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:01:04 -04:00
Mario Limonciello	cb11ca3233	drm/amd: Add concept of running prepare_suspend() sequence for IP blocks If any IP blocks allocate memory during their hw_fini() sequence this can cause the suspend to fail under memory pressure. Introduce a new phase that IP blocks can use to allocate memory before suspend starts so that it can potentially be evicted into swap instead. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:00:58 -04:00
Mario Limonciello	5095d54181	drm/amd: Evict resources during PM ops prepare() callback Linux PM core has a prepare() callback run before suspend. If the system is under high memory pressure, the resources may need to be evicted into swap instead. If the storage backing for swap is offlined during the suspend() step then such a call may fail. So move this step into prepare() to move evict majority of resources and update all non-pmops callers to call the same callback. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:00:18 -04:00
Li Ma	31715a8620	drm/amdgpu: enable GFX IP v11.5.0 CG and PG support Add CG support for GFX/MC/HDP/ATHUB/IH/BIF. Add PG support for GFX. Signed-off-by: Li Ma <li.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:00:15 -04:00
Li Ma	ad3e54ab9e	drm/amdgpu/discovery: add SMU 14 support add smu 14 into the IP discovery list. Signed-off-by: Li Ma <li.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 11:00:00 -04:00
Lang Yu	4acf679f86	drm/amdgpu/umsch: power on/off UMSCH by DLDO VCN 4.0.5 uses DLDO. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 10:59:32 -04:00
Lang Yu	617b472431	drm/amdgpu/umsch: fix psp frontdoor loading These changes are missed in rebase. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 10:59:24 -04:00
Lijo Lazar	558fcb7d11	drm/amdgpu: Increase IP discovery region size IP discovery region has increased to > 8K on some SOCs.Maximum reserve size is upto 12K, but not used. For now increase to 10K. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 10:59:16 -04:00
Lin.Cao	b053117e86	drm/amdgpu: Return -EINVAL when MMSCH init status incorrect Return -EINVAL when MMSCH init fail which can be handle by function amdgpu_device_reset_sriov correctly. Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 10:58:48 -04:00
Lang Yu	9a37f65c4e	drm/amdgpu/vpe: fix insert_nop ops Avoid infinite loop when count is 0. This is missed in rebase. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 10:58:33 -04:00
Srinivasan Shanmugam	54967d5683	drm/amdgpu: Address member 'gart_placement' not described in 'amdgpu_gmc_gart_location' Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c:274: warning: Function parameter or member 'gart_placement' not described in 'amdgpu_gmc_gart_location' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 10:58:22 -04:00
Lang Yu	84aa39ab1e	drm/amdgpu/vpe: align with mcbp changes MCBP is decided by adev->gfx.mcbp now. This is missed in rebase. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 10:58:13 -04:00
Lang Yu	99ea82f424	drm/amdgpu/vpe: remove IB end boundary requirement Remove IB end boundary requirement, VPE has no such limitions, use existing amdgpu_ring_generic_pad_ib() instead. This is missed in rebase. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 10:58:01 -04:00
Jay Cornwall	757920585d	drm/amdgpu: Improve MES responsiveness during oversubscription When MES is oversubscribed it may not frequently check for new command submissions from driver if the scheduling load is high. Response latency as high as 5 seconds has been observed. Enable a flag which adds a check for new commands between scheduling quantums. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Cc: Alexandru Tudor <alexandru.tudor@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-13 10:57:46 -04:00
Thomas Zimmermann	57390019b6	Merge drm/drm-next into drm-misc-next Updating drm-misc-next to the state of Linux v6.6-rc2. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2023-10-11 09:50:59 +02:00
Icenowy Zheng	3806a8c647	drm/amdgpu: fix SI failure due to doorbells allocation SI hardware does not have doorbells at all, however currently the code will try to do the allocation and thus fail, makes SI AMDGPU not usable. Fix this failure by skipping doorbells allocation when doorbells count is zero. Fixes: `54c30d2a8d` ("drm/amdgpu: create kernel doorbell pages") Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 17:59:29 -04:00
Christian König	ff89f064dc	drm/amdgpu: add missing NULL check bo->tbo.resource can easily be NULL here. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2902 Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2023-10-09 17:59:29 -04:00
Icenowy Zheng	219223eca4	drm/amdgpu: fix SI failure due to doorbells allocation SI hardware does not have doorbells at all, however currently the code will try to do the allocation and thus fail, makes SI AMDGPU not usable. Fix this failure by skipping doorbells allocation when doorbells count is zero. Fixes: `54c30d2a8d` ("drm/amdgpu: create kernel doorbell pages") Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 17:02:52 -04:00
Aaron Liu	ce862c4995	drm/amdgpu/discovery: enable DCN 3.5.0 support Enable DCN 3.5.0 support. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 17:02:46 -04:00
Arvind Yadav	367a0af433	drm/amdkfd: get doorbell's absolute offset based on the db_size Here, Adding db_size in byte to find the doorbell's absolute offset for both 32-bit and 64-bit doorbell sizes. So that doorbell offset will be aligned based on the doorbell size. v2: - Addressed the review comment from Felix. v3: - Adding doorbell_size as parameter to get db absolute offset. v4: Squash the two patches into one. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 17:02:34 -04:00
Christian König	31220ee9dc	drm/amdgpu: add missing NULL check bo->tbo.resource can easily be NULL here. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2902 Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org	2023-10-09 17:01:32 -04:00
Yifan Zhang	061863e5db	drm/amdgpu: add hub->ctx_distance in setup_vmid_config add hub->ctx_distance when read CONTEXT1_CNTL, align w/ write back operation. v2: fix coding style errors reported by checkpatch.pl (Christian) Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 16:59:06 -04:00
Stanley.Yang	80285ae1ec	drm/amdgpu: Fix potential null pointer derefernce The amdgpu_ras_get_context may return NULL if device not support ras feature, so add check before using. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 16:52:46 -04:00
Lijo Lazar	b3e73b5a8f	Documentation/amdgpu: Add FRU attribute details Add documentation for the newly added manufacturer and fru_id attributes in sysfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 16:52:25 -04:00
Lijo Lazar	ac6b1f275f	drm/amdgpu: Add more FRU field information Add support to read Manufacturer Name and FRU File Id fields. Also add sysfs device attributes for external usage. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 16:52:17 -04:00
Lijo Lazar	8a2b51392a	drm/amdgpu: Refactor FRU product information Keep FRU related information together in a separate structure. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 16:52:08 -04:00
Yang Wang	be2e8aca06	drm/amdgpu: enable FRU device for SMU v13.0.6 v1: enable GFX v9.4.3 FRU device to query board information. v2: use MP1 version to identify different asic Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 16:51:58 -04:00
Boyuan Zhang	6cb8e3ee3a	drm/amdgpu: update ib start and size alignment Update IB starting address alignment and size alignment with correct values for decode and encode IPs. Decode IB starting address alignment: 256 bytes Decode IB size alignment: 64 bytes Encode IB starting address alignment: 256 bytes Encode IB size alignment: 4 bytes Also bump amdgpu driver version for this update. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-09 16:51:39 -04:00
Kees Cook	c8e7df374b	drm/amdgpu: Annotate struct amdgpu_bo_list with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct amdgpu_bo_list. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-hardening@vger.kernel.org Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:59:35 -04:00
Srinivasan Shanmugam	e0a3e7bf62	drm/amdgpu: Drop unnecessary return statements There is no reason to call return at the end of function that returns void. Fixes the below: WARNING: void function return statements are not generally useful Thus remove such a statement in the affected functions. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:59:35 -04:00
Lijo Lazar	4798db85b7	Documentation/amdgpu: Add board info details Add documentation for board info sysfs attribute. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:59:35 -04:00
Lijo Lazar	76da73f026	drm/amdgpu: Add sysfs attribute to get board info Add a sysfs attribute which shows the board form factor like OAM or CEM. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:59:35 -04:00
Lijo Lazar	b0a4553336	drm/amdgpu: Get package types for smuio v13.0 Add support to query package types supported in smuio v13.0 ASICs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:59:35 -04:00
Lijo Lazar	4365d2ed09	drm/amdgpu: Add more smuio v13.0.3 package types Expand support to get other board types like OAM or CEM. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:59:35 -04:00
Sathishkumar S	cbad0dd13a	drm/amdgpu: fix ip count query for xcp partitions fix wrong ip count INFO on spatial partitions. update the query to return the instance count corresponding to the partition id. v2: initialize variables only when required to be (Christian) move variable declarations to the beginning of function (Christian) Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:59:35 -04:00
Lijo Lazar	28a3f49609	drm/amdgpu: Move package type enum to amdgpu_smuio Move definition of package type to amdgpu_smuio header and add new package types for CEM and OAM. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:59:35 -04:00
Srinivasan Shanmugam	2b6b29f33f	drm/amdgpu: Fix complex macros error Fixes the below: ERROR: Macros with complex values should be enclosed in parentheses WARNING: macros should not use a trailing semicolon +#define amdgpu_inc_vram_lost(adev) atomic_inc(&((adev)->vram_lost_counter)); Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:59:35 -04:00
Alex Deucher	7d3f1d76f3	drm/amdgpu: refine fault cache updates Don't update the fault cache if status is 0. In the multiple fault case, subsequent faults will return a 0 status which is useless for userspace and replaces the useful fault status, so only update if status is non-0. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:49:51 -04:00
Alex Deucher	7a41ed8b59	drm/amdgpu: add new INFO ioctl query for the last GPU page fault Add a interface to query the last GPU page fault for the process. Useful for debugging context lost errors. v2: split vmhub representation between kernel and userspace v3: add locking when fetching fault info in INFO IOCTL Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238 libdrm MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238 Cc: samuel.pitoiset@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-05 17:49:39 -04:00
Kees Cook	ac8e62ab25	drm/amdgpu/discovery: Annotate struct ip_hw_instance with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ip_hw_instance. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230922173216.3823169-2-keescook@chromium.org	2023-10-05 11:29:09 +02:00
Mario Limonciello	134b8c5d86	drm/amd: Fix detection of _PR3 on the PCIe root port On some systems with Navi3x dGPU will attempt to use BACO for runtime PM but fails to resume properly. This is because on these systems the root port goes into D3cold which is incompatible with BACO. This happens because in this case dGPU is connected to a bridge between root port which causes BOCO detection logic to fail. Fix the intent of the logic by looking at root port, not the immediate upstream bridge for _PR3. Cc: stable@vger.kernel.org Suggested-by: Jun Ma <Jun.Ma2@amd.com> Tested-by: David Perry <David.Perry@amd.com> Fixes: `b10c1c5b3a` ("drm/amdgpu: add check for ACPI power resources") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 22:52:05 -04:00
Luben Tuikov	5d061675b7	drm/amdgpu: Fix a memory leak Fix a memory leak in amdgpu_fru_get_product_info(). Cc: Alex Deucher <Alexander.Deucher@amd.com> Reported-by: Yang Wang <kevinyang.wang@amd.com> Fixes: `0dbf2c5626` ("drm/amdgpu: Interpret IPMI data for product information (v2)") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 22:43:26 -04:00
Philip Yang	fdac890966	drm/amdgpu: ratelimited override pte flags messages Use ratelimited version of dev_dbg to avoid flooding dmesg log. No functional change. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 18:40:11 -04:00
Mario Limonciello	b8e6aec146	drm/amd: Drop all hand-built MIN and MAX macros in the amdgpu base driver Several files declare MIN() or MAX() macros that ignore the types of the values being compared. Drop these macros and switch to min() min_t(), and max() from `linux/minmax.h`. Suggested-by: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 18:39:52 -04:00
Alex Deucher	8dbf1ba867	drm/amdgpu: cache gpuvm fault information for gmc7+ Cache the current fault info in the vm struct. This can be queried by userspace later to help debug UMDs. Cc: samuel.pitoiset@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 18:37:07 -04:00
Alex Deucher	2e8ef6a561	drm/amdgpu: add cached GPU fault structure to vm struct When we get a GPU page fault, cache the fault for later analysis. Cc: samuel.pitoiset@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 18:36:57 -04:00
Rajneesh Bhardwaj	f4bff6e0b9	drm/amdgpu: Use ttm_pages_limit to override vram reporting On GFXIP9.4.3 APU, allow the memory reporting as per the ttm pages limit in NPS1 mode. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 18:36:42 -04:00
Rajneesh Bhardwaj	9b37d45d79	drm/amdgpu: Rework KFD memory max limits To allow bigger allocations specially on systems such as GFXIP 9.4.3 that use GTT memory for VRAM allocations, relax the limits to maximize ROCm allocations. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 18:36:17 -04:00
Alex Deucher	67318cb843	drm/amdgpu/gmc11: set gart placement GC11 Needed to avoid a hardware issue. v2: force high for all GC11 parts for consistency (Alex) v3: rebase Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 18:36:12 -04:00
Alex Deucher	917f91d8d8	drm/amdgpu/gmc: add a way to force a particular placement for GART We normally place GART based on the location of VRAM and the available address space around that, but provide an option to force a particular location for hardware that needs it. v2: Switch to passing the placement via parameter Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-04 18:36:07 -04:00
Lang Yu	a19d934986	drm/amdgpu: correct gpu clock counter query on cyan skilfish Cayn skilfish uses SMUIO v11.0.8 offset. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: <stable@vger.kernel.org> # v5.15+	2023-10-04 18:35:28 -04:00
Mario Limonciello	c4c8955b8a	drm/amd: Fix detection of _PR3 on the PCIe root port On some systems with Navi3x dGPU will attempt to use BACO for runtime PM but fails to resume properly. This is because on these systems the root port goes into D3cold which is incompatible with BACO. This happens because in this case dGPU is connected to a bridge between root port which causes BOCO detection logic to fail. Fix the intent of the logic by looking at root port, not the immediate upstream bridge for _PR3. Cc: stable@vger.kernel.org Suggested-by: Jun Ma <Jun.Ma2@amd.com> Tested-by: David Perry <David.Perry@amd.com> Fixes: `b10c1c5b3a` ("drm/amdgpu: add check for ACPI power resources") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-03 15:43:11 -04:00
Alex Hung	f6893fcb10	drm/amd/display: Add writeback enable field (wb_enabled) [WHAT] Add a new field to keep track whether a crtc is previously writeback-enabled. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-03 15:42:38 -04:00
Alex Hung	cd1a4bc228	drm/amd/display: Hande writeback request from userspace [WHAT] Handle writeback requests and fill in the required information for DWB programming and setup. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-03 15:42:06 -04:00
Alex Deucher	0021d70a06	drm/amdkfd: drop struct kfd_cu_info I think this was an abstraction back from when kfd supported both radeon and amdgpu. Since we just support amdgpu now, there is no more need for this and we can use the amdgpu structures directly. This also avoids having the kfd_cu_info structures on the stack when inlining which can blow up the stack. Cc: Arnd Bergmann <arnd@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-03 15:41:13 -04:00
Dave Airlie	79fb229b88	Merge tag 'drm-misc-next-2023-09-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.7-rc1: UAPI Changes: - drm_file owner is now updated during use, in the case of a drm fd opened by the display server for a client, the correct owner is displayed. - Qaic gains support for the QAIC_DETACH_SLICE_BO ioctl to allow bo recycling. Cross-subsystem Changes: - Disable boot logo for au1200fb, mmpfb and unexport logo helpers. Only fbcon should manage display of logo. - Update freescale in MAINTAINERS. - Add some bridge files to bridge in MAINTAINERS. - Update gma500 driver repo in MAINTAINERS to point to drm-misc. Core Changes: - Move size computations to drm buddy allocator. - Make drm_atomic_helper_shutdown(NULL) a nop. - Assorted small fixes in drm_debugfs, DP-MST payload addition error handling. - Fix DRM_BRIDGE_ATTACH_NO_CONNECTOR handling. - Handle bad (h/v)sync_end in EDID by clipping to htotal. - Build GPUVM as a module. Driver Changes: - Simple drivers don't need to cache prepared result. - Call drm_atomic_helper_shutdown() in shutdown/unbind for a whole lot more drm drivers. - Assorted small fixes in amdgpu, ssd130x, bridge/it6621, accel/qaic, nouveau, tc358768. - Add NV12 for komeda writeback. - Add arbitration lost event to synopsis/dw-hdmi-cec. - Speed up s/r in nouveau by not restoring some big bo's. - Assorted nouveau display rework in preparation for GSP-RM, especially related to how the modeset sequence works and the DP sequence in relation to link training. - Update anx7816 panel. - Support NVSYNC and NHSYNC in tegra. - Allow multiple power domains in simple driver. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/f1fae5eb-25b8-192a-9a53-215e1184ce81@linux.intel.com	2023-09-29 08:27:15 +10:00
Sui Jingfeng	18bf400530	drm/amdgpu: Use pci_get_base_class() to reduce duplicated code Use pci_get_base_class() to reduce duplicated code. No functional change intended. Link: https://lore.kernel.org/r/20230825062714.6325-5-sui.jingfeng@linux.dev Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 16:54:54 -05:00
Tao Zhou	fc59889071	drm/amdgpu: update retry times for psp vmbx wait Increase the retry loops and replace the constant number with macro. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:44:36 -04:00
Tao Zhou	1934907234	drm/amdgpu: exit directly if gpu reset fails No need to perform the full reset operation in case of gpu reset failure. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:44:36 -04:00
Mario Limonciello	93499bd6cd	drm/amd: Move microcode init from sw_init to early_init for CIK SDMA As part of IP discovery early_init is run for all HW IP blocks. During this phase all firmware is supposed to be identified that may be missing so that the driver can avoid releasing resources used by the EFI framebuffer or simpledrm until the last possible moment. Move microcode loading from sw_init to early_init. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:38:17 -04:00
Mario Limonciello	751e293f2c	drm/amd: Move microcode init from sw_init to early_init for SDMA v2.4 As part of IP discovery early_init is run for all HW IP blocks. During this phase all firmware is supposed to be identified that may be missing so that the driver can avoid releasing resources used by the EFI framebuffer or simpledrm until the last possible moment. Move microcode loading from sw_init to early_init. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:38:11 -04:00
Mario Limonciello	cc76630483	drm/amd: Move microcode init from sw_init to early_init for SDMA v3.0 As part of IP discovery early_init is run for all HW IP blocks. During this phase all firmware is supposed to be identified that may be missing so that the driver can avoid releasing resources used by the EFI framebuffer or simpledrm until the last possible moment. Move microcode loading from sw_init to early_init. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:38:05 -04:00
Mario Limonciello	e0d4fbb58c	drm/amd: Move microcode init from sw_init to early_init for SDMA v5.2 As part of IP discovery early_init is run for all HW IP blocks. During this phase all firmware is supposed to be identified that may be missing so that the driver can avoid releasing resources used by the EFI framebuffer or simpledrm until the last possible moment. Move microcode loading from sw_init to early_init. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:37:50 -04:00
Mario Limonciello	95b456d3b0	drm/amd: Move microcode init from sw_init to early_init for SDMA v6.0 As part of IP discovery early_init is run for all HW IP blocks. During this phase all firmware is supposed to be identified that may be missing so that the driver can avoid releasing resources used by the EFI framebuffer or simpledrm until the last possible moment. Move microcode loading from sw_init to early_init. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:37:43 -04:00
Mario Limonciello	ed1c1053cd	drm/amd: Move microcode init from sw_init to early_init for SDMA v5.0 As part of IP discovery early_init is run for all HW IP blocks. During this phase all firmware is supposed to be identified that may be missing so that the driver can avoid releasing resources used by the EFI framebuffer or simpledrm until the last possible moment. Move microcode loading from sw_init to early_init. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:37:34 -04:00
Mario Limonciello	161d076c2d	drm/amd: Drop error message about failing to load SDMA firmware The error path for SDMA firmware loading is unnecessarily noisy. When a firmware is missing 3 errors show up: ``` amdgpu 0000:07:00.0: Direct firmware load for amdgpu/green_sardine_sdma.bin failed with error -2 [drm:sdma_v4_0_early_init [amdgpu]] ERROR Failed to load sdma firmware! [drm:amdgpu_device_init [amdgpu]] ERROR early_init of IP block <sdma_v4_0> failed -19 ``` The error code for the device init is bubbled up already, remove the second one. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:36:41 -04:00
Le Ma	3152d01e88	drm/amd/pm: deprecate allow_xgmi_power_down interface Replace with set_plpd_mode uniformly for places to use. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:36:23 -04:00
Mario Limonciello	3657a1d5ac	drm/amd: Limit seamless boot by default to APUs A hang is reported on DCN 3.2 with seamless boot enabled. As the benefits come from an eDP setup, limit it to only enabled by default with APUs. Suggested-by: Alexander.Deucher@amd.com Reported-by: feifei.xu@amd.com Closes: https://lore.kernel.org/amd-gfx/85b427f6-11ec-4249-bf6f-eadf9c375f88@amd.com/T/#m2887e919d7c01b2a4860d2261b366d22e070f309 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-28 15:35:31 -04:00
Alex Deucher	b2e1cbe628	drm/amdgpu/gmc11: disable AGP on GC 11.5 AGP aperture is deprecated and no longer functional. v2: fix typo (Alex) v3: just skip the agp setup call v4: revert back to the original model v5: back to v3 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 17:00:23 -04:00
David (Ming Qiang) Wu	fa1f1cc09d	drm/amdgpu: not to save bo in the case of RAS err_event_athub err_event_athub will corrupt VCPU buffer and not good to be restored in amdgpu_vcn_resume() and in this case the VCPU buffer needs to be cleared for VCN firmware to work properly. Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 17:00:23 -04:00
Luben Tuikov	9ed630c5c4	drm/amdgpu: Fix a memory leak Fix a memory leak in amdgpu_fru_get_product_info(). Cc: Alex Deucher <Alexander.Deucher@amd.com> Reported-by: Yang Wang <kevinyang.wang@amd.com> Fixes: `0dbf2c5626` ("drm/amdgpu: Interpret IPMI data for product information (v2)") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 17:00:23 -04:00
Alex Deucher	de59b69932	drm/amdgpu/gmc: set a default disable value for AGP To disable AGP, the start needs to be set to a higher value than the end. Set a default disable value for the AGP aperture and allow the IP specific GMC code to enable it selectively be calling amdgpu_gmc_agp_location(). Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 17:00:22 -04:00
Alex Deucher	29495d8145	drm/amdgpu/gmc6-8: properly disable the AGP aperture The BOT register needs to be larger than the TOP register for this to be properly disabled. The lower 22 bits of the BOT address are always 0 and the lower 22 bits of the TOP register are always 1 so you need to make the upper bits of BOT larger than the upper bits of BOT. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 17:00:22 -04:00
Mangesh Gadre	cd956e7531	drm/amdgpu:Expose physical id of device in XGMI hive This identifies the physical ordering of devices in the hive v2: fix compilation issue Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 17:00:22 -04:00
Lang Yu	7021b397c6	drm/amdgpu/vpe: fix truncation warnings Fix truncation warnings. Fixes: `9d4346bdbc` ("drm/amdgpu: add VPE 6.1.0 support") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202309200028.aUVuM8os-lkp@intel.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 17:00:21 -04:00
Philip Yang	101b810430	drm/amdkfd: Move dma unmapping after TLB flush Otherwise GPU may access the stale mapping and generate IOMMU IO_PAGE_FAULT. Move this to inside p->mutex to prevent multiple threads mapping and unmapping concurrently race condition. After kfd_mem_dmaunmap_attachment is removed from unmap_bo_from_gpuvm, kfd_mem_dmaunmap_attachment is called if failed to map to GPUs, and before free the mem attachment in case failed to unmap from GPUs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:10 -04:00
Christian König	08abccc9a7	drm/amdgpu: further move TLB hw workarounds a layer up For the PASID flushing we already handled that at a higher layer, apply those workarounds to the standard flush as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:09 -04:00
Christian König	e2e3788850	drm/amdgpu: rework lock handling for flush_tlb v2 Instead of each implementation doing this more or less correctly move taking the reset lock at a higher level. v2: fix typo Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:09 -04:00
Christian König	3983c9fd2d	drm/amdgpu: drop error return from flush_gpu_tlb_pasid That function never fails, drop the error return. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:09 -04:00
Christian König	041a574388	drm/amdgpu: fix and cleanup gmc_v11_0_flush_gpu_tlb_pasid The same PASID can be used by more than one VMID, reset each of them. Use the common KIQ handling. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:09 -04:00
Christian König	72cc99205c	drm/amdgpu: cleanup gmc_v10_0_flush_gpu_tlb_pasid The same PASID can be used by more than one VMID, reset each of them. Use the common KIQ handling. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:09 -04:00
Christian König	e7b90e99fa	drm/amdgpu: fix and cleanup gmc_v9_0_flush_gpu_tlb_pasid Testing for reset is pointless since the reset can start right after the test. The same PASID can be used by more than one VMID, invalidate each of them. Move the KIQ and all the workaround handling into common GMC code. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:09 -04:00
Christian König	0c525aa406	drm/amdgpu: fix and cleanup gmc_v8_0_flush_gpu_tlb_pasid Testing for reset is pointless since the reset can start right after the test. Grab the reset semaphore instead. The same PASID can be used by more than once VMID, build a mask of VMIDs to invalidate instead of just restting the first one. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:09 -04:00
Christian König	fb4c52db69	drm/amdgpu: fix and cleanup gmc_v7_0_flush_gpu_tlb_pasid Testing for reset is pointless since the reset can start right after the test. Grab the reset semaphore instead. The same PASID can be used by more than once VMID, build a mask of VMIDs to invalidate instead of just restting the first one. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:09 -04:00
Christian König	a54db42ff3	drm/amdgpu: cleanup gmc_v11_0_flush_gpu_tlb Remove leftovers from copying this from the gmc v10 code. v2: squash in fix from Yifan Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:55:09 -04:00
Christian König	a70cb2176f	drm/amdgpu: rework gmc_v10_0_flush_gpu_tlb v2 Move the SDMA workaround necessary for Navi 1x into a higher layer. v2: use dev_err Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:54:52 -04:00
Tao Zhou	8c14a67bdf	drm/amdgpu: change if condition for bad channel bitmap update The amdgpu_ras_eeprom_control.bad_channel_bitmap is u32 type, but the channel index could be larger than 32. For the ASICs whose channel number is more than 32, the amdgpu_dpm_send_hbm_bad_channel_flag interface is not supported, so we simply bypass channel bitmap update under this condition. v2: replace sizeof with BITS_PER_TYPE, we should check bit number instead of byte number. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:54:52 -04:00
Tao Zhou	6205b558e1	drm/amdgpu: fix value of some UMC parameters for UMC v12 Prepare for bad page retirement. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:54:52 -04:00
Philip Yang	e61801f162	drm/amdkfd: Don't use sw fault filter if retry cam enabled If retry cam enabled, we don't use sw retry fault filter and add fault into sw filter ring, so we shouldn't remove fault from sw filter. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:54:51 -04:00
Christian König	24a6eb92b7	drm/amdgpu: fix and cleanup gmc_v9_0_flush_gpu_tlb The KIQ code path was ignoring the second flush. Also avoid long lines and re-calculating the register offsets over and over again. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:54:51 -04:00
Philip Yang	bcfb9cee61	drm/amdgpu: Increase IH soft ring size for GFX v9.4.3 dGPU On GFX v9.4.3 dGPU, applications have random timeout failure when XNACK on, dmesg log has "amdgpu: IH soft ring buffer overflow 0x900, 0x900", because dGPU mode has 272 cam entries. After increasing IH soft ring to 512 entries, no more IH soft ring overflow message and application passed. Fixes: `bf80d34b6c` ("drm/amdgpu: Increase soft IH ring size") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:54:51 -04:00
Lijo Lazar	c45e38f217	drm/amdgpu: Restore partition mode after reset On a full device reset, PSP FW gets unloaded. Hence restore the partition mode by placing a new request. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-26 16:54:51 -04:00
Cong Liu	f387bb578d	drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enable This patch fixes a memory leak in the amdgpu_ras_feature_enable() function. The leak occurs when the function sends a command to the firmware to enable or disable a RAS feature for a GFX block. If the command fails, the kfree() function is not called to free the info memory. Fixes: `9f051d6ff1` ("drm/amdgpu: Free ras cmd input buffer properly") Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Cong Liu <liucong2@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 17:27:04 -04:00
Lijo Lazar	06cce38ef5	Revert "drm/amdgpu: Report vbios version instead of PN" This reverts commit `7748ce5b69`. vbios_version sysfs node is used to identify Part Number also. Revert to the same so that it doesn't break scripts/software which parse this. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 17:26:15 -04:00
Lijo Lazar	ff96ddc3f2	drm/amdgpu: Add more fields to IP version Include subrevision and variant fileds also to IP version. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:25:17 -04:00
Tao Zhou	f8754f58d6	drm/amdgpu: print channel index for UMC bad page Print channel index for UMC v12. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:25:17 -04:00
Sathishkumar S	5aba51233b	drm/amdgpu: update IP count INFO query update the query to return the number of functional instances where there is more than an instance of the requested type and for others continue to return one. v2: count must reflect the actual number of engines (Alex) v3: fix wrong number of engines for vcn (Alex) Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:09 -04:00
Stanley.Yang	a83f2bf1f4	drm/amdgpu: Fix false positive error log It should first check block ras obj whether be set, it should return 0 directly if block ras obj or hw_ops is not set. If block doesn't support RAS just return 0 is fine. Changed from V1: return 0 directly if block ras obj or hw ops is not set Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:09 -04:00
Vignesh Chander	8c95cda3e1	drm/amdgpu/jpeg: skip set pg for sriov Host handles PG. Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:09 -04:00
André Almeida	a769178585	drm/amdgpu: Rework coredump to use memory dynamically Instead of storing coredump information inside amdgpu_device struct, move if to a proper separated struct and allocate it dynamically. This will make it easier to further expand the logged information. Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:07 -04:00
Cong Liu	5838f74c29	drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enable This patch fixes a memory leak in the amdgpu_ras_feature_enable() function. The leak occurs when the function sends a command to the firmware to enable or disable a RAS feature for a GFX block. If the command fails, the kfree() function is not called to free the info memory. Fixes: `9f051d6ff1` ("drm/amdgpu: Free ras cmd input buffer properly") Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Cong Liu <liucong2@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:07 -04:00
Lijo Lazar	24f60ddc4b	drm/amdgpu: Fix vbios version string search Search for vbios version string in STRING_OFFSET-ATOM_ROM_HEADER region first. If those offsets are not populated, use the hardcoded region. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:06 -04:00
Lijo Lazar	2af351d692	Revert "drm/amdgpu: Report vbios version instead of PN" This reverts commit `7748ce5b69`. vbios_version sysfs node is used to identify Part Number also. Revert to the same so that it doesn't break scripts/software which parse this. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:06 -04:00
David Francis	5f248462c6	drm/amdgpu: Add EXT_COHERENT memory allocation flags These flags (for GEM and SVM allocations) allocate memory that allows for system-scope atomic semantics. On GFX943 these flags cause caches to be avoided on non-local memory. On all other ASICs they are identical in functionality to the equivalent COHERENT flags. Corresponding Thunk patch is at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 Reviewed-by: David Yat Sin <David.YatSin@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 16:24:06 -04:00
Yang Wang	4051844c66	drm/amdgpu: add amdgpu mca debug sysfs support add amdgpu mca debug sysfs support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:25:19 -04:00
Alex Deucher	d11bbacee3	drm/amdgpu: add VPE IP discovery info to HW IP info query Add missing IP discovery info. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:25:11 -04:00
Yang Wang	7ff607e272	drm/amdgpu: add amdgpu smu mca dump feature support add amdgpu smu mca dump feature support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:25:01 -04:00
Mario Limonciello	7f4ce7b50a	drm/amd: Enable seamless boot by default on newer ASICs Seamless boot can technically be supported as far back as DCN1 but to avoid regressions on older hardware, enable it for DCN3 and later. If users report using the module parameter that it works on older ASICs as well, this can be adjusted. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:24:46 -04:00
Mario Limonciello	5dc270d366	drm/amd: Add a module parameter for seamless boot The module parameter can be used to test more easily enabling seamless boot support on additional ASICs. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:24:39 -04:00
Timmy Tsai	2fa73a101c	drm/amd: Add HDP flush during jpeg init During jpeg init, CPU writes to frame buffer which can be cached by HDP, occasionally causing invalid header to be sent to MMSCH. Perform HDP flush after writing to frame buffer before continuing with jpeg init sequence. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timmy Tsai <timmtsai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:24:28 -04:00
Mario Limonciello	bb0f84293e	drm/amd: Move seamless boot check out of display This will allow base driver to dictate whether seamless should be enabled. No intended functional changes. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:24:21 -04:00
Mario Limonciello	3ef07651a5	drm/amd: Drop special case for yellow carp without discovery `amdgpu_gmc_get_vbios_allocations` has a special case for how to bring up yellow carp when amdgpu discovery is turned off. As this ASIC ships with discovery turned on, it's generally dead code and worse it causes `adev->mman.keep_stolen_vga_memory` to not be initialized for yellow carp. Remove it. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:24:04 -04:00
Lijo Lazar	4e8303cf2c	drm/amdgpu: Use function for IP version check Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-20 12:23:28 -04:00
Tvrtko Ursulin	1c7a387ffe	drm: Update file owner during use With the typical model where the display server opens the file descriptor and then hands it over to the client(), we were showing stale data in debugfs. Fix it by updating the drm_file->pid on ioctl access from a different process. The field is also made RCU protected to allow for lockless readers. Update side is protected with dev->filelist_mutex. Before: $ cat /sys/kernel/debug/dri/0/clients command pid dev master a uid magic Xorg 2344 0 y y 0 0 Xorg 2344 0 n y 0 2 Xorg 2344 0 n y 0 3 Xorg 2344 0 n y 0 4 After: $ cat /sys/kernel/debug/dri/0/clients command tgid dev master a uid magic Xorg 830 0 y y 0 0 xfce4-session 880 0 n y 0 1 xfwm4 943 0 n y 0 2 neverball 1095 0 n y 0 3 ) More detailed and historically accurate description of various handover implementation kindly provided by Emil Velikov: """ The traditional model, the server was the orchestrator managing the primary device node. From the fd, to the master status and authentication. But looking at the fd alone, this has varied across the years. IIRC in the DRI1 days, Xorg (libdrm really) would have a list of open fd(s) and reuse those whenever needed, DRI2 the client was responsible for open() themselves and with DRI3 the fd was passed to the client. Around the inception of DRI3 and systemd-logind, the latter became another possible orchestrator. Whereby Xorg and Wayland compositors could ask it for the fd. For various reasons (hysterical and genuine ones) Xorg has a fallback path going the open(), whereas Wayland compositors are moving to solely relying on logind... some never had fallback even. Over the past few years, more projects have emerged which provide functionality similar (be that on API level, Dbus, or otherwise) to systemd-logind. """ v2: * Fixed typo in commit text and added a fine historical explanation from Emil. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230621094824.2348732-1-tvrtko.ursulin@linux.intel.com Signed-off-by: Christian König <christian.koenig@amd.com>	2023-09-20 15:27:44 +02:00
Dave Airlie	1216d49178	amd-drm-fixes-6.6-2023-09-13: amdgpu: - GC 9.4.3 fixes - Fix white screen issues with S/G display on system with >= 64G of ram - Replay fixes - SMU 13.0.6 fixes - AUX backlight fix - NBIO 4.3 SR-IOV fixes for HDP - RAS fixes - DP MST resume fix - Fix segfault on systems with no vbios - DPIA fixes amdkfd: - CWSR grace period fix - Unaligned doorbell fix - CRIU fix for GFX11 - Add missing TLB flush on gfx10 and newer -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZQIRSAAKCRC93/aFa7yZ 2O/nAP4zB0fdLB46Hhz11aYsE9Zghe91b2rcmF4EYpEAQs7awwEAhSjy0Wiy6EYb prEGCdW0O8Tq7fdjr7+JrPmF7dasAQk= =SUbg -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.6-2023-09-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.6-2023-09-13: amdgpu: - GC 9.4.3 fixes - Fix white screen issues with S/G display on system with >= 64G of ram - Replay fixes - SMU 13.0.6 fixes - AUX backlight fix - NBIO 4.3 SR-IOV fixes for HDP - RAS fixes - DP MST resume fix - Fix segfault on systems with no vbios - DPIA fixes amdkfd: - CWSR grace period fix - Unaligned doorbell fix - CRIU fix for GFX11 - Add missing TLB flush on gfx10 and newer Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230913195009.7714-1-alexander.deucher@amd.com	2023-09-15 09:50:50 +10:00
Daniel Vetter	15794f9dc3	One doc fix for drm/connector, one fix for amdgpu for an crash when VRAM usage is high, and one fix in gm12u320 to fix the timeout units in the code -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCZPl/TAAKCRDj7w1vZxhR xWZZAP0b3k5vIuQdbiZBdXy7+guakiJ2DqOMxJJ+sYS5Mun53AEA73Cu1gmBNMoT d8H1uBjOfvPcXANNI0t0OgJfrESOdg8= =atPC -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2023-09-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes One doc fix for drm/connector, one fix for amdgpu for an crash when VRAM usage is high, and one fix in gm12u320 to fix the timeout units in the code Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/w5nlld5ukeh6bgtljsxmkex3e7s7f4qquuqkv5lv4cv3uxzwqr@pgokpejfsyef	2023-09-14 14:00:51 +02:00
Alex Deucher	addd7aef25	drm/amdgpu: add remap_hdp_registers callback for nbio 7.11 Implement support for remapping the HDP aperture registers for NBIO 7.11. Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-12 17:30:22 -04:00
Alex Deucher	b85a17d354	drm/amdgpu: add vcn_doorbell_range callback for nbio 7.11 Implement support for setting up the VCN doorbell range for NBIO 7.11. Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-12 17:30:18 -04:00
Thomas Zimmermann	c900529f3d	Merge drm/drm-fixes into drm-misc-fixes Forwarding to v6.6-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2023-09-12 08:53:30 +02:00
David Francis	5e7e822542	drm/amdgpu: Handle null atom context in VBIOS info ioctl On some APU systems, there is no atom context and so the atom_context struct is null. Add a check to the VBIOS_INFO branch of amdgpu_info_ioctl to handle this case, returning all zeroes. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:25:26 -04:00
Hawking Zhang	ffd6bde302	drm/amdgpu: fallback to old RAS error message for aqua_vanjaram So driver doesn't generate incorrect message until the new format is settled down for aqua_vanjaram Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:20:07 -04:00
Alex Deucher	ab43213e7a	drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOV Needed for HDP flush to work correctly. Reviewed-by: Timmy Tsai <timmtsai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:19:48 -04:00
Alex Deucher	1832403cd4	drm/amdgpu/soc21: don't remap HDP registers for SR-IOV This matches the behavior for soc15 and nv. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Timmy Tsai <timmtsai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:19:42 -04:00
Hamza Mahfooz	169ed4ece8	Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory" This reverts commit `70e64c4d52`. Since, we now have an actual fix for this issue, we can get rid of this workaround as it can cause pin failures if enough VRAM isn't carved out by the BIOS. Cc: stable@vger.kernel.org # 6.1+ Acked-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:18:17 -04:00
Mukul Joshi	97e3c6a853	drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3 Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:16:31 -04:00
Mukul Joshi	81faf9e0c3	drm/amdkfd: Fix reg offset for setting CWSR grace period This patch fixes the case where the code currently passes absolute register address and not the reg offset, which HWS expects, when sending the PM4 packet to set/update CWSR grace period. Additionally, cleanup the signature of build_grace_period_packet_info function as it no longer needs the inst parameter. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 18:15:43 -04:00
David Francis	86f2ec2265	drm/amdgpu: Handle null atom context in VBIOS info ioctl On some APU systems, there is no atom context and so the atom_context struct is null. Add a check to the VBIOS_INFO branch of amdgpu_info_ioctl to handle this case, returning all zeroes. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:22:32 -04:00
André Almeida	ffde72107b	drm/amdgpu: Create an option to disable soft recovery Create a module option to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:22:23 -04:00
André Almeida	887db1e49a	drm/amdgpu: Merge debug module parameters Merge all developer debug options available as separated module parameters in one, making it obvious that are for developers. Drop the obsolete module options in favor of the new ones. Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:19:54 -04:00
Yifan Zhang	0e64c9aad0	drm/amdgpu: add type conversion for gc info gc info usage misses type conversion. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Li Ma <li.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:17:30 -04:00
Mukul Joshi	68fa72a437	drm/amdgpu: Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES to conform with the naming convention followed in amdgpu_gfx.h. No functional change. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:15:41 -04:00
Tao Zhou	ced575203a	drm/amdgpu: print more address info of UMC bad page Print out row, column and bank value of UMC error address for UMC v12. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:15:15 -04:00
Hawking Zhang	9f9d4651f7	drm/amdgpu: fallback to old RAS error message for aqua_vanjaram So driver doesn't generate incorrect message until the new format is settled down for aqua_vanjaram Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:15:02 -04:00
Alex Deucher	6a82822b90	drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOV Needed for HDP flush to work correctly. Reviewed-by: Timmy Tsai <timmtsai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:14:59 -04:00
Alex Deucher	8a6e26e7ef	drm/amdgpu/soc21: don't remap HDP registers for SR-IOV This matches the behavior for soc15 and nv. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Timmy Tsai <timmtsai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:14:50 -04:00
Hamza Mahfooz	601c63ad8e	Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory" This reverts commit `70e64c4d52`. Since, we now have an actual fix for this issue, we can get rid of this workaround as it can cause pin failures if enough VRAM isn't carved out by the BIOS. Cc: stable@vger.kernel.org # 6.1+ Acked-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:12:20 -04:00
Tao Zhou	3cb9ebc9d6	drm/amdgpu: add channel index table for UMC v12 Get UMC phyical channel index according to node id, umc instance and channel instance. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:10:58 -04:00
Tao Zhou	40a08fe890	drm/amdgpu: add address conversion for UMC v12 Convert MCA error address to physical address and find out all pages in one physical row. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:10:35 -04:00
Lijo Lazar	ca7aa3bf31	drm/amdgpu: Use default reset method handler When reset method is not passed in reset context, look for the handler for default reset method. On Aldebaran, default reset method for SOCs connected to CPU over XGMI is MODE2. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Tested-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:10:25 -04:00
Mukul Joshi	f705a6f021	drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3 Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:10:19 -04:00
Ma Jun	a1ce3e1f7c	drm/amd: Fix the flag setting code for interrupt request [1] Remove the irq flags setting code since pci_alloc_irq_vectors() handles these flags. [2] Free the msi vectors in case of error. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:10:09 -04:00
Lang Yu	dbb8052151	drm/amdgpu: fix unsigned error codes Fixes: `5d5eac7e83` ("drm/amdgpu: add selftest framework for UMSCH") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/all/ZPhddADtKmOuVyDq@lang-desktop Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:09:12 -04:00
Mukul Joshi	56d6daa3c7	drm/amdkfd: Fix reg offset for setting CWSR grace period This patch fixes the case where the code currently passes absolute register address and not the reg offset, which HWS expects, when sending the PM4 packet to set/update CWSR grace period. Additionally, cleanup the signature of build_grace_period_packet_info function as it no longer needs the inst parameter. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-11 17:07:33 -04:00
Arunpravin Paneer Selvam	2eb412aa25	drm/amdgpu: Move the size computations to drm buddy - Move roundup_power_of_two() and IS_ALIGNED() computations to drm buddy file to support the new try harder mechanism for contiguous allocation. - Move trim function call to drm_buddy_alloc_blocks() function. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230909160902.15644-2-Arunpravin.PaneerSelvam@amd.com Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Christian König <christian.koenig@amd.com>	2023-09-11 20:18:06 +02:00
Linus Torvalds	a48fa7efaf	drm fixes for 6.6-rc1 amdgpu: - Display replay fixes - Fixes for headless boards - Fix documentation breakage - RAS fixes - Handle newer IP discovery tables - SMU 13.0.6 fixes - SR-IOV fixes - Display vstartup fixes - NBIO 7.9 fixes - Display scaling mode fixes - Debugfs power reporting fix - GC 9.4.3 fixes - Dirty framebuffer fixes for fbcon - eDP fixes - DCN 3.1.5 fix - Display ODM fixes - GPU core dump fix - Re-enable zops property now that IGT test is fixed - Fix possible UAF in CS code - Cursor degamma fix amdkfd: - HMM fixes - Interrupt masking fix - GFX11 MQD fixes i915: - Mark requests for GuC virtual engines to avoid use-after-free nouveau: - Fix fence state in nouveau_fence_emit() ivpu: - replace strncpy -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmT6ihUACgkQDHTzWXnE hr6Vqg/+OGfbxx0qev5C93bLYpg8d4zbply0zTed2hU48zczARkOyDH+h2uYM4tP rmB6mK/0KRy3H8vkngKduR/IF6QBwzOnLDpS+C/TrHYQYqMDwvs3qEDVYqXh3V5H GPIFuu9sb2Nb/o9Fid70pvNACbgGsAUgqMUhQ/is3NjzOR8S/qjdBQ7wSdoUoOqx okTdlwuMK5SEYrihSyTZvhNcwpR/1L8JuxOUXXXUSQ0tRXBer/ZNF2lcEyYmQ0zs bZHKM4dNbdew/EhygbH6LVB5RjFaT5pGw08Xm8zJry+q5tXQV/NIXPQHL3vWqQoX i2QLbvGX/Uu8LJg9YNdsa1kPwNKADAxF64cW38Llv8ybsPHyva/I255j689/TvSG Se7HkTooURKS6GWFHPOkyMNC0Y+Fb/7WG5zUPSDVk9stqJz9pzx48okTsCWeuovD cBOssp8If1QsTyPvDq5A47l5z1oO3J1rdJ9fL0GnpOuXulPhgJGIoUSkftyI6lbw rhUoRd7w6VcgOsA9WIkAf+/325em0Y0AKZBgQnr2jfF56IE4iLa8yFuJNeReZ9oy W9yf14AB0orVm9+P4+PqATXBh+PdCTD1CPcB0MyK1SZAjh836Tc0HPKvJAUU6jp+ 8aw3BKXcaLjIP/dCyhSddXCnuuTPWreff5isktwgEXUtNFv9jx0= =fPeN -----END PGP SIGNATURE----- Merge tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Regular rounds of rc1 fixes, a large bunch for amdgpu since it's three weeks in one go, one i915, one nouveau and one ivpu. I think there might be a few more fixes in misc that I haven't pulled in yet, but we should get them all for rc2. amdgpu: - Display replay fixes - Fixes for headless boards - Fix documentation breakage - RAS fixes - Handle newer IP discovery tables - SMU 13.0.6 fixes - SR-IOV fixes - Display vstartup fixes - NBIO 7.9 fixes - Display scaling mode fixes - Debugfs power reporting fix - GC 9.4.3 fixes - Dirty framebuffer fixes for fbcon - eDP fixes - DCN 3.1.5 fix - Display ODM fixes - GPU core dump fix - Re-enable zops property now that IGT test is fixed - Fix possible UAF in CS code - Cursor degamma fix amdkfd: - HMM fixes - Interrupt masking fix - GFX11 MQD fixes i915: - Mark requests for GuC virtual engines to avoid use-after-free nouveau: - Fix fence state in nouveau_fence_emit() ivpu: - replace strncpy" * tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm: (51 commits) drm/amdgpu: Restrict bootloader wait to SMUv13.0.6 drm/amd/display: prevent potential division by zero errors drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1 Revert "drm/amd/display: Remove v_startup workaround for dcn3+" drm/amdgpu: fix amdgpu_cs_p1_user_fence Revert "Revert "drm/amd/display: Implement zpos property"" drm/amdkfd: Add missing gfx11 MQD manager callbacks drm/amdgpu: Free ras cmd input buffer properly drm/amdgpu: Hide xcp partition sysfs under SRIOV drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting drm/amdkfd: use mask to get v9 interrupt sq data bits correctly drm/amdgpu: Allocate coredump memory in a nonblocking way drm/amdgpu: Support query ecc cap for aqua_vanjaram drm/amdgpu: Add umc_info v4_0 structure drm/amd/display: always switch off ODM before committing more streams drm/amd/display: Remove wait while locked drm/amd/display: update blank state on ODM changes drm/amd/display: Add smu write msg id fail retry process drm/amdgpu: Add SMU v13.0.6 default reset methods ...	2023-09-07 19:47:04 -07:00
Lijo Lazar	fbe1a9e0c7	drm/amdgpu: Restrict bootloader wait to SMUv13.0.6 Restrict the wait for boot loader steady state only to SMUv13.0.6. For older SOCs, ASIC init has a longer wait period and that takes care. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 22:11:51 -04:00
Candice Li	7e6ec09974	drm/amdgpu: Add umc v12_0 ras functions Add umc v12_0 ras error querying. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:38:00 -04:00
Lijo Lazar	6b7d211740	drm/amdgpu: Fix refclk reporting for SMU v13.0.6 SMU v13.0.6 SOCs have 100MHz reference clock. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:37:03 -04:00
Hawking Zhang	c2c23a10f1	drm/amdgpu: Correct se_num and reg_inst for gfx v9_4_3 ras counters gfx_v9_4_3_ue\|ce_reg_list is an array per gfx core instance correct the settings of se_num and reg_inst for some of gfx ras counters so all the available register instances can be polled for ras status. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:36:57 -04:00
Lijo Lazar	1b8e56b994	drm/amdgpu: Restrict bootloader wait to SMUv13.0.6 Restrict the wait for boot loader steady state only to SMUv13.0.6. For older SOCs, ASIC init has a longer wait period and that takes care. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:36:52 -04:00
Lijo Lazar	b93fb0fe24	drm/amdgpu: Add only valid firmware version nodes Show only firmware version attributes that have valid version. Hide others. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:36:44 -04:00
Lang Yu	d519072d26	drm/amdgpu: fix incompatible types in conditional expression Use proper type. Fixes: `9d4346bdbc` ("drm/amdgpu: add VPE 6.1.0 support") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Solomon Chiu <solomon.chiu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202309020608.FwP8QMht-lkp@intel.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:35:29 -04:00
Srinivasan Shanmugam	eb3b214c37	drm/amdgpu: Use min_t to replace min Use min_t to replace min, min_t is a bit fast because min use twice typeof. And using min_t is cleaner here since the min/max macros do a typecheck while min_t()/max_t() to an explicit type cast. Fixes the below checkpatch warning: WARNING: min() should probably be min_t() Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:35:22 -04:00
Candice Li	d57e24aa56	drm/amdgpu: Update amdgpu_device_indirect_r/wreg_ext Only calculate pcie_index_hi for register address greater than 32bits. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:34:26 -04:00
Candice Li	a76b2870bd	drm/amdgpu: Add RREG64_PCIE_EXT/WREG64_PCIE_EXT functions Add 64bits register access support on register whose address is greater than 32bits. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:34:18 -04:00
Srinivasan Shanmugam	9b70a1d414	drm/amdgpu: Declare array with strings as pointers constant This warning is for the declaration of a static array, and it is recommended to declare it as type "static const char * const" instead of "static const char ". an array pointer declared as type "static const char " can point to a different character constant because the pointer is mutable. However, if it is declared as type "static const char * const", the pointer will point to an immutable character constant, preventing it from being modified which can better ensure the safety and stability of the program. Fixes the below: WARNING: static const char * array should probably be static const char * const Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:34:11 -04:00
Jiapeng Chong	df04434cb5	drm/amdgpu: clean up some inconsistent indenting No functional modification involved. drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c:34 nbio_v7_11_get_rev_id() warn: inconsistent indenting. v2: drop leftover printk (Alex) Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6316 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:32:25 -04:00
Yifan Zhang	0bdf09cc5e	drm/amdgpu: calling address translation functions to simplify codes Use amdgpu_gmc_vram_pa to simplify codes. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-09-06 14:31:52 -04:00
Simon Pilkington	e2884fe84a	drm/amd: Make fence wait in suballocator uninterruptible Commit `c103a23f2f` ("drm/amd: Convert amdgpu to use suballocation helper.") made the fence wait in amdgpu_sa_bo_new() interruptible but there is no code to handle an interrupt. This caused the kernel to randomly explode in high-VRAM-pressure situations so make it uninterruptible again. Signed-off-by: Simon Pilkington <simonp.git@gmail.com> Fixes: `c103a23f2f` ("drm/amd: Convert amdgpu to use suballocation helper.") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2761 CC: stable@vger.kernel.org # 6.4+	2023-09-01 15:12:07 +02:00
Christian König	35588314e9	drm/amdgpu: fix amdgpu_cs_p1_user_fence The offset is just 32bits here so this can potentially overflow if somebody specifies a large value. Instead reduce the size to calculate the last possible offset. The error handling path incorrectly drops the reference to the user fence BO resulting in potential reference count underflow. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2023-08-31 18:14:49 -04:00
Hawking Zhang	9f051d6ff1	drm/amdgpu: Free ras cmd input buffer properly Do not access the pointer for ras input cmd buffer if it is even not allocated. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:12:13 -04:00
Rajneesh Bhardwaj	2031c46b09	drm/amdgpu: Hide xcp partition sysfs under SRIOV XCP partitions should not be visible for the VF for GFXIP 9.4.3. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:11:41 -04:00
Tao Zhou	e23b10675a	drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting Instead of using direct update, avoid touching unrelated fields. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:11:09 -04:00
André Almeida	6d1b345548	drm/amdgpu: Allocate coredump memory in a nonblocking way During a GPU reset, a normal memory reclaim could block to reclaim memory. Giving that coredump is a best effort mechanism, it shouldn't disturb the reset path. Change its memory allocation flag to a nonblocking one. Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:10:11 -04:00
Hawking Zhang	e0c5c387ac	drm/amdgpu: Support query ecc cap for aqua_vanjaram Driver queries umc_info v4_0 to identify ecc cap for aqua_vanjaram Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Candice Li <candice.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:09:45 -04:00
Lijo Lazar	05347402d1	drm/amdgpu: Add SMU v13.0.6 default reset methods For APUs with SMU v13.0.6, mode-2 reset is kept as default and for others mode-1 is the default reset method. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Tested-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:05:43 -04:00
Lijo Lazar	7c2949c12e	drm/amdgpu: Add bootloader wait for PSP v13 Implement the wait for bootloader call back for PSP v13.0 ASICs. Only for ASICs with PSP v13.0.6, it needs an additional check for VBIOS mailbox status. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Tested-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:04:10 -04:00
Hamza Mahfooz	0a611560f5	drm/amdgpu: register a dirty framebuffer callback for fbcon fbcon requires that we implement &drm_framebuffer_funcs.dirty. Otherwise, the framebuffer might take a while to flush (which would manifest as noticeable lag). However, we can't enable this callback for non-fbcon cases since it may cause too many atomic commits to be made at once. So, implement amdgpu_dirtyfb() and only enable it for fbcon framebuffers (we can use the "struct drm_file file" parameter in the callback to check for this since it is only NULL when called by fbcon, at least in the mainline kernel) on devices that support atomic KMS. Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org # 6.1+ Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2519 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:03:48 -04:00
Mangesh Gadre	7b9f623530	drm/amdgpu: Updated TCP/UTCL1 programming Update TCP/UTCL1 thrashing control settings v2: updated rev_id check Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:02:49 -04:00
Hawking Zhang	7d4424373d	drm/amdgpu: Fix the return for gpu mode1_reset amdgpu_device_mode1_reset will return gpu mode1_reset succeed (ret = 0) as long as wait_for_bootloader call succeed, regardless of the status reported by smu or psp firmware. This results to driver continue executing recovery even smu or psp fail to perform mode1 reset. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 18:02:10 -04:00
Mangesh Gadre	3f16096795	drm/amdgpu: Remove SRAM clock gater override by driver rlc firmware does required setting, driver need not do it. Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:59:17 -04:00
Lijo Lazar	7656168a8a	drm/amdgpu: Add bootloader status check Add a function to wait till bootloader has reached steady state. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Tested-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:58:58 -04:00
Horace Chen	8c97e87c13	drm/amdkfd: use correct method to get clock under SRIOV [What] Current SRIOV still using adev->clock.default_XX which gets from atomfirmware. But these fields are abandoned in atomfirmware long ago. Which may cause function to return a 0 value. [How] We don't need to check whether SR-IOV. For SR-IOV one-vf-mode, pm is enabled and VF is able to read dpm clock from pmfw, so we can use dpm clock interface directly. For multi-VF mode, VF pm is disabled, so driver can just react as pm disabled. One-vf-mode is introduced from GFX9 so it shall not have any backward compatibility issue. Signed-off-by: Horace Chen <horace.chen@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:58:29 -04:00
Lijo Lazar	8f1778939b	drm/amdgpu: Unset baco dummy mode on nbio v7.9 BACO dummy mode could be set under reset conditions and that affects framebuffer access. Check If baco dummy mode is set, unset it if so. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Tested-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:58:23 -04:00
YiPeng Chai	e81c455685	drm/amdgpu: Enable ras for mp0 v13_0_6 sriov Enable ras for mp0 v13_0_6 sriov Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:55:02 -04:00
Samir Dhume	bae44a8fcb	drm/amdgpu/jpeg - skip change of power-gating state for sriov Powergating is handled in the host driver. Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:54:18 -04:00
Le Ma	46b55e25c9	drm/amdgpu: update gc_info v2_1 from discovery Several new fields are exposed in gc_info v2_1 Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:53:19 -04:00
Le Ma	d4f6425a56	drm/amdgpu: update mall info v2 from discovery Mall info v2 is introduced in ip discovery Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:53:08 -04:00
Candice Li	4b721ed87e	drm/amdgpu: Only support RAS EEPROM on dGPU platform RAS EEPROM device is only supported on dGPU platform for smu v13_0_6. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:52:49 -04:00
Lang Yu	983ac45a06	drm/amdgpu: update SET_HW_RESOURCES definition for UMSCH Align with FW changes. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:14:21 -04:00
Lang Yu	eebb06d121	drm/amdgpu: add amdgpu_umsch_mm module parameter Enable Multi Media User Mode Scheduler (0 = disabled (default), 1 = enabled). Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:14:21 -04:00
Lang Yu	822f780829	drm/amdgpu/discovery: enable UMSCH 4.0 in IP discovery Enable UMSCH to support VPE and VCN user queues. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:14:21 -04:00
Lang Yu	4f94903332	drm/amdgpu: add PSP loading support for UMSCH Add front door loading support. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:14:21 -04:00
Lang Yu	40748f9a0a	drm/amdgpu: reserve mmhub engine 3 for UMSCH FW UMSCH FW uses mmhub engine 3 for invalidation. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Acked-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:14:21 -04:00
Lang Yu	d591ae0c9f	drm/amdgpu: add VPE queue submission test Submit a fence command through indirect buffer. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:14:21 -04:00
Lang Yu	5d5eac7e83	drm/amdgpu: add selftest framework for UMSCH Prepare for VPE and VCN queue submission test. v2: rebase on drm_exec (Alex) Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 17:14:21 -04:00
Lang Yu	dc6f3d6ff2	drm/amdgpu: enable UMSCH scheduling for VPE Add VPE into UMSCH hw resourses, set vmid mask to 0xf00, set hqd mask to 0xfe, then UMSCH can schedule VPE queues. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:40:56 -04:00
Lang Yu	3488c79bea	drm/amdgpu: add initial support for UMSCH Add basic data structure, dummy ring functions and ip functions for UMSCH. Implement sw_init(ring_init and init_microcodede) and hw_init(load_microcode), UMSCH can boot up now. Implement hw_init(ring_start) and hw_fini(ring_stop), UMSCH is ready for command submission now. Implement set_hw_resources and add/remove_queue, UMSCH is ready for scheduling now. Aggregated doorbell is used to notify UMSCH FW that there is unmapped queue with corresponding priority level (e.g., AGDB[0] for Real time band, etc.) is updating its job. v2: squash together initial patches to avoid breaking the build (Alex) Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:40:53 -04:00
Lang Yu	9c852a42a9	drm/amdgpu: add UMSCH firmware header definition Add firmware header definition for UMSCH. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:40:45 -04:00
Lang Yu	1a29f36781	drm/amdgpu: add UMSCH RING TYPE definition Add RING TYPE definition for Multi Mdeia User Mode Scheduler. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:40:40 -04:00
Saleemkhan Jamadar	1cf36599b9	drm/amdgpu/jpeg: initialize number of jpeg ring Initialize number of jpeg ring for vcn 4.0.5. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:39:41 -04:00
Christian König	a5492fe27f	drm/amdgpu: fix amdgpu_cs_p1_user_fence The offset is just 32bits here so this can potentially overflow if somebody specifies a large value. Instead reduce the size to calculate the last possible offset. The error handling path incorrectly drops the reference to the user fence BO resulting in potential reference count underflow. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:39:28 -04:00
Evan Quan	90bcb9b595	drm/amdgpu: revise the device initialization sequences By placing the sysfs interfaces creation after `.late_int`. Since some operations performed during `.late_init` may affect how the sysfs interfaces should be created. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:35:33 -04:00
Evan Quan	3e38b634f9	drm/amd/pm: introduce a new set of OD interfaces There will be multiple interfaces(sysfs files) exposed with each representing a single OD functionality. And all those interface will be arranged in a tree liked hierarchy with the top dir as "gpu_od". Meanwhile all functionalities for the same component will be arranged under the same directory. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:35:26 -04:00
Saleemkhan Jamadar	6be6e74b7d	drm/amdgpu: enable PG flags for VCN Enable PG flags for VCN and Jpeg on IP 11_5_0 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:35:02 -04:00
Saleemkhan Jamadar	844d8dd5b9	drm/amdgpu/discovery: add VCN 4.0.5 Support Enable VCN 4.0.5 on gc 11_5_0. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:58 -04:00
Saleemkhan Jamadar	c64f389506	drm/amdgpu/soc21: Add video cap query support for VCN_4_0_5 Added the video capability query support for VCN version 4_0_5 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:52 -04:00
Saleemkhan Jamadar	cc308acc9b	drm/amdgpu:enable CG and PG flags for VCN Enable CG and PG flags for VCN on IP 11_5_0 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:48 -04:00
Saleemkhan Jamadar	1827b37582	drm/amdgpu: add VCN_4_0_5 firmware support Add VCN_4_0_5 firmware support Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:43 -04:00
Saleemkhan Jamadar	8f98a715da	drm/amdgpu/jpeg: add jpeg support for VCN4_0_5 Add jpeg support for VCN4_0_5 v2 - update license year (Leo Liu) Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:36 -04:00
Saleemkhan Jamadar	547aad32ed	drm/amdgpu: add VCN4 ip block support Add VCN 4.0.5 initialization and decoder/encoder ring functions. v2 - update license year (Leo Liu) Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:32 -04:00
Lang Yu	f9ecae9a4e	drm/amdgpu: fix VPE front door loading issue Implement proper front door loading for vpe 6.1. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:22 -04:00
Lang Yu	5f6e9cdc83	drm/amdgpu: add VPE FW version query support Add support to query VPE FW version. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:19 -04:00
Lang Yu	3ee8fb7005	drm/amdgpu: enable VPE for VPE 6.1.0 Enable Video Processing Engine on SoCs that contain VPE 6.1.0. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:16 -04:00
Lang Yu	523c12802d	drm/amdgpu: add user space CS support for VPE Enable command submission to VPE from user space. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:14 -04:00
Lang Yu	c5d67a0ec3	drm/amdgpu: add PSP loading support for VPE Add PSP loading support for Video Processing Engine. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:10 -04:00
Lang Yu	9d4346bdbc	drm/amdgpu: add VPE 6.1.0 support Add skeleton driver code. (Ray) Add initial support for Video Processing Engine. (Lang) Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:34:05 -04:00
Lang Yu	5861e47731	drm/amdgpu: add nbio 7.11 callback for VPE Add nbio callback to configure doorbell settings. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:56 -04:00
Lang Yu	75fdd738ff	drm/amdgpu: add nbio callback for VPE Add nbio callback to configure doorbell settings. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:53 -04:00
Lang Yu	964a36d7a4	drm/amdgpu: add PSP FW TYPE for VPE Add PSP FW TYPE for Video Processing Engine. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:50 -04:00
Lang Yu	4c63735fa8	drm/amdgpu: add UCODE ID for VPE Add UCODE ID for Video Processing Engine. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:47 -04:00
Lang Yu	ce7b59c1e6	drm/amdgpu: add support for VPE firmware name decoding Add decoding VPE firmware name support. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:43 -04:00
Lang Yu	2f3916bedb	drm/amdgpu: add doorbell index for VPE Add doorbell index for Video Processing Engine. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:40 -04:00
Lang Yu	0b233357a6	drm/amdgpu: add HWID for VPE Add HWID for Video Processing Engine. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:31 -04:00
Lang Yu	b0fa855cab	drm/amdgpu: add VPE firmware interface Add initial firmware interface. (Ray) Add more opcodes and rename to vpe_v6_1. (Lang) v2: Update copyright date (Alex) Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:27 -04:00
Lang Yu	878fe05116	drm/amdgpu: add VPE firmware header definition Add firmware header definition for Video Processing Engine. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:25 -04:00
Huang Rui	5b28f1c720	drm/amdgpu: add VPE HW IP BLOCK definition Add HW IP BLOCK for Video Processing Engine. Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:21 -04:00
Huang Rui	2d6ea3b07c	drm/amdgpu: add VPE RING TYPE definition Add RING TYPE for Video Processing Engine. Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-31 16:33:09 -04:00
Linus Torvalds	b6f6167ea8	pci-v6.6-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmTvfQgUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vyDKA//UBxniXTyxvN8L/agMZngFJd9jLkE p2lnk5eTW6y/aJp1g+ujc7IJEmHG/B1Flp0b5mK8XL7S6OBtAGlPwnuPPpXb0ZxV ofSuQpYoNZGpkYrQMYvATfdLnH2WF3Yj3WCqh5jd2EldPEyqhMV68l7NMzf6+td2 KWJPli1XO8e60JAzbhpXH9vn1I0T8e6Qx8z/ulcydfiOH3PGDPnVrEo8gw9CvJOr aDqSPW7uhTk2SjjUJcAlQVpTGclE4yBxOOhEbuSGc7L6Ab04Y6D0XKx1589AUK6Z W2dQFK3cFYNQQ9aS/2DMUG88H09ca5t8kgUf7Iz3uan1soPzSYK8SLNBgxAPs11S 1jY093rDXXoaCJqxWUwDc/JUpWq6T3g4m445SNvFIOMcSwmMOIfAwfug4UexE1zC Ie8u3Um35Mp25o0o6V1J2EjdBsUsm0p//CsslfoAAIWi85W02Z/46bLLcITchkCe bP05H+c55ZN6maRJiaeghcpY+iWO4XCRCKS9mF1v9yn7FOhNxhBcwgTNPyGBVrYz T9w3ynTHAmuwNqtd6jhpTR/b1902up/Qv9I8uHhBDMqJAXfHocGEXHZblNuZMgfE bu9cjcbFghUPdrhUHYmbEqAzhdlL2SFuMYfn8D4QV4A6x+32xCdwsi39I0Effm5V wl0HmemjKjTYbLw= =iFFM -----END PGP SIGNATURE----- Merge tag 'pci-v6.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Add locking to read/modify/write PCIe Capability Register accessors for Link Control and Root Control - Use pci_dev_id() when possible instead of manually composing ID from dev->bus->number and dev->devfn Resource management: - Move prototypes for __weak sysfs resource files to linux/pci.h to fix 'no previous prototype' warnings - Make more I/O port accesses depend on HAS_IOPORT - Use devm_platform_get_and_ioremap_resource() instead of open-coding platform_get_resource() followed by devm_ioremap_resource() Power management: - Ensure devices are powered up while accessing VPD - If device is powered-up, keep it that way while polling for PME - Only read PCI_PM_CTRL register when available, to avoid reading the wrong register and corrupting dev->current_state Virtualization: - Avoid Secondary Bus Reset on NVIDIA T4 GPUs Error handling: - Remove unused pci_disable_pcie_error_reporting() - Unexport pci_enable_pcie_error_reporting(), used only by aer.c - Unexport pcie_port_bus_type, used only by PCI core VGA: - Simplify and clean up typos in VGA arbiter Apple PCIe controller driver: - Initialize pcie->nvecs (number of available MSIs) before use Broadcom iProc PCIe controller driver: - Use of_property_read_bool() instead of low-level accessors for boolean properties Broadcom STB PCIe controller driver: - Assert PERST# when probing BCM2711 because some bootloaders don't do it Freescale i.MX6 PCIe controller driver: - Add .host_deinit() callback so we can clean up things like regulators on probe failure or driver unload Freescale Layerscape PCIe controller driver: - Add support for link-down notification so the endpoint driver can process LINK_DOWN events - Add suspend/resume support, including manual PME_Turn_off/PME_TO_Ack handshake - Save Link Capabilities during probe so they can be restored when handling a link-up event, since the controller loses the Link Width and Link Speed values during reset Intel VMD host bridge driver: - Fix disable of bridge windows during domain reset; previously we cleared the base/limit registers, which actually left the windows enabled Marvell MVEBU PCIe controller driver: - Remove unused busn member Microchip PolarFlare PCIe controller driver: - Fix interrupt bit definitions so the SEC and DED interrupt handlers work correctly - Make driver buildable as a module - Read FPGA MSI configuration parameters from hardware instead of hard-coding them Microsoft Hyper-V host bridge driver: - To avoid a NULL pointer dereference, skip MSI restore after hibernate if MSI/MSI-X hasn't been enabled NVIDIA Tegra194 PCIe controller driver: - Revert 'PCI: tegra194: Enable support for 256 Byte payload' because Linux doesn't know how to reduce MPS from to 256 to 128 bytes for endpoints below a switch (because other devices below the switch might already be operating), which leads to 'Malformed TLP' errors Qualcomm PCIe controller driver: - Add DT and driver support for interconnect bandwidth voting for 'pcie-mem' and 'cpu-pcie' interconnects - Fix broken SDX65 'compatible' DT property - Configure controller so MHI bus master clock will be switched off while in ASPM L1.x states - Use alignment restriction from EPF core in EPF MHI driver - Add Endpoint eDMA support - Add MHI eDMA support - Add Snapdragon SM8450 support to the EPF MHI driversupport - Add MHI eDMA support - Add Snapdragon SM8450 support to the EPF MHI driversupport - Add MHI eDMA support - Add Snapdragon SM8450 support to the EPF MHI driversupport - Add MHI eDMA support - Add Snapdragon SM8450 support to the EPF MHI driver - Use iATU for EPF MHI transfers smaller than 4K to avoid eDMA setup latency - Add sa8775p DT binding and driver support Rockchip PCIe controller driver: - Use 64-bit mask on MSI 64-bit PCI address to avoid zeroing out the upper 32 bits SiFive FU740 PCIe controller driver: - Set the supported number of MSI vectors so we can use all available MSI interrupts Synopsys DesignWare PCIe controller driver: - Add generic dwc suspend/resume APIs (dw_pcie_suspend_noirq() and dw_pcie_resume_noirq()) to be called by controller driver suspend/resume ops, and a controller callback to send PME_Turn_Off MicroSemi Switchtec management driver: - Add support for PCIe Gen5 devices Miscellaneous: - Reorder and compress to reduce size of struct pci_dev - Fix race in DOE destroy_work_on_stack() - Add stubs to avoid casts between incompatible function types - Explicitly include correct DT includes to untangle headers" * tag 'pci-v6.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (96 commits) PCI: qcom-ep: Add ICC bandwidth voting support dt-bindings: PCI: qcom: ep: Add interconnects path PCI: qcom-ep: Treat unknown IRQ events as an error dt-bindings: PCI: qcom: Fix SDX65 compatible PCI: endpoint: Add kernel-doc for pci_epc_mem_init() API PCI: epf-mhi: Use iATU for small transfers PCI: epf-mhi: Add support for SM8450 PCI: epf-mhi: Add eDMA support PCI: qcom-ep: Add eDMA support PCI: epf-mhi: Make use of the alignment restriction from EPF core PCI/PM: Only read PCI_PM_CTRL register when available PCI: qcom: Add support for sa8775p SoC dt-bindings: PCI: qcom: Add sa8775p compatible PCI: qcom-ep: Pass alignment restriction to the EPF core PCI: Simplify pcie_capability_clear_and_set_word() control flow PCI: Tidy config space save/restore messages PCI: Fix code formatting inconsistencies PCI: Fix typos in docs and comments PCI: Fix pci_bus_resetable(), pci_slot_resetable() name typos PCI: Simplify pci_dev_driver() ...	2023-08-30 20:23:07 -07:00
Srinivasan Shanmugam	8254e05c82	drm/amdgpu: Fix printk_ratelimit() with DRM_ERROR_RATELIMITED in 'amdgpu_cs_ioctl' Replaced printk_ratelimit() with its DRM equivalent to avoid flooding of dmesg logs & hence fixes the following: WARNING: Prefer printk_ratelimited or pr_<level>_ratelimited to printk_ratelimit + if (printk_ratelimit()) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:51:17 -04:00
Srinivasan Shanmugam	bf227a4f05	drm/amdgpu: Use READ_ONCE() when reading the values in 'sdma_v4_4_2_ring_get_rptr' Use READ_ONCE() instead of declaring the pointer volatile. To prevent the compiler from refetching or reordering the read, so that the read value is always consistent. Link: https://lwn.net/Articles/624126/ Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Le Ma <le.ma@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:51:17 -04:00
Yifan Zhang	4d5dc6260c	drm/amdgpu: remove unused parameter in amdgpu_vmid_grab_idle amdgpu_vm is not used in amdgpu_vmid_grab_idle. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:51:17 -04:00
Hawking Zhang	bf7aa8bea9	drm/amdgpu: Free ras cmd input buffer properly Do not access the pointer for ras input cmd buffer if it is even not allocated. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:51:16 -04:00
Ma Jun	8f9a9a09af	drm/amd: Simplify the bo size check funciton Simplify the code logic of size check function amdgpu_bo_validate_size Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-30 15:51:16 -04:00

... 28 29 30 31 32 ...

15900 Commits