Commit Graph

491 Commits

Author SHA1 Message Date
Lijo Lazar b3c6871692 drm/amdgpu: Fetch NPS mode for GCv9.4.3 VFs
Use the memory ranges published in discovery table to deduce NPS mode
of GC v9.4.3 VFs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com>
Tested-by: Vignesh Chander <Vignesh.Chander@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:19:46 -04:00
Lijo Lazar c29aeadf0b drm/amdgpu: Add NPS switch support for GC 9.4.3
Add dynamic NPS switch support for GC 9.4.3 variants. Only GC v9.4.3 and
GC v9.4.4 currently support this. NPS switch is only supported if an SOC
supports multiple NPS modes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:17:25 -04:00
Sunil Khatri 692d2cd180 drm/amdgpu: update the handle ptr in hw_fini
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of hw_fini.

Also update the ip_block ptr where ever needed as
there were cyclic dependency of hw_fini on suspend
and some followed clean up.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:03:25 -04:00
Sunil Khatri 58608034ed drm/amdgpu: update the handle ptr in hw_init
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of hw_init.

Also update the ip_block ptr where ever needed as
there were cyclic dependency of hw_init on resume.

v2: squash in isp fix

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:03:25 -04:00
Sunil Khatri 7feb4f3ad8 drm/amdgpu: update the handle ptr in resume
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of resume.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:02:50 -04:00
Sunil Khatri 982d7f9bfe drm/amdgpu: update the handle ptr in suspend
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of suspend.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:02:45 -04:00
Sunil Khatri 82ae6619a4 drm/amdgpu: update the handle ptr in wait_for_idle
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of wait_for_idle.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:02:36 -04:00
Sunil Khatri 0ef2a1e7af drm/amdgpu: update the handle ptr in soft_reset
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of soft_reset.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:45:44 -04:00
Sunil Khatri 36aa9ab9c0 drm/amdgpu: update the handle ptr in sw_fini
update the *handle to amdgpu_ip_block ptr for all
functions pointers of sw_fini.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:43 -04:00
Sunil Khatri d5347e8d27 drm/amdgpu: update the handle ptr in sw_init
update the *handle to amdgpu_ip_block ptr for all
functions pointers of sw_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:37 -04:00
Sunil Khatri 3138ab2c5b drm/amdgpu: update the handle ptr in late_init
Update the ptr handle to amdgpu_ip_block ptr in all
the functions of late_init function ptr.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:31 -04:00
Sunil Khatri 146b085ead drm/amdgpu: update the handle ptr in early_init
update the handle ptr to amdgpu_ip_block ptr
for all functions pointers on early_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:22 -04:00
Lijo Lazar 631af731ee drm/amdgpu: Refactor XGMI reset on init handling
Use XGMI hive information to rely on resetting XGMI devices on
initialization rather than using mgpu structure. mgpu structure may have
other devices as well.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <feifxu@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26 17:06:46 -04:00
Xiaogang Chen 6ef29715ac drm/amdkfd: Change kfd/svm page fault drain handling
When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and
not handle any incoming pages fault of this process until a deferred work item
got executed by default system wq. The time period of "not handle page fault"
can be long and is unpredicable. That is advese to kfd performance on page
faults recovery.

This patch uses time stamp of incoming page fault to decide to drop or recover
page fault. When app unmap vm ranges kfd records each gpu device's ih ring
current time stamp. These time stamps are used at kfd page fault recovery
routine.

Any page fault happened on unmapped ranges after unmap events is application
bug that accesses vm range after unmap. It is not driver work to cover that.

By using time stamp of page fault do not need drain page faults at deferred
work. So, the time period that kfd does not handle page faults is reduced
and can be controlled.

Signed-off-by: Xiaogang.Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-23 10:55:13 -04:00
Hawking Zhang 9da0f77367 drm/amdgpu: Fix register access violation
fault_status is read only register. fault_cntl
is not accessible from guest environment.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:35 -04:00
Shane Xiao 45bd39fb3b drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_VG10
This patch changes the implementation of AMDGPU_PTE_MTYPE_VG10,
clear the bits before setting the new one.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:13 -04:00
Victor Zhao e21e0b7824 drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rw
the inst passed to amdgpu_virt_rlcg_reg_rw should be physical instance.
Fix the miss matched code.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:38 -04:00
Lijo Lazar b194d21b9b drm/amdgpu: Use NPS ranges from discovery table
Add GMC API to fetch NPS range information from discovery table. Use NPS
range information in GMC 9.4.3 SOCs when available, otherwise fallback
to software method.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Lijo Lazar ce798376ef drm/amdgpu: Fix memory range calculation
Consider the 16M reserved region also before range calculation for GMC
9.4.3 SOCs.

Fixes: a433f1f594 ("drm/amdgpu: Initialize memory ranges for GC 9.4.3")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Hawking Zhang 5f571c61b9 drm/amdgpu: Add gfx v9_4_4 ip block
Add gfx v9_4_4 ip block support

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:49:16 -04:00
Tao Zhou 4b0cb230bd drm/amdgpu: retire UMC v12 mca_addr_to_pa
RAS TA will handle it, the function is useless.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:09:15 -04:00
Tao Zhou d8070c4241 drm/amdgpu: support utcl2 RAS poison query for mmhub
Support the query for both gfxhub and mmhub, also replace
xcc_id with hub_inst.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:14 -04:00
Tao Zhou 0c501d3c11 drm/amdgpu: skip GFX FED error in page fault handling
Let kfd interrupt handler process it.

v2: return 0 instead of 1 for fed error.
drop the usage of strcmp in interrupt handler.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:36 -04:00
Shashank Sharma b8f67b9ddf drm/amdgpu: change vm->task_info handling
This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
  reference counted.
- introducing two new helper funcs for task_info lifecycle management
    - amdgpu_vm_get_task_info: reference counts up task_info before
      returning this info
    - amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm->task_info.

V2: Do not block all the prints when task_info not found (Felix)

V3: Fixed review comments from Felix
   - Fix wrong indentation
   - No debug message for -ENOMEM
   - Add NULL check for task_info
   - Do not duplicate the debug messages (ti vs no ti)
   - Get first reference of task_info in vm_init(), put last
     in vm_fini()

V4: Fixed review comments from Felix
   - fix double reference increment in create_task_info
   - change amdgpu_vm_get_task_info_pasid
   - additional changes in amdgpu_gem.c while porting

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-04 15:59:08 -05:00
Victor Lu 8f4de8f72e drm/amdgpu: Use correct SRIOV macro for gmc_v9_0_vm_fault_interrupt_state
Under SRIOV, programming to VM_CONTEXT*_CNTL regs failed because the
current macro does not pass through the correct xcc instance.
Use the *REG32_XCC macro in this case.

The behaviour without SRIOV is the same without this patch.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:27:42 -05:00
Lijo Lazar 3d1554d999 drm/amdgpu: Avoid fetching VRAM vendor info
The present way to fetch VRAM vendor information turns out to be not
reliable on GFX 9.4.3 dGPUs as well. Avoid using the data.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 12:26:23 -05:00
Srinivasan Shanmugam fa8a91b0e5 drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r'

Fixes: fac4ebd79f ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:19 -05:00
Lijo Lazar e0eb08dcec drm/amdgpu: Avoid fetching vram vendor information
For GFX 9.4.3 APUs, the current method of fetching vram vendor
information is not reliable. Avoid fetching the information.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:28 -05:00
Alex Deucher 26405ff430 drm/amdgpu: move kiq_reg_write_reg_wait() out of amdgpu_virt.c
It's used for more than just SR-IOV now, so move it to
amdgpu_gmc.c and rename it to better match the functionality and
update the comments in the code paths to better document
when each path is used and why.  No functional change.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun.Liu@amd.com
Cc: Christian.Koenig@amd.com
2024-01-15 18:35:36 -05:00
Stanley.Yang a32c6f7f57 drm/amdgpu: Fix ecc irq enable/disable unpaired
The ecc_irq is disabled while GPU mode2 reset suspending process,
but not be enabled during GPU mode2 reset resume process.

Changed from V1:
	only do sdma/gfx ras_late_init in aldebaran_mode2_restore_ip
	delete amdgpu_ras_late_resume function

Changed from V2:
	check umc ras supported before put ecc_irq

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 10:30:49 -05:00
Lijo Lazar ed342a2e78 drm/amdgpu: Use the right method to get IP version
Replace direct usage of adev->ip_versions with amdgpu_ip_version.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:53 -05:00
Alex Deucher e8c2d3e25b drm/amdgpu/gmc9: disable AGP aperture
We've had misc reports of random IOMMU page faults when
this is used.  It's just a rarely used optimization anyway, so
let's just disable it.  It can still be toggled via the
module parameter for testing.

v2: leave it configurable via module parameter

Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:58:41 -05:00
Alex Deucher 6ba5b61383 drm/amdgpu: add a module parameter to control the AGP aperture
Add a module parameter to control the AGP aperture.  The AGP
aperture is an aperture in the GPU's internal address space
which provides direct non-paged access to the platform address
space.  This access is non-snooped so only uncached memory
can be accessed.

Add a knob so that we can toggle this for debugging.

Fixes: 67318cb843 ("drm/amdgpu/gmc11: set gart placement GC11")
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:58:20 -05:00
Le Ma bdb72185d3 drm/amdgpu: finalizing mem_partitions at the end of GMC v9 sw_fini
The valid num_mem_partitions is required during ttm pool fini,
thus move the cleanup at the end of the function.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:47:05 -05:00
David Yat Sin 4abf0b0bdf drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3
Change local memory type to MTYPE_UC on revision id 0

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:02:14 -05:00
Victor Lu bc3c566071 drm/amdgpu: Add xcc param to SRIOV kiq write and WREG32_SOC15_IP_NO_KIQ (v4)
WREG32/RREG32_SOC15_IP_NO_KIQ and amdgpu_virt_kiq_reg_write_reg_wait
are not using the correct rlcg interface or mec engine, respectively.

Add xcc instance parameter to them.

v4: Use GET_INST and squash commit with:
"drm/amdgpu: Add xcc_inst param to amdgpu_virt_kiq_reg_write_reg_wait"

v3: xcc not needed for MMMHUB

v2: rebase

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:01:10 -05:00
Alex Deucher bff3315ba8 drm/amdgpu: fix AGP init order
The default AGP settings were overwriting the IP selected
ones since the default was getting set after the IP ones
were selected.

Fixes: de59b69932 ("drm/amdgpu/gmc: set a default disable value for AGP")
Link: https://lists.freedesktop.org/archives/amd-gfx/2023-November/100966.html
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
2023-11-09 16:59:46 -05:00
David Francis 142262a1c0 drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
On gfx943 APU, EXT_COHERENT should give MTYPE_CC for local and
MTYPE_UC for nonlocal memory.

On NUMA systems, local memory gets the local mtype, set by an
override callback. If EXT_COHERENT is set, memory will be set as
MTYPE_UC by default, with local memory MTYPE_CC.

Add an option in the override function for this case, and
add a check to ensure it is not used on UNCACHED memory.

V2: Combined APU and NUMA code into one patch
V3: Fixed a potential nullptr in amdgpu_vm_bo_update

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27 14:15:16 -04:00
Lin.Cao 9ee819285c drm/amdgpu remove restriction of sriov max_pfn on Vega10
Remove restriction of sriov max_pfn so that TBA and TMA can move to high
47 bits address.

Regression test: change range alloc flag of libdrm as
AMDGPU_VA_RANGE_HIGH and there is no flr occur when testing amdgpu_test
of drm.

Signed-off-by: Lin.Cao <lincao12@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:22 -04:00
Tao Zhou 21226f02d7 drm/amdgpu: replace reset_error_count with amdgpu_ras_reset_error_count
Simplify the code.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Philip Yang fdac890966 drm/amdgpu: ratelimited override pte flags messages
Use ratelimited version of dev_dbg to avoid flooding dmesg log. No
functional change.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:40:11 -04:00
Alex Deucher 8dbf1ba867 drm/amdgpu: cache gpuvm fault information for gmc7+
Cache the current fault info in the vm struct.  This can be queried
by userspace later to help debug UMDs.

Cc: samuel.pitoiset@gmail.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:37:07 -04:00
Rajneesh Bhardwaj f4bff6e0b9 drm/amdgpu: Use ttm_pages_limit to override vram reporting
On GFXIP9.4.3 APU, allow the memory reporting as per the ttm pages
limit in NPS1 mode.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:36:42 -04:00
Alex Deucher 917f91d8d8 drm/amdgpu/gmc: add a way to force a particular placement for GART
We normally place GART based on the location of VRAM and the
available address space around that, but provide an option
to force a particular location for hardware that needs it.

v2: Switch to passing the placement via parameter

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:36:07 -04:00
Alex Deucher de59b69932 drm/amdgpu/gmc: set a default disable value for AGP
To disable AGP, the start needs to be set to a higher
value than the end.  Set a default disable value for
the AGP aperture and allow the IP specific GMC code
to enable it selectively be calling amdgpu_gmc_agp_location().

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:22 -04:00
Christian König 08abccc9a7 drm/amdgpu: further move TLB hw workarounds a layer up
For the PASID flushing we already handled that at a higher layer, apply
those workarounds to the standard flush as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König e2e3788850 drm/amdgpu: rework lock handling for flush_tlb v2
Instead of each implementation doing this more or less correctly
move taking the reset lock at a higher level.

v2: fix typo

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König 3983c9fd2d drm/amdgpu: drop error return from flush_gpu_tlb_pasid
That function never fails, drop the error return.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König e7b90e99fa drm/amdgpu: fix and cleanup gmc_v9_0_flush_gpu_tlb_pasid
Testing for reset is pointless since the reset can start right after the
test.

The same PASID can be used by more than one VMID, invalidate each of them.

Move the KIQ and all the workaround handling into common GMC code.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Tao Zhou 6205b558e1 drm/amdgpu: fix value of some UMC parameters for UMC v12
Prepare for bad page retirement.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:54:52 -04:00