Commit Graph

11690 Commits

Author SHA1 Message Date
Luben Tuikov da858deab8 drm/amdgpu: Remove redundant I2C EEPROM address
Remove redundant EEPROM_I2C_MADDR_54H address, since we already have it
represented (ARCTURUS), and since we don't include the I2C device type
identifier in EEPROM memory addresses, i.e. that high up in the device
abstraction--we only use EEPROM memory addresses, as memory is continuously
represented by EEPROM device(s) on the I2C bus.

Add a comment describing what these memory addresses are, how they come
about and how they're usually extracted from the device address byte.

Cc: Candice Li <candice.li@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: c9bdc6c3cf ("drm/amdgpu: Add EEPROM I2C address support for ip discovery")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:41:42 -05:00
Kenneth Feng 60cfad329a drm/amd/pm: enable mode1 reset on smu_v13_0_10
enable mode1 reset and prioritize debug port on smu_v13_0_10
as a more reliable message processing

v2 - move mode1 reset callback to smu_v13_0_0_ppt.c

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:41:42 -05:00
Philip Yang 0bc71adc8b drm/amdgpu: Drop eviction lock when allocating PT BO
Re-take the eviction lock immediately again after the allocation is
completed, to fix circular locking warning with drm_buddy allocator.

Move amdgpu_vm_eviction_lock/unlock/trylock to amdgpu_vm.h as they are
called from multiple files.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:41:41 -05:00
Philip Yang c70e216696 drm/amdgpu: Unlock bo_list_mutex after error handling
Get below kernel WARNING backtrace when pressing ctrl-C to kill kfdtest
application.

If amdgpu_cs_parser_bos returns error after taking bo_list_mutex, as
caller amdgpu_cs_ioctl will not unlock bo_list_mutex, this generates the
kernel WARNING.

Add unlock bo_list_mutex after amdgpu_cs_parser_bos error handling to
cleanup bo_list userptr bo.

 WARNING: kfdtest/2930 still has locks held!
 1 lock held by kfdtest/2930:
  (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0xce5/0x1f10 [amdgpu]
  stack backtrace:
   dump_stack_lvl+0x44/0x57
   get_signal+0x79f/0xd00
   arch_do_signal_or_restart+0x36/0x7b0
   exit_to_user_mode_prepare+0xfd/0x1b0
   syscall_exit_to_user_mode+0x19/0x40
   do_syscall_64+0x40/0x80

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:41:41 -05:00
Rajneesh Bhardwaj adf65dff5d drm/amdgpu: Fix the kerneldoc description
amdgpu_ttm_tt_set_userptr() is also called by the KFD as part of
initializing the user pages for userptr BOs and also while initializing
the GPUVM for a KFD process so update the function description.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:41:41 -05:00
Christian König f9e6949645 drm/amdgpu: workaround for TLB seq race
It can happen that we query the sequence value before the callback
had a chance to run.

Workaround that by grabbing the fence lock and releasing it again.
Should be replaced by hw handling soon.

Signed-off-by: Christian König <christian.koenig@amd.com>
CC: stable@vger.kernel.org # 5.19+
Fixes: 5255e146c9 ("drm/amdgpu: rework TLB flushing")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2113
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Stefan Springer <stefanspr94@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:22:48 -05:00
Kefeng Wang cc03817c0e amdgpu: use VM_ACCESS_FLAGS
Simplify VM_READ|VM_WRITE|VM_EXEC with VM_ACCESS_FLAGS.

Link: https://lkml.kernel.org/r/20221019034945.93081-6-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 17:37:19 -08:00
Ma Jun e0b26b9482 drm/amdgpu: Fix the lpfn checking condition in drm buddy
Because the value of man->size is changed during suspend/resume process,
use mgr->mm.size instead of man->size here for lpfn checking.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220914125331.2467162-1-Jun.Ma2@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2022-11-08 14:00:18 +01:00
Dave Airlie 49e8e6343d Merge tag 'amd-drm-next-6.2-2022-11-04' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.2-2022-11-04:

amdgpu:
- Add TMZ support for GC 11.0.1
- More IP version check conversions
- Mode2 reset fixes for sienna cichlid
- SMU 13.x fixes
- RAS enablement on MP 13.x
- Replace kmap with kmap_local_page()
- Misc Clang warning fixes
- SR-IOV fixes for GC 11.x
- PCI AER fix
- DCN 3.2.x commit sequence rework
- SDMA 4.x doorbell fix
- Expose additional new GC 11.x firmware versions
- Misc code cleanups
- S0i3 fixes
- More DC FPU cleanup
- Add more DC kerneldoc
- Misc spelling and grammer fixes
- DCN 3.1.x fixes
- Plane modifier fix
- MCA RAS enablement
- Secure display locking fix
- RAS TA rework
- RAS EEPROM fixes
- Fail suspend if eviction fails
- Drop AMD specific DSC workarounds in favor of drm EDID quirks
- SR-IOV suspend/resume fixes
- Enable DCN support for ARM
- Enable secure display on DCN 2.1

amdkfd:
- Cache size fixes for GC 10.3.x
- kfd_dev struct cleanup
- GC11.x CWSR trap handler fix
- Userptr fixes
- Warning fixes

radeon:
- Replace kmap with kmap_local_page()

UAPI:
- Expose additional new GC 11.x firmware versions via the existing INFO query

drm:
- Add some new EDID DSC quirks

Signed-off-by: Dave Airlie <airlied@redhat.com>

# Conflicts:
#	drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221104205827.6008-1-alexander.deucher@amd.com
2022-11-08 16:32:31 +10:00
Thomas Zimmermann 45b64fd9f7 drm/fb-helper: Remove unnecessary include statements
Remove include statements for <drm/drm_fb_helper.h> where it is not
required (i.e., most of them). In a few places include other header
files that are required by the source code.

v3:
	* fix amdgpu include statements
	* fix rockchip include statements

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221103151446.2638-23-tzimmermann@suse.de
2022-11-05 17:12:04 +01:00
Thomas Zimmermann 8ab59da26b drm/fb-helper: Move generic fbdev emulation into separate source file
Move the generic fbdev implementation into its own source and header
file. Adapt drivers. No functional changes, but some of the internal
helpers have been renamed to fit into the drm_fbdev_ naming scheme.

v3:
	* rename drm_fbdev.{c,h} to drm_fbdev_generic.{c,h}
	* rebase onto vmwgfx changes
	* rebase onto xlnx changes
	* fix include statements in amdgpu

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221103151446.2638-22-tzimmermann@suse.de
2022-11-05 17:12:04 +01:00
Thomas Zimmermann 0e3172bac3 drm/amdgpu: Don't set struct drm_driver.output_poll_changed
Don't set struct drm_driver.output_poll_changed. It's used to restore
the fbdev console. But as amdgpu uses generic fbdev emulation, the
console is being restored by the DRM client helpers already. See the
functions drm_kms_helper_hotplug_event() and
drm_kms_helper_connector_hotplug_event() in drm_probe_helper.c.

v2:
	* fix commit description (Christian)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221103151446.2638-5-tzimmermann@suse.de
2022-11-05 17:05:53 +01:00
Thomas Zimmermann 8e4e4c2f53 Merge drm/drm-next into drm-misc-next
Backmerging drm/drm-next to get the latest changes in the xlnx driver.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2022-11-05 16:08:36 +01:00
Xiaogang Chen fcf00f8d29 drm/amdkfd: Remove skiping userptr buffer mapping when mmu notifier marks it as invalid
mmu notifier does not always hold mm->sem during call back. That causes
a race condition between kfd userprt buffer mapping and mmu notifier
which leds to gpu shadder or SDMA access userptr buffer before it has been
mapped to gpu VM. Always map userptr buffer to avoid that though it may make
some userprt buffers mapped two times.

Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:54 -04:00
Victor Zhao ec4927d463 drm/amdgpu: fix for suspend/resume sequence under sriov
- clear kiq ring after suspend/resume under sriov to aviod kiq ring
test failure
- update irq after resume to fix kiq interrput loss

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Kenneth Feng 5cefe31b2a drm/amd/amdgpu: temporary workaround to skip ras error for gc_v11_0_3
temporary workaround to skip ras error for gc_v11_0_3 until IFWI release later

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Hawking Zhang cfa61b8f9e drm/amdgpu: switch to select_se_sh wrapper for gfx v9_0
To allow invoking ip specific callbacks

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Nathan Chancellor f0d0f10873 drm/amdgpu: Fix type of second parameter in trans_msg() callback
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:

  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c:412:15: error: incompatible function pointer types initializing 'void (*)(struct amdgpu_device *, u32, u32, u32, u32)' (aka 'void (*)(struct amdgpu_device *, unsigned int, unsigned int, unsigned int, unsigned int)') with an expression of type 'void (struct amdgpu_device *, enum idh_request, u32, u32, u32)' (aka 'void (struct amdgpu_device *, enum idh_request, unsigned int, unsigned int, unsigned int)') [-Werror,-Wincompatible-function-pointer-types-strict]
          .trans_msg = xgpu_ai_mailbox_trans_msg,
                      ^~~~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.

  drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:435:15: error: incompatible function pointer types initializing 'void (*)(struct amdgpu_device *, u32, u32, u32, u32)' (aka 'void (*)(struct amdgpu_device *, unsigned int, unsigned int, unsigned int, unsigned int)') with an expression of type 'void (struct amdgpu_device *, enum idh_request, u32, u32, u32)' (aka 'void (struct amdgpu_device *, enum idh_request, unsigned int, unsigned int, unsigned int)') [-Werror,-Wincompatible-function-pointer-types-strict]
          .trans_msg = xgpu_nv_mailbox_trans_msg,
                      ^~~~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.

The type of the second parameter in the prototype should be 'enum
idh_request' instead of 'u32'. Update it to clear up the warnings.

Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Paulo Miguel Almeida 320e2590e2 drm/amdgpu: Replace one-element array with flexible-array member
One-element arrays are deprecated, and we are replacing them with
flexible array members instead. So, replace one-element array with
flexible-array member in struct _ATOM_FAKE_EDID_PATCH_RECORD and
refactor the rest of the code accordingly.

Important to mention is that doing a build before/after this patch
results in no binary output differences.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/238
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 [1]

Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Alex Deucher e053d71f8c drm/amdgpu/gfx11: set gfx.funcs in early init
So the callbacks are set early in case we need them.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Alex Deucher 105195af02 drm/amdgpu/gfx10: set gfx.funcs in early init
So the callbacks are set early in case we need them.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Alex Deucher 33034c5c2e drm/amdgpu/gfx9: set gfx.funcs in early init
So the callbacks are set before we use them.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Peng Ju Zhou f8638ad7fc drm/amdgpu: Remove unnecessary register program in SRIOV
Remove unnecessary register program in SRIOV

Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Yiqing Yao 8a1fbb4a5e drm/amdgpu: Disable MCBP from soc21 for SRIOV
[why]
Start from soc21, CP does not support MCBP, so disable it.

[how]
Used amgpu_mcbp flag alone instead of checking if is in SRIOV to
enable/disable MCBP.
Only set flag to enable on asic_type prior to soc21 in SRIOV.

Signed-off-by: Yiqing Yao <yiqing.yao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Yiqing Yao 0cfce2401e drm/amdgpu: Clean up soc21 early init for SRIOV
Use virt_init_setting instead of per ip version setting.

Signed-off-by: Yiqing Yao <yiqing.yao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Graham Sider 9a1662f549 drm/amdgpu: extend halt_if_hws_hang to MES
Hang on MES timeout if halt_if_hws_hang is set to 1.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-04 16:05:53 -04:00
Dave Airlie 441f0ec0ae drm-misc-next for 6.2:
UAPI Changes:
 
 Cross-subsystem Changes:
 - dma-buf: locking improvements
 - firmware: New API in the RaspberryPi firmware driver used by vc4
 
 Core Changes:
 - client: Null pointer dereference fix in drm_client_buffer_delete()
 - mm/buddy: Add back random seed log
 - ttm: Convert ttm_resource to use size_t for its size, fix for an
   undefined behaviour
 
 Driver Changes:
 - bridge:
   - adv7511: use dev_err_probe
   - it6505: Fix return value check of pm_runtime_get_sync
 - panel:
   - sitronix: Fixes and clean-ups
 - lcdif: Increase DMA burst size
 - rockchip: runtime_pm improvements
 - vc4: Fix for a regression preventing the use of 4k @ 60Hz, and
   further HDMI rate constraints check.
 - vmwgfx: Cursor improvements
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCY2N8/gAKCRDj7w1vZxhR
 xT1TAQDiwdSyL/bzOk/WYggmj5wmFqb2PPFbFRt/DOTLl52ZdwEAmtE9I2WKvY8w
 sxZYF9gfFnQC3id7YwNs+CDb1kAangc=
 =6GBd
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2022-11-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 6.2:

UAPI Changes:

Cross-subsystem Changes:
- dma-buf: locking improvements
- firmware: New API in the RaspberryPi firmware driver used by vc4

Core Changes:
- client: Null pointer dereference fix in drm_client_buffer_delete()
- mm/buddy: Add back random seed log
- ttm: Convert ttm_resource to use size_t for its size, fix for an
  undefined behaviour

Driver Changes:
- bridge:
  - adv7511: use dev_err_probe
  - it6505: Fix return value check of pm_runtime_get_sync
- panel:
  - sitronix: Fixes and clean-ups
- lcdif: Increase DMA burst size
- rockchip: runtime_pm improvements
- vc4: Fix for a regression preventing the use of 4k @ 60Hz, and
  further HDMI rate constraints check.
- vmwgfx: Cursor improvements

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20221103083437.ksrh3hcdvxaof62l@houat
2022-11-04 12:33:04 +10:00
Christian König a82f30b04c drm/scheduler: rename dependency callback into prepare_job
This now matches much better what this is doing.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-14-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König 1728baa7e4 drm/amdgpu: use scheduler dependencies for CS
Entirely remove the sync obj in the job.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-11-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König 46e0270c71 drm/amdgpu: use scheduler dependencies for UVD msgs
Instead of putting that into the job sync object.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-10-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König aab9cf7b69 drm/amdgpu: use scheduler dependencies for VM updates
Instead of putting that into the job sync object.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-9-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König 1b2d5eda5a drm/amdgpu: move explicit sync check into the CS
This moves the memory allocation out of the critical code path.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-8-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König f7d66fb2ea drm/amdgpu: cleanup scheduler job initialization v2
Init the DRM scheduler base class while allocating the job.

This makes the whole handling much more cleaner.

v2: fix coding style

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-7-christian.koenig@amd.com
2022-11-03 12:45:20 +01:00
Christian König 940ca22b7e drm/amdgpu: drop amdgpu_sync from amdgpu_vmid_grab v2
Instead return the fence directly. Avoids memory allocation to store the
fence.

v2: cleanup coding style as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-6-christian.koenig@amd.com
2022-11-03 12:45:19 +01:00
Christian König c5093cddf5 drm/amdgpu: drop the fence argument from amdgpu_vmid_grab
This is always the job anyway.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-5-christian.koenig@amd.com
2022-11-03 12:45:19 +01:00
Christian König 4f91790b42 drm/amdgpu: use drm_sched_job_add_resv_dependencies for moves
Use the new common scheduler functions to figure out what to wait for.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-4-christian.koenig@amd.com
2022-11-03 12:45:19 +01:00
Gavin Wan 8fe8ce896c drm/amdgpu: Disable GPU reset on SRIOV before remove pci.
The recent change brought a bug on SRIOV envrionment. It caused
unloading amdgpu failed on Guest VM. The reason is that the VF
FLR was requested while unloading amdgpu driver, but the VF FLR
of SRIOV sequence is wrong while removing PCI device.

For SRIOV, the guest driver should not trigger the whole XGMI hive
to do the reset. Host driver control how the device been reset.

Fixes: f5c7e77970 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-02 17:16:25 -04:00
Graham Sider a3e5ce56f3 drm/amdgpu: disable GFXOFF during compute for GFX11
Temporary workaround to fix issues observed in some compute applications
when GFXOFF is enabled on GFX11.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-11-02 17:16:11 -04:00
Mario Limonciello 8d4de331f1 drm/amd: Fail the suspend if resources can't be evicted
If a system does not have swap and memory is under 100% usage,
amdgpu will fail to evict resources.  Currently the suspend
carries on proceeding to reset the GPU:

```
[drm] evicting device resources failed
[drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vcn_v3_0> failed -12
[drm] free PSP TMR buffer
[TTM] Failed allocating page table
[drm] evicting device resources failed
amdgpu 0000:03:00.0: amdgpu: MODE1 reset
amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
```

At this point if the suspend actually succeeded I think that amdgpu
would have recovered because the GPU would have power cut off and
restored.  However the kernel fails to continue the suspend from the
memory pressure and amdgpu fails to run the "resume" from the aborted
suspend.

```
ACPI: PM: Preparing to enter system sleep state S3
SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL|__GFP_ZERO)
  cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0
  node 0: slabs: 22, objs: 1122, free: 0
ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651)

[drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed!
[drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
[drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62
amdgpu 0000:03:00.0: PM: failed to resume async: error -62
```

To avoid this series of unfortunate events, fail amdgpu's suspend
when the memory eviction fails.  This will let the system gracefully
recover and the user can try suspend again when the memory pressure
is relieved.

Reported-by: post@davidak.de
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-02 17:00:39 -04:00
Graham Sider e542ca6e3e drm/amdgpu: correct MES debugfs versions
Use mes.sched_version, mes.kiq_version for debugfs as
mes.ucode_fw_version does not contain correct versioning information.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-02 16:57:32 -04:00
Yifan Zhang 89b3554782 drm/amdgpu: set fb_modifiers_not_supported in vkms
This patch to fix the gdm3 start failure with virual display:

/usr/libexec/gdm-x-session[1711]: (II) AMDGPU(0): Setting screen physical size to 270 x 203
/usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): Failed to make import prime FD as pixmap: 22
/usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): failed to set mode: Invalid argument
/usr/libexec/gdm-x-session[1711]: (WW) AMDGPU(0): Failed to set mode on CRTC 0
/usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): Failed to enable any CRTC
gnome-shell[1840]: Running GNOME Shell (using mutter 42.2) as a X11 window and compositing manager
/usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): failed to set mode: Invalid argument

vkms doesn't have modifiers support, set fb_modifiers_not_supported to bring the gdm back.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-02 16:56:13 -04:00
Yifan Zha 2c763f37d0 drm/amdgpu: Skip program gfxhub_v3_0_3 system aperture registers under SRIOV
[Why]
gfxhub_v3_0_3 system aperture registers are removed from RLCG register access range.

[How]
Skip access gfxhub_v3_0_3 system aperture registers under SRIOV VF.
These registers will be programmed on host side.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-01 11:45:56 -04:00
Yifan Zha e1a29b28e7 drm/amdgpu: Skip access SDMA0_F32_CNTL in sdma_v6_0_enable under SRIOV
[Why]
SDMA0_F32_CNTL is a PF_only regitser which will be blocked by L1.
RLCG will not program the register as well.

[How]
Skip to program SDMA0_F32_CNTL under SRIOV VF.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-01 11:45:50 -04:00
Yifan Zha 47a7470bb2 drm/amdgpu: Skip access GRBM_CNTL under SRIOV on gfx_v11
[Why]
GRBM_CNTL is a PF_only register on gfx_v11.
RLCG interface will return "out of range" under SRIOV VF.

[How]
Skip access GRBM_CNTL under gfx_v11 SRIOV VF.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-01 11:45:44 -04:00
Gavin Wan 2103c42198 drm/amdgpu: Disable GPU reset on SRIOV before remove pci.
The recent change brought a bug on SRIOV envrionment. It caused
unloading amdgpu failed on Guest VM. The reason is that the VF
FLR was requested while unloading amdgpu driver, but the VF FLR
of SRIOV sequence is wrong while removing PCI device.

For SRIOV, the guest driver should not trigger the whole XGMI hive
to do the reset. Host driver control how the device been reset.

Fixes: f5c7e77970 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-01 11:44:49 -04:00
Candice Li 9d1b073d01 drm/amdgpu: Enable GFX RAS feature for gfx v11_0_3
v1: Support gfx ras feature enablement for gfx v11_0_3.
v2: Update function name and error message.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-01 11:44:33 -04:00
Mukul Joshi d69a3b762d drm/amdkfd: Cleanup kfd_dev struct
Cleanup kfd_dev struct by removing ddev and pdev as both
drm_device and pci_dev can be fetched from amdgpu_device.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:09 -04:00
Graham Sider 087b8542c0 drm/amdgpu: disable GFXOFF during compute for GFX11
Temporary workaround to fix issues observed in some compute applications
when GFXOFF is enabled on GFX11.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:09 -04:00
Mario Limonciello 7863c15526 drm/amd: Fail the suspend if resources can't be evicted
If a system does not have swap and memory is under 100% usage,
amdgpu will fail to evict resources.  Currently the suspend
carries on proceeding to reset the GPU:

```
[drm] evicting device resources failed
[drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vcn_v3_0> failed -12
[drm] free PSP TMR buffer
[TTM] Failed allocating page table
[drm] evicting device resources failed
amdgpu 0000:03:00.0: amdgpu: MODE1 reset
amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
```

At this point if the suspend actually succeeded I think that amdgpu
would have recovered because the GPU would have power cut off and
restored.  However the kernel fails to continue the suspend from the
memory pressure and amdgpu fails to run the "resume" from the aborted
suspend.

```
ACPI: PM: Preparing to enter system sleep state S3
SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL|__GFP_ZERO)
  cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0
  node 0: slabs: 22, objs: 1122, free: 0
ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651)

[drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed!
[drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
[drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62
amdgpu 0000:03:00.0: PM: failed to resume async: error -62
```

To avoid this series of unfortunate events, fail amdgpu's suspend
when the memory eviction fails.  This will let the system gracefully
recover and the user can try suspend again when the memory pressure
is relieved.

Reported-by: post@davidak.de
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:09 -04:00
Candice Li c9bdc6c3cf drm/amdgpu: Add EEPROM I2C address support for ip discovery
1. Update EEPROM_I2C_MADDR_SMU_13_0_0 to EEPROM_I2C_MADDR_54H
2. Add EEPROM I2C address support for smu v13_0_0 and v13_0_10.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:09 -04:00
Candice Li bc22f8ec46 drm/amdgpu: Update ras eeprom support for smu v13_0_0 and v13_0_10
Enable RAS EEPROM support for smu v13_0_0 and v13_0_10.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Candice Li 896b7addf2 drm/amdgpu: Optimize TA load/unload/invoke debugfs interfaces
1. Add a function pointer structure ta_funcs to psp context
2. Make the interfaces generic to all TAs
3. Leverage exisitng TA context and remove unused functions
4. Fix return code bugs

v2: Add comments for ta funcs macros and correct typo

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Candice Li bf7d777289 drm/amdgpu: Optimize RAS TA initialization and TA unload funcs
1. Save TA unload psp response status
2. Add RAS TA loading status check for initializaiton
3. Drop RAS context teardown to allow RAS TA to be reloaded

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Graham Sider 6040517e4a drm/amdgpu: remove deprecated MES version vars
MES scheduler and kiq versions are stored in mes.sched_version and
mes.kiq_version, respectively, which are read from a register after
their queues are initialized. Remove mes.ucode_fw_version and
mes.data_fw_version which tried to read this versioning info from the
firmware headers (which don't contain this information).

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Graham Sider 1d522b51e3 drm/amdgpu: correct MES debugfs versions
Use mes.sched_version, mes.kiq_version for debugfs as
mes.ucode_fw_version does not contain correct versioning information.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Alan Liu 7117007eda drm/amdgpu: Move the mutex_lock to protect the return status of securedisplay command buffer
[Why]
Before we call psp_securedisplay_invoke(), we call
psp_prep_securedisplay_cmd_buf() to prepare and initialize the command
buffer.

However, we didn't use the mutex_lock to protect the status of command
buffer. So when multiple threads are using the command buffer, after
thread A return from psp_securedisplay_invoke() and the command buffer
status is set to SUCCESS, another thread B may call
psp_prep_securedisplay_cmd_buf() and initialize the status to FAILURE
again, and cause Thread A to get a failure return status.

[How]
Move the mutex_lock out of psp_securedisplay_invoke() to its caller to
cover psp_prep_securedisplay_cmd_buf() and the code checking the return
status of command buffer.

Signed-off-by: Alan Liu <HaoPing.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Tao Zhou 1ed0e17690 drm/amdgpu: remove ras_error_status parameter for UMC poison handler
Make the code simpler.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Tao Zhou ae45a18b80 drm/amdgpu: add RAS poison handling for MCA
For MCA poison, if unmap queue fails, only gpu reset should be
triggered without page retirement handling, MCA notifier will do it.

v2: handle MCA poison consumption in umc_poison_handler directly.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Tao Zhou 24b822928b drm/amdgpu: use page retirement API in MCA notifier
Make the code more readable.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Tao Zhou cbe4d43ea5 drm/amdgpu: add RAS page retirement functions for MCA
Define page retirement functions for MCA platform.

v2: remove page retirement handling from MCA poison handler,
    let MCA notifier do page retirement.

v3: remove specific poison handler for MCA to simplify code.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 15:12:08 -04:00
Yifan Zhang 887e8cec55 drm/amdgpu: set fb_modifiers_not_supported in vkms
This patch to fix the gdm3 start failure with virual display:

/usr/libexec/gdm-x-session[1711]: (II) AMDGPU(0): Setting screen physical size to 270 x 203
/usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): Failed to make import prime FD as pixmap: 22
/usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): failed to set mode: Invalid argument
/usr/libexec/gdm-x-session[1711]: (WW) AMDGPU(0): Failed to set mode on CRTC 0
/usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): Failed to enable any CRTC
gnome-shell[1840]: Running GNOME Shell (using mutter 42.2) as a X11 window and compositing manager
/usr/libexec/gdm-x-session[1711]: (EE) AMDGPU(0): failed to set mode: Invalid argument

vkms doesn't have modifiers support, set fb_modifiers_not_supported to bring the gdm back.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27 14:46:02 -04:00
Somalapuram Amaranath e3c92eb4a8 drm/ttm: rework on ttm_resource to use size_t type
Change ttm_resource structure from num_pages to size_t size in bytes.
v1 -> v2: change PFN_UP(dst_mem->size) to ttm->num_pages
v1 -> v2: change bo->resource->size to bo->base.size at some places
v1 -> v2: remove the local variable
v1 -> v2: cleanup cmp_size_smaller_first()
v2 -> v3: adding missing PFN_UP in ttm_bo_vm_fault_reserved

Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221027091237.983582-1-Amaranath.Somalapuram@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2022-10-27 11:42:58 +02:00
Prike Liang d61e1d1d52 drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume
In the S2idle suspend/resume phase the gfxoff is keeping functional so
some IP blocks will be likely to reinitialize at gfxoff entry and that
will result in failing to program GC registers.Therefore, let disallow
gfxoff until AMDGPU IPs reinitialized completely.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.15.x
2022-10-26 17:48:43 -04:00
Dave Airlie b837d3db9a drm-misc-next for 6.2:
UAPI Changes:
   - Documentation for page-flip flags
 
 Cross-subsystem Changes:
   - dma-buf: Add unlocked variant of vmapping and attachment-mapping
     functions
 
 Core Changes:
   - atomic-helpers: CRTC primary plane test fixes
   - connector: TV API consistency improvements, cmdline parsing
     improvements
   - crtc-helpers: Introduce drm_crtc_helper_atomic_check() helper
   - edid: Fixes for HFVSDB parsing,
   - fourcc: Addition of the Vivante tiled modifier
   - makefile: Sort and reorganize the objects files
   - mode_config: Remove fb_base from drm_mode_config_funcs
   - sched: Add a module parameter to change the scheduling policy,
     refcounting fix for fences
   - tests: Sort the Kunit tests in the Makefile, improvements to the
     DP-MST tests
   - ttm: Remove unnecessary drm_mm_clean() call
 
 Driver Changes:
   - New driver: ofdrm
   - Move all drivers to a common dma-buf locking convention
   - bridge:
     - adv7533: Remove dynamic lane switching
     - it6505: Runtime PM support
     - ps8640: Handle AUX defer messages
     - tc358775: Drop soft-reset over I2C
   - ast: Atomic Gamma LUT Support, Convert to SHMEM, various
     improvements
   - lcdif: Support for YUV planes
   - mgag200: Fix PLL Setup on some revisions
   - udl: Modesetting improvements, hot-unplug support
   - vc4: Fix support for PAL-M
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCY1D3fQAKCRDj7w1vZxhR
 xZZgAPoCSfyU3+M3ALT0vQ/OpktE5tuwUBMac2Lxkqgx4dnQ0gEAnYeez0Dedod8
 HNdgBH7FTklYa/zT8n17SwHUOzJJ5gc=
 =YvuW
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 6.2:

UAPI Changes:
  - Documentation for page-flip flags

Cross-subsystem Changes:
  - dma-buf: Add unlocked variant of vmapping and attachment-mapping
    functions

Core Changes:
  - atomic-helpers: CRTC primary plane test fixes
  - connector: TV API consistency improvements, cmdline parsing
    improvements
  - crtc-helpers: Introduce drm_crtc_helper_atomic_check() helper
  - edid: Fixes for HFVSDB parsing,
  - fourcc: Addition of the Vivante tiled modifier
  - makefile: Sort and reorganize the objects files
  - mode_config: Remove fb_base from drm_mode_config_funcs
  - sched: Add a module parameter to change the scheduling policy,
    refcounting fix for fences
  - tests: Sort the Kunit tests in the Makefile, improvements to the
    DP-MST tests
  - ttm: Remove unnecessary drm_mm_clean() call

Driver Changes:
  - New driver: ofdrm
  - Move all drivers to a common dma-buf locking convention
  - bridge:
    - adv7533: Remove dynamic lane switching
    - it6505: Runtime PM support
    - ps8640: Handle AUX defer messages
    - tc358775: Drop soft-reset over I2C
  - ast: Atomic Gamma LUT Support, Convert to SHMEM, various
    improvements
  - lcdif: Support for YUV planes
  - mgag200: Fix PLL Setup on some revisions
  - udl: Modesetting improvements, hot-unplug support
  - vc4: Fix support for PAL-M

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20221020072405.g3o4hxuk75gmeumw@houat
2022-10-25 11:42:18 +10:00
YuBiao Wang e105b6212f drm/amdgpu: skip mes self test for gc 11.0.3 in recover
Temporary disable mes self teset for gc 11.0.3 during gpu_recovery.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:44:03 -04:00
David Francis 68bc147363 drm/amd: Add IMU fw version to fw version queries
IMU is a new firmware for GFX11.

There are four means by which firmware version can be queried
from the driver: device attributes, vf2pf, debugfs,
and the AMDGPU_INFO_FW_VERSION option in the amdgpu info ioctl.

Add IMU as an option for those four methods.

V2: Added debugfs

Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:44:03 -04:00
Kenneth Feng 08841950db drm/amd/pm: allow gfxoff on gc_11_0_3
allow gfxoff on gc_11_0_3

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:44:03 -04:00
Rafael Mendonca 90bfee142a drm/amdkfd: Fix memory leak in kfd_mem_dmamap_userptr()
If the number of pages from the userptr BO differs from the SG BO then the
allocated memory for the SG table doesn't get freed before returning
-EINVAL, which may lead to a memory leak in some error paths. Fix this by
checking the number of pages before allocating memory for the SG table.

Fixes: 264fb4d332 ("drm/amdgpu: Add multi-GPU DMA mapping helpers")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:44:03 -04:00
Lijo Lazar d2c4c1569a drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x
MMHUB 2.1.x versions don't have ATCL2. Remove accesses to ATCL2 registers.

Since they are non-existing registers, read access will cause a
'Completer Abort' and gets reported when AER is enabled with the below patch.
Tagging with the patch so that this is backported along with it.

v2: squash in uninitialized warning fix (Nathan Chancellor)

Fixes: 8795e182b0 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-10-24 14:44:03 -04:00
wangjianli 12024b1761 amd/amdgpu: fix repeated words in comments
Delete the redundant word 'the'.

Signed-off-by: wangjianli <wangjianli@cdjrlc.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:38:46 -04:00
Prike Liang f543d28687 drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume
In the S2idle suspend/resume phase the gfxoff is keeping functional so
some IP blocks will be likely to reinitialize at gfxoff entry and that
will result in failing to program GC registers.Therefore, let disallow
gfxoff until AMDGPU IPs reinitialized completely.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:34:48 -04:00
YuBiao Wang 693073a04d drm/amdgpu: skip mes self test for gc 11.0.3 in recover
Temporary disable mes self teset for gc 11.0.3 during gpu_recovery.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:34:47 -04:00
David Francis b72362962a drm/amd: Add IMU fw version to fw version queries
IMU is a new firmware for GFX11.

There are four means by which firmware version can be queried
from the driver: device attributes, vf2pf, debugfs,
and the AMDGPU_INFO_FW_VERSION option in the amdgpu info ioctl.

Add IMU as an option for those four methods.

V2: Added debugfs

Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:34:47 -04:00
Alex Deucher 6c16afdcec drm/amdgpu: fix sdma doorbell init ordering on APUs
Commit 8795e182b0 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
uncovered a bug in amdgpu that required a reordering of the driver
init sequence to avoid accessing a special register on the GPU
before it was properly set up leading to an PCI AER error.  This
reordering uncovered a different hw programming ordering dependency
in some APUs where the SDMA doorbells need to be programmed before
the GFX doorbells. To fix this, move the SDMA doorbell programming
back into the soc15 common code, but use the actual doorbell range
values directly rather than the values stored in the ring structure
since those will not be initialized at this point.

This is a partial revert, but with the doorbell assignment
fixed so the proper doorbell index is set before it's used.

Fixes: e3163bc8ff ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: skhan@linuxfoundation.org
2022-10-24 14:34:47 -04:00
Kenneth Feng fa16dec204 drm/amd/pm: allow gfxoff on gc_11_0_3
allow gfxoff on gc_11_0_3

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:34:47 -04:00
Rafael Mendonca 7b5a4d7b9e drm/amdkfd: Fix memory leak in kfd_mem_dmamap_userptr()
If the number of pages from the userptr BO differs from the SG BO then the
allocated memory for the SG table doesn't get freed before returning
-EINVAL, which may lead to a memory leak in some error paths. Fix this by
checking the number of pages before allocating memory for the SG table.

Fixes: 264fb4d332 ("drm/amdgpu: Add multi-GPU DMA mapping helpers")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-24 14:34:27 -04:00
Lijo Lazar 20293269d8 drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x
MMHUB 2.1.x versions don't have ATCL2. Remove accesses to ATCL2 registers.

Since they are non-existing registers, read access will cause a
'Completer Abort' and gets reported when AER is enabled with the below patch.
Tagging with the patch so that this is backported along with it.

v2: squash in uninitialized warning fix (Nathan Chancellor)

Fixes: 8795e182b0 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-10-24 14:34:26 -04:00
Yiqing Yao 226dcfad34 drm/amdgpu: Adjust MES polling timeout for sriov
[why]
MES response time in sriov may be longer than default value
due to reset or init in other VF. A timeout value specific
to sriov is needed.

[how]
When in sriov, adjust the timeout value to calculated
worst case scenario.

Signed-off-by: Yiqing Yao <yiqing.yao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-21 16:13:26 -04:00
Chengming Gui 79610d3041 drm/amdgpu: fix pstate setting issue
[WHY]
0, original pstate X
1, ctx_A_create -> ctx_A->stable_pstate = X
2, ctx_A_set_pstate (Y) -> current pstate is Y (PEAK or STANDARD)
3, ctx_B_create -> ctx_B->stable_pstate =  Y
4, ctx_A_destroy -> restore pstate to X
5, ctx_B_destroy -> restore pstate to Y
Above sequence will cause final pstate is wrong (Y), should be original X.

[HOW]
When ctx_B create,
if  ctx_A touched pstate setting
(not auto, stable_pstate_ctx != NULL),
set ctx_B->stable_pstate the same value as ctx_A saved,
if stable_pstate_ctx == NULL,
fetch current pstate to fill
ctx_B->stable_pstate.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-10-21 16:12:09 -04:00
Yiqing Yao bb3c846ad2 drm/amdgpu: Adjust MES polling timeout for sriov
[why]
MES response time in sriov may be longer than default value
due to reset or init in other VF. A timeout value specific
to sriov is needed.

[how]
When in sriov, adjust the timeout value to calculated
worst case scenario.

Signed-off-by: Yiqing Yao <yiqing.yao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-21 15:45:43 -04:00
Chengming Gui 8a7a5b5f23 drm/amdgpu: fix pstate setting issue
[WHY]
0, original pstate X
1, ctx_A_create -> ctx_A->stable_pstate = X
2, ctx_A_set_pstate (Y) -> current pstate is Y (PEAK or STANDARD)
3, ctx_B_create -> ctx_B->stable_pstate =  Y
4, ctx_A_destroy -> restore pstate to X
5, ctx_B_destroy -> restore pstate to Y
Above sequence will cause final pstate is wrong (Y), should be original X.

[HOW]
When ctx_B create,
if  ctx_A touched pstate setting
(not auto, stable_pstate_ctx != NULL),
set ctx_B->stable_pstate the same value as ctx_A saved,
if stable_pstate_ctx == NULL,
fetch current pstate to fill
ctx_B->stable_pstate.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-21 15:44:56 -04:00
Dave Airlie cbc543c59e drm-misc-fixes for v6.1-rc2:
- Fix a buffer overflow in format_helper_test.
 - Set DDC pointer in drmm_connector_init.
 - Compiler fixes for panfrost.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmNRMiEACgkQ/lWMcqZw
 E8MJExAAhVv/bEcg9Ib1e2gIirtOKB3w0V20rEvkvplzsfKALMH3oXHx0cIbk2sj
 YBICPiuc2x4sDQW9xlQmPa5Gv+6aoqjSOpu+Zpc5MQAb8qnO/vxDCOlYDJwcicjP
 DdxXwJcJ48+x3FFWmrg6gcU8fmH6Rckb2BgBsF3fyzOhIMF2ME6a02S+bYuHNOxK
 sQfo1Tlbibi5pfrxHFBE9V+Wjf3ohQwxQHmcpAC6wwgtGKOLQSxkU7o+wCFUNsn1
 dlWTVe9+2xgNQenTue091PAkFP5R4T3wv654EPudyUYjybQL8ci8aFjaFTQgR3BT
 s4LsViTViTbm5OrSga2hGA+GGH2OGkW70yN+tqXPVxWyPaMA9GuoTh9xKhY3zVsB
 69HlWBzfhAoyixp0rWmilZIGAvKXn8xf+HzOxQ/ihDk6a/VsyLbEdYP72AQTRCi8
 A+h6vBNxP6AVPwDCA//hpCupG75bI8h/Plf1R/W7uzVTKXCSPAGdzROI8dxjsFAX
 Y4OR9kd8yn+bpf6G6D0q+tJV8BqUAzs5AUVkXWU5i1XaAK6fNpqh066yFQt8xCHX
 hFRcr7StYqI9ZliGP1ZEjE8nsiaqPZfDLMqSQlHl392JEfmf5uwskKwVadGHVfuC
 L+iATevGiNsY4JHHk+YBXHMSWrX9XIRNDio+LZQgFttr/KTK4ac=
 =rzib
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2022-10-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v6.1-rc2:
- Fix a buffer overflow in format_helper_test.
- Set DDC pointer in drmm_connector_init.
- Compiler fixes for panfrost.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c4d05683-8ebe-93b8-d24c-d1d2c68f12c4@linux.intel.com
2022-10-21 09:56:14 +10:00
Alex Deucher 50b0e4d4da drm/amdgpu: fix sdma doorbell init ordering on APUs
Commit 8795e182b0 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
uncovered a bug in amdgpu that required a reordering of the driver
init sequence to avoid accessing a special register on the GPU
before it was properly set up leading to an PCI AER error.  This
reordering uncovered a different hw programming ordering dependency
in some APUs where the SDMA doorbells need to be programmed before
the GFX doorbells. To fix this, move the SDMA doorbell programming
back into the soc15 common code, but use the actual doorbell range
values directly rather than the values stored in the ring structure
since those will not be initialized at this point.

This is a partial revert, but with the doorbell assignment
fixed so the proper doorbell index is set before it's used.

Fixes: e3163bc8ff ("drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: skhan@linuxfoundation.org
Cc: stable@vger.kernel.org
2022-10-20 09:35:51 -04:00
Thomas Zimmermann 1aca5ce036 Merge drm/drm-fixes into drm-misc-fixes
Backmerging to get v6.1-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2022-10-20 09:09:00 +02:00
Zack Rusin 7c99616e3f drm: Remove drm_mode_config::fb_base
The fb_base in struct drm_mode_config has been unused for a long time.
Some drivers set it and some don't leading to a very confusing state
where the variable can't be relied upon, because there's no indication
as to which driver sets it and which doesn't.

The only usage of fb_base is internal to two drivers so instead of trying
to force it into all the drivers to get it into a coherent state
completely remove it.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Thomas Zimmermann <tzimemrmann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019024401.394617-1-zack@kde.org
2022-10-19 21:46:16 -04:00
Christian König 01f2cf5384 drm/amdgpu: use DRM_SCHED_FENCE_DONT_PIPELINE for VM updates
Make sure that we always have a CPU round trip to let the submission
code correctly decide if a TLB flush is necessary or not.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-2-christian.koenig@amd.com
2022-10-19 12:45:00 +02:00
Arunpravin Paneer Selvam 8273b40486 drm/amdgpu: Fix for BO move issue
A user reported a bug on CAPE VERDE system where uvd_v3_1
IP component failed to initialize as there is an issue with
BO move code from one memory to other.

In function amdgpu_mem_visible() called by amdgpu_bo_move(),
when there are no blocks to compare or if we have a single
block then break the loop.

Fixes: 312b4dc11d ("drm/amdgpu: Fix VRAM BO swap issue")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:14:07 -04:00
YuBiao Wang 2abe92c7ad drm/amdgpu: dequeue mes scheduler during fini
[Why]
If mes is not dequeued during fini, mes will be in an uncleaned state
during reload, then mes couldn't receive some commands which leads to
reload failure.

[How]
Perform MES dequeue via MMIO after all the unmap jobs are done by mes
and before kiq fini.

v2: Move the dequeue operation inside kiq_hw_fini.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:59 -04:00
Yifan Zha 97a3d6090f drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11
[Why]
L1 blocks most of GC registers accessing by MMIO.

[How]
Use RLCG interface to program GC registers under SRIOV VF in full access time.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:13:24 -04:00
Evan Quan 3059cd8c5f drm/amd/pm: disable cstate feature for gpu reset scenario
Suggested by PMFW team and same as what did for gfxoff feature.
This can address some Mode1Reset failures observed on SMU13.0.0.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:20 -04:00
YiPeng Chai 3bd026c3e3 drm/amdgpu: Add sriov vf ras support in amdgpu_ras_asic_supported
V2:
Add sriov vf ras support in amdgpu_ras_asic_supported.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:11:48 -04:00
YiPeng Chai 001ebcf5b9 drm/amdgpu: Enable ras support for mp0 v13_0_0 and v13_0_10
V1:
Enable ras support for CHIP_IP_DISCOVERY asic type.

V2:
1. Change commit comment.
2. Enable ras support for mp0 v13_0_0 and v13_0_10.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:11:37 -04:00
YiPeng Chai 7cd3f6c3ac drm/amdgpu: Enable gmc soft reset on gmc_v11_0_3
Enable gmc soft reset on gmc_v11_0_3.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:48 -04:00
Likun Gao b7a76a2914 drm/amdgpu: skip mes self test for gc 11.0.3
Temporary disable mes self teset for gc 11.0.3.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:41 -04:00
Kenneth Feng 657e07221c drm/amd/amdgpu: enable gfx clock gating features on smu_v13_0_10
enable gfx clock gating features on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:09:23 -04:00
Victor Zhao a31e62873f drm/amdgpu: Refactor mode2 reset logic for v11.0.7
- refactor mode2 on v11.0.7 to align with aldebaran
- comment out using mode2 reset as default for now, will introduce
another controller to replace previous reset_level_mask

v2: squash in unused variable removal (Alex)

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:40 -04:00
Victor Zhao a340847b02 Revert "drm/amdgpu: let mode2 reset fallback to default when failure"
This reverts commit dac6b80818.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:33 -04:00
Victor Zhao afbaa15501 Revert "drm/amdgpu: add debugfs amdgpu_reset_level"
This reverts commit 5bd8d53f6f.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f6f ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:08:25 -04:00
Danijel Slivka 65f8682b9a drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.

v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT
which indicates that VF MMIO write access is not allowed in sriov runtime

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:07:58 -04:00
Arunpravin Paneer Selvam df768a9770 drm/amdgpu: Fix for BO move issue
A user reported a bug on CAPE VERDE system where uvd_v3_1
IP component failed to initialize as there is an issue with
BO move code from one memory to other.

In function amdgpu_mem_visible() called by amdgpu_bo_move(),
when there are no blocks to compare or if we have a single
block then break the loop.

Fixes: 312b4dc11d ("drm/amdgpu: Fix VRAM BO swap issue")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 14:52:49 -04:00
YuBiao Wang c4dfad81e4 drm/amdgpu: dequeue mes scheduler during fini
[Why]
If mes is not dequeued during fini, mes will be in an uncleaned state
during reload, then mes couldn't receive some commands which leads to
reload failure.

[How]
Perform MES dequeue via MMIO after all the unmap jobs are done by mes
and before kiq fini.

v2: Move the dequeue operation inside kiq_hw_fini.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 14:52:40 -04:00
Yifan Zha 7e2c58320e drm/amdgpu: Program GC registers through RLCG interface in gfx_v11/gmc_v11
[Why]
L1 blocks most of GC registers accessing by MMIO.

[How]
Use RLCG interface to program GC registers under SRIOV VF in full access time.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 14:52:06 -04:00
Fabio M. De Francesco a2c554262d drm/amd/amdgpu: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page().

There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and are still valid.

Since its use in amdgpu/amdgpu_ttm.c is safe, it should be preferred.

Therefore, replace kmap() with kmap_local_page() in amdgpu/amdgpu_ttm.c.

Suggested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:21 -04:00
Hawking Zhang 6c0ca74820 drm/amdgpu: move convert_error_address out of umc_ras
RAS error address translation algorithm is common
across dGPU and A + A platform as along as the SOC
integrates the same generation of UMC IP.

UMC RAS is managed by x86 MCA on A + A platform,
umc_ras in GPU driver is not initialized at all on
A + A platform. In such case, any umc_ras callback
implemented for dGPU config shouldn't be invoked
from A + A specific callback.

The change moves convert_error_address out of dGPU
umc_ras structure and makes it share between A + A
and dGPU config.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:21 -04:00
Evan Quan b31d6ada83 drm/amd/pm: disable cstate feature for gpu reset scenario
Suggested by PMFW team and same as what did for gfxoff feature.
This can address some Mode1Reset failures observed on SMU13.0.0.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-10-17 17:41:21 -04:00
YiPeng Chai 82835055c6 drm/amdgpu: Add sriov vf ras support in amdgpu_ras_asic_supported
V2:
Add sriov vf ras support in amdgpu_ras_asic_supported.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
YiPeng Chai 073285efde drm/amdgpu: Enable ras support for mp0 v13_0_0 and v13_0_10
V1:
Enable ras support for CHIP_IP_DISCOVERY asic type.

V2:
1. Change commit comment.
2. Enable ras support for mp0 v13_0_0 and v13_0_10.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
YiPeng Chai 2e26bf1e46 drm/amdgpu: Enable gmc soft reset on gmc_v11_0_3
Enable gmc soft reset on gmc_v11_0_3.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
Likun Gao bbce8cdb83 drm/amdgpu: skip mes self test for gc 11.0.3
Temporary disable mes self teset for gc 11.0.3.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
Kenneth Feng 4ecdb30ec4 drm/amd/amdgpu: enable gfx clock gating features on smu_v13_0_10
enable gfx clock gating features on smu_v13_0_10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
Victor Zhao 16e3116124 drm/amdgpu: Refactor mode2 reset logic for v11.0.7
- refactor mode2 on v11.0.7 to align with aldebaran
- comment out using mode2 reset as default for now, will introduce
another controller to replace previous reset_level_mask

v2: squash in unused variable removal (Alex)

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
Victor Zhao b98a1648d6 Revert "drm/amdgpu: let mode2 reset fallback to default when failure"
This reverts commit dac6b80818.

This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with
the original design of reset handler. Will redesign it.

Fixes: dac6b80818 ("drm/amdgpu: let mode2 reset fallback to default when failure")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
Victor Zhao 6aa5893926 Revert "drm/amdgpu: add debugfs amdgpu_reset_level"
This reverts commit 5bd8d53f6f.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f6f ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
Danijel Slivka a7310d8de3 drm/amdgpu: set vm_update_mode=0 as default for Sienna Cichlid in SRIOV case
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to mmBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.

v3: introduce virtualization capability flag AMDGPU_VF_MMIO_ACCESS_PROTECT
which indicates that VF MMIO write access is not allowed in sriov runtime

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
Yang Yingliang 7fe441d8b7 drm/amdgpu/si_dma: remove unused variable in si_dma_stop()
After commit 571c053658 ("drm/amdgpu: switch sdma buffer function
tear down to a helper"), the variable 'ring' is not used anymore, it
can be removed.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:20 -04:00
Alex Deucher eb1670787e drm/amdgpu: convert amdgpu_amdkfd_gpuvm.c to IP version checks
For consistency with the rest of the code.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:19 -04:00
Alex Deucher 886f1816c2 drm/amdgpu: convert vega20_ih.c to IP version checks
For consistency with newer asics.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:19 -04:00
Hawking Zhang e9ff000b5a drm/amdgpu: update psp_fw_type enum in amdgpu_ucode header
To match with the definition in psp firmware

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:19 -04:00
Hawking Zhang 7a94c8602f drm/amdgpu: extend HWIP_MAX_INSTANCE to 28
more ip instances are available

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:19 -04:00
Yifan Zhang bfa8cb055f drm/amdgpu: allow secure submission on gfx11 and sdma6
This patch to allow secure submission on gfx11 and sdma6.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:19 -04:00
Yifan Zhang 9707421691 drm/amdgpu: add tmz support for GC 11.0.1
this patch to add tmz support for GC 11.0.1.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-17 17:41:19 -04:00
Linus Torvalds 9c9155a350 drm fixes for 6.1-rc1
amdgpu:
 - DC mutex fix
 - DC SubVP fixes
 - DCN 3.2.x fixes
 - DCN 3.1.x fixes
 - SDMA 6.x fixes
 - Enable DPIA for 3.1.4
 - VRR fixes
 - VRAM BO swapping fix
 - Revert dirty fb helper change
 - SR-IOV suspend/resume fixes
 - Work around GCC array bounds check fail warning
 - UMC 8.10 fixes
 - Misc fixes and cleanups
 
 i915:
 - Round to closest in g4x+ HDMI clock readout
 - Update MOCS table for EHL
 - Fix PSR_IMR/IIR field handling
 - Fix watermark calculations for gen12+/DG2 modifiers
 - Reject excessive dotclocks early
 - Fix revocation of non-persistent contexts
 - Handle migration for dpt
 - Fix display problems after resume
 - Allow control over the flags when migrating
 - Consider DG2_RC_CCS_CC when migrating buffers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmNIrI0ACgkQDHTzWXnE
 hr4muRAAm6mywF0G9LhtVuYhpl5A8EZR5as4PxE1nSdLs4t1+OSJ6MfJoY/Ihvc6
 LMzMadDMHoZbY/9z0MZDJEfeBSTRjTu1+g8u63d0tMf0YqI8/3D53sOodlULftc4
 9YwjyB4G2xol/fk81Z5IgIcFilW5M8OjqMHuswxb0k92aF3arkHkVT+w0Njd26dK
 Fqxm7Z8FbdQFlWeZLaY43K4CGQUqMtg2Xcc+F/ciNAs9a6QwTudaenchSdXMBzCW
 OMv14KLDG3YO+tndOESmKVE9qJ0X/7J8mzUpxfzbcAalGJQpctTGPiy/UBLQvOzx
 BKOSCVRg+1LPRnpnj9nZWCrs5oWCaDWr1XjYNP7za8wukHIr3sGRST1sDkVqzeKw
 Ct62V9H5ix3I9UG/QXSguY30Vq/w7zPJQuQy+CWNKGjsXR8hVyCc2NTIk1FlDMww
 vRjOD9stbSdX5hGS/lSi2xZ9ERPi2L73bPQblosKl3Gi65kScvdzl2F9PSuMGsik
 rK1PB2I7dj1gJp4f8RmQZCrN0UlfH/YAVn5rpZSejAE+mFGG0/qzgBkIkNhAHB9s
 3cg/EiDRbGQ6zxffE6GGP4HN6E9vVtuOVJl8sGR90aDRvVyTXKgTXiXRfHGc9BZj
 xc6sRPQjjWvHVopN354A9bJs3InT9rwEOw/0PHhK1GXE4htkGpc=
 =X1UY
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm

Pull more drm updates from Dave Airlie:
 "Round of fixes for the merge window stuff, bunch of amdgpu and i915
  changes, this should have the gcc11 warning fix, amongst other
  changes.

  amdgpu:
   - DC mutex fix
   - DC SubVP fixes
   - DCN 3.2.x fixes
   - DCN 3.1.x fixes
   - SDMA 6.x fixes
   - Enable DPIA for 3.1.4
   - VRR fixes
   - VRAM BO swapping fix
   - Revert dirty fb helper change
   - SR-IOV suspend/resume fixes
   - Work around GCC array bounds check fail warning
   - UMC 8.10 fixes
   - Misc fixes and cleanups

  i915:
   - Round to closest in g4x+ HDMI clock readout
   - Update MOCS table for EHL
   - Fix PSR_IMR/IIR field handling
   - Fix watermark calculations for gen12+/DG2 modifiers
   - Reject excessive dotclocks early
   - Fix revocation of non-persistent contexts
   - Handle migration for dpt
   - Fix display problems after resume
   - Allow control over the flags when migrating
   - Consider DG2_RC_CCS_CC when migrating buffers"

* tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm: (110 commits)
  drm/amd/display: Add HUBP surface flip interrupt handler
  drm/i915/display: consider DG2_RC_CCS_CC when migrating buffers
  drm/i915: allow control over the flags when migrating
  drm/amd/display: Simplify bool conversion
  drm/amd/display: fix transfer function passed to build_coefficients()
  drm/amd/display: add a license to cursor_reg_cache.h
  drm/amd/display: make virtual_disable_link_output static
  drm/amd/display: fix indentation in dc.c
  drm/amd/display: make dcn32_split_stream_for_mpc_or_odm static
  drm/amd/display: fix build error on arm64
  drm/amd/display: 3.2.207
  drm/amd/display: Clean some DCN32 macros
  drm/amdgpu: Add poison mode query for umc v8_10_0
  drm/amdgpu: Update umc v8_10_0 headers
  drm/amdgpu: fix coding style issue for mca notifier
  drm/amdgpu: define convert_error_address for umc v8.7
  drm/amdgpu: define RAS convert_error_address API
  drm/amdgpu: remove check for CE in RAS error address query
  drm/i915: Fix display problems after resume
  drm/amd/display: fix array-bounds error in dc_stream_remove_writeback() [take 2]
  ...
2022-10-13 21:56:34 -07:00
Candice Li 832e72dd0d drm/amdgpu: Add poison mode query for umc v8_10_0
Add poison mode query support on umc v8_10_0.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-11 11:05:41 -04:00
Tao Zhou 38dbbfa57c drm/amdgpu: fix coding style issue for mca notifier
Fix some issues found by checkpatch script.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-11 11:05:29 -04:00
Tao Zhou fb4d5891ce drm/amdgpu: define convert_error_address for umc v8.7
So the code can be simplified.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-11 11:05:23 -04:00
Tao Zhou 44420ac5f8 drm/amdgpu: define RAS convert_error_address API
Make the code reusable and remove redundant code.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-11 11:05:17 -04:00
Tao Zhou cdbb816b5b drm/amdgpu: remove check for CE in RAS error address query
Only RAS UE error address is queried currently, no need to check CE status.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-11 11:04:55 -04:00
Alex Deucher a98cec220a drm/amdgpu: fix SDMA suspend/resume on SR-IOV
Update all SDMA versions that support SR-IOV to properly
tear down the ttm buffer functions on suspend.

Tested-by: Bokun Zhang <Bokun.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-10 17:32:56 -04:00
Alex Deucher 571c053658 drm/amdgpu: switch sdma buffer function tear down to a helper
Switch all of the SDMA implementations to use the helper to
tear down the ttm buffer manager.

Tested-by: Bokun Zhang <Bokun.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-10 17:32:56 -04:00
Bokun Zhang e5da651985 drm/amdgpu: Fix SDMA engine resume issue under SRIOV
- Under SRIOV, SDMA engine is shared between VFs. Therefore,
  we will not stop SDMA during hw_fini. This is not an issue
  with normal dirver loading and unloading.

- However, when we put the SDMA engine to suspend state and resume
  it, the issue starts to show up. Something could attempt to use
  that SDMA engine to clear or move memory before the engine is
  initialized since the DRM entity is still there.

- Therefore, we will call sdma_v5_2_enable(false) during hw_fini,
  and if we are under SRIOV, we will call sdma_v5_2_enable(true)
  afterwards to allow other VFs to use SDMA. This way, the DRM
  entity of SDMA engine is emptied and it will follow the flow
  of resume code path.

Tested-by: Bokun Zhang <Bokun.Zhang@amd.com>
Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-10 17:32:55 -04:00
Linus Torvalds e8bc52cb8d Driver core changes for 6.1-rc1
Here is the big set of driver core and debug printk changes for 6.1-rc1.
 Included in here is:
 	- dynamic debug updates for the core and the drm subsystem.  The
 	  drm changes have all been acked by the relevant maintainers.
 	- kernfs fixes for syzbot reported problems
 	- kernfs refactors and updates for cgroup requirements
 	- magic number cleanups and removals from the kernel tree (they
 	  were not being used and they really did not actually do
 	  anything.)
 	- other tiny cleanups
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY0BYUA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylozwCdFRlcghaf7XBUyNgRZRwMC+oQI8EAn1G/nEDE
 6aFd2er41uK0IGQnSmYO
 =OK0k
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the big set of driver core and debug printk changes for
  6.1-rc1. Included in here is:

   - dynamic debug updates for the core and the drm subsystem. The drm
     changes have all been acked by the relevant maintainers

   - kernfs fixes for syzbot reported problems

   - kernfs refactors and updates for cgroup requirements

   - magic number cleanups and removals from the kernel tree (they were
     not being used and they really did not actually do anything)

   - other tiny cleanups

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (74 commits)
  docs: filesystems: sysfs: Make text and code for ->show() consistent
  Documentation: NBD_REQUEST_MAGIC isn't a magic number
  a.out: restore CMAGIC
  device property: Add const qualifier to device_get_match_data() parameter
  drm_print: add _ddebug descriptor to drm_*dbg prototypes
  drm_print: prefer bare printk KERN_DEBUG on generic fn
  drm_print: optimize drm_debug_enabled for jump-label
  drm-print: add drm_dbg_driver to improve namespace symmetry
  drm-print.h: include dyndbg header
  drm_print: wrap drm_*_dbg in dyndbg descriptor factory macro
  drm_print: interpose drm_*dbg with forwarding macros
  drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers.
  drm_print: condense enum drm_debug_category
  debugfs: use DEFINE_SHOW_ATTRIBUTE to define debugfs_regset32_fops
  driver core: use IS_ERR_OR_NULL() helper in device_create_groups_vargs()
  Documentation: ENI155_MAGIC isn't a magic number
  Documentation: NBD_REPLY_MAGIC isn't a magic number
  nbd: remove define-only NBD_MAGIC, previously magic number
  Documentation: FW_HEADER_MAGIC isn't a magic number
  Documentation: EEPROM_MAGIC_VALUE isn't a magic number
  ...
2022-10-07 17:04:10 -07:00
Hamza Mahfooz 17d819e282 Revert "drm/amdgpu: use dirty framebuffer helper"
This reverts commit 66f99628eb.

Unfortunately, that commit causes performance regressions on non-PSR
setups. So, just revert it until FB_DAMAGE_CLIPS support can be added.

Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2189
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216554
Fixes: 66f99628eb ("drm/amdgpu: use dirty framebuffer helper")
Fixes: abbc7a3daf ("drm/amdgpu: don't register a dirty callback for non-atomic")
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:08:27 -04:00
Philip Yang 2302d50714 drm/amdgpu: Correct amdgpu_amdkfd_total_mem_size calculation
amdkfd_total_mem_size is the size of total GPUs vram plus system memory
to estimate page tables memory usage and leave enough VRAM room for page
tables allocation.

Calculate amdkfd_total_mem_size in amdgpu_amdkfd_device_probe is
incorrect because adev->gmc.real_vram_size is still 0 called from
amdgpu_device_ip_early_init. Move the calculation
to amdgpu_amdkfd_device_init to get the correct VRAM size.

Do reverse calculation in amdgpu_amdkfd_device_fini_sw to support
hot-unplugging GPUs.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:08:18 -04:00
Philip Yang 9a3c6067bd drm/amdgpu: Set vmbo destroy after pt bo is created
Under VRAM usage pression, map to GPU may fail to create pt bo and
vmbo->shadow_list is not initialized, then ttm_bo_release calling
amdgpu_bo_vm_destroy to access vmbo->shadow_list generates below
dmesg and NULL pointer access backtrace:

Set vmbo destroy callback to amdgpu_bo_vm_destroy only after creating pt
bo successfully, otherwise use default callback amdgpu_bo_destroy.

amdgpu: amdgpu_vm_bo_update failed
amdgpu: update_gpuvm_pte() failed
amdgpu: Failed to map bo to gpuvm
amdgpu 0000:43:00.0: amdgpu: Failed to map peer:0000:43:00.0 mem_domain:2
BUG: kernel NULL pointer dereference, address:
 RIP: 0010:amdgpu_bo_vm_destroy+0x4d/0x80 [amdgpu]
 Call Trace:
  <TASK>
  ttm_bo_release+0x207/0x320 [amdttm]
  amdttm_bo_init_reserved+0x1d6/0x210 [amdttm]
  amdgpu_bo_create+0x1ba/0x520 [amdgpu]
  amdgpu_bo_create_vm+0x3a/0x80 [amdgpu]
  amdgpu_vm_pt_create+0xde/0x270 [amdgpu]
  amdgpu_vm_ptes_update+0x63b/0x710 [amdgpu]
  amdgpu_vm_update_range+0x2e7/0x6e0 [amdgpu]
  amdgpu_vm_bo_update+0x2bd/0x600 [amdgpu]
  update_gpuvm_pte+0x160/0x420 [amdgpu]
  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x313/0x1130 [amdgpu]
  kfd_ioctl_map_memory_to_gpu+0x115/0x390 [amdgpu]
  kfd_ioctl+0x24a/0x5b0 [amdgpu]

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:08:09 -04:00
Arunpravin Paneer Selvam 312b4dc11d drm/amdgpu: Fix VRAM BO swap issue
DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

v2: Added a Fixes tag
v3: Rewrite the code to save a bit of calculations and
    variables (Christian)

Fixes: c9cad937c0 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:07:37 -04:00
Ruili Ji 21a550de5f drm/amdgpu: Enable F32_WPTR_POLL_ENABLE in mqd
This patch is to fix the SDMA user queue doorbell missing issue on
SDMA 6.0. F32_WPTR_POLL_ENABLE has to be set if doorbell mode is
used. Otherwise ringing SDMA user queue doorbell can't wake up
system from gfxoff.

Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-10-06 12:05:44 -04:00
Yang Yingliang 525530ad9a drm/amdgpu/sdma: add missing release_firmware() in amdgpu_sdma_init_microcode()
In some error path in amdgpu_sdma_init_microcode(), release_firmware() is
not called, the memory allocated in request_firmware() will be leaked,
calling amdgpu_sdma_destroy_inst_ctx() which calls release_firmware() to
avoid memory leak.

Fixes: 15aa13056d ("drm/amdgpu: add function to init SDMA microcode")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-06 12:05:01 -04:00
Sonny Jiang e626d9b9c6 drm/amdgpu: Enable VCN PG on GC11_0_1
Enable VCN PG on GC11_0_1

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-10-06 12:02:49 -04:00
Linus Torvalds 7e6739b933 drm pull for 6.1-rc1
core:
 - convert selftests to kunit
 - managed init for more objects
 - move to idr_init_base
 - rename fb and gem cma helpers to dma
 - hide unregistered connectors from getconnector ioctl
 - DSC passthrough aux support
 - backlight handling improvements
 - add dma_resv_assert_held to vmap/vunmap
 
 edid:
 - move luminance calculation to core
 
 fbdev:
 - fix aperture helper usage
 
 fourcc:
 - add more format helpers
 - add DRM_FORMAT_Cxx, DRM_FORMAT_Rxx, DRM_FORMAT_Dxx
 - add packed AYUV8888, XYUV8888
 - add some kunit tests
 
 ttm:
 - allow bos without backing store
 - rewrite placement to use intersect/compatible functions
 
 dma-buf:
 - docs update
 - improve signalling when debugging
 
 udmabuf:
 - fix failure path GPF
 
 dp:
 - drop dp/mst legacy code
 - atomic mst state support
 - audio infoframe packing
 
 panel:
 - Samsung LTL101AL01
 - B120XAN01.0
 - R140NWF5 RH
 - DMT028VGHMCMI-1A T
 - AUO B133UAN02.1
 - IVO M133NW4J-R3
 - Innolux N120ACA-EA1
 
 amdgpu:
 - Gang submit support
 - Mode2 reset for RDNA2
 - New IP support:
   DCN 3.1.4, 3.2
   SMU 13.x
   NBIO 7.7
   GC 11.x
   PSP 13.x
   SDMA 6.x
   GMC 11.x
 - DSC passthrough support
 - PSP fixes for TA support
 - vangogh GFXOFF stats
 - clang fixes
 - gang submit CS cleanup prep work
 - fix VRAM eviction issues
 
 amdkfd:
 - GC 10.3 IP ISA fixes
 - fix CRIU regression
 - CPU fault on COW mapping fixes
 
 i915:
 - align fw versioning with kernel practices
 - add display substruct to i915 private
 - add initial runtime info to driver info
 - split out HDCP and backlight registers
 - MEI XeHP SDV GSC support
 - add per-gt sysfs defaults
 - TLB invalidation improvements
 - Disable PCI BAR resize on 32-bit
 - GuC firmware updates and compat changes
 - GuC log timestamp translation
 - DG2 preemption workaround changes
 - DG2 improved HDMI pixel clocks support
 - PCI BAR sanity checks
 - Enable DC5 on DG2
 - DG2 DMC fw bumped
 - ADL-S PCI ID added
 - Meteorlake enablement
 - Rename ggtt_view to gtt_view
 - host RPS fixes
 - release mmaps on rpm suspend on discrete
 - clocking and dpll refactoring
 - VBT definitions and parsing updates
 - SKL watermark code extracted to separate file
 - allow seamless M/N changes on eDP panels
 - BUG_ON removal and cleanups
 
 msm:
 - DPU: simplified VBIF configuration
 -      cleanup CTL interfaces
 - DSI: removed unused msm_display_dsc_config struct
 -      switch regulator calls to new API
 -      switched to PANEL_BRIDGE for direct attached panels
 - DSI_PHY: convert drivers to parent_hws
 - DP: cleanup pixel_rate handling
 - HDMI: turned hdmi-phy-8996 into OF clk provider
 - misc dt-bindings fixes
 - choose eDP as primary display if it's available
 - support getting interconnects from either the mdss or the mdp5/dpu
   device nodes
 - gem: Shrinker + LRU re-work:
 - adds a shared GEM LRU+shrinker helper and moves msm over to that
 - reduces lock contention between retire and submit by avoiding the
   need to acquire obj lock in retire path (and instead using resv
   seeing obj's busyness in the shrinker
 - fix reclaim vs submit issues
 - GEM fault injection for triggering userspace error paths
 - Map/unmap optimization
 - Improved robustness for a6xx GPU recovery
 
 virtio:
 - Improve error and edge conditions handling
 - Convert to use managed helpers
 - stop exposing LINEAR modifier
 
 mgag200:
 - split modeset handling per model
 
 udl:
 - suspend/disconnect handling improvements
 
 vc4:
 - rework HDMI power up
 - depend on PM
 - better unplugging support
 
 ast:
 - resolution handling improvements
 
 ingenic:
 - Add JZ4760(B) support
 - avoid a modeset when sharpness property is unchanged
 - use the new PM ops
 
 it6505:
 - power seq and clock updates
 
 ssd130x:
 - regmap bulk write
 - use atomic helpers instead of simple helpers
 
 via:
 - rename via_drv to via_dri1, consolidate all code.
 
 radeon:
 - drop DP MST experimental support
 - delayed work flush fix
 - use time_after
 
 ti-sn65dsi86:
 - DP support
 
 mediatek:
 - MT8195 DP support
 - drop of_gpio header
 - remove unneeded result
 - small DP code improvements
 
 vkms:
 - RGB565, XRGB64 and ARGB64 support
 
 sun4i:
 - tv: convert to atomic
 
 rcar-du:
 - Synopsys DW HDMI bridge DT bindings update
 
 exynos:
 - use drm_display_info.is_hdmi
 - correct return of mixer_mode_valid and hdmi_mode_valid
 
 omap:
 - refcounting fix
 
 rockchip:
 - RK3568 support
 - RK3399 gamma support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmM894sACgkQDHTzWXnE
 hr7EYw//WdVe69TNauCAQiOYdmPp1twmr2o5gDOFLoo4IZw5v+qK0HL/nTrDkBq6
 xIu1GLTScOh0AItW1rhFmrtKhO/u/QPQ15P6cO7x8AzlUIhVOYqM79+OA0X6zIV8
 IZjpc6EEWPSKJTCRud9HdzsV06DIa+QlwShLCaOFxRiGSuUqsxzacIHUqnFekRnV
 PBG7RzcmdWwe6Gy/7T2wegsFjw1mh4S4FypEGs53emru3PGvcau5dwXcE5Jro7Br
 k4BFFknuXahVJ2ynVfIFn3QUQRMLgAKRWflqxo7McLeKVQEt4gfB6+PaMwGpSiRQ
 iC9QPy69TWEx6X015q2DvvlQDewnCbPOlzyoj9O991QDGLPIim8srPblr8DPeeOz
 Y7IW1PRVnPdKReMJvTyrIVED/XT9fUoR7N+F9sfPnEee5HsvjXNGumEHbOE8avFf
 rB6CFdby+Ecd9cSeINXowFy4ss0d5zCHMiKEVyQWTZOJysp29vLyKezNqU5m37FK
 LAQHtsRdn1+V3o22H5y1PJyqssbOMImMV1ffqW/urRLLefPVHIKCKI8Ycgh0qxqc
 B+gebHMgF8j6RR0DHAcQby+PIVi/Pn36TAMI3lPsVjFWGS5s5EQwpKlMNj46H0Cr
 yE2Vr4w29+Nsv0b4Uz16AZ0mHcauqx4bvWMyT0frJfNcE86x8MI=
 =u/MJ
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2022-10-05' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Lots of stuff all over, some new AMD IP support and gang submit
  support. i915 has further DG2 and Meteorlake pieces, and a bunch of
  i915 display refactoring. msm has a shrinker rework. There are also a
  bunch of conversions to use kunit.

  This has two external pieces, some MEI changes needed for future Intel
  discrete GPUs. These should be acked by Greg. There is also a cross
  maintainer shared tree with some backlight rework from Hans in here.

  Core:
   - convert selftests to kunit
   - managed init for more objects
   - move to idr_init_base
   - rename fb and gem cma helpers to dma
   - hide unregistered connectors from getconnector ioctl
   - DSC passthrough aux support
   - backlight handling improvements
   - add dma_resv_assert_held to vmap/vunmap

  edid:
   - move luminance calculation to core

  fbdev:
   - fix aperture helper usage

  fourcc:
   - add more format helpers
   - add DRM_FORMAT_Cxx, DRM_FORMAT_Rxx, DRM_FORMAT_Dxx
   - add packed AYUV8888, XYUV8888
   - add some kunit tests

  ttm:
   - allow bos without backing store
   - rewrite placement to use intersect/compatible functions

  dma-buf:
   - docs update
   - improve signalling when debugging

  udmabuf:
   - fix failure path GPF

  dp:
   - drop dp/mst legacy code
   - atomic mst state support
   - audio infoframe packing

  panel:
   - Samsung LTL101AL01
   - B120XAN01.0
   - R140NWF5 RH
   - DMT028VGHMCMI-1A T
   - AUO B133UAN02.1
   - IVO M133NW4J-R3
   - Innolux N120ACA-EA1

  amdgpu:
   - Gang submit support
   - Mode2 reset for RDNA2
   - New IP support:
        DCN 3.1.4, 3.2
        SMU 13.x
        NBIO 7.7
        GC 11.x
        PSP 13.x
        SDMA 6.x
        GMC 11.x
   - DSC passthrough support
   - PSP fixes for TA support
   - vangogh GFXOFF stats
   - clang fixes
   - gang submit CS cleanup prep work
   - fix VRAM eviction issues

  amdkfd:
   - GC 10.3 IP ISA fixes
   - fix CRIU regression
   - CPU fault on COW mapping fixes

  i915:
   - align fw versioning with kernel practices
   - add display substruct to i915 private
   - add initial runtime info to driver info
   - split out HDCP and backlight registers
   - MEI XeHP SDV GSC support
   - add per-gt sysfs defaults
   - TLB invalidation improvements
   - Disable PCI BAR resize on 32-bit
   - GuC firmware updates and compat changes
   - GuC log timestamp translation
   - DG2 preemption workaround changes
   - DG2 improved HDMI pixel clocks support
   - PCI BAR sanity checks
   - Enable DC5 on DG2
   - DG2 DMC fw bumped
   - ADL-S PCI ID added
   - Meteorlake enablement
   - Rename ggtt_view to gtt_view
   - host RPS fixes
   - release mmaps on rpm suspend on discrete
   - clocking and dpll refactoring
   - VBT definitions and parsing updates
   - SKL watermark code extracted to separate file
   - allow seamless M/N changes on eDP panels
   - BUG_ON removal and cleanups

  msm:
   - DPU:
       simplified VBIF configuration
       cleanup CTL interfaces
   - DSI:
       removed unused msm_display_dsc_config struct
       switch regulator calls to new API
       switched to PANEL_BRIDGE for direct attached panels
   - DSI_PHY: convert drivers to parent_hws
   - DP: cleanup pixel_rate handling
   - HDMI: turned hdmi-phy-8996 into OF clk provider
   - misc dt-bindings fixes
   - choose eDP as primary display if it's available
   - support getting interconnects from either the mdss or the mdp5/dpu
     device nodes
   - gem: Shrinker + LRU re-work:
   - adds a shared GEM LRU+shrinker helper and moves msm over to that
   - reduce lock contention between retire and submit by avoiding the
     need to acquire obj lock in retire path (and instead using resv
     seeing obj's busyness in the shrinker
   - fix reclaim vs submit issues
   - GEM fault injection for triggering userspace error paths
   - Map/unmap optimization
   - Improved robustness for a6xx GPU recovery

  virtio:
   - improve error and edge conditions handling
   - convert to use managed helpers
   - stop exposing LINEAR modifier

  mgag200:
   - split modeset handling per model

  udl:
   - suspend/disconnect handling improvements

  vc4:
   - rework HDMI power up
   - depend on PM
   - better unplugging support

  ast:
   - resolution handling improvements

  ingenic:
   - add JZ4760(B) support
   - avoid a modeset when sharpness property is unchanged
   - use the new PM ops

  it6505:
   - power seq and clock updates

  ssd130x:
   - regmap bulk write
   - use atomic helpers instead of simple helpers

  via:
   - rename via_drv to via_dri1, consolidate all code.

  radeon:
   - drop DP MST experimental support
   - delayed work flush fix
   - use time_after

  ti-sn65dsi86:
   - DP support

  mediatek:
   - MT8195 DP support
   - drop of_gpio header
   - remove unneeded result
   - small DP code improvements

  vkms:
   - RGB565, XRGB64 and ARGB64 support

  sun4i:
   - tv: convert to atomic

  rcar-du:
   - Synopsys DW HDMI bridge DT bindings update

  exynos:
   - use drm_display_info.is_hdmi
   - correct return of mixer_mode_valid and hdmi_mode_valid

  omap:
   - refcounting fix

  rockchip:
   - RK3568 support
   - RK3399 gamma support"

* tag 'drm-next-2022-10-05' of git://anongit.freedesktop.org/drm/drm: (1374 commits)
  drm/amdkfd: Fix UBSAN shift-out-of-bounds warning
  drm/amdkfd: Track unified memory when switching xnack mode
  drm/amdgpu: Enable sram on vcn_4_0_2
  drm/amdgpu: Enable VCN DPG for GC11_0_1
  drm/msm: Fix build break with recent mm tree
  drm/panel: simple: Use dev_err_probe() to simplify code
  drm/panel: panel-edp: Use dev_err_probe() to simplify code
  drm/panel: simple: Add Multi-Inno Technology MI0800FT-9
  dt-bindings: display: simple: Add Multi-Inno Technology MI0800FT-9 panel
  drm/amdgpu: correct the memcpy size for ip discovery firmware
  drm/amdgpu: Skip put_reset_domain if it doesn't exist
  drm/amdgpu: remove switch from amdgpu_gmc_noretry_set
  drm/amdgpu: Fix mc_umc_status used uninitialized warning
  drm/amd/display: Prevent OTG shutdown during PSR SU
  drm/amdgpu: add page retirement handling for CPU RAS
  drm/amdgpu: use RAS error address convert api in mca notifier
  drm/amdgpu: support to convert dedicated umc mca address
  drm/amdgpu: export umc error address convert interface
  drm/amdgpu: fix sdma v4 init microcode error
  drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()
  ...
2022-10-05 11:24:12 -07:00
Linus Torvalds 7fb68b6c82 platform-drivers-x86 for v6.1-1
Highlights:
  -  AMD Platform Management Framework (PMF) driver with AMT and QnQF support
  -  AMD PMC: Improved logging for debugging s2idle issues
  -  Big refactor of the ACPI/x86 backlight handling, ensuring that we only
     register 1 /sys/class/backlight device per LCD panel
  -  Microsoft Surface:
     -  Surface Laptop Go 2 support
     -  Surface Pro 8 HID sensor support
  -  Asus WMI:
     -  Lots of cleanups
     -  Support for TUF RGB keyboard backlight control
     -  Add support for ROG X13 tablet mode
  -  Siemens Simatic: IPC227G and IPC427G support
  -  Toshiba ACPI laptop driver: Fan hwmon and battery ECO mode support
  -  tools/power/x86/intel-speed-select: Various improvements
  -  Various cleanups
  -  Various small bugfixes
 
 The following is an automated git shortlog grouped by driver:
 
 ACPI:
  -  video: Change disable_backlight_sysfs_if quirks to acpi_backlight=native
  -  s2idle: Add a new ->check() callback for platform_s2idle_ops
  -  video: Fix indentation of video_detect_dmi_table[] entries
  -  video: Drop NL5x?U, PF4NU1F and PF5?U?? acpi_backlight=native quirks
  -  video: Drop "Samsung X360" acpi_backlight=native quirk
  -  video: Remove acpi_video_set_dmi_backlight_type()
  -  video: Add Apple GMUX brightness control detection
  -  video: Add Nvidia WMI EC brightness control detection (v3)
  -  video: Refactor acpi_video_get_backlight_type() a bit
  -  video: Remove code to unregister acpi_video backlight when a native backlight registers
  -  video: Make backlight class device registration a separate step (v2)
  -  video: Simplify acpi_video_unregister_backlight()
  -  video: Remove acpi_video_bus from list before tearing it down
  -  video: Drop backlight_device_get_by_type() call from acpi_video_get_backlight_type()
  -  video: Add acpi_video_backlight_use_native() helper
 
 acer-wmi:
  -  Move backlight DMI quirks to acpi/video_detect.c
  -  Acer Aspire One AOD270/Packard Bell Dot keymap fixes
 
 apple-gmux:
  -  Stop calling acpi/video.h functions
 
 asus-wmi:
  -  Expand support of GPU fan to read RPM and label
  -  Make kbd_rgb_mode_groups static
  -  Move acpi_backlight=native quirks to ACPI video_detect.c
  -  Move acpi_backlight=vendor quirks to ACPI video_detect.c
  -  Drop DMI chassis-type check from backlight handling
  -  Increase FAN_CURVE_BUF_LEN to 32
  -  Fix the name of the mic-mute LED classdev
  -  Implement TUF laptop keyboard power states
  -  Implement TUF laptop keyboard LED modes
  -  Support the GPU fan on TUF laptops
  -  Modify behaviour of Fn+F5 fan key
  -  Update tablet_mode_sw module-param help text
  -  Simplify tablet-mode-switch handling
  -  Simplify tablet-mode-switch probing
  -  Add support for ROG X13 tablet mode
  -  Adjust tablet/lidflip handling to use enum
  -  Support the hardware GPU MUX on some laptops
  -  Simplify some of the *_check_present() helpers
  -  Refactor panel_od attribute
  -  Refactor egpu_enable attribute
  -  Refactor disable_gpu attribute
  -  Document the panel_od sysfs attribute
  -  Document the egpu_enable sysfs attribute
  -  Document the dgpu_disable sysfs attribute
  -  Use kobj_to_dev()
  -  Convert all attr-show to use sysfs_emit
 
 compal-laptop:
  -  Get rid of a few forward declarations
 
 dell-privacy:
  -  convert to use dev_groups
 
 dell-smbios-base:
  -  Use sysfs_emit()
 
 dell-wmi:
  -  Add WMI event 0x0012 0x0003 to the list
 
 docs:
  -  ABI: charge_control_end_threshold may not support all values
 
 drivers/platform:
  -  toshiba_acpi: Call HCI_PANEL_POWER_ON on resume on some models
 
 drm/amdgpu:
  -  Register ACPI video backlight when skipping amdgpu backlight registration
  -  Don't register backlight when another backlight should be used (v3)
 
 drm/i915:
  -  Call acpi_video_register_backlight() (v3)
  -  Don't register backlight when another backlight should be used (v2)
 
 drm/nouveau:
  -  Register ACPI video backlight when nv_backlight registration fails (v2)
  -  Don't register backlight when another backlight should be used (v2)
 
 drm/radeon:
  -  Register ACPI video backlight when skipping radeon backlight registration
  -  Don't register backlight when another backlight should be used (v3)
 
 drm/todo:
  -  Add entry about dealing with brightness control on devices with > 1 panel
 
 gpio-f7188x:
  -  use unique labels for banks/chips
  -  Add GPIO support for Nuvoton NCT6116
  -  add a prefix to macros to keep gpio namespace clean
  -  switch over to using pr_fmt
 
 hp-wmi:
  -  Support touchpad on/off
  -  Setting thermal profile fails with 0x06
 
 int3472/discrete:
  -  Drop a forward declaration
 
 intel-uncore-freq:
  -  Use sysfs_emit() to instead of scnprintf()
 
 intel_cht_int33fe:
  -  Fix comment according to the code flow
 
 leds:
  -  simatic-ipc-leds-gpio: Make simatic_ipc_led_gpio_table static
  -  simatic-ipc-leds-gpio: add new model 227G
 
 move from strlcpy with unused retval to strscpy:
  - move from strlcpy with unused retval to strscpy
 
 msi-laptop:
  -  Change DMI match / alias strings to fix module autoloading
  -  Add msi_scm_disable_hw_fn_handling() helper
  -  Add msi_scm_model_exit() helper
  -  Fix resource cleanup
  -  Simplify ec_delay handling
  -  Fix old-ec check for backlight registering
  -  Drop MSI_DRIVER_VERSION
  -  Use MODULE_DEVICE_TABLE()
 
 nvidia-wmi-ec-backlight:
  -  Use acpi_video_get_backlight_type()
  -  Move fw interface definitions to a header (v2)
 
 p2sb:
  -  Fix UAF when caller uses resource name
 
 platform/mellanox:
  -  mlxreg-lc: Make error handling flow consistent
  -  Remove redundant 'NULL' check
  -  Remove unnecessary code
  -  mlxreg-lc: Fix locking issue
  -  mlxreg-lc: Fix coverity warning
 
 platform/surface:
  -  Split memcpy() of struct ssam_event flexible array
  -  aggregator_registry: Add HID devices for sensors and UCSI client to SP8
  -  aggregator_registry: Rename HID device nodes based on new findings
  -  aggregator_registry: Rename HID device nodes based on their function
  -  aggregator_registry: Add support for Surface Laptop Go 2
 
 platform/x86:
  - use PLATFORM_DEVID_NONE instead of -1
 
 platform/x86/amd:
  -  pmc: Dump idle mask during "check" stage instead
  -  pmc: remove CONFIG_DEBUG_FS checks
  -  pmc: Fix build without debugfs
  -  pmc: Add sysfs files for SMU
  -  pmc: Add an extra STB message for checking s2idle entry
  -  pmc: Always write to the STB
  -  pmc: Add defines for STB events
 
 platform/x86/amd/pmf:
  -  Remove unused power_delta instances
  -  install notify handler after acpi init
  -  Add sysfs to toggle CnQF
  -  Add support for CnQF
  -  Fix clang unused variable warning
  -  Fix undefined reference to platform_profile
  -  Force load driver on older supported platforms
  -  Handle AMT and CQL events for Auto mode
  -  Add support for Auto mode feature
  -  Get performance metrics from PMFW
  -  Add fan control support
  -  Add heartbeat signal support
  -  Add debugfs information
  -  Add support SPS PMF feature
  -  Add support for PMF APCI layer
  -  Add support for PMF core layer
  -  Add ABI doc for AMD PMF
  -  Add AMD PMF driver entry
 
 platform/x86/intel/wmi:
  -  thunderbolt: Use dev_groups callback
 
 pmc_atom:
  -  Amend comment style and grammar
  -  Make terminator entry uniform
  -  Improve quirk message to be less cryptic
  -  Fix SLP_TYPx bitfield mask
 
 samsung-laptop:
  -  Move acpi_backlight=[vendor|native] quirks to ACPI video_detect.c
 
 simatic-ipc:
  -  add new model 427G
  -  enable watchdog for 227G
 
 thinkpad_acpi:
  -  Explicitly set to balanced mode on startup
 
 tools/power/x86/intel-speed-select:
  -  Release v1.13
  -  Optimize CPU initialization
  -  Utilize cpu_map to get physical id
  -  Remove unused struct clos_config fields
  -  Enforce isst_id value
  -  Do not export get_physical_id
  -  Introduce is_cpu_in_power_domain helper
  -  Cleanup get_physical_id usage
  -  Convert more function to use isst_id
  -  Add pkg and die in isst_id
  -  Introduce struct isst_id
  -  Remove unused core_mask array
  -  Remove dead code
  -  Fix cpu count for TDP level display
 
 toshiba_acpi:
  -  change turn_on_panel_on_resume to static
  -  Remove duplicate include
  -  Set correct parent for input device.
  -  Add fan RPM reading (hwmon interface)
  -  Add fan RPM reading (internals)
  -  Stop using acpi_video_set_dmi_backlight_type()
  -  Fix ECO LED control on Toshiba Z830
  - Battery charge mode in toshiba_acpi (internals)
  - Battery charge mode in toshiba_acpi (sysfs)
 
 wmi:
  -  Drop forward declaration of static functions
  -  Allow duplicate GUIDs for drivers that use struct wmi_driver
 
 x86-android-tablets:
  -  Fix broken touchscreen on Chuwi Hi8 with Windows BIOS
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmM9eLgUHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9zIzAf+JG058wucK0Zsnu4POzGp+uHjnWuu
 AJUmTeRvCD7MidwIv5PiwEtTucQ8OuXYR+tPc8LIwCVZtc05FBmh7Y8le/CX0SS6
 n9EZIvCk3Owosti5w2TPnCK920kh+Wfxl/fmfDbpi6+lpAL8r+F/mZEGKGFdZpWu
 Q+yM/eyxwPH8q8gjrXOUC7rN43aYeO3OCpNG3GYkQ/2S5qrjuz39dXhNVzdSsxm7
 aYOqJRNjZQEjQ3kJcp65kC6oWp3UaI1zdpGwhBG/SY8BCtCYZzlRy7gN2FCJfAa/
 EyYayOvdy0zNwewbIYzck4W80hUDtfidgZgZ9crQscO/JjbGi6LhveD4YA==
 =afGw
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver updates from Hans de Goede:

 - AMD Platform Management Framework (PMF) driver with AMT and QnQF
   support

 - AMD PMC: Improved logging for debugging s2idle issues

 - Big refactor of the ACPI/x86 backlight handling, ensuring that we
   only register 1 /sys/class/backlight device per LCD panel

 - Microsoft Surface:
    - Surface Laptop Go 2 support
    - Surface Pro 8 HID sensor support

 - Asus WMI:
    - Lots of cleanups
    - Support for TUF RGB keyboard backlight control
    - Add support for ROG X13 tablet mode

 - Siemens Simatic: IPC227G and IPC427G support

 - Toshiba ACPI laptop driver: Fan hwmon and battery ECO mode support

 - tools/power/x86/intel-speed-select: Various improvements

 - Various cleanups

 - Various small bugfixes

* tag 'platform-drivers-x86-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (153 commits)
  platform/x86: use PLATFORM_DEVID_NONE instead of -1
  platform/x86/amd: pmc: Dump idle mask during "check" stage instead
  platform/x86/intel/wmi: thunderbolt: Use dev_groups callback
  platform/x86/amd: pmc: remove CONFIG_DEBUG_FS checks
  platform/surface: Split memcpy() of struct ssam_event flexible array
  platform/x86: compal-laptop: Get rid of a few forward declarations
  platform/x86: intel-uncore-freq: Use sysfs_emit() to instead of scnprintf()
  platform/x86: dell-smbios-base: Use sysfs_emit()
  platform/x86/amd/pmf: Remove unused power_delta instances
  platform/x86/amd/pmf: install notify handler after acpi init
  Documentation/ABI/testing/sysfs-amd-pmf: Add ABI doc for AMD PMF
  platform/x86/amd/pmf: Add sysfs to toggle CnQF
  platform/x86/amd/pmf: Add support for CnQF
  platform/x86/amd: pmc: Fix build without debugfs
  platform/x86: hp-wmi: Support touchpad on/off
  platform/x86: int3472/discrete: Drop a forward declaration
  platform/x86: toshiba_acpi: change turn_on_panel_on_resume to static
  platform/x86: wmi: Drop forward declaration of static functions
  platform/x86: toshiba_acpi: Remove duplicate include
  platform/x86: msi-laptop: Change DMI match / alias strings to fix module autoloading
  ...
2022-10-05 10:38:24 -07:00
Dave Airlie 65898687cf Merge tag 'amd-drm-next-6.1-2022-09-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.1-2022-09-30:

amdgpu:
- RLC FW code cleanup
- RLC fixes for GC 11.x
- SMU 13.x fixes
- CP FW code cleanup
- SDMA FW code cleanup
- GC 11.x fixes
- DCN 3.2.x fixes
- DCN 3.1.4 fixes
- Misc fixes
- RAS fixes
- SR-IOV fixes
- VCN 4.x fixes

amdkfd:
- GC 11.x fixes
- Xnack fixes
- UBSAN warning fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220930162012.5823-1-alexander.deucher@amd.com
2022-10-04 09:42:24 +10:00
Hawking Zhang 0fd85e89b5 drm/amdgpu/gfx11: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx11

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 16:59:06 -04:00
Hawking Zhang 04fa38cce6 drm/amdgpu: add helper to init rlc firmware
To initialzie rlc firmware according to rlc
firmware header version

v2: squash in backwards compat fix

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 16:59:00 -04:00
Hawking Zhang b33139ee15 drm/amdgpu: add helper to init rlc fw in header v2_4
To initialize rlc firmware in header v2_4

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 16:58:57 -04:00
Hawking Zhang c1c3f41ffb drm/amdgpu: add helper to init rlc fw in header v2_3
To initialize rlc firmware in header v2_3

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 16:58:47 -04:00
Hawking Zhang bcecb65248 drm/amdgpu: add helper to init rlc fw in header v2_2
To initialize rlc firmware in header v2_2

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 16:58:40 -04:00
Hawking Zhang 90df151245 drm/amdgpu: add helper to init rlc fw in header v2_1
To initialize rlc firmware in header v2_1

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 16:58:37 -04:00
Hawking Zhang 2f3f958602 drm/amdgpu: add helper to init rlc fw in header v2_0
To initialize rlc firmware in header v2_0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 16:58:35 -04:00
Hawking Zhang af81a9201e drm/amdgpu: save rlcv/rlcp ucode version in amdgpu_gfx
cache rlcv/rlcvp ucode version info in amdgpu_gfx
structure

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 16:58:23 -04:00
Sonny Jiang 730548ba02 drm/amdgpu: Enable sram on vcn_4_0_2
Enable sram on vcn_4_0_2

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 11:21:02 -04:00
Sonny Jiang 0b37f47494 drm/amdgpu: Enable VCN DPG for GC11_0_1
Enable VCN DPG on GC11_0_1

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-30 11:20:43 -04:00
Sonny Jiang 541540b904 drm/amdgpu: Enable sram on vcn_4_0_2
Enable sram on vcn_4_0_2

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 15:05:11 -04:00
Sonny Jiang a3aded135e drm/amdgpu: Enable VCN DPG for GC11_0_1
Enable VCN DPG on GC11_0_1

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 15:05:01 -04:00
Le Ma a79852a393 drm/amdgpu: correct the memcpy size for ip discovery firmware
Use fw->size instead of discovery_tmr_size for fallback path.

Signed-off-by: Le Ma <le.ma@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:44:02 -04:00
Vignesh Chander f61a825aa8 drm/amdgpu: Skip put_reset_domain if it doesn't exist
For xgmi sriov, the reset is handled by host driver and hive->reset_domain
is not initialized so need to check if it exists before doing a put.

Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:43:52 -04:00
Graham Sider e67135571e drm/amdgpu: remove switch from amdgpu_gmc_noretry_set
Simplify the logic in amdgpu_gmc_noretry_set by getting rid of the
switch. Also set noretry=1 as default for GFX 10.3.0 and greater since
retry faults are not supported.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:43:41 -04:00
Leo Li 3ff4ccc3e9 drm/amdgpu: Fix mc_umc_status used uninitialized warning
On ChromeOS clang build, the following warning is seen:

/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:6: error: variable 'mc_umc_status' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
        if (mca_addr == UMC_INVALID_ADDR) {
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:485:21: note: uninitialized use occurs here
        if ((REG_GET_FIELD(mc_umc_status, MCA_UMC_UMC0_MCUMC_STATUST0, Val) == 1 &&
                           ^~~~~~~~~~~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu.h:1208:5: note: expanded from macro 'REG_GET_FIELD'
        (((value) & REG_FIELD_MASK(reg, field)) >> REG_FIELD_SHIFT(reg, field))
           ^~~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:463:2: note: remove the 'if' if its condition is always true
        if (mca_addr == UMC_INVALID_ADDR) {
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:460:24: note: initialize the variable 'mc_umc_status' to silence this warning
        uint64_t mc_umc_status, mc_umc_addrt0;
                              ^
                               = 0
1 error generated.
make[5]: *** [/mnt/host/source/src/third_party/kernel/v5.15/scripts/Makefile.build:289: drivers/gpu/drm/amd/amdgpu/umc_v6_7.o] Error 1

Fix by initializing mc_umc_status = 0.

Fixes: 1014bd1cb3 ("drm/amdgpu: support to convert dedicated umc mca address")
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:50 -04:00
Tao Zhou 5e1fdf76cf drm/amdgpu: add page retirement handling for CPU RAS
Do RAS page retirement in poison consumption handler unconditionally.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:36 -04:00
Tao Zhou cd4c99f103 drm/amdgpu: use RAS error address convert api in mca notifier
Use the convert interface to simplify code.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:31 -04:00
Tao Zhou 1014bd1cb3 drm/amdgpu: support to convert dedicated umc mca address
Update umc error address query interface, the mca address can be read
from register or input from parameter.

TODO: define a common address conversion function to simplify the code.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:21 -04:00
Tao Zhou c19a5f325a drm/amdgpu: export umc error address convert interface
Make it global so we can convert specific mca address.

v2: rename query_error_address_per_channel to
convert_ras_error_address

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:15 -04:00
Likun Gao baf28cc10a drm/amdgpu: fix sdma v4 init microcode error
Fix init SDMA microcode error for sdma v4, which caused by mistake when
rearch sdma init microcode function (coding 4.2.2 to 4.2.0).

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:42:08 -04:00
Bokun Zhang d7274ec723 drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV
- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor
  during the suspend time. Furthermore, we cannot request a
  mode 1 reset under SRIOV as VF. Therefore, we will skip it
  as it is called in suspend_noirq() function.

- In the resume code path, we need to send REQ_GPU_INIT to the
  hypervisor and also resume PSP IP block under SRIOV.

Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Likun Gao 2d89e2ddfd drm/amdgpu: fix compiler warning for amdgpu_gfx_cp_init_microcode
Change the type of parameter on amdgpu_gfx_cp_init_microcode to fix
compiler warning.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Hawking Zhang 940d4dd402 drm/amdgpu: add rlc_sr_cntl_list to firmware array
To allow upload the list via psp

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Jiadong.Zhu e7b8e90add drm/amdgpu: Remove fence_process in count_emitted
The function amdgpu_fence_count_emitted used in work_hander should not call
amdgpu_fence_process which must be used in irq handler.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Jiadong.Zhu 415be17fb2 drm/amdgpu: Correct the position in patch_cond_exec
The current position calulated in gfx_v9_0_ring_emit_patch_cond_exec
underflows when the wptr is divisible by ring->buf_mask + 1.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:46 -04:00
Graham Sider 3e9cf23428 drm/amdgpu: pass queue size and is_aql_queue to MES
Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.

v2: Make is_aql_queue assign clearer

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:44 -04:00
David Belanger 585a82618b drm/amdgpu: Enable SA software trap.
Enables support for software trap for MES >= 4.
Adapted from implementation from Jay Cornwall.

v2: Add IP version check in conditions.
v3: Remove debugger code changes.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Ruijing Dong 167be85228 drm/amdgpu/vcn: update vcn4 fw shared data structure
update VF_RB_SETUP_FLAG, add SMU_DPM_INTERFACE_FLAG,
and corresponding change in VCN4.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao b077656b8c drm/amdgpu/sdma6: use common function to init sdma fw
Use common function to init sdma v6 firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao 52642d13d6 drm/amdgpu: support sdma struct v2 fw init
Support SDMA firmware init on common function for sdma v2 struct.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao 108db8decf drm/amdgpu/sdma5: use common function to init sdma fw
Use common function to init sdma v5 firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao a2d3b4b81f drm/amdgpu/sdma4: use common function to init sdma fw
Use common function to init sdma v4 firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao 15aa13056d drm/amdgpu: add function to init SDMA microcode
Add an common function to init SDMA related microcode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao e268df1d20 drm/amdgpu/gfx11: use common function to init cp fw
Use common function to init gfx v11 CP firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao 5993e4c68a drm/amdgpu/gfx10: use common function to init CP fw
Use common function to init gfx v10 CP firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao 93cad722d3 drm/amdgpu/gfx9: use common function to init cp fw
Use common function to init gfx v9 CP firmware ucode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Likun Gao ec71b25017 drm/amdgpu: add function to init CP microcode
Add an common function to init CP related microcode.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:43 -04:00
Evan Quan 7436538899 drm/amdgpu: avoid gfx register accessing during gfxoff
Make sure gfxoff is disabled before gfx register accessing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Lijo Lazar bb66ecbf12 drm/amdgpu: Use simplified API for p2p dist calc
Use the simpified API that calculates distance between two devices.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Lijo Lazar d0fa84f174 drm/amdgpu: Disable verbose for p2p dist calc
Disable verbose while getting p2p distance. With verbose, it shows
warning if ACS redirect is set between the devices. Adds noise
to dmesg logs when a few GPU devices are on the same platform.

Example log:

amdgpu 0000:34:00.0: ACS redirect is set between the client and provider (0000:31:00.0)
amdgpu 0000:34:00.0: to disable ACS redirect for this path, add the kernel parameter:
	pci=disable_acs_redir=0000:30:00.0;0000:2e:00.0;0000:33:00.0;0000:2e:10.0

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
YiPeng Chai 642c040113 drm/amdgpu: Fixed ras warning when uninstalling amdgpu
For the asic using smu v13_0_2, there is the following
warning when uninstalling amdgpu:
  amdgpu: ras disable gfx failed poison:1 ret:-22.

[Why]:
  For the asic using smu v13_0_2, the psp .suspend and
  mode1reset is called before amdgpu_ras_pre_fini during
  amdgpu uninstall, it has disabled all ras features and
  reset the psp. Since the psp is reset, calling
  amdgpu_ras_disable_all_features in amdgpu_ras_pre_fini
  to disable ras features will fail.

[How]:
  If all ras features are disabled, amdgpu_ras_disable_all_features
  will not be called to disable all ras features again.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Hawking Zhang 7c32d4e37f drm/amdgpu/gfx11: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx11

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Hawking Zhang 39a35d52d4 drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx10

v2: squash in size validation fix (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Dave Airlie e8573000f4 Merge tag 'amd-drm-next-6.1-2022-09-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.1-2022-09-23:

amdgpu:
- SDMA fix
- Add new firmware types to debugfs/IOCTL version queries
- Misc spelling and grammar fixes
- Misc code cleanups
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- CS cleanup
- Gang submit support
- Clang fixes
- Non-DC audio fix
- GPUVM locking fixes
- Vega10 PWN fan speed fix

amdkgd:
- MQD manager cleanup
- Misc spelling and grammar fixes

UAPI:
- Add new firmware types to the FW version query IOCTL

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220923215729.6061-1-alexander.deucher@amd.com
2022-09-28 14:56:09 +10:00
Dave Airlie 907cc346ff drm-misc-next for 6.1:
UAPI Changes:
 
 Cross-subsystem Changes:
   - dma-buf: Improve signaling when debugging
 
 Core Changes:
   - Backlight handling improvements
   - format-helper: Add drm_fb_build_fourcc_list()
   - fourcc: Kunit tests improvements
   - modes: Add DRM_MODE_INIT() macro
   - plane: Remove drm_plane_init(), Allocate planes with drm_universal_plane_alloc()
   - plane-helper: Add drm_plane_helper_atomic_check()
   - probe-helper: Add drm_connector_helper_get_modes_fixed() and
     drm_crtc_helper_mode_valid_fixed()
   - tests: Conversion to parametrized tests, test name consistency
 
 Driver Changes:
   - amdgpu: Fix for a VRAM eviction issue
   - ast: Resolution handling improvements
   - mediatek: small code improvements for DP
   - omap: Refcounting fix, small improvements
   - rockchip: RK3568 support, Gamma support for RK3399
   - sun4i: Build failure fix when !OF
   - udl: Multiple fixes here and there
   - vc4: HDMI hotplug handling improvements
   - vkms: Warning fix
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCYy1ikAAKCRDj7w1vZxhR
 xWepAQC6CjJl40F3t8Xp/hH3MJtrr0jCA3zFLS5boIk3v7/Q8QEA9FW8zr4Svujz
 9RJe8oN3ZVX133Zp+61WhziYI0XJuwg=
 =Veif
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2022-09-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 6.1:

UAPI Changes:

Cross-subsystem Changes:
  - dma-buf: Improve signaling when debugging

Core Changes:
  - Backlight handling improvements
  - format-helper: Add drm_fb_build_fourcc_list()
  - fourcc: Kunit tests improvements
  - modes: Add DRM_MODE_INIT() macro
  - plane: Remove drm_plane_init(), Allocate planes with drm_universal_plane_alloc()
  - plane-helper: Add drm_plane_helper_atomic_check()
  - probe-helper: Add drm_connector_helper_get_modes_fixed() and
    drm_crtc_helper_mode_valid_fixed()
  - tests: Conversion to parametrized tests, test name consistency

Driver Changes:
  - amdgpu: Fix for a VRAM eviction issue
  - ast: Resolution handling improvements
  - mediatek: small code improvements for DP
  - omap: Refcounting fix, small improvements
  - rockchip: RK3568 support, Gamma support for RK3399
  - sun4i: Build failure fix when !OF
  - udl: Multiple fixes here and there
  - vc4: HDMI hotplug handling improvements
  - vkms: Warning fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220923073943.d43tne5hni3iknlv@houat
2022-09-28 13:50:46 +10:00
Bokun Zhang 3b7329cf5a drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV
- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor
  during the suspend time. Furthermore, we cannot request a
  mode 1 reset under SRIOV as VF. Therefore, we will skip it
  as it is called in suspend_noirq() function.

- In the resume code path, we need to send REQ_GPU_INIT to the
  hypervisor and also resume PSP IP block under SRIOV.

Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-09-27 18:03:36 -04:00
Jiadong.Zhu 11e38360cc drm/amdgpu: Remove fence_process in count_emitted
The function amdgpu_fence_count_emitted used in work_hander should not call
amdgpu_fence_process which must be used in irq handler.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 18:03:09 -04:00
Jiadong.Zhu b3e45b18e5 drm/amdgpu: Correct the position in patch_cond_exec
The current position calulated in gfx_v9_0_ring_emit_patch_cond_exec
underflows when the wptr is divisible by ring->buf_mask + 1.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 18:02:58 -04:00
Graham Sider 91ef6cfd30 drm/amdgpu: pass queue size and is_aql_queue to MES
Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.

v2: Make is_aql_queue assign clearer

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:54:12 -04:00
Evan Quan 7516777434 drm/amdgpu: avoid gfx register accessing during gfxoff
Make sure gfxoff is disabled before gfx register accessing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:46:52 -04:00
Hawking Zhang f6f8bb5989 drm/amdgpu/gfx9: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx9

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:02:39 -04:00
Hawking Zhang 5b41521268 drm/amdgpu: add helper to init rlc firmware
To initialzie rlc firmware according to rlc
firmware header version

v2: squash in backwards compat fix

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:02:38 -04:00
Jim Cromie f158936b60 drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers.
Use DECLARE_DYNDBG_CLASSMAP across DRM:

 - in .c files, since macro defines/initializes a record

 - in drivers, $mod_{drv,drm,param}.c
   ie where param setup is done, since a classmap is param related

 - in drm/drm_print.c
   since existing __drm_debug param is defined there,
   and we ifdef it, and provide an elaborated alternative.

 - in drm_*_helper modules:
   dp/drm_dp - 1st item in makefile target
   drivers/gpu/drm/drm_crtc_helper.c - random pick iirc.

Since these modules all use identical CLASSMAP declarations (ie: names
and .class_id's) they will all respond together to "class DRM_UT_*"
query-commands:

  :#> echo class DRM_UT_KMS +p > /proc/dynamic_debug/control

NOTES:

This changes __drm_debug from int to ulong, so BIT() is usable on it.

DRM's enum drm_debug_category values need to sync with the index of
their respective class-names here.  Then .class_id == category, and
dyndbg's class FOO mechanisms will enable drm_dbg(DRM_UT_KMS, ...).

Though DRM needs consistent categories across all modules, thats not
generally needed; modules X and Y could define FOO differently (ie a
different NAME => class_id mapping), changes are made according to
each module's private class-map.

No callsites are actually selected by this patch, since none are
class'd yet.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220912052852.1123868-3-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-24 15:02:01 +02:00
Arunpravin Paneer Selvam 39dd0cc2e5 drm/amdgpu: Fix VRAM eviction issue
A user reported that when he starts a game (MTGA) with wine,
he observed an error msg failed to pin framebuffer with error -12.
Found an issue with the VRAM mem type eviction decision condition
logic. This patch will fix the if condition code error.

Gitlab bug link:
https://gitlab.freedesktop.org/drm/amd/-/issues/2159

Fixes: ded910f368 ("drm/amdgpu: Implement intersect/compatible functions")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220922151447.265696-1-Arunpravin.PaneerSelvam@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2022-09-22 19:53:06 +02:00
Alex Deucher abbc7a3daf drm/amdgpu: don't register a dirty callback for non-atomic
Some asics still support non-atomic code paths.

Fixes: 66f99628eb ("drm/amdgpu: use dirty framebuffer helper")
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 17:36:43 -04:00
Mukul Joshi 37a0bad677 drm/amdgpu: Update PTE flags with TF enabled
This patch updates the PTE flags when translate further (TF) is
enabled:
- With translate_further enabled, invalid PTEs can be 0. Reading
  consecutive invalid PTEs as 0 is considered a fault. To prevent
  this, ensure invalid PTEs have at least 1 bit set.
- The current invalid PTE flags settings to translate a retry fault
  into a no-retry fault, doesn't work with TF enabled. As a result,
  update invalid PTE flags settings which works for both TF enabled
  and disabled case.

Fixes: 352e683b72 ("drm/amdgpu: Enable translate_further to extend UTCL2 reach")
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 17:14:09 -04:00
Hawking Zhang 435d6e6f02 drm/amdgpu: add helper to init rlc fw in header v2_4
To initialize rlc firmware in header v2_4

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:27 -04:00
Hawking Zhang a0d9084d7f drm/amdgpu: add helper to init rlc fw in header v2_3
To initialize rlc firmware in header v2_3

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:21 -04:00