Commit Graph

24 Commits

Author SHA1 Message Date
Tomasz Rusinowicz 87a23fe296 accel/ivpu: Enable MCA ECC signalling based on MSR
Add new boot parameter for NPU5+ that enables
ECC signalling for on-chip memory based on the value
of MSR_INTEGRITY_CAPS register.

Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://lore.kernel.org/r/20250925145020.1446208-1-maciej.falkowski@linux.intel.com
2025-10-01 09:58:14 +02:00
Karol Wachowski 2007e210b6 accel/ivpu: Split FW runtime and global memory buffers
Split firmware boot parameters (4KB) and FW version (4KB) into dedicated
buffer objects, separating them from the FW runtime memory buffer. This
creates three distinct buffers with independent allocation control.

This enables future modifications, particularly allowing the FW
image memory to be moved into a read-only buffer.

Fix user range starting address from incorrect 0x88000000 (2GB + 128MB)
overlapping global aperture on 37XX to intended 0xa0000000 (2GB + 512MB).
This caused no issues as FW aligned this range to 512MB anyway, but
corrected for consistency.

Convert ivpu_hw_range_init() from inline helper to function with overflow
validation to prevent potential security issues from address range
arithmetic overflows.

Improve readability of ivpu_fw_parse() function, remove unrelated constant
defines and validate firmware header values based on vdev->hw->ranges.

Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://lore.kernel.org/r/20250925074209.1148924-1-karol.wachowski@linux.intel.com
2025-09-25 21:03:57 +02:00
Thomas Zimmermann c598d5eb9f Merge drm/drm-next into drm-misc-next
Backmerging to forward to v6.16-rc1

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-06-11 09:01:34 +02:00
Karol Wachowski 8395204aeb accel/ivpu: Add inference_timeout_ms module parameter
Add new inference_timeout_ms parameter that allows specifying
maximum allowed duration in milliseconds that inference can take before
triggering a recovery.

Calculate maximum number of heartbeat retries based on ratio between
inference timeout and tdr timeout.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250515093128.252041-1-jacek.lawrynowicz@linux.intel.com
2025-06-02 14:39:50 +02:00
Jacek Lawrynowicz c4eb2f88d2 accel/ivpu: Increase state dump msg timeout
Increase JMS message state dump command timeout to 100 ms. On some
platforms, the FW may take a bit longer than 50 ms to dump its state
to the log buffer and we don't want to miss any debug info during TDR.

Fixes: 5e162f872d ("accel/ivpu: Add FW state dump on TDR")
Cc: stable@vger.kernel.org # v6.13+
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250425092822.2194465-1-jacek.lawrynowicz@linux.intel.com
2025-04-30 11:26:20 +02:00
Karol Wachowski 011529fe81 accel/ivpu: Implement D0i2 disable test mode
Add power_profile firmware boot param and set it to 0 by default
which is default FW power profile.

Implement IVPU_TEST_MODE_D0I2_DISABLE which is used for setting
power profile boot param value to 1 which prevents NPU from entering
d0i2 power state.

Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250204084622.2422544-7-jacek.lawrynowicz@linux.intel.com
2025-02-10 10:45:43 +01:00
Karol Wachowski 55e856c344 accel/ivpu: Add test modes to toggle clock relinquish disable
Add IVPU_TEST_MODE_CLK_RELINQ_[DISABLE|ENABLE] that overrides
workaround for disabling clock relinquish for testing purposes.

Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250204084622.2422544-6-jacek.lawrynowicz@linux.intel.com
2025-02-10 10:45:43 +01:00
Karol Wachowski 320323d2e5 accel/ivpu: Add debugfs interface for setting HWS priority bands
Add debugfs interface to modify following priority bands properties:
 * grace period
 * process grace period
 * process quantum

This allows for the adjustment of hardware scheduling algorithm parameters
for each existing priority band, facilitating validation and fine-tuning.

Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250204084622.2422544-4-jacek.lawrynowicz@linux.intel.com
2025-02-10 10:45:41 +01:00
Tomasz Rusinowicz af80fe138b accel/ivpu: Enable recovery and adjust timeouts for fpga
Recovery now works on fpga. JSM state dump timeout needs to
be really long for the new fpga model releases.

Enable punit on fpga.

Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129125636.1047413-6-jacek.lawrynowicz@linux.intel.com
2025-02-03 10:40:34 +01:00
Jacek Lawrynowicz b8c00323ae accel/ivpu: Update last_busy in IRQ handler
Call pm_runtime_mark_last_busy() in top half of IRQ handler to prevent
device from being runtime suspended before bottom half is executed on
a workqueue.

Reviewed-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129125636.1047413-3-jacek.lawrynowicz@linux.intel.com
2025-02-03 10:40:34 +01:00
Jacek Lawrynowicz 3477696345 accel/ivpu: Add support for hardware fault injection
Introduces the capability to simulate hardware faults for testing
purposes. The new `fail_hw` fault can be injected in
`ivpu_hw_reg_poll_fld()`, which is used in various parts of the driver
to wait for the hardware to reach a specific state. This allows to test
failures during NPU boot and shutdown, IPC message handling and more.

Fault injection can be enabled using debugfs or a module parameter.

Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129125636.1047413-2-jacek.lawrynowicz@linux.intel.com
2025-02-03 10:39:45 +01:00
Karol Wachowski dd4f78ec6a accel/ivpu: Add platform detection for presilicon
Use highest buttress VPU_STATUS register bits(15:13) that encode
platform type as follows:
	0 - Silicon
	2 - Simics
	3 - FPGA
	4 - Hybrid SLE

Remove old DMI based method.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-14-maciej.falkowski@linux.intel.com
2025-01-09 09:35:45 +01:00
Maciej Falkowski bc3e5f48b7 accel/ivpu: Use workqueue for IRQ handling
Convert IRQ bottom half from the thread handler into workqueue.
This increases a stability in rare scenarios where driver on
debugging/hardening kernels processes IRQ too slow and misses
some interrupts due to it.
Workqueue handler also gives a very minor performance increase.

Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-6-maciej.falkowski@linux.intel.com
2025-01-09 09:35:44 +01:00
Dave Airlie 30169bb645 Backmerge v6.12-rc6 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next
Backmerge Linus tree for some drm-fixes needed for msm and xe merges.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-11-04 14:25:33 +10:00
Karol Wachowski 83b6fa5844 accel/ivpu: Increase DMA address range
Increase DMA address range to:
 * 128 GB on 37xx (due to MMU limitations)
 * 256 GB on other generations
Merge User and DMA ranges on 40xx and above as it is possible
to access whole 256 GBs from both FW and DMA.

Increase User range on 37xx from 255MB to 511MB
to allow loading very large models.

Do not set global_alias_pio_base/size on other generations than 37xx
as it's only used on 37xx anyway.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017145817.121590-11-jacek.lawrynowicz@linux.intel.com
2024-10-30 10:23:57 +01:00
Andrzej Kacprowski 72f7e16ecc accel/ivpu: Fix NOC firewall interrupt handling
The NOC firewall interrupt means that the HW prevented
unauthorized access to a protected resource, so there
is no need to trigger device reset in such case.

To facilitate security testing add firewall_irq_counter
debugfs file that tracks firewall interrupts.

Fixes: 8a27ad81f7 ("accel/ivpu: Split IP and buttress code")
Cc: stable@vger.kernel.org # v6.11+
Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017144958.79327-1-jacek.lawrynowicz@linux.intel.com
2024-10-30 10:17:00 +01:00
Karol Wachowski 03b3b6657d accel/ivpu: Turn on autosuspend on Simics
With recent Simics update DVFS flows using cdyn were fixed
and it is possible to enable D0i3/D3 entry flows on autosuspend.
Set autosuspend timeout to 100 ms by default on Simics.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-11-jacek.lawrynowicz@linux.intel.com
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
2024-10-11 12:44:38 +02:00
Tomasz Rusinowicz 5e162f872d accel/ivpu: Add FW state dump on TDR
Send JSM state dump message at the beginning of TDR handler. This allows
FW to collect debug info in the FW log before the state of the NPU is
lost allowing to analyze the cause of a TDR.

Wait a predefined timeout (10 ms) so the FW has a chance to write debug
logs. We cannot wait for JSM response at this point because IRQs are
already disabled before TDR handler is invoked.

Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-9-jacek.lawrynowicz@linux.intel.com
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
2024-10-11 12:44:38 +02:00
Andrzej Kacprowski cf1d06ac53 accel/ivpu: Increase autosuspend delay to 100ms on 40xx
The new HW is more power efficient and there is no
need to enter the D0i3/D3 so quickly. Increasing
autosuspend delay reduces latency in certain usage
scenarios.

Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611120433.1012423-14-jacek.lawrynowicz@linux.intel.com
2024-06-14 09:15:08 +02:00
Maciej Falkowski 3f440e0b48 accel/ivpu: Add test mode flag for disabling timeouts
Add new test mode flag that will disable all timeouts
defined in timeout fields of struct ivpu_device.
Remove also reschedule_suspend field as it is unused.

Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611120433.1012423-11-jacek.lawrynowicz@linux.intel.com
2024-06-14 09:14:36 +02:00
Wachowski, Karol 52ab5be191 accel/ivpu: Disable disable_clock_relinquish WA for LNL B0+
This WA is only needed for LNL revision A.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611120433.1012423-5-jacek.lawrynowicz@linux.intel.com
2024-06-14 09:12:58 +02:00
Wachowski, Karol d9dfc4eaa3 accel/ivpu: Add wp0_during_power_up WA
Send workpoint 0 request during power up on 37xx.
This is needed in rare case where WP0 was not sent
during power down due to device hang.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611120433.1012423-2-jacek.lawrynowicz@linux.intel.com
2024-06-14 09:09:36 +02:00
Jacek Lawrynowicz 2f7ffb06d6 accel/ivpu: Replace wake_thread with kfifo
Use kfifo to pass IRQ sources to IRQ thread so it will be possible to
use IRQ thread by multiple IRQ types.

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240515113006.457472-4-jacek.lawrynowicz@linux.intel.com
2024-05-17 08:30:27 +02:00
Wachowski, Karol 8a27ad81f7 accel/ivpu: Split IP and buttress code
The NPU device consists of two parts: NPU buttress and NPU IP.
Buttress is a platform specific part that integrates the NPU IP with
the CPU.
NPU IP is the platform agnostic part that does the inference.

This separation enables support for multiple platforms using
a single NPU IP, so for example NPU IP 37XX could be integrated into
MTL and LNL platforms.

Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240515113006.457472-3-jacek.lawrynowicz@linux.intel.com
2024-05-17 08:30:24 +02:00