Commit Graph

527 Commits

Author SHA1 Message Date
Linus Torvalds 43dfc13ca9 pci-v6.19-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmkwoyoUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vztdBAAjvZO0NOafCbhn6lUAz/T4VxPY0R7
 L5RqeLci33rQzbQ0yhJYXsd8VemMo6Zk0qmlSwjddlOMPboHoC1jK3i4C16QoP+3
 R5ecab0VoImLl2Ffig0BZoHQpVq01q2kTGQ2YyrryzDCgBCsBG3U10ZD380pGsTW
 ypqEgOCxaQCq2mtqr5CavaCcquq2krrnHkkVQOP1ryWzRq1C3wDXcQXFYNdzXpDP
 Lq8pBIh8WN5pYwrqjrFMrtNhj7BHPmowLEaAbNIWmH8WjGav624XcKq2O+arx3Hl
 BDHFKVjWtiYikrWmAODZhlY3HgEj546h2HtQYwWPOKuSKkgzNVn28LFIDgL3oXDP
 sJ6gWIDWMgRpEI6VzmqzRXJWbTAkIrRfHv3QFzvATSZV7eHk3eUpMZtG/qOi5z3P
 rPwW7NSXKbPg5qi+zKpqC20Im1Wm6fF4qQtim3uNlz2KXlVGvQl5/Ww27nIZMn6B
 Kkv5RGzdodePbKGqd+CADAQoJOpR6kOng5ZaRdZx+6aTTooyM7KSk8bFq7j/CoYM
 PzVtO6IHIvV42di7H/NP8/qtQA3xwSLue3lWpTh+tzYmZvCdldZiIvBo9YXnlsEM
 kCetDIGBxUWBj7OdYK4hC1BPBVw3XtCWM/51ElslDZTPvT543lApBfMnVTv3JmFp
 gzvZ7XfZx443JOg=
 =qSzr
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan
     Williams)

   - Switch vmd from custom domain number allocator to the common
     allocator to prevent a potential race with new non-VMD buses (Dan
     Williams)

   - Enable Precision Time Measurement (PTM) only if device advertises
     support for a relevant role, to prevent invalid PTM Requests that
     cause ACS violations that are reported as AER Uncorrectable
     Non-Fatal errors (Mika Westerberg)

  Resource management:

   - Prevent resource tree corruption when BAR resize fails (Ilpo
     Järvinen)

   - Restore BARs to the original size if a BAR resize fails (Ilpo
     Järvinen)

   - Remove BAR release from BAR resize attempts by the xe, i915, and
     amdgpu drivers so the PCI core can restore BARs if the resize fails
     (Ilpo Järvinen)

   - Move Resizable BAR code to rebar.c (Ilpo Järvinen)

   - Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo
     Järvinen)

   - Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo
     Järvinen)

  Power management and error handling:

   - For drivers using PCI legacy suspend, save config state at suspend
     so that state (not any earlier state from enumeration, probe, or
     error recovery) will be restored when resuming (Lukas Wunner)

   - For devices with no driver or a driver that lacks power management,
     save config state at hibernate so that state (not any earlier state
     from enumeration, probe, or error recovery) will be restored when
     resuming (Lukas Wunner)

   - Save device config space on device addition, before driver binding,
     so error recovery works more reliably (Lukas Wunner)

   - Drop pci_save_state() from several drivers that no longer need it
     since the PCI core always does it and pci_restore_state() no longer
     invalidates the saved state (Lukas Wunner)

   - Document use of pci_save_state() by drivers to capture the state
     they want restored during error recovery (Lukas Wunner)

  Power control:

   - Add a struct pci_ops.assert_perst() function pointer to
     assert/deassert PCIe PERST# and implement it for the qcom driver
     (Krishna Chaitanya Chundru)

   - Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe
     switch, which must be held in reset after poweron so the pwrctrl
     driver can configure the switch via I2C before bringing up the
     links (Krishna Chaitanya Chundru)

  Endpoint framework:

   - Convert the endpoint doorbell test to use a threaded IRQ to fix a
     'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri)

   - Add endpoint VNTB MSI doorbell support to reduce latency between
     host and endpoint (Frank Li)

  New native PCIe controller drivers:

   - Add CIX Sky1 host controller DT binding and driver (Hans Zhang)

   - Add NXP S32G host controller DT binding and driver (Vincent
     Guittot)

   - Add Renesas RZ/G3S host controller DT binding and driver (Claudiu
     Beznea)

   - Add SpacemiT K1 host controller DT binding and driver (Alex Elder)

  Amlogic Meson PCIe controller driver:

   - Update DT binding to name DBI region 'dbi', not 'elbi', and update
     driver to support both (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Move struct pci_host_bridge allocation from pci_host_common_init()
     to callers, which significantly simplifies pcie-apple (Marc
     Zyngier)

  Broadcom STB PCIe controller driver:

   - Disable advertising ASPM L0s support correctly (Jim Quinlan)

   - Add a panic/die handler to print diagnostic info in case PCIe
     caused an unrecoverable abort (Jim Quinlan)

  Cadence PCIe controller driver:

   - Add module support for Cadence platform host and endpoint
     controller driver (Manikandan K Pillai)

   - Split headers into 'legacy' (LGA) and 'high perf' (HPA) to prepare
     for new CIX Sky1 driver (Manikandan K Pillai)

  MediaTek PCIe controller driver:

   - Convert DT binding to YAML schema (Christian Marangi)

   - Add Airoha AN7583 DT compatible and driver support (Christian
     Marangi)

  Qualcomm PCIe controller driver:

   - Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu)

   - Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280,
     sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT
     schemas (Krzysztof Kozlowski)

   - Look up OPP using both frequency and data rate (not just frequency)
     so RPMh votes can account for both (Krishna Chaitanya Chundru)

  Rockchip DesignWare PCIe controller driver:

   - Add Rockchip RK3528 compatible strings in DT binding (Yao Zi)

  STMicroelectronics STM32MP25 PCIe controller driver:

   - Fix a race between link training and endpoint register
     initialization (Christian Bruel)

   - Align endpoint allocations to match the ATU requirements (Christian
     Bruel)

  Synopsys DesignWare PCIe controller driver:

   - Clear L1 PM Substate Capability 'Supported' bits unless glue driver
     says it's supported, which prevents users from enabling non-working
     L1SS. Currently only qcom and tegra194 support L1SS (Bjorn Helgaas)

   - Remove now-superfluous L1SS disable code from tegra194 (Bjorn
     Helgaas)

   - Configure L1SS support in dw-rockchip when DT says
     'supports-clkreq' (Shawn Lin)

  TI Keystone PCIe controller driver:

   - Fail the probe instead of silently succeeding if ks_pcie_of_data
     didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli)

   - Make keystone buildable as a loadable module, except on ARM32 where
     hook_fault_code() is __init (Siddharth Vadapalli)"

* tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (100 commits)
  MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer
  MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer
  PCI: sky1: Add PCIe host support for CIX Sky1
  dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings
  PCI: cadence: Add support for High Perf Architecture (HPA) controller
  MAINTAINERS: Add NXP S32G PCIe controller driver maintainer
  PCI: s32g: Add NXP S32G PCIe controller driver (RC)
  PCI: dwc: Add register and bitfield definitions
  dt-bindings: PCI: s32g: Add NXP S32G PCIe controller
  PCI: Add Renesas RZ/G3S host controller driver
  PCI: host-generic: Move bridge allocation outside of pci_host_common_init()
  dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding
  PCI: Validate pci_rebar_size_supported() input
  Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  treewide: Drop pci_save_state() after pci_restore_state()
  PCI/ERR: Ensure error recoverability at all times
  PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw
  PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths
  PCI: dw-rockchip: Configure L1SS support
  PCI: tegra194: Remove unnecessary L1SS disable code
  ...
2025-12-04 17:29:41 -08:00
Ilpo Järvinen a337869885 PCI: Move pci_rebar_size_to_bytes() and export it
pci_rebar_size_to_bytes() is in drivers/pci/pci.h but would be useful for
endpoint drivers as well.

Move the function to rebar.c and export it.

In addition, convert the literal to where the number comes from
(PCI_REBAR_MIN_SIZE).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20251113180053.27944-4-ilpo.jarvinen@linux.intel.com
2025-11-14 12:34:21 -06:00
Ilpo Järvinen 9f71938cd7 PCI: Move Resizable BAR code to rebar.c
For lack of a better place to put it, Resizable BAR code has been placed
inside pci.c and setup-res.c that do not use it for anything.  Upcoming
changes are going to add more Resizable BAR related functions, increasing
the code size.

As pci.c is huge as is, move the Resizable BAR related code and the BAR
resize code from setup-res.c to rebar.c.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20251113180053.27944-2-ilpo.jarvinen@linux.intel.com
2025-11-14 12:34:21 -06:00
Ilpo Järvinen 337b1b566d PCI: Fix restoring BARs on BAR resize rollback path
BAR resize operation is implemented in the pci_resize_resource() and
pbus_reassign_bridge_resources() functions. pci_resize_resource() can be
called either from __resource_resize_store() from sysfs or directly by the
driver for the Endpoint Device.

The pci_resize_resource() requires that caller has released the device
resources that share the bridge window with the BAR to be resized as
otherwise the bridge window is pinned in place and cannot be changed.

pbus_reassign_bridge_resources() rolls back resources if the resize
operation fails, but rollback is performed only for the bridge windows.
Because releasing the device resources are done by the caller of the BAR
resize interface, these functions performing the BAR resize do not have
access to the device resources as they were before the resize.

pbus_reassign_bridge_resources() could try __pci_bridge_assign_resources()
after rolling back the bridge windows as they were, however, it will not
guarantee the resource are assigned due to differences in how FW and the
kernel assign the resources (alignment of the start address and tail).

To perform rollback robustly, the BAR resize interface has to be altered to
also release the device resources that share the bridge window with the BAR
to be resized.

Also, remove restoring from the entries failed list as saved list should
now contain both the bridge windows and device resources so the extra
restore is duplicated work.

Some drivers (currently only amdgpu) want to prevent releasing some
resources. Add exclude_bars param to pci_resize_resource() and make amdgpu
pass its register BAR (BAR 2 or 5), which should never be released during
resize operation. Normally 64-bit prefetchable resources do not share a
bridge window with the 32-bit only register BAR, but there are various
fallbacks in the resource assignment logic which may make the resources
share the bridge window in rare cases.

This change (together with the driver side changes) is to counter the
resource releases that had to be done to prevent resource tree corruption
in the ("PCI: Release assigned resource before restoring them") change. As
such, it likely restores functionality in cases where device resources were
released to avoid resource tree conflicts which appeared to be "working"
when such conflicts were not correctly detected by the kernel.

Reported-by: Simon Richter <Simon.Richter@hogyros.de>
Link: https://lore.kernel.org/linux-pci/f9a8c975-f5d3-4dd2-988e-4371a1433a60@hogyros.de/
Reported-by: Alex Bennée <alex.bennee@linaro.org>
Link: https://lore.kernel.org/linux-pci/874irqop6b.fsf@draig.linaro.org/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: squash amdgpu BAR selection from
https://lore.kernel.org/r/20251114103053.13778-1-ilpo.jarvinen@linux.intel.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20251113162628.5946-7-ilpo.jarvinen@linux.intel.com
2025-11-14 12:33:14 -06:00
Ilpo Järvinen 4687b3315a PCI/IOV: Adjust ->barsz[] when changing BAR size
pci_rebar_set_size() adjusts BAR size for both normal and IOV BARs. The
struct pci_sriov keeps a cached copy of BAR size in ->barsz[] which is not
adjusted by pci_rebar_set_size() but by pci_iov_resource_set_size().
pci_iov_resource_set_size() is called also from
pci_resize_resource_set_size().

The current arrangement is problematic once BAR resize algorithm starts to
roll back changes properly in case of a failure. The normal resource
fitting algorithm rolls back resource size using the struct
pci_dev_resource easily but also calling pci_resize_resource_set_size() or
pci_iov_resource_set_size() to roll back BAR size would be an extra burden,
whereas combining ->barsz[] update with pci_rebar_set_size() naturally
rolls back it when restoring the old BAR size on a different layer of the
BAR resize operation.

Thus, rework pci_rebar_set_size() to also update ->barsz[].

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU
Link: https://patch.msgid.link/20251113162628.5946-3-ilpo.jarvinen@linux.intel.com
2025-11-14 12:32:47 -06:00
Bjorn Helgaas 575b98e39d PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states
Add pcie_aspm_remove_cap().  A quirk can use this to prevent use of ASPM
L0s or L1 link states, even if the device advertised support for them.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20251110222929.2140564-3-helgaas@kernel.org
2025-11-12 18:51:27 -06:00
Bjorn Helgaas fef3530379 Merge branch 'pci/capability-search'
- Simplify __pci_find_next_cap_ttl() by replacing magic numbers with
  #defines, extracting fields with FIELD_GET(), etc (Hans Zhang)

- Convert __pci_find_next_cap_ttl() to a PCI_FIND_NEXT_CAP() macro that
  takes a config space accessor function so we can also use it in cases
  where the usual config accessors aren't available (Hans Zhang)

- Similarly convert pci_find_next_ext_capability() to a
  PCI_FIND_NEXT_EXT_CAP() macro (Hans Zhang)

- Implement dwc, dwc endpoint, and cadence capability search interfaces on
  top of PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP(), replacing the
  previous duplicated code (Hans Zhang)

- Search for capabilities in the cadence core instead of hard-coding their
  offsets, which are subject to change (Hans Zhang)

* pci/capability-search:
  PCI: cadence: Use cdns_pcie_find_*capability() to avoid hardcoding offsets
  PCI: cadence: Implement capability search using PCI core APIs
  PCI: dwc: ep: Implement capability search using PCI core APIs
  PCI: dwc: Implement capability search using PCI core APIs
  PCI: Refactor extended capability search into PCI_FIND_NEXT_EXT_CAP()
  PCI: Refactor capability search into PCI_FIND_NEXT_CAP()
  PCI: Clean up __pci_find_next_cap_ttl() readability
2025-10-03 12:13:14 -05:00
Bjorn Helgaas 3d56c86318 Merge branch 'pci/virtualization'
- Add rescan/remove locking when enabling/disabling SR-IOV, which solves
  list corruption on s390, where disabling SR-IOV also generates hotplug
  events (Niklas Schnelle)

- Add lockdep assertion in pci_stop_and_remove_bus_device() to catch
  device removal without appropriate locking (Niklas Schnelle)

* pci/virtualization:
  PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
  PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
2025-10-03 12:13:12 -05:00
Bjorn Helgaas fead6a0b15 Merge branch 'pci/resource'
- Ensure relaxed tail alignment does not increase min_align when computing
  bridge window size, to fix a regression (Ilpo Järvinen)

- Fix bridge window size computation to fix a regression for devices with
  undefined PCI class, e.g., Samsung [144d:a5a5] (Ilpo Järvinen)

- Fix error handling during resource resize to fix a regression in amdgpu
  (Ilpo Järvinen)

- Align m68k pcibios_enable_device() with other arches (Ilpo Järvinen)

- Remove several sparc pcibios_enable_device() implementations that don't
  do anything beyond what pci_enable_resources() does (Ilpo Järvinen)

- Remove mips pcibios_enable_resources() and use pci_enable_resources()
  instead (Ilpo Järvinen)

- Refactor and simplify find_bus_resource_of_type() (Ilpo Järvinen)

- Claim bridge windows before setting them up (Ilpo Järvinen)

- Disable non-claimed bridge windows so the kernel's view matches the
  hardware configuration (Ilpo Järvinen)

- Use pci_release_resource() instead of release_resource() to reduce code
  duplication and increase consistency (Ilpo Järvinen)

- Enable bridges even if bridge window assignment fails (Ilpo Järvinen)

- Preserve bridge window resource type flags when assignment fails because
  we may need it later (Ilpo Järvinen)

- Add bridge window selection functions to make the selection consistent
  across the several places that do this (Ilpo Järvinen)

- Warn if bridge window cannot be released when resizing BAR (Ilpo
  Järvinen)

- Set up bridge resources before enumerating children so we can check
  whether child resources are inside bridge windows (Ilpo Järvinen)

* pci/resource:
  PCI: Set up bridge resources earlier
  PCI: Don't print stale information about resource
  PCI: Alter misleading recursion to pci_bus_release_bridge_resources()
  PCI: Pass bridge window to pci_bus_release_bridge_resources()
  PCI: Add pci_setup_one_bridge_window()
  PCI: Refactor remove_dev_resources() to use pbus_select_window()
  PCI: Refactor distributing available memory to use loops
  PCI: Use pbus_select_window_for_type() during mem window sizing
  PCI: Use pbus_select_window() in space available checker
  PCI: Rename resource variable from r to res
  PCI: Use pbus_select_window_for_type() during IO window sizing
  PCI: Use pbus_select_window() during BAR resize
  PCI: Warn if bridge window cannot be released when resizing BAR
  PCI: Fix finding bridge window in pci_reassign_bridge_resources()
  PCI: Add bridge window selection functions
  PCI: Add defines for bridge window indexing
  PCI: Preserve bridge window resource type flags
  PCI: Enable bridge even if bridge window fails to assign
  PCI: Use pci_release_resource() instead of release_resource()
  PCI: Disable non-claimed bridge window
  PCI: Always claim bridge window before its setup
  PCI: Refactor find_bus_resource_of_type() logic checks
  PCI: Move find_bus_resource_of_type() earlier
  MIPS: PCI: Use pci_enable_resources()
  sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
  m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
  PCI: Fix failure detection during resource resize
  PCI: Fix pdev_resources_assignable() disparity
  PCI: Ensure relaxed tail alignment does not increase min_align
2025-10-03 12:13:12 -05:00
Niklas Schnelle 60e7b5aa85 PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
Removing a PCI devices requires holding pci_rescan_remove_lock. Prompted by
this being missed in sriov_disable() and going unnoticed since its
inception, add a lockdep assert so this doesn't get missed again in the
future.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-2-2d0bc938f2a3@linux.ibm.com
2025-09-26 16:01:17 -05:00
Ilpo Järvinen 3ab10f83e2 PCI: Fix finding bridge window in pci_reassign_bridge_resources()
pci_reassign_bridge_resources() walks upwards in the PCI bus hierarchy,
locates the relevant bridge window on each level using flags check, and
attempts to release the bridge window. The flags-based check is fragile due
to various fallbacks in the bridge window selection logic. As such, the
algorithm might not locate the correct bridge window.

Refactor pci_reassign_bridge_resources() to determine the correct bridge
window using pbus_select_window(), which contains logic to handle all
fallback cases correctly. Change function prefix to pbus as it now inputs
struct bus and resource for which to locate the bridge window.

The main purpose is to make bridge window selection logic consistent across
the entire PCI core (one step at a time). While this technically also fixes
the commit 8bb705e3e7 ("PCI: Add pci_resize_resource() for resizing
BARs") making the bridge window walk algorithm more robust, the normal
setup having a 64-bit resizable BAR underneath bridge(s) with 64-bit
prefetchable windows does not need to use any fallbacks. As such, the
practical impact is low (requiring BAR resize use case and a non-typical
bridge device).

The way to detect if unrelated resource failed again is left to use the
type based approximation which should not behave worse than before.

Fixes: 8bb705e3e7 ("PCI: Add pci_resize_resource() for resizing BARs")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-14-ilpo.jarvinen@linux.intel.com
2025-09-16 11:19:31 -05:00
Ilpo Järvinen 74afce3dfc PCI: Add bridge window selection functions
Various places in the PCI core code independently decide into which bridge
window a child resource should be placed. It is hard to see whether these
decisions always end up in agreement, especially in the corner cases, and
in some places it requires complex logic to pass multiple resource types
and/or bridge windows around.

Add pbus_select_window() and pbus_select_window_for_type() for cases where
the former cannot be used so that eventually the same helper can be used to
select the bridge window everywhere. Using the same function ensures the
selected bridge window remains always the same and it can be easily
recalculated in-situ allowing simplifying the interfaces between internal
functions in upcoming changes.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-13-ilpo.jarvinen@linux.intel.com
2025-09-16 11:19:28 -05:00
Ilpo Järvinen e4934832c5 PCI: Add defines for bridge window indexing
include/linux/pci.h provides PCI_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines,
however, they're based on the resource array indexing in the pci_dev
struct. The struct pci_bus also has pointers to those same resources but
they start from zeroth index.

Add PCI_BUS_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines to get rid of literal
indexing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-12-ilpo.jarvinen@linux.intel.com
2025-09-16 11:19:24 -05:00
Hans Zhang 4d909bf1a5 PCI: Refactor extended capability search into PCI_FIND_NEXT_EXT_CAP()
Move the extended Capability search logic into a PCI_FIND_NEXT_EXT_CAP()
macro that accepts a config space accessor function as an argument. This
enables controller drivers to perform Capability discovery using their
early access mechanisms prior to full device initialization while sharing
the Capability search code.

Convert the existing PCI core extended Capability search implementation to
use PCI_FIND_NEXT_EXT_CAP().

Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://patch.msgid.link/20250813144529.303548-4-18255117159@163.com
2025-08-14 15:03:45 -05:00
Hans Zhang 3c02084d8f PCI: Refactor capability search into PCI_FIND_NEXT_CAP()
The PCI Capability search functionality is duplicated across the PCI core
and several controller drivers. The core's current implementation requires
fully initialized PCI device and bus structures, which prevents controller
drivers from using it during early initialization phases before these
structures are available.

Move the Capability search logic into a PCI_FIND_NEXT_CAP() macro that
accepts a config space accessor function as an argument. This enables
controller drivers to perform Capability discovery using their early
access mechanisms prior to full device initialization while sharing the
Capability search code.

Convert the existing PCI core Capability search implementation to use
PCI_FIND_NEXT_CAP().  Controller drivers can later use this with their
early access mechanisms while maintaining the existing protection against
infinite loops through preserved TTL checks.

Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://patch.msgid.link/20250813144529.303548-3-18255117159@163.com
2025-08-14 15:03:40 -05:00
Ilpo Järvinen c763fae8c4 PCI: Clean up pci_scan_child_bus_extend() loop
pci_scan_child_bus_extend() open-codes device number iteration in the for
loop. Convert to use PCI_DEVFN() and add PCI_MAX_NR_DEVS (there seems to be
no pre-existing define for this purpose).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250610105820.7126-3-ilpo.jarvinen@linux.intel.com
2025-08-11 15:00:51 -05:00
Bjorn Helgaas 480b315376 Merge branch 'pci/controller/linkup-fix'
- Rename PCIE_RESET_CONFIG_DEVICE_WAIT_MS to PCIE_RESET_CONFIG_WAIT_MS (the
  required delay before sending config requests after a reset) (Niklas
  Cassel)

- PCIE_T_RRS_READY_MS and PCIE_RESET_CONFIG_WAIT_MS were two names for the
  same delay; replace PCIE_T_RRS_READY_MS with PCIE_RESET_CONFIG_WAIT_MS
  and remove PCIE_T_RRS_READY_MS (Niklas Cassel)

- Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ to
  dw-rockchip, qcom (Niklas Cassel)

- Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on
  Ports that support > 5.0 GT/s in dwc core (Niklas Cassel)

- Move LINK_WAIT_SLEEP_MS and LINK_WAIT_MAX_RETRIES to pci.h and prefix
  with 'PCIE_' for potential sharing across drivers (Niklas Cassel)

* pci/controller/linkup-fix:
  PCI: Move link up wait time and max retries macros to pci.h
  PCI: dwc: Ensure that dw_pcie_wait_for_link() waits 100 ms after link up
  PCI: qcom: Wait PCIE_RESET_CONFIG_WAIT_MS after link-up IRQ
  PCI: dw-rockchip: Wait PCIE_RESET_CONFIG_WAIT_MS after link-up IRQ
  PCI: rockchip-host: Use macro PCIE_RESET_CONFIG_WAIT_MS
  PCI: Rename PCIE_RESET_CONFIG_DEVICE_WAIT_MS to PCIE_RESET_CONFIG_WAIT_MS
2025-07-31 16:11:47 -05:00
Bjorn Helgaas 29ccb7b97e Merge branch 'pci/resources'
- Restore VF resizable BAR state after reset (Michał Winiarski)

- Add pci_resource_num_to_vf_bar() and pci_resource_num_from_vf_bar() to
  convert between VF BAR number and the dev->resource[] index (Michał
  Winiarski)

- Allow IOV resources (VF BARs) to be resized (Michał Winiarski)

- Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size (Michał
  Winiarski)

* pci/resources:
  PCI/IOV: Allow drivers to control VF BAR size
  PCI/IOV: Check that VF BAR fits within the reservation
  PCI/IOV: Allow IOV resources to be resized in pci_resize_resource()
  PCI/IOV: Add pci_resource_num_to_vf_bar() to convert VF BAR number to/from IOV resource
  PCI/IOV: Restore VF resizable BAR state after reset
2025-07-31 16:11:43 -05:00
Michał Winiarski e200f4f7ea PCI/IOV: Allow IOV resources to be resized in pci_resize_resource()
Similar to regular resizable BARs, VF BARs can also be resized.

The capability layout is the same as PCI_EXT_CAP_ID_REBAR, which means we
can reuse most of the implementation, the only difference being resource
size calculation (which is multiplied by total VFs) and memory decoding
(which is controlled by a separate VF MSE field in SR-IOV cap).

Extend the pci_resize_resource() function to accept IOV resources.

See PCIe r6.2, sec 7.8.7.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250702093522.518099-4-michal.winiarski@intel.com
2025-07-14 15:04:18 -05:00
Michał Winiarski 535bdbeaac PCI/IOV: Add pci_resource_num_to_vf_bar() to convert VF BAR number to/from IOV resource
There are multiple places where conversions between IOV resources and
corresponding VF BAR numbers are done.

Extract the logic to pci_resource_num_from_vf_bar() and
pci_resource_num_to_vf_bar() helpers.

Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20250702093522.518099-3-michal.winiarski@intel.com
2025-07-14 15:03:23 -05:00
Michał Winiarski 5a8f77e24a PCI/IOV: Restore VF resizable BAR state after reset
Similar to regular resizable BARs, VF BARs can also be resized, e.g. by the
system firmware or the PCI subsystem itself.

The capability layout is the same as PCI_EXT_CAP_ID_REBAR.

Add the capability ID and restore it as a part of IOV state.

See PCIe r6.2, sec 7.8.7.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patch.msgid.link/20250702093522.518099-2-michal.winiarski@intel.com
2025-07-14 14:58:13 -05:00
Niklas Cassel d7467bc72c PCI: Move link up wait time and max retries macros to pci.h
Move the LINK_WAIT_SLEEP_MS and LINK_WAIT_MAX_RETRIES macros to pci.h.
Prefix the macros with PCIE_ in order to avoid redefining these for
drivers that already have macros named like this.

No functional changes.

Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250625102347.1205584-15-cassel@kernel.org
2025-06-25 07:26:04 -06:00
Niklas Cassel bbc6a829ad PCI: rockchip-host: Use macro PCIE_RESET_CONFIG_WAIT_MS
Macro PCIE_RESET_CONFIG_DEVICE_WAIT_MS was added to pci.h in commit
d5ceb9496c ("PCI: Add PCIE_RESET_CONFIG_DEVICE_WAIT_MS waiting time
value").

Later, in commit 70a7bfb1e5 ("PCI: rockchip-host: Wait 100ms after reset
before starting configuration"), PCIE_T_RRS_READY_MS was added to pci.h.

These macros are duplicates, and represent the exact same delay in the
PCIe specification.

Since the comment above PCIE_RESET_CONFIG_WAIT_MS is strictly more correct
than the comment above PCIE_T_RRS_READY_MS, change rockchip-host to use
PCIE_RESET_CONFIG_WAIT_MS, and remove PCIE_T_RRS_READY_MS, as
rockchip-host is the only user of this macro.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Link: https://patch.msgid.link/20250625102347.1205584-11-cassel@kernel.org
2025-06-25 07:25:07 -06:00
Niklas Cassel 817f989700 PCI: Rename PCIE_RESET_CONFIG_DEVICE_WAIT_MS to PCIE_RESET_CONFIG_WAIT_MS
Rename PCIE_RESET_CONFIG_DEVICE_WAIT_MS to PCIE_RESET_CONFIG_WAIT_MS.

Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250625102347.1205584-10-cassel@kernel.org
2025-06-25 07:25:00 -06:00
Jiwei Sun 9989e0ca74 PCI: Fix link speed calculation on retrain failure
When pcie_failed_link_retrain() fails to retrain, it tries to revert to the
previous link speed.  However it calculates that speed from the Link
Control 2 register without masking out non-speed bits first.

PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to
PCI_SPEED_UNKNOWN (0xff), which in turn causes a WARN splat in
pcie_set_target_speed():

  pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port
  pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
  pci 0000:00:01.1: retraining failed
  WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed
  RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000
  pcie_failed_link_retrain
  pci_device_add
  pci_scan_single_device

Mask out the non-speed bits in PCIE_LNKCTL2_TLS2SPEED() and
PCIE_LNKCAP_SLS2SPEED() so they don't incorrectly return PCI_SPEED_UNKNOWN.

Fixes: de9a6c8d5d ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed")
Reported-by: Andrew <andreasx0@protonmail.com>
Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/
Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Jiwei Sun <sunjw10@lenovo.com>
[bhelgaas: commit log, add details from https://lore.kernel.org/r/1c92ef6bcb314ee6977839b46b393282e4f52e74.1750684771.git.lukas@wunner.de]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org	# v6.13+
Link: https://patch.msgid.link/20250123055155.22648-2-sjiwei@163.com
2025-06-24 15:48:35 -05:00
Linus Torvalds 3719a04a80 pci-v6.16-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmhAa9EUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyA3w//aX8d73z/xVxkYLMN/6XQA5fdmd4d
 Dv4n0Pjf0WCMKbsgRCdXEYLvcHV8VhH5iCR/b2UsFm9LjxSIRuqE5XosY3bNhrHn
 xVKEh2prq2XZOibWrFkJ+RZ0FF7Ogq1Uy5gUBbBHbE1q1byZzrOALaF3FWGaDIZQ
 6QLLAFtd3UtqOOUu8J8P9N15uFR8gunyfuM9U7TLMcy4B8txk6T6m/9xAWtRURuJ
 I6WN8lO+g8Nl2mL9m27+wyWiVT3tKqoMwp8rVtym/L5JQOmHycYhn0WQAr2dPCMs
 Xbgmoeei0je7mZvk5btpt68NAKQ3ZnCVkxbbINBkUxAjI0dbI6h37EhW18ShYVUk
 CCo4fmaFtwP8qNN9tSvDN8vZdGB44fN5tIz4lmGzKk5gt+oV50RC/APrzC+PJBQ0
 +2SdDVKj71Gr2H1VnI6uLB7oQ+tp7TOdhg+DGV4bdc6QFnsM+BpKWRq5f1UQcau/
 XVDmorM/2t6z0DNktAv3NFwSodUjk1loWESr/pRBH1AqAWZTK98PWIg97XYsal59
 zbJ3dLrnCqUNozeVgjtZo1LWD2FZaVTvhq2NY7D+QPpnMGhFUhHxNliZUXiQa1q4
 boI2hEFdu3IQP/OC2a1zGJyMRLU43d5rhZ1U5xQSVtM0c3lgCY7rn/t26LymQVPA
 SYdg2jBcnhe6gXo=
 =eWJw
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Print the actual delay time in pci_bridge_wait_for_secondary_bus()
     instead of assuming it was 1000ms (Wilfred Mallawa)

   - Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
     devices', which broke resume from system sleep on AMD platforms and
     has been fixed by other commits (Lukas Wunner)

  Resource management:

   - Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated
     and unnecessary (Philipp Stanner)

   - Remove pcim_iounmap_regions() and pcim_request_region_exclusive()
     and related flags since all uses have been removed (Philipp
     Stanner)

   - Rework devres 'request' functions so they are no longer 'hybrid',
     i.e., their behavior no longer depends on whether
     pcim_enable_device or pci_enable_device() was used, and remove
     related code (Philipp Stanner)

   - Warn (not BUG()) about failure to assign optional resources (Ilpo
     Järvinen)

  Error handling:

   - Log the DPC Error Source ID only when it's actually valid (when
     ERR_FATAL or ERR_NONFATAL was received from a downstream device)
     and decode into bus/device/function (Bjorn Helgaas)

   - Determine AER log level once and save it so all related messages
     use the same level (Karolina Stolarek)

   - Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable
     Errors (Karolina Stolarek)

   - Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs
     controls on interval and burst count, to avoid flooding logs and
     RCU stall warnings (Jon Pan-Doh)

  Power management:

   - Increment PM usage counter when probing reset methods so we don't
     try to read config space of a powered-off device (Alex Williamson)

   - Set all devices to D0 during enumeration to ensure ACPI opregion is
     connected via _REG (Mario Limonciello)

  Power control:

   - Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match
     the filename paths. Retain old deprecated symbols for
     compatibility, except for the pwrctrl slot driver
     (PCI_PWRCTRL_SLOT) (Johan Hovold)

   - When unregistering pwrctrl, cancel outstanding rescan work before
     cleaning up data structures to avoid use-after-free issues (Brian
     Norris)

  Bandwidth control:

   - Simplify link bandwidth controller by replacing the count of Link
     Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN
     flag (Ilpo Järvinen)

   - Update the Link Speed after retraining, since the Link Speed may
     have changed (Ilpo Järvinen)

  PCIe native device hotplug:

   - Ignore Presence Detect Changed caused by DPC.

     pciehp already ignores Link Down/Up events caused by DPC, but on
     slots using in-band presence detect, DPC causes a spurious Presence
     Detect Changed event (Lukas Wunner)

   - Ignore Link Down/Up caused by Secondary Bus Reset.

     On hotplug ports using in-band presence detect, the reset causes a
     Presence Detect Changed event, which mistakenly caused teardown and
     re-enumeration of the device. Drivers may need to annotate code
     that resets their device (Lukas Wunner)

  Virtualization:

   - Add an ACS quirk for Loongson Root Ports that don't advertise ACS
     but don't allow peer-to-peer transactions between Root Ports; the
     quirk allows each Root Port to be in a separate IOMMU group (Huacai
     Chen)

  Endpoint framework:

   - For fixed-size BARs, retain both the actual size and the possibly
     larger size allocated to accommodate iATU alignment requirements
     (Jerome Brunet)

   - Simplify ctrl/SPAD space allocation and avoid allocating more space
     than needed (Jerome Brunet)

   - Correct MSI-X PBA offset calculations for DesignWare and Cadence
     endpoint controllers (Niklas Cassel)

   - Align the return value (number of interrupts) encoding for
     pci_epc_get_msi()/pci_epc_ops::get_msi() and
     pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel)

   - Align the nr_irqs parameter encoding for
     pci_epc_set_msi()/pci_epc_ops::set_msi() and
     pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel)

  Common host controller library:

   - Convert pci-host-common to a library so platforms that don't need
     native host controller drivers don't need to include these helper
     functions (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Extract ECAM bridge creation helper from pci_host_common_probe() to
     separate driver-specific things like MSI from PCI things (Marc
     Zyngier)

   - Dynamically allocate RID-to_SID bitmap to prepare for SoCs with
     varying capabilities (Marc Zyngier)

   - Skip ports disabled in DT when setting up ports (Janne Grunau)

   - Add t6020 compatible string (Alyssa Rosenzweig)

   - Add T602x PCIe support (Hector Martin)

   - Directly set/clear INTx mask bits because T602x dropped the
     accessors that could do this without locking (Marc Zyngier)

   - Move port PHY registers to their own reg items to accommodate
     T602x, which moves them around; retain default offsets for existing
     DTs that lack phy%d entries with the reg offsets (Hector Martin)

   - Stop polling for core refclk, which doesn't work on T602x and the
     bootloader has already done anyway (Hector Martin)

   - Use gpiod_set_value_cansleep() when asserting PERST# in probe
     because we're allowed to sleep there (Hector Martin)

  Cadence PCIe controller driver:

   - Drop a runtime PM 'put' to resolve a runtime atomic count underflow
     (Hans Zhang)

   - Make the cadence core buildable as a module (Kishon Vijay Abraham I)

   - Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by
     loadable drivers when they are removed (Siddharth Vadapalli)

  Freescale i.MX6 PCIe controller driver:

   - Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP
     (Richard Zhu)

   - Remove redundant dw_pcie_wait_for_link() from
     imx_pcie_start_link(); since the DWC core does this, imx6 only
     needs it when retraining for a faster link speed (Richard Zhu)

   - Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu)

   - Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in
     some cases, the controller can't exit 'L23 Ready' through Beacon or
     PERST# deassertion (Richard Zhu)

   - Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum:
     controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8
     GT/s, causing timeouts in L1 (Richard Zhu)

   - Wait for i.MX95 PLL lock before enabling controller (Richard Zhu)

   - Save/restore i.MX95 LUT for suspend/resume (Richard Zhu)

  Mobiveil PCIe controller driver:

   - Return bool (not int) for link-up check in
     mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans
     Zhang)

  NVIDIA Tegra194 PCIe controller driver:

   - Create debugfs directory for 'aspm_state_cnt' only when
     CONFIG_PCIEASPM is enabled, since there are no other entries (Hans
     Zhang)

  Qualcomm PCIe controller driver:

   - Add OF support for parsing DT 'eq-presets-<N>gts' property for lane
     equalization presets (Krishna Chaitanya Chundru)

   - Read Maximum Link Width from the Link Capabilities register if DT
     lacks 'num-lanes' property (Krishna Chaitanya Chundru)

   - Add Physical Layer 64 GT/s Capability ID and register offsets for
     8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya
     Chundru)

   - Add generic dwc support for configuring lane equalization presets
     (Krishna Chaitanya Chundru)

   - Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar)

  Renesas R-Car PCIe controller driver:

   - Describe endpoint BAR 4 as being fixed size (Jerome Brunet)

   - Document how to obtain R-Car V4H (r8a779g0) controller firmware
     (Yoshihiro Shimoda)

  Rockchip PCIe controller driver:

   - Reorder rockchip_pci_core_rsts because
     reset_control_bulk_deassert() deasserts in reverse order, to fix a
     link training regression (Jensen Huang)

   - Mark RK3399 as being capable of raising INTx interrupts (Niklas
     Cassel)

  Rockchip DesignWare PCIe controller driver:

   - Check only PCIE_LINKUP, not LTSSM status, to determine whether the
     link is up (Shawn Lin)

   - Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s
     for Root Complex and Endpoint modes (Shawn Lin)

   - Hide the broken ATS Capability in rockchip_pcie_ep_init() instead
     of rockchip_pcie_ep_pre_init() so it stays hidden after PERST#
     resets non-sticky registers (Shawn Lin)

   - Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit()
     (Diederik de Haas)

  Synopsys DesignWare PCIe controller driver:

   - Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training
     more robust; this will not affect the intended link width if all
     lanes are functional (Wenbin Yao)

   - Return bool (not int) for link-up check in dw_pcie_ops.link_up()
     and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay,
     keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx,
     tegra194, uniphier, visconti (Hans Zhang)

   - Add debugfs support for exposing DWC device-specific PTM context
     (Manivannan Sadhasivam)

  TI J721E PCIe driver:

   - Make j721e buildable as a loadable and removable module (Siddharth
     Vadapalli)

   - Fix j721e host/endpoint dependencies that result in link failures
     in some configs (Arnd Bergmann)

  Device tree bindings:

   - Add qcom DT binding for 'global' interrupt (PCIe controller and
     link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p,
     sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan
     Sadhasivam)

   - Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074,
     ipq8074-gen3, ipq6018 (Manivannan Sadhasivam)

   - Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang)

   - Correct indentation and style of examples in brcm,stb-pcie,
     cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie,
     microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm
     (Krzysztof Kozlowski)

   - Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and
     armada8k from text to schema DT bindings (Rob Herring)

   - Remove obsolete .txt DT bindings for content that has been moved to
     schemas (Rob Herring)

   - Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074
     and IPQ9574 (Varadarajan Narayanan)

   - Convert v3,v360epc-pci from text to DT schema binding (Rob Herring)

   - Change microchip,pcie-host DT binding to be 'dma-noncoherent' since
     PolarFire may be configured that way (Conor Dooley)

  Miscellaneous:

   - Drop 'pci' suffix from intel_mid_pci.c filename to match similar
     files (Andy Shevchenko)

   - All platforms with PCI have an MMU, so add PCI Kconfig dependency
     on MMU to simplify build testing and avoid inadvertent build
     regressions (Arnd Bergmann)

   - Update Krzysztof Wilczyński's email address in MAINTAINERS
     (Krzysztof Wilczyński)

   - Update Manivannan Sadhasivam's email address in MAINTAINERS
     (Manivannan Sadhasivam)"

* tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits)
  MAINTAINERS: Update Manivannan Sadhasivam email address
  PCI: j721e: Fix host/endpoint dependencies
  PCI: j721e: Add support to build as a loadable module
  PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup
  PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup
  PCI: cadence: Add support to build pcie-cadence library as a kernel module
  MAINTAINERS: Update Krzysztof Wilczyński email address
  PCI: Remove unnecessary linesplit in __pci_setup_bridge()
  PCI: WARN (not BUG()) when we fail to assign optional resources
  PCI: Remove unused pci_printk()
  PCI: qcom: Replace PERST# sleep time with proper macro
  PCI: dw-rockchip: Replace PERST# sleep time with proper macro
  PCI: host-common: Convert to library for host controller drivers
  PCI/ERR: Remove misleading TODO regarding kernel panic
  PCI: cadence: Remove duplicate message code definitions
  PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding
  PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding
  PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding
  PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding
  PCI: cadence-ep: Correct PBA offset in .set_msix() callback
  ...
2025-06-04 11:26:17 -07:00
Bjorn Helgaas 05cf00aa05 Merge branch 'pci/controller/qcom'
- Add OF support for parsing DT 'eq-presets-<N>gts' property for lane
  equalization presets (Krishna Chaitanya Chundru)

- Read Maximum Link Width from the Link Capabilities register if DT lacks
  'num-lanes' property (Krishna Chaitanya Chundru)

- Add Physical Layer 64 GT/s Capability ID and register offsets for 8, 32,
  and 64 GT/s lane equalization registers (Krishna Chaitanya Chundru)

- Add generic dwc support for configuring lane equalization presets
  (Krishna Chaitanya Chundru)

- Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar)

* pci/controller/qcom:
  PCI: qcom: Add support for IPQ5018
  dt-bindings: PCI: qcom: Add IPQ5018 SoC
  PCI: dwc: Add support for configuring lane equalization presets
  PCI: Add lane equalization register offsets
  PCI: dwc: Update pci->num_lanes to maximum supported link width
  PCI: of: Add of_pci_get_equalization_presets() API
2025-06-04 10:50:42 -05:00
Bjorn Helgaas f377d9cb25 Merge branch 'pci/pm'
- Add pm_runtime_put() cleanup helper for use with __free() to
  automatically drop the device usage count when a pointer goes out of
  scope (Alex Williamson)

- Increment PM usage counter when probing reset methods so we don't try to
  read config space of a powered-off device (Alex Williamson)

- Set all devices to D0 during enumeration to ensure ACPI opregion is
  connected via _REG (Mario Limonciello)

* pci/pm:
  PCI: Explicitly put devices into D0 when initializing
  PCI: Increment PM usage counter when probing reset methods
  PM: runtime: Define pm_runtime_put cleanup helper
2025-06-04 10:50:01 -05:00
Bjorn Helgaas bb5c909e6a Merge branch 'pci/hotplug'
- Ignore Presence Detect Changed caused by DPC.  pciehp already ignores
  Link Down/Up events caused by DPC, but on slots using in-band presence
  detect, DPC causes a spurious Presence Detect Changed event (Lukas
  Wunner)

- Ignore Link Down/Up caused by Secondary Bus Reset.  On hotplug ports
  using in-band presence detect, the reset causes a Presence Detect Changed
  event, which mistakenly caused teardown and re-enumeration of the device.
  Drivers may need to annotate code that resets their device (Lukas Wunner)

* pci/hotplug:
  PCI: hotplug: Drop superfluous #include directives
  PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
  PCI: pciehp: Ignore Presence Detect Changed caused by DPC

# Conflicts:
#	drivers/pci/pci.h
2025-06-04 10:49:59 -05:00
Bjorn Helgaas 68d0370e4e Merge branch 'pci/enumeration'
- Remove pci_fixup_cardbus(), which has no users left (Heiner Kallweit)

- Print the actual delay time in pci_bridge_wait_for_secondary_bus()
  instead of assuming it was 1000ms (Wilfred Mallawa)

- Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI
  devices', which broke resume from system sleep on AMD platforms and has
  been fixed by other commits (Lukas Wunner)

- Restrict visibility of pci_dev.match_driver since it's no longer used
  outside the PCI core (Lukas Wunner)

* pci/enumeration:
  PCI: Limit visibility of match_driver flag to PCI core
  Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices"
  PCI: Print the actual delay time in pci_bridge_wait_for_secondary_bus()
  PCI: Use PCI_STD_NUM_BARS instead of 6
  PCI: Remove pci_fixup_cardbus()

# Conflicts:
#	drivers/pci/pci.h
2025-06-04 10:49:56 -05:00
Bjorn Helgaas f56278a46d Merge branch 'pci/devres'
- Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and
  unnecessary (Philipp Stanner)

- Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and
  related flags since all uses have been removed (Philipp Stanner)

- Rework devres 'request' functions so they are no longer 'hybrid', i.e.,
  their behavior no longer depends on whether pcim_enable_device or
  pci_enable_device() was used, and remove related code (Philipp Stanner)

* pci/devres:
  PCI: Remove function pcim_intx() prototype from pci.h
  PCI: Remove hybrid-devres usage warnings from kernel-doc
  PCI: Remove redundant set of request functions
  PCI: Remove exclusive requests flags from _pcim_request_region()
  PCI: Remove pcim_request_region_exclusive()
  Documentation/driver-api: Update pcim_enable_device()
  PCI: Remove hybrid devres nature from request functions
  PCI: Remove pcim_iounmap_regions()
  mtip32xx: Remove unnecessary pcim_iounmap_regions() calls
2025-06-04 10:49:50 -05:00
Bjorn Helgaas 1acf6a5e79 Merge branch 'pci/bwctrl'
- Simplify link bandwidth controller by replacing the count of Link
  Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag
  (Ilpo Järvinen)

- Update the Link Speed after retraining, since the Link Speed may have
  changed (Ilpo Järvinen)

* pci/bwctrl:
  PCI: Update Link Speed after retraining
  PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
2025-06-04 10:49:49 -05:00
Jon Pan-Doh b4fe7398de PCI/AER: Add sysfs attributes for log ratelimits
Allow userspace to read/write log ratelimits per device (including
enable/disable). Create aer/ sysfs directory to store them and any
future AER configs.

The new sysfs files are:

  /sys/bus/pci/devices/*/aer/correctable_ratelimit_burst
  /sys/bus/pci/devices/*/aer/correctable_ratelimit_interval_ms
  /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_burst
  /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_interval_ms

The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so
if we try to emit more than 10 messages in a 5 second period, some are
suppressed.

Update AER sysfs ABI filename to reflect the broader scope of AER sysfs
attributes (e.g. stats and ratelimits).

  Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats ->
    sysfs-bus-pci-devices-aer

Tested using aer-inject[1]. Configured correctable log ratelimit to 5.
Sent 6 AER errors. Observed 5 errors logged while AER stats
(cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 6.

Disabled ratelimiting and sent 6 more AER errors. Observed all 6 errors
logged and accounted in AER stats (12 total errors).

[1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git

[bhelgaas: note fatal errors are not ratelimited, "aer_report" ->
"aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms]

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-21-helgaas@kernel.org
2025-05-23 11:11:45 -05:00
Jon Pan-Doh a57f2bfb4a PCI/AER: Ratelimit correctable and non-fatal error logging
Spammy devices can flood kernel logs with AER errors and slow/stall
execution. Add per-device ratelimits for AER correctable and non-fatal
uncorrectable errors that use the kernel defaults (10 per 5s).  Logging of
fatal errors is not ratelimited.

There are two AER logging entry points:

  - aer_print_error() is used by DPC and native AER

  - pci_print_aer() is used by GHES and CXL

The native AER aer_print_error() case includes a loop that may log details
from multiple devices, which are ratelimited individually.  If we log
details for any device, we also log the Error Source ID from the Root Port
or RCEC.

If no such device details are found, we still log the Error Source from the
ERR_* Message, ratelimited by the Root Port or RCEC that received it.

The DPC aer_print_error() case is not ratelimited, since this only happens
for fatal errors.

The CXL pci_print_aer() case is ratelimited by the Error Source device.

The GHES pci_print_aer() case is via aer_recover_work_func(), which
searches for the Error Source device.  If the device is not found, there's
no per-device ratelimit, so we use a system-wide ratelimit that covers all
error types (correctable, non-fatal, and fatal).

Sargun at Meta reported internally that a flood of AER errors causes RCU
CPU stall warnings and CSD-lock warnings.

Tested using aer-inject[1]. Sent 11 AER errors. Observed 10 errors logged
while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) show
true count of 11.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git

[bhelgaas: commit log, factor out trace_aer_event() and aer_print_rp_info()
changes to previous patches, enable Error Source logging if any downstream
detail will be printed, don't ratelimit fatal errors, "aer_report" ->
"aer_info", "cor_log_ratelimit" -> "correctable_ratelimit",
"uncor_log_ratelimit" -> "nonfatal_ratelimit"]

Reported-by: Sargun Dhillon <sargun@meta.com>
Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-19-helgaas@kernel.org
2025-05-23 11:11:45 -05:00
Bjorn Helgaas 94bc15c348 PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index
Previously aer_get_device_error_info() and aer_print_error() took a pointer
to struct aer_err_info and a pointer to a pci_dev.  Typically the pci_dev
was one of the elements of the aer_err_info.dev[] array (DPC was an
exception, where the dev[] array was unused).

Convert aer_get_device_error_info() and aer_print_error() to take an index
into the aer_err_info.dev[] array instead.  A future patch will add
per-device ratelimit information, so the index makes it convenient to find
the ratelimit associated with the device.

To accommodate DPC, set info->dev[0] to the DPC port before using these
interfaces.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-17-helgaas@kernel.org
2025-05-23 11:11:36 -05:00
Bjorn Helgaas 82013ff394 PCI/ERR: Add printk level to pcie_print_tlp_log()
aer_print_error() produces output at a printk level (KERN_ERR/KERN_WARNING/
etc) that depends on the kind of error, and it calls pcie_print_tlp_log(),
which previously always produced output at KERN_ERR.

Add a "level" parameter so aer_print_error() can control the level of the
pcie_print_tlp_log() output to match.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-14-helgaas@kernel.org
2025-05-23 11:02:03 -05:00
Karolina Stolarek c8f6791e33 PCI/AER: Check log level once and remember it
When reporting an AER error, we check its type multiple times to determine
the log level for each message. Do this check only in the top-level
functions (aer_isr_one_error(), pci_print_aer()) and save the level in
struct aer_err_info.

[bhelgaas: save log level in struct aer_err_info instead of passing it
as a parameter]

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-13-helgaas@kernel.org
2025-05-23 11:01:52 -05:00
Philipp Stanner dfc970ad61
PCI: Remove function pcim_intx() prototype from pci.h
The subsystem-internal header pci.h still contains the function
prototype of pcim_intx(), which has since been made public in the global
header.

Remove the redundant function prototype.

Signed-off-by: Philipp Stanner <phasta@kernel.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20250522084626.150148-2-phasta@kernel.org
2025-05-22 10:54:04 +00:00
Philipp Stanner 8e9987485d
PCI: Remove pcim_request_region_exclusive()
pcim_request_region_exclusive() was only needed for redirecting the
relatively exotic exclusive request functions in pci.c in case of them
operating in managed mode.

The managed nature has been removed from those functions and no one else
uses pcim_request_region_exclusive().

Remove pcim_request_region_exclusive().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20250519112959.25487-5-phasta@kernel.org
2025-05-19 12:35:15 +00:00
Philipp Stanner 51f6aec99c
PCI: Remove hybrid devres nature from request functions
All functions based on __pci_request_region() and its release counter
part support "hybrid mode", where the functions become managed if the
PCI device was enabled with pcim_enable_device().

Removing this undesirable feature requires to remove all users who
activated their device with that function and use one of the affected
request functions.

These users were:
	ASoC
	alsa
	cardreader
	cirrus
	i2c
	mmc
	mtd
	mtd
	mxser
	net
	spi
	vdpa
	vmwgfx

all of which have been ported to always-managed pcim_ functions by now.

The hybrid nature can, thus, be removed from the aforementioned PCI
functions.

Remove all function guards and documentation in pci.c related to the
hybrid redirection. Adjust the visibility of pcim_release_region().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20250519112959.25487-3-phasta@kernel.org
2025-05-19 12:34:58 +00:00
Lukas Wunner ce45dc4bb2
PCI: Limit visibility of match_driver flag to PCI core
Since commit 58d9a38f6f ("PCI: Skip attaching driver in device_add()"),
PCI enumeration is split into two steps:  In the first step, all devices
are published in sysfs with device_add().  In the second step, drivers are
bound to the devices with device_attach().  To delay driver binding until
the second step, a "bool match_driver" in struct pci_dev is used.

Instead of a bool, use a bit in the "unsigned long priv_flags" to shrink
struct pci_dev a little and prevent use of the bool outside the PCI core
(as has happened with commit cbbc00be2c ("iommu/amd: Prevent binding
other PCI drivers to IOMMU PCI devices")).

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://patch.msgid.link/d22a9e5b81d6bd8dd1837607d6156679b3b1199c.1745572340.git.lukas@wunner.de
2025-05-15 13:40:52 +00:00
Ilpo Järvinen 2389d8dc38
PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
PCIe BW controller counted LBMS assertions for the purposes of the Target
Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the
LBMS count through sysfs to allow better diagnosing link related issues.
Lukas Wunner suggested, however, that adding a trace event would be better
for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user
of the lbms_count.

The logic in pcie_failed_link_retrain() does not require keeping count of
LBMS assertions, so replace lbms_count with a simple flag in pci_dev's
priv_flags.  The reduced complexity allows removing pcie_bwctrl_lbms_rwsem.

Since pcie_failed_link_retrain() runs before bwctrl is probed during boot,
the LBMS in Link Status register still has to be checked by the quirk.

The priv_flags numbering is not continuous because hotplug code added a few
flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in
different branches).

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: squashed a fix to resolve build failures from
https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-05-15 08:38:40 +00:00
Mario Limonciello 4d4c10f763 PCI: Explicitly put devices into D0 when initializing
AMD BIOS team has root caused an issue that NVMe storage failed to come
back from suspend to a lack of a call to _REG when NVMe device was probed.

112a7f9c8e ("PCI/ACPI: Call _REG when transitioning D-states") added
support for calling _REG when transitioning D-states, but this only works
if the device actually "transitions" D-states.

967577b062 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices")
added support for runtime PM on PCI devices, but never actually
'explicitly' sets the device to D0.

To make sure that devices are in D0 and that platform methods such as
_REG are called, explicitly set all devices into D0 during initialization.

Fixes: 967577b062 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Denis Benato <benato.denis96@gmail.com>
Tested-By: Yijun Shen <Yijun_Shen@Dell.com>
Tested-By: David Perry <david.perry@amd.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://patch.msgid.link/20250424043232.1848107-1-superm1@kernel.org
2025-05-05 18:04:44 -05:00
Krishna Chaitanya Chundru 57a4591df7 PCI: of: Add of_pci_get_equalization_presets() API
PCIe equalization presets are predefined settings used to optimize
signal integrity by compensating for signal loss and distortion in
high-speed data transmission.

As per PCIe spec 6.0.1 revision section 8.3.3.3 & 4.2.4 for data rates
of 8.0 GT/s, 16.0 GT/s, 32.0 GT/s, and 64.0 GT/s, there is a way to
configure lane equalization presets for each lane to enhance the PCIe
link reliability. Each preset value represents a different combination
of pre-shoot and de-emphasis values. For each data rate, different
registers are defined: for 8.0 GT/s, registers are defined in section
7.7.3.4; for 16.0 GT/s, in section 7.7.5.9, etc. The 8.0 GT/s rate has
an extra receiver preset hint, requiring 16 bits per lane, while the
remaining data rates use 8 bits per lane.

Based on the number of lanes and the supported data rate,
of_pci_get_equalization_presets() reads the device tree property and
stores in the presets structure.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://patch.msgid.link/20250328-preset_v6-v9-2-22cfa0490518@oss.qualcomm.com
2025-04-19 19:42:33 +05:30
Lukas Wunner 2af781a9ed PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
When a Secondary Bus Reset is issued at a hotplug port, it causes a Data
Link Layer State Changed event as a side effect.  On hotplug ports using
in-band presence detect, it additionally causes a Presence Detect Changed
event.

These spurious events should not result in teardown and re-enumeration of
the device in the slot.  Hence commit 2e35afaefe ("PCI: pciehp: Add
reset_slot() method") masked the Presence Detect Changed Enable bit in the
Slot Control register during a Secondary Bus Reset.  Commit 06a8d89af5
("PCI: pciehp: Disable link notification across slot reset") additionally
masked the Data Link Layer State Changed Enable bit.

However masking those bits only disables interrupt generation (PCIe r6.2
sec 6.7.3.1).  The events are still visible in the Slot Status register
and picked up by the IRQ handler if it runs during a Secondary Bus Reset.
This can happen if the interrupt is shared or if an unmasked hotplug event
occurs, e.g. Attention Button Pressed or Power Fault Detected.

The likelihood of this happening used to be small, so it wasn't much of a
problem in practice.  That has changed with the recent introduction of
bandwidth control in v6.13-rc1 with commit 665745f274 ("PCI/bwctrl:
Re-add BW notification portdrv as PCIe BW controller"):

Bandwidth control shares the interrupt with PCIe hotplug.  A Secondary Bus
Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler
runs, picks up the masked events and tears down the device in the slot.

As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo
root-caused to the incorrect handling of masked hotplug events.

Clearly, a more reliable way is needed to ignore spurious hotplug events.

For Downstream Port Containment, a new ignore mechanism was introduced by
commit a97396c6eb ("PCI: pciehp: Ignore Link Down/Up caused by DPC").
It has been working reliably for the past four years.

Adapt it for Secondary Bus Resets.

Introduce two helpers to annotate code sections which cause spurious link
changes:  pci_hp_ignore_link_change() and pci_hp_unignore_link_change()
Use those helpers in lieu of masking interrupts in the Slot Control
register.

Introduce a helper to check whether such a code section is executing
concurrently and if so, await it:  pci_hp_spurious_link_change()
Invoke the helper in the hotplug IRQ thread pciehp_ist().  Re-use the
IRQ thread's existing code which ignores DPC-induced link changes unless
the link is unexpectedly down after reset recovery or the device was
replaced during the bus reset.

That code block in pciehp_ist() was previously only executed if a Data
Link Layer State Changed event has occurred.  Additionally execute it for
Presence Detect Changed events.  That's necessary for compatibility with
PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist
before PCIe r1.1.  DPC was added with PCIe r3.1 and thus DPC-capable
hotplug ports always support Data Link Layer State Changed events.
But the same cannot be assumed for Secondary Bus Reset, which already
existed in PCIe r1.0.

Secondary Bus Reset is only one of many causes of spurious link changes.
Others include runtime suspend to D3cold, firmware updates or FPGA
reconfiguration.  The new pci_hp_{,un}ignore_link_change() helpers may be
used by all kinds of drivers to annotate such code sections, hence their
declarations are publicly visible in <linux/pci.h>.  A case in point is
the Mellanox Ethernet driver which disables a firmware reset feature if
the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612
("net/mlx5: Nack sync reset request when HotPlug is enabled").  Going
forward, PCIe hotplug will be able to cope gracefully with all such use
cases once the code sections are properly annotated.

The new helpers internally use two bits in struct pci_dev's priv_flags as
well as a wait_queue.  This mirrors what was done for DPC by commit
a97396c6eb ("PCI: pciehp: Ignore Link Down/Up caused by DPC").  That may
be insufficient if spurious link changes are caused by multiple sources
simultaneously.  An example might be a Secondary Bus Reset issued by AER
during FPGA reconfiguration.  If this turns out to happen in real life,
support for it can easily be added by replacing the PCI_LINK_CHANGING flag
with an atomic_t counter incremented by pci_hp_ignore_link_change() and
decremented by pci_hp_unignore_link_change().  Instead of awaiting a zero
PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would
then simply await a zero counter.

Fixes: 665745f274 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Reported-by: Joel Mathew Thomas <proxy0@tutamail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/d04deaf49d634a2edf42bf3c06ed81b4ca54d17b.1744298239.git.lukas@wunner.de
2025-04-15 16:03:44 -05:00
Thomas Gleixner d5124a9957 PCI/MSI: Provide a sane mechanism for TPH
The PCI/TPH driver fiddles with the MSI-X control word of an active
interrupt completely unserialized against concurrent operations issued
from the interrupt core. It also brings the PCI/MSI-X internal cached
control word out of sync.

Provide a function, which has the required serialization and keeps the
control word cache in sync.

Unfortunately this requires to look up and lock the interrupt descriptor,
which should be only done in the interrupt core code. But confining this
particular oddity in the PCI/MSI core is the lesser of all evil. A
interrupt core implementation would require a larger pile of infrastructure
and indirections for dubious value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.683663807@linutronix.de
2025-04-09 20:47:30 +02:00
Bjorn Helgaas a1aed6b34f Merge branch 'pci/devtree-create'
- Add device_add_of_node() to set dev->of_node and dev->fwnode only if they
  haven't been set already (Herve Codina)

- Allow of_pci_set_address() to set the DT address property for root bus
  nodes, where there is no PCI bridge to supply the PCI bus/device/function
  part of the property (Herve Codina)

- Create DT nodes for PCI host bridges to enable loading device tree
  overlays to create platform devices for PCI devices that have several
  features that require multiple drivers (Herve Codina)

* pci/devtree-create:
  PCI: of: Create device tree PCI host bridge node
  PCI: of_property: Constify parameter in of_pci_get_addr_flags()
  PCI: of_property: Add support for NULL pdev in of_pci_set_address()
  PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device
  driver core: Introduce device_{add,remove}_of_node()
2025-03-27 13:14:45 -05:00
Bjorn Helgaas 38d42a6612 Merge branch 'pci/resource'
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo
  Järvinen)

- Fix typo that repeatedly distributed resources to a bridge instead of
  iterating over subordinate bridges, which resulted in too little space to
  assign some BARs (Kai-Heng Feng)

- Relax bridge window tail sizing for optional resources, e.g., IOV BARs,
  to avoid failures when removing and re-adding devices (Ilpo Järvinen)

- Fix a double counting error for I/O resources, as we previously did for
  memory resources (Ilpo Järvinen)

- Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen)

- Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen)

- Add pci_resource_num() to look up the BAR number from the resource
  pointer (Ilpo Järvinen)

- Add restore_dev_resource() to simplify code that resources saved device
  resources (Ilpo Järvinen)

- Allow drivers to enable devices even if we haven't assigned optional IOV
  resources to them (Ilpo Järvinen)

- Improve debug output during resource reallocation (Ilpo Järvinen)

- Rework handling of optional resources (IOV BARs, ROMs) to reduce failures
  if we can't allocate them (Ilpo Järvinen)

- Move declarations of pci_rescan_bus_bridge_resize(),
  pci_reassign_bridge_resources(), and CardBus-related sizes from
  include/linux/pci.h to drivers/pci/pci.h since they're not used outside
  the PCI core (Ilpo Järvinen)

- Make pci_setup_bridge() static (Ilpo Järvinen)

- Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory)

- Fix s390 mmio_read/write syscalls, which didn't cause page faults in some
  cases, which broke vfio-pci lazy mapping on first access (Niklas
  Schnelle)

- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was
  disabled only for s390 (Niklas Schnelle)

- Support mmap of PCI resources on s390 except for ISM devices (Niklas
  Schnelle)

* pci/resource:
  s390/pci: Support mmap() of PCI resources except for ISM devices
  s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
  s390/pci: Fix s390_mmio_read/write syscall page fault handling
  PCI: Fix NULL dereference in SR-IOV VF creation error path
  PCI: Move cardbus IO size declarations into pci/pci.h
  PCI: Make pci_setup_bridge() static
  PCI: Move resource reassignment func declarations into pci/pci.h
  PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
  PCI: Fix BAR resizing when VF BARs are assigned
  PCI: Do not claim to release resource falsely
  PCI: Increase Resizable BAR support from 512 GB to 128 TB
  PCI: Rework optional resource handling
  PCI: Perform reset_resource() and build fail list in sync
  PCI: Use res->parent to check if resource is assigned
  PCI: Add debug print when releasing resources before retry
  PCI: Indicate optional resource assignment failures
  PCI: Always have realloc_head in __assign_resources_sorted()
  PCI: Extend enable to check for any optional resource
  PCI: Add restore_dev_resource()
  PCI: Remove incorrect comment from pci_reassign_resource()
  PCI: Consolidate assignment loop next round preparation
  PCI: Rename retval to ret
  PCI: Use while loop and break instead of gotos
  PCI: Refactor pdev_sort_resources() & __dev_sort_resources()
  PCI: Converge return paths in __assign_resources_sorted()
  PCI: Add dev & res local variables to resource assignment funcs
  PCI: Add pci_resource_num() helper
  PCI: Check resource_size() separately
  PCI: Add pci_resource_is_iov() to identify IOV resources
  PCI: Use resource_set_{range,size}() helpers
  PCI: Use SZ_* instead of literals in setup-bus.c
  PCI: Fix old_size lower bound in calculate_iosize() too
  PCI: Allow relaxed bridge window tail sizing for optional resources
  PCI: Simplify size1 assignment logic
  PCI: Use min_align, not unrelated add_align, for size0
  PCI: Remove add_align overwrite unrelated to size0
  PCI: Use downstream bridges for distributing resources
  PCI: Cleanup dev->resource + resno to use pci_resource_n()
2025-03-27 13:14:45 -05:00
Bjorn Helgaas e9e224dadd Merge branch 'pci/enumeration'
- Enable Configuration RRS SV early instead of during child bus scanning
  (Bjorn Helgaas)

- Cache offset of Resizable BAR capability to avoid redundant searches for
  it (Bjorn Helgaas)

- Fix reference leaks in pci_register_host_bridge() and
  pci_alloc_child_bus() (Ma Ke)

- Drop put_device() in pci_register_host_bridge() left over from converting
  device_register() to device_add() (Dan Carpenter)

* pci/enumeration:
  PCI: Remove stray put_device() in pci_register_host_bridge()
  PCI: Fix reference leak in pci_alloc_child_bus()
  PCI: Fix reference leak in pci_register_host_bridge()
  PCI: Cache offset of Resizable BAR capability
  PCI: Enable Configuration RRS SV early
2025-03-27 13:14:43 -05:00
Bjorn Helgaas 651aa9052c Merge branch 'pci/doe'
- Rename DOE 'protocol' to 'feature' to follow spec terminology (Alistair
  Francis)

- Expose supported DOE features via sysfs (Alistair Francis)

- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
  Francis)

* pci/doe:
  PCI/DOE: Allow enabling DOE without CXL
  PCI/DOE: Expose DOE features via sysfs
  PCI/DOE: Rename Discovery Response Data Object Contents to type
  PCI/DOE: Rename DOE protocol to feature
2025-03-27 13:14:43 -05:00