mirror of https://github.com/torvalds/linux.git
[v9] vfio/pci: Allow MMIO regions to be exported through dma-buf
https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmkf53sRHGFsZXhAc2hh emJvdC5vcmcACgkQI5ubbjuwiyJv5w//TdfL5p6yz8O9CxJCQrm0W6raqDx+LE7u MNyCktSdokkPKSz/ms10vgl9CGqpVzDHNlgmVBGkAFQaRYQKUgMryp1IQ0jlspz0 Ee1zy6AtlMemAyL1bybnk6yvc2nh/xa3LHa4FJ6sgL3KKnt9KjpY4sGGV/KlNfJV lYs+20+NyNU1rgyPtkHcrCrcYTMkDumvHDsn51O8Zx12b++qkZuf3b+mcWNNlNam DJl58O9tio0ol5a4rf63BxgPROgEVqs4G4rSRelJqr6g8IIFttihplhyZ83af9sC jtV0NEqsWt0nrKZtg1N9IIBgfQho5eamB99J0cPU0dhZSKXzBwIalBUb6zKZeVVY QEN6ZLoynpKRpZ1bhe3EhduNE1LOm2+wOJ7s93gSAtvsSXHHGDf1cXHL5CzKMLwG 76fkO6b0m60mwXyjTgEVjqE9GTcgZLc9SKI9GN303v53W5Y1BOtW9B4QyX9eEPui qqtSbqXMsYx95lJASkjwE+u2b33mFrks8UC7Xg1nZkJWLy6hDPbQ6AIcM0mHd1EE 2UJNJAzeUtzr0Vd3B92RkT0BPY299XlUUp50In42/g5y6IMSO7R/jZXrQyEXEDez dsqedYJE0vzO07pfkRYh1TJtwIl0WfdNnqClOKSlolfSn0vzI5qz9RKkpHM5T8Ou FIIb7Bf3hvs= =li88 -----END PGP SIGNATURE----- Merge tag 'vfio-v6.19-dma-buf-v9+' into v6.19/vfio/next [v9] vfio/pci: Allow MMIO regions to be exported through dma-buf https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
This commit is contained in:
commit
fa804aa4ac
|
|
@ -9,22 +9,48 @@ between two devices on the bus. This type of transaction is henceforth
|
||||||
called Peer-to-Peer (or P2P). However, there are a number of issues that
|
called Peer-to-Peer (or P2P). However, there are a number of issues that
|
||||||
make P2P transactions tricky to do in a perfectly safe way.
|
make P2P transactions tricky to do in a perfectly safe way.
|
||||||
|
|
||||||
One of the biggest issues is that PCI doesn't require forwarding
|
For PCIe the routing of Transaction Layer Packets (TLPs) is well-defined up
|
||||||
transactions between hierarchy domains, and in PCIe, each Root Port
|
until they reach a host bridge or root port. If the path includes PCIe switches
|
||||||
defines a separate hierarchy domain. To make things worse, there is no
|
then based on the ACS settings the transaction can route entirely within
|
||||||
simple way to determine if a given Root Complex supports this or not.
|
the PCIe hierarchy and never reach the root port. The kernel will evaluate
|
||||||
(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
|
the PCIe topology and always permit P2P in these well-defined cases.
|
||||||
only supports doing P2P when the endpoints involved are all behind the
|
|
||||||
same PCI bridge, as such devices are all in the same PCI hierarchy
|
|
||||||
domain, and the spec guarantees that all transactions within the
|
|
||||||
hierarchy will be routable, but it does not require routing
|
|
||||||
between hierarchies.
|
|
||||||
|
|
||||||
The second issue is that to make use of existing interfaces in Linux,
|
However, if the P2P transaction reaches the host bridge then it might have to
|
||||||
memory that is used for P2P transactions needs to be backed by struct
|
hairpin back out the same root port, be routed inside the CPU SOC to another
|
||||||
pages. However, PCI BARs are not typically cache coherent so there are
|
PCIe root port, or routed internally to the SOC.
|
||||||
a few corner case gotchas with these pages so developers need to
|
|
||||||
be careful about what they do with them.
|
The PCIe specification doesn't define the forwarding of transactions between
|
||||||
|
hierarchy domains and kernel defaults to blocking such routing. There is an
|
||||||
|
allow list to allow detecting known-good HW, in which case P2P between any
|
||||||
|
two PCIe devices will be permitted.
|
||||||
|
|
||||||
|
Since P2P inherently is doing transactions between two devices it requires two
|
||||||
|
drivers to be co-operating inside the kernel. The providing driver has to convey
|
||||||
|
its MMIO to the consuming driver. To meet the driver model lifecycle rules the
|
||||||
|
MMIO must have all DMA mapping removed, all CPU accesses prevented, all page
|
||||||
|
table mappings undone before the providing driver completes remove().
|
||||||
|
|
||||||
|
This requires the providing and consuming driver to actively work together to
|
||||||
|
guarantee that the consuming driver has stopped using the MMIO during a removal
|
||||||
|
cycle. This is done by either a synchronous invalidation shutdown or waiting
|
||||||
|
for all usage refcounts to reach zero.
|
||||||
|
|
||||||
|
At the lowest level the P2P subsystem offers a naked struct p2p_provider that
|
||||||
|
delegates lifecycle management to the providing driver. It is expected that
|
||||||
|
drivers using this option will wrap their MMIO memory in DMABUF and use DMABUF
|
||||||
|
to provide an invalidation shutdown. These MMIO addresess have no struct page, and
|
||||||
|
if used with mmap() must create special PTEs. As such there are very few
|
||||||
|
kernel uAPIs that can accept pointers to them; in particular they cannot be used
|
||||||
|
with read()/write(), including O_DIRECT.
|
||||||
|
|
||||||
|
Building on this, the subsystem offers a layer to wrap the MMIO in a ZONE_DEVICE
|
||||||
|
pgmap of MEMORY_DEVICE_PCI_P2PDMA to create struct pages. The lifecycle of
|
||||||
|
pgmap ensures that when the pgmap is destroyed all other drivers have stopped
|
||||||
|
using the MMIO. This option works with O_DIRECT flows, in some cases, if the
|
||||||
|
underlying subsystem supports handling MEMORY_DEVICE_PCI_P2PDMA through
|
||||||
|
FOLL_PCI_P2PDMA. The use of FOLL_LONGTERM is prevented. As this relies on pgmap
|
||||||
|
it also relies on architecture support along with alignment and minimum size
|
||||||
|
limitations.
|
||||||
|
|
||||||
|
|
||||||
Driver Writer's Guide
|
Driver Writer's Guide
|
||||||
|
|
@ -114,14 +140,39 @@ allocating scatter-gather lists with P2P memory.
|
||||||
Struct Page Caveats
|
Struct Page Caveats
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
Driver writers should be very careful about not passing these special
|
While the MEMORY_DEVICE_PCI_P2PDMA pages can be installed in VMAs,
|
||||||
struct pages to code that isn't prepared for it. At this time, the kernel
|
pin_user_pages() and related will not return them unless FOLL_PCI_P2PDMA is set.
|
||||||
interfaces do not have any checks for ensuring this. This obviously
|
|
||||||
precludes passing these pages to userspace.
|
|
||||||
|
|
||||||
P2P memory is also technically IO memory but should never have any side
|
The MEMORY_DEVICE_PCI_P2PDMA pages require care to support in the kernel. The
|
||||||
effects behind it. Thus, the order of loads and stores should not be important
|
KVA is still MMIO and must still be accessed through the normal
|
||||||
and ioreadX(), iowriteX() and friends should not be necessary.
|
readX()/writeX()/etc helpers. Direct CPU access (e.g. memcpy) is forbidden, just
|
||||||
|
like any other MMIO mapping. While this will actually work on some
|
||||||
|
architectures, others will experience corruption or just crash in the kernel.
|
||||||
|
Supporting FOLL_PCI_P2PDMA in a subsystem requires scrubbing it to ensure no CPU
|
||||||
|
access happens.
|
||||||
|
|
||||||
|
|
||||||
|
Usage With DMABUF
|
||||||
|
=================
|
||||||
|
|
||||||
|
DMABUF provides an alternative to the above struct page-based
|
||||||
|
client/provider/orchestrator system and should be used when struct page
|
||||||
|
doesn't exist. In this mode the exporting driver will wrap
|
||||||
|
some of its MMIO in a DMABUF and give the DMABUF FD to userspace.
|
||||||
|
|
||||||
|
Userspace can then pass the FD to an importing driver which will ask the
|
||||||
|
exporting driver to map it to the importer.
|
||||||
|
|
||||||
|
In this case the initiator and target pci_devices are known and the P2P subsystem
|
||||||
|
is used to determine the mapping type. The phys_addr_t-based DMA API is used to
|
||||||
|
establish the dma_addr_t.
|
||||||
|
|
||||||
|
Lifecycle is controlled by DMABUF move_notify(). When the exporting driver wants
|
||||||
|
to remove() it must deliver an invalidation shutdown to all DMABUF importing
|
||||||
|
drivers through move_notify() and synchronously DMA unmap all the MMIO.
|
||||||
|
|
||||||
|
No importing driver can continue to have a DMA map to the MMIO after the
|
||||||
|
exporting driver has destroyed its p2p_provider.
|
||||||
|
|
||||||
|
|
||||||
P2P DMA Support Library
|
P2P DMA Support Library
|
||||||
|
|
|
||||||
|
|
@ -85,7 +85,7 @@ static inline bool blk_can_dma_map_iova(struct request *req,
|
||||||
|
|
||||||
static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
|
static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
|
||||||
{
|
{
|
||||||
iter->addr = pci_p2pdma_bus_addr_map(&iter->p2pdma, vec->paddr);
|
iter->addr = pci_p2pdma_bus_addr_map(iter->p2pdma.mem, vec->paddr);
|
||||||
iter->len = vec->len;
|
iter->len = vec->len;
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
# SPDX-License-Identifier: GPL-2.0-only
|
# SPDX-License-Identifier: GPL-2.0-only
|
||||||
obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
|
obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
|
||||||
dma-fence-unwrap.o dma-resv.o
|
dma-fence-unwrap.o dma-resv.o dma-buf-mapping.o
|
||||||
obj-$(CONFIG_DMABUF_HEAPS) += dma-heap.o
|
obj-$(CONFIG_DMABUF_HEAPS) += dma-heap.o
|
||||||
obj-$(CONFIG_DMABUF_HEAPS) += heaps/
|
obj-$(CONFIG_DMABUF_HEAPS) += heaps/
|
||||||
obj-$(CONFIG_SYNC_FILE) += sync_file.o
|
obj-$(CONFIG_SYNC_FILE) += sync_file.o
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,248 @@
|
||||||
|
// SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
/*
|
||||||
|
* DMA BUF Mapping Helpers
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
#include <linux/dma-buf-mapping.h>
|
||||||
|
#include <linux/dma-resv.h>
|
||||||
|
|
||||||
|
static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length,
|
||||||
|
dma_addr_t addr)
|
||||||
|
{
|
||||||
|
unsigned int len, nents;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
nents = DIV_ROUND_UP(length, UINT_MAX);
|
||||||
|
for (i = 0; i < nents; i++) {
|
||||||
|
len = min_t(size_t, length, UINT_MAX);
|
||||||
|
length -= len;
|
||||||
|
/*
|
||||||
|
* DMABUF abuses scatterlist to create a scatterlist
|
||||||
|
* that does not have any CPU list, only the DMA list.
|
||||||
|
* Always set the page related values to NULL to ensure
|
||||||
|
* importers can't use it. The phys_addr based DMA API
|
||||||
|
* does not require the CPU list for mapping or unmapping.
|
||||||
|
*/
|
||||||
|
sg_set_page(sgl, NULL, 0, 0);
|
||||||
|
sg_dma_address(sgl) = addr + i * UINT_MAX;
|
||||||
|
sg_dma_len(sgl) = len;
|
||||||
|
sgl = sg_next(sgl);
|
||||||
|
}
|
||||||
|
|
||||||
|
return sgl;
|
||||||
|
}
|
||||||
|
|
||||||
|
static unsigned int calc_sg_nents(struct dma_iova_state *state,
|
||||||
|
struct dma_buf_phys_vec *phys_vec,
|
||||||
|
size_t nr_ranges, size_t size)
|
||||||
|
{
|
||||||
|
unsigned int nents = 0;
|
||||||
|
size_t i;
|
||||||
|
|
||||||
|
if (!state || !dma_use_iova(state)) {
|
||||||
|
for (i = 0; i < nr_ranges; i++)
|
||||||
|
nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);
|
||||||
|
} else {
|
||||||
|
/*
|
||||||
|
* In IOVA case, there is only one SG entry which spans
|
||||||
|
* for whole IOVA address space, but we need to make sure
|
||||||
|
* that it fits sg->length, maybe we need more.
|
||||||
|
*/
|
||||||
|
nents = DIV_ROUND_UP(size, UINT_MAX);
|
||||||
|
}
|
||||||
|
|
||||||
|
return nents;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* struct dma_buf_dma - holds DMA mapping information
|
||||||
|
* @sgt: Scatter-gather table
|
||||||
|
* @state: DMA IOVA state relevant in IOMMU-based DMA
|
||||||
|
* @size: Total size of DMA transfer
|
||||||
|
*/
|
||||||
|
struct dma_buf_dma {
|
||||||
|
struct sg_table sgt;
|
||||||
|
struct dma_iova_state *state;
|
||||||
|
size_t size;
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* dma_buf_phys_vec_to_sgt - Returns the scatterlist table of the attachment
|
||||||
|
* from arrays of physical vectors. This funciton is intended for MMIO memory
|
||||||
|
* only.
|
||||||
|
* @attach: [in] attachment whose scatterlist is to be returned
|
||||||
|
* @provider: [in] p2pdma provider
|
||||||
|
* @phys_vec: [in] array of physical vectors
|
||||||
|
* @nr_ranges: [in] number of entries in phys_vec array
|
||||||
|
* @size: [in] total size of phys_vec
|
||||||
|
* @dir: [in] direction of DMA transfer
|
||||||
|
*
|
||||||
|
* Returns sg_table containing the scatterlist to be returned; returns ERR_PTR
|
||||||
|
* on error. May return -EINTR if it is interrupted by a signal.
|
||||||
|
*
|
||||||
|
* On success, the DMA addresses and lengths in the returned scatterlist are
|
||||||
|
* PAGE_SIZE aligned.
|
||||||
|
*
|
||||||
|
* A mapping must be unmapped by using dma_buf_free_sgt().
|
||||||
|
*
|
||||||
|
* NOTE: This function is intended for exporters. If direct traffic routing is
|
||||||
|
* mandatory exporter should call routing pci_p2pdma_map_type() before calling
|
||||||
|
* this function.
|
||||||
|
*/
|
||||||
|
struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach,
|
||||||
|
struct p2pdma_provider *provider,
|
||||||
|
struct dma_buf_phys_vec *phys_vec,
|
||||||
|
size_t nr_ranges, size_t size,
|
||||||
|
enum dma_data_direction dir)
|
||||||
|
{
|
||||||
|
unsigned int nents, mapped_len = 0;
|
||||||
|
struct dma_buf_dma *dma;
|
||||||
|
struct scatterlist *sgl;
|
||||||
|
dma_addr_t addr;
|
||||||
|
size_t i;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
dma_resv_assert_held(attach->dmabuf->resv);
|
||||||
|
|
||||||
|
if (WARN_ON(!attach || !attach->dmabuf || !provider))
|
||||||
|
/* This function is supposed to work on MMIO memory only */
|
||||||
|
return ERR_PTR(-EINVAL);
|
||||||
|
|
||||||
|
dma = kzalloc(sizeof(*dma), GFP_KERNEL);
|
||||||
|
if (!dma)
|
||||||
|
return ERR_PTR(-ENOMEM);
|
||||||
|
|
||||||
|
switch (pci_p2pdma_map_type(provider, attach->dev)) {
|
||||||
|
case PCI_P2PDMA_MAP_BUS_ADDR:
|
||||||
|
/*
|
||||||
|
* There is no need in IOVA at all for this flow.
|
||||||
|
*/
|
||||||
|
break;
|
||||||
|
case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
|
||||||
|
dma->state = kzalloc(sizeof(*dma->state), GFP_KERNEL);
|
||||||
|
if (!dma->state) {
|
||||||
|
ret = -ENOMEM;
|
||||||
|
goto err_free_dma;
|
||||||
|
}
|
||||||
|
|
||||||
|
dma_iova_try_alloc(attach->dev, dma->state, 0, size);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
ret = -EINVAL;
|
||||||
|
goto err_free_dma;
|
||||||
|
}
|
||||||
|
|
||||||
|
nents = calc_sg_nents(dma->state, phys_vec, nr_ranges, size);
|
||||||
|
ret = sg_alloc_table(&dma->sgt, nents, GFP_KERNEL | __GFP_ZERO);
|
||||||
|
if (ret)
|
||||||
|
goto err_free_state;
|
||||||
|
|
||||||
|
sgl = dma->sgt.sgl;
|
||||||
|
|
||||||
|
for (i = 0; i < nr_ranges; i++) {
|
||||||
|
if (!dma->state) {
|
||||||
|
addr = pci_p2pdma_bus_addr_map(provider,
|
||||||
|
phys_vec[i].paddr);
|
||||||
|
} else if (dma_use_iova(dma->state)) {
|
||||||
|
ret = dma_iova_link(attach->dev, dma->state,
|
||||||
|
phys_vec[i].paddr, 0,
|
||||||
|
phys_vec[i].len, dir,
|
||||||
|
DMA_ATTR_MMIO);
|
||||||
|
if (ret)
|
||||||
|
goto err_unmap_dma;
|
||||||
|
|
||||||
|
mapped_len += phys_vec[i].len;
|
||||||
|
} else {
|
||||||
|
addr = dma_map_phys(attach->dev, phys_vec[i].paddr,
|
||||||
|
phys_vec[i].len, dir,
|
||||||
|
DMA_ATTR_MMIO);
|
||||||
|
ret = dma_mapping_error(attach->dev, addr);
|
||||||
|
if (ret)
|
||||||
|
goto err_unmap_dma;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!dma->state || !dma_use_iova(dma->state))
|
||||||
|
sgl = fill_sg_entry(sgl, phys_vec[i].len, addr);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (dma->state && dma_use_iova(dma->state)) {
|
||||||
|
WARN_ON_ONCE(mapped_len != size);
|
||||||
|
ret = dma_iova_sync(attach->dev, dma->state, 0, mapped_len);
|
||||||
|
if (ret)
|
||||||
|
goto err_unmap_dma;
|
||||||
|
|
||||||
|
sgl = fill_sg_entry(sgl, mapped_len, dma->state->addr);
|
||||||
|
}
|
||||||
|
|
||||||
|
dma->size = size;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* No CPU list included — set orig_nents = 0 so others can detect
|
||||||
|
* this via SG table (use nents only).
|
||||||
|
*/
|
||||||
|
dma->sgt.orig_nents = 0;
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SGL must be NULL to indicate that SGL is the last one
|
||||||
|
* and we allocated correct number of entries in sg_alloc_table()
|
||||||
|
*/
|
||||||
|
WARN_ON_ONCE(sgl);
|
||||||
|
return &dma->sgt;
|
||||||
|
|
||||||
|
err_unmap_dma:
|
||||||
|
if (!i || !dma->state) {
|
||||||
|
; /* Do nothing */
|
||||||
|
} else if (dma_use_iova(dma->state)) {
|
||||||
|
dma_iova_destroy(attach->dev, dma->state, mapped_len, dir,
|
||||||
|
DMA_ATTR_MMIO);
|
||||||
|
} else {
|
||||||
|
for_each_sgtable_dma_sg(&dma->sgt, sgl, i)
|
||||||
|
dma_unmap_phys(attach->dev, sg_dma_address(sgl),
|
||||||
|
sg_dma_len(sgl), dir, DMA_ATTR_MMIO);
|
||||||
|
}
|
||||||
|
sg_free_table(&dma->sgt);
|
||||||
|
err_free_state:
|
||||||
|
kfree(dma->state);
|
||||||
|
err_free_dma:
|
||||||
|
kfree(dma);
|
||||||
|
return ERR_PTR(ret);
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_NS_GPL(dma_buf_phys_vec_to_sgt, "DMA_BUF");
|
||||||
|
|
||||||
|
/**
|
||||||
|
* dma_buf_free_sgt- unmaps the buffer
|
||||||
|
* @attach: [in] attachment to unmap buffer from
|
||||||
|
* @sgt: [in] scatterlist info of the buffer to unmap
|
||||||
|
* @dir: [in] direction of DMA transfer
|
||||||
|
*
|
||||||
|
* This unmaps a DMA mapping for @attached obtained
|
||||||
|
* by dma_buf_phys_vec_to_sgt().
|
||||||
|
*/
|
||||||
|
void dma_buf_free_sgt(struct dma_buf_attachment *attach, struct sg_table *sgt,
|
||||||
|
enum dma_data_direction dir)
|
||||||
|
{
|
||||||
|
struct dma_buf_dma *dma = container_of(sgt, struct dma_buf_dma, sgt);
|
||||||
|
int i;
|
||||||
|
|
||||||
|
dma_resv_assert_held(attach->dmabuf->resv);
|
||||||
|
|
||||||
|
if (!dma->state) {
|
||||||
|
; /* Do nothing */
|
||||||
|
} else if (dma_use_iova(dma->state)) {
|
||||||
|
dma_iova_destroy(attach->dev, dma->state, dma->size, dir,
|
||||||
|
DMA_ATTR_MMIO);
|
||||||
|
} else {
|
||||||
|
struct scatterlist *sgl;
|
||||||
|
|
||||||
|
for_each_sgtable_dma_sg(sgt, sgl, i)
|
||||||
|
dma_unmap_phys(attach->dev, sg_dma_address(sgl),
|
||||||
|
sg_dma_len(sgl), dir, DMA_ATTR_MMIO);
|
||||||
|
}
|
||||||
|
|
||||||
|
sg_free_table(sgt);
|
||||||
|
kfree(dma->state);
|
||||||
|
kfree(dma);
|
||||||
|
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_NS_GPL(dma_buf_free_sgt, "DMA_BUF");
|
||||||
|
|
@ -1439,8 +1439,8 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
|
||||||
* as a bus address, __finalise_sg() will copy the dma
|
* as a bus address, __finalise_sg() will copy the dma
|
||||||
* address into the output segment.
|
* address into the output segment.
|
||||||
*/
|
*/
|
||||||
s->dma_address = pci_p2pdma_bus_addr_map(&p2pdma_state,
|
s->dma_address = pci_p2pdma_bus_addr_map(
|
||||||
sg_phys(s));
|
p2pdma_state.mem, sg_phys(s));
|
||||||
sg_dma_len(s) = sg->length;
|
sg_dma_len(s) = sg->length;
|
||||||
sg_dma_mark_bus_address(s);
|
sg_dma_mark_bus_address(s);
|
||||||
continue;
|
continue;
|
||||||
|
|
|
||||||
|
|
@ -25,12 +25,12 @@ struct pci_p2pdma {
|
||||||
struct gen_pool *pool;
|
struct gen_pool *pool;
|
||||||
bool p2pmem_published;
|
bool p2pmem_published;
|
||||||
struct xarray map_types;
|
struct xarray map_types;
|
||||||
|
struct p2pdma_provider mem[PCI_STD_NUM_BARS];
|
||||||
};
|
};
|
||||||
|
|
||||||
struct pci_p2pdma_pagemap {
|
struct pci_p2pdma_pagemap {
|
||||||
struct pci_dev *provider;
|
|
||||||
u64 bus_offset;
|
|
||||||
struct dev_pagemap pgmap;
|
struct dev_pagemap pgmap;
|
||||||
|
struct p2pdma_provider *mem;
|
||||||
};
|
};
|
||||||
|
|
||||||
static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap)
|
static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap)
|
||||||
|
|
@ -204,8 +204,8 @@ static void p2pdma_page_free(struct page *page)
|
||||||
{
|
{
|
||||||
struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page_pgmap(page));
|
struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page_pgmap(page));
|
||||||
/* safe to dereference while a reference is held to the percpu ref */
|
/* safe to dereference while a reference is held to the percpu ref */
|
||||||
struct pci_p2pdma *p2pdma =
|
struct pci_p2pdma *p2pdma = rcu_dereference_protected(
|
||||||
rcu_dereference_protected(pgmap->provider->p2pdma, 1);
|
to_pci_dev(pgmap->mem->owner)->p2pdma, 1);
|
||||||
struct percpu_ref *ref;
|
struct percpu_ref *ref;
|
||||||
|
|
||||||
gen_pool_free_owner(p2pdma->pool, (uintptr_t)page_to_virt(page),
|
gen_pool_free_owner(p2pdma->pool, (uintptr_t)page_to_virt(page),
|
||||||
|
|
@ -228,56 +228,136 @@ static void pci_p2pdma_release(void *data)
|
||||||
|
|
||||||
/* Flush and disable pci_alloc_p2p_mem() */
|
/* Flush and disable pci_alloc_p2p_mem() */
|
||||||
pdev->p2pdma = NULL;
|
pdev->p2pdma = NULL;
|
||||||
|
if (p2pdma->pool)
|
||||||
synchronize_rcu();
|
synchronize_rcu();
|
||||||
|
xa_destroy(&p2pdma->map_types);
|
||||||
|
|
||||||
|
if (!p2pdma->pool)
|
||||||
|
return;
|
||||||
|
|
||||||
gen_pool_destroy(p2pdma->pool);
|
gen_pool_destroy(p2pdma->pool);
|
||||||
sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
|
sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
|
||||||
xa_destroy(&p2pdma->map_types);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static int pci_p2pdma_setup(struct pci_dev *pdev)
|
/**
|
||||||
|
* pcim_p2pdma_init - Initialise peer-to-peer DMA providers
|
||||||
|
* @pdev: The PCI device to enable P2PDMA for
|
||||||
|
*
|
||||||
|
* This function initializes the peer-to-peer DMA infrastructure
|
||||||
|
* for a PCI device. It allocates and sets up the necessary data
|
||||||
|
* structures to support P2PDMA operations, including mapping type
|
||||||
|
* tracking.
|
||||||
|
*/
|
||||||
|
int pcim_p2pdma_init(struct pci_dev *pdev)
|
||||||
{
|
{
|
||||||
int error = -ENOMEM;
|
|
||||||
struct pci_p2pdma *p2p;
|
struct pci_p2pdma *p2p;
|
||||||
|
int i, ret;
|
||||||
|
|
||||||
|
p2p = rcu_dereference_protected(pdev->p2pdma, 1);
|
||||||
|
if (p2p)
|
||||||
|
return 0;
|
||||||
|
|
||||||
p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
|
p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL);
|
||||||
if (!p2p)
|
if (!p2p)
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|
||||||
xa_init(&p2p->map_types);
|
xa_init(&p2p->map_types);
|
||||||
|
/*
|
||||||
|
* Iterate over all standard PCI BARs and record only those that
|
||||||
|
* correspond to MMIO regions. Skip non-memory resources (e.g. I/O
|
||||||
|
* port BARs) since they cannot be used for peer-to-peer (P2P)
|
||||||
|
* transactions.
|
||||||
|
*/
|
||||||
|
for (i = 0; i < PCI_STD_NUM_BARS; i++) {
|
||||||
|
if (!(pci_resource_flags(pdev, i) & IORESOURCE_MEM))
|
||||||
|
continue;
|
||||||
|
|
||||||
p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
|
p2p->mem[i].owner = &pdev->dev;
|
||||||
if (!p2p->pool)
|
p2p->mem[i].bus_offset =
|
||||||
goto out;
|
pci_bus_address(pdev, i) - pci_resource_start(pdev, i);
|
||||||
|
}
|
||||||
|
|
||||||
error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
|
ret = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev);
|
||||||
if (error)
|
if (ret)
|
||||||
goto out_pool_destroy;
|
goto out_p2p;
|
||||||
|
|
||||||
error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group);
|
|
||||||
if (error)
|
|
||||||
goto out_pool_destroy;
|
|
||||||
|
|
||||||
rcu_assign_pointer(pdev->p2pdma, p2p);
|
rcu_assign_pointer(pdev->p2pdma, p2p);
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
out_pool_destroy:
|
out_p2p:
|
||||||
gen_pool_destroy(p2p->pool);
|
|
||||||
out:
|
|
||||||
devm_kfree(&pdev->dev, p2p);
|
devm_kfree(&pdev->dev, p2p);
|
||||||
return error;
|
return ret;
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(pcim_p2pdma_init);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* pcim_p2pdma_provider - Get peer-to-peer DMA provider
|
||||||
|
* @pdev: The PCI device to enable P2PDMA for
|
||||||
|
* @bar: BAR index to get provider
|
||||||
|
*
|
||||||
|
* This function gets peer-to-peer DMA provider for a PCI device. The lifetime
|
||||||
|
* of the provider (and of course the MMIO) is bound to the lifetime of the
|
||||||
|
* driver. A driver calling this function must ensure that all references to the
|
||||||
|
* provider, and any DMA mappings created for any MMIO, are all cleaned up
|
||||||
|
* before the driver remove() completes.
|
||||||
|
*
|
||||||
|
* Since P2P is almost always shared with a second driver this means some system
|
||||||
|
* to notify, invalidate and revoke the MMIO's DMA must be in place to use this
|
||||||
|
* function. For example a revoke can be built using DMABUF.
|
||||||
|
*/
|
||||||
|
struct p2pdma_provider *pcim_p2pdma_provider(struct pci_dev *pdev, int bar)
|
||||||
|
{
|
||||||
|
struct pci_p2pdma *p2p;
|
||||||
|
|
||||||
|
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
p2p = rcu_dereference_protected(pdev->p2pdma, 1);
|
||||||
|
if (WARN_ON(!p2p))
|
||||||
|
/* Someone forgot to call to pcim_p2pdma_init() before */
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
return &p2p->mem[bar];
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(pcim_p2pdma_provider);
|
||||||
|
|
||||||
|
static int pci_p2pdma_setup_pool(struct pci_dev *pdev)
|
||||||
|
{
|
||||||
|
struct pci_p2pdma *p2pdma;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
|
||||||
|
if (p2pdma->pool)
|
||||||
|
/* We already setup pools, do nothing, */
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
p2pdma->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev));
|
||||||
|
if (!p2pdma->pool)
|
||||||
|
return -ENOMEM;
|
||||||
|
|
||||||
|
ret = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group);
|
||||||
|
if (ret)
|
||||||
|
goto out_pool_destroy;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
out_pool_destroy:
|
||||||
|
gen_pool_destroy(p2pdma->pool);
|
||||||
|
p2pdma->pool = NULL;
|
||||||
|
return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void pci_p2pdma_unmap_mappings(void *data)
|
static void pci_p2pdma_unmap_mappings(void *data)
|
||||||
{
|
{
|
||||||
struct pci_dev *pdev = data;
|
struct pci_p2pdma_pagemap *p2p_pgmap = data;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Removing the alloc attribute from sysfs will call
|
* Removing the alloc attribute from sysfs will call
|
||||||
* unmap_mapping_range() on the inode, teardown any existing userspace
|
* unmap_mapping_range() on the inode, teardown any existing userspace
|
||||||
* mappings and prevent new ones from being created.
|
* mappings and prevent new ones from being created.
|
||||||
*/
|
*/
|
||||||
sysfs_remove_file_from_group(&pdev->dev.kobj, &p2pmem_alloc_attr.attr,
|
sysfs_remove_file_from_group(&p2p_pgmap->mem->owner->kobj,
|
||||||
|
&p2pmem_alloc_attr.attr,
|
||||||
p2pmem_group.name);
|
p2pmem_group.name);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -295,6 +375,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
|
||||||
u64 offset)
|
u64 offset)
|
||||||
{
|
{
|
||||||
struct pci_p2pdma_pagemap *p2p_pgmap;
|
struct pci_p2pdma_pagemap *p2p_pgmap;
|
||||||
|
struct p2pdma_provider *mem;
|
||||||
struct dev_pagemap *pgmap;
|
struct dev_pagemap *pgmap;
|
||||||
struct pci_p2pdma *p2pdma;
|
struct pci_p2pdma *p2pdma;
|
||||||
void *addr;
|
void *addr;
|
||||||
|
|
@ -312,11 +393,21 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
|
||||||
if (size + offset > pci_resource_len(pdev, bar))
|
if (size + offset > pci_resource_len(pdev, bar))
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
if (!pdev->p2pdma) {
|
error = pcim_p2pdma_init(pdev);
|
||||||
error = pci_p2pdma_setup(pdev);
|
|
||||||
if (error)
|
if (error)
|
||||||
return error;
|
return error;
|
||||||
}
|
|
||||||
|
error = pci_p2pdma_setup_pool(pdev);
|
||||||
|
if (error)
|
||||||
|
return error;
|
||||||
|
|
||||||
|
mem = pcim_p2pdma_provider(pdev, bar);
|
||||||
|
/*
|
||||||
|
* We checked validity of BAR prior to call
|
||||||
|
* to pcim_p2pdma_provider. It should never return NULL.
|
||||||
|
*/
|
||||||
|
if (WARN_ON(!mem))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
p2p_pgmap = devm_kzalloc(&pdev->dev, sizeof(*p2p_pgmap), GFP_KERNEL);
|
p2p_pgmap = devm_kzalloc(&pdev->dev, sizeof(*p2p_pgmap), GFP_KERNEL);
|
||||||
if (!p2p_pgmap)
|
if (!p2p_pgmap)
|
||||||
|
|
@ -328,10 +419,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
|
||||||
pgmap->nr_range = 1;
|
pgmap->nr_range = 1;
|
||||||
pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
|
pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
|
||||||
pgmap->ops = &p2pdma_pgmap_ops;
|
pgmap->ops = &p2pdma_pgmap_ops;
|
||||||
|
p2p_pgmap->mem = mem;
|
||||||
p2p_pgmap->provider = pdev;
|
|
||||||
p2p_pgmap->bus_offset = pci_bus_address(pdev, bar) -
|
|
||||||
pci_resource_start(pdev, bar);
|
|
||||||
|
|
||||||
addr = devm_memremap_pages(&pdev->dev, pgmap);
|
addr = devm_memremap_pages(&pdev->dev, pgmap);
|
||||||
if (IS_ERR(addr)) {
|
if (IS_ERR(addr)) {
|
||||||
|
|
@ -340,7 +428,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
|
||||||
}
|
}
|
||||||
|
|
||||||
error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings,
|
error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings,
|
||||||
pdev);
|
p2p_pgmap);
|
||||||
if (error)
|
if (error)
|
||||||
goto pages_free;
|
goto pages_free;
|
||||||
|
|
||||||
|
|
@ -972,16 +1060,26 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
|
EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
|
||||||
|
|
||||||
static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
|
/**
|
||||||
|
* pci_p2pdma_map_type - Determine the mapping type for P2PDMA transfers
|
||||||
|
* @provider: P2PDMA provider structure
|
||||||
|
* @dev: Target device for the transfer
|
||||||
|
*
|
||||||
|
* Determines how peer-to-peer DMA transfers should be mapped between
|
||||||
|
* the provider and the target device. The mapping type indicates whether
|
||||||
|
* the transfer can be done directly through PCI switches or must go
|
||||||
|
* through the host bridge.
|
||||||
|
*/
|
||||||
|
enum pci_p2pdma_map_type pci_p2pdma_map_type(struct p2pdma_provider *provider,
|
||||||
struct device *dev)
|
struct device *dev)
|
||||||
{
|
{
|
||||||
enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
|
enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
|
||||||
struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
|
struct pci_dev *pdev = to_pci_dev(provider->owner);
|
||||||
struct pci_dev *client;
|
struct pci_dev *client;
|
||||||
struct pci_p2pdma *p2pdma;
|
struct pci_p2pdma *p2pdma;
|
||||||
int dist;
|
int dist;
|
||||||
|
|
||||||
if (!provider->p2pdma)
|
if (!pdev->p2pdma)
|
||||||
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
|
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
|
||||||
|
|
||||||
if (!dev_is_pci(dev))
|
if (!dev_is_pci(dev))
|
||||||
|
|
@ -990,7 +1088,7 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
|
||||||
client = to_pci_dev(dev);
|
client = to_pci_dev(dev);
|
||||||
|
|
||||||
rcu_read_lock();
|
rcu_read_lock();
|
||||||
p2pdma = rcu_dereference(provider->p2pdma);
|
p2pdma = rcu_dereference(pdev->p2pdma);
|
||||||
|
|
||||||
if (p2pdma)
|
if (p2pdma)
|
||||||
type = xa_to_value(xa_load(&p2pdma->map_types,
|
type = xa_to_value(xa_load(&p2pdma->map_types,
|
||||||
|
|
@ -998,7 +1096,7 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
|
||||||
rcu_read_unlock();
|
rcu_read_unlock();
|
||||||
|
|
||||||
if (type == PCI_P2PDMA_MAP_UNKNOWN)
|
if (type == PCI_P2PDMA_MAP_UNKNOWN)
|
||||||
return calc_map_type_and_dist(provider, client, &dist, true);
|
return calc_map_type_and_dist(pdev, client, &dist, true);
|
||||||
|
|
||||||
return type;
|
return type;
|
||||||
}
|
}
|
||||||
|
|
@ -1006,9 +1104,13 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
|
||||||
void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state,
|
void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state,
|
||||||
struct device *dev, struct page *page)
|
struct device *dev, struct page *page)
|
||||||
{
|
{
|
||||||
state->pgmap = page_pgmap(page);
|
struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(page_pgmap(page));
|
||||||
state->map = pci_p2pdma_map_type(state->pgmap, dev);
|
|
||||||
state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
|
if (state->mem == p2p_pgmap->mem)
|
||||||
|
return;
|
||||||
|
|
||||||
|
state->mem = p2p_pgmap->mem;
|
||||||
|
state->map = pci_p2pdma_map_type(p2p_pgmap->mem, dev);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|
|
||||||
|
|
@ -55,6 +55,9 @@ config VFIO_PCI_ZDEV_KVM
|
||||||
|
|
||||||
To enable s390x KVM vfio-pci extensions, say Y.
|
To enable s390x KVM vfio-pci extensions, say Y.
|
||||||
|
|
||||||
|
config VFIO_PCI_DMABUF
|
||||||
|
def_bool y if VFIO_PCI_CORE && PCI_P2PDMA && DMA_SHARED_BUFFER
|
||||||
|
|
||||||
source "drivers/vfio/pci/mlx5/Kconfig"
|
source "drivers/vfio/pci/mlx5/Kconfig"
|
||||||
|
|
||||||
source "drivers/vfio/pci/hisilicon/Kconfig"
|
source "drivers/vfio/pci/hisilicon/Kconfig"
|
||||||
|
|
|
||||||
|
|
@ -2,6 +2,7 @@
|
||||||
|
|
||||||
vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
|
vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
|
||||||
vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o
|
vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o
|
||||||
|
vfio-pci-core-$(CONFIG_VFIO_PCI_DMABUF) += vfio_pci_dmabuf.o
|
||||||
obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
|
obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
|
||||||
|
|
||||||
vfio-pci-y := vfio_pci.o
|
vfio-pci-y := vfio_pci.o
|
||||||
|
|
|
||||||
|
|
@ -7,6 +7,7 @@
|
||||||
#include <linux/vfio_pci_core.h>
|
#include <linux/vfio_pci_core.h>
|
||||||
#include <linux/delay.h>
|
#include <linux/delay.h>
|
||||||
#include <linux/jiffies.h>
|
#include <linux/jiffies.h>
|
||||||
|
#include <linux/pci-p2pdma.h>
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* The device memory usable to the workloads running in the VM is cached
|
* The device memory usable to the workloads running in the VM is cached
|
||||||
|
|
@ -652,6 +653,50 @@ nvgrace_gpu_write(struct vfio_device *core_vdev,
|
||||||
return vfio_pci_core_write(core_vdev, buf, count, ppos);
|
return vfio_pci_core_write(core_vdev, buf, count, ppos);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int nvgrace_get_dmabuf_phys(struct vfio_pci_core_device *core_vdev,
|
||||||
|
struct p2pdma_provider **provider,
|
||||||
|
unsigned int region_index,
|
||||||
|
struct dma_buf_phys_vec *phys_vec,
|
||||||
|
struct vfio_region_dma_range *dma_ranges,
|
||||||
|
size_t nr_ranges)
|
||||||
|
{
|
||||||
|
struct nvgrace_gpu_pci_core_device *nvdev = container_of(
|
||||||
|
core_vdev, struct nvgrace_gpu_pci_core_device, core_device);
|
||||||
|
struct pci_dev *pdev = core_vdev->pdev;
|
||||||
|
struct mem_region *mem_region;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* if (nvdev->resmem.memlength && region_index == RESMEM_REGION_INDEX) {
|
||||||
|
* The P2P properties of the non-BAR memory is the same as the
|
||||||
|
* BAR memory, so just use the provider for index 0. Someday
|
||||||
|
* when CXL gets P2P support we could create CXLish providers
|
||||||
|
* for the non-BAR memory.
|
||||||
|
* } else if (region_index == USEMEM_REGION_INDEX) {
|
||||||
|
* This is actually cachable memory and isn't treated as P2P in
|
||||||
|
* the chip. For now we have no way to push cachable memory
|
||||||
|
* through everything and the Grace HW doesn't care what caching
|
||||||
|
* attribute is programmed into the SMMU. So use BAR 0.
|
||||||
|
* }
|
||||||
|
*/
|
||||||
|
mem_region = nvgrace_gpu_memregion(region_index, nvdev);
|
||||||
|
if (mem_region) {
|
||||||
|
*provider = pcim_p2pdma_provider(pdev, 0);
|
||||||
|
if (!*provider)
|
||||||
|
return -EINVAL;
|
||||||
|
return vfio_pci_core_fill_phys_vec(phys_vec, dma_ranges,
|
||||||
|
nr_ranges,
|
||||||
|
mem_region->memphys,
|
||||||
|
mem_region->memlength);
|
||||||
|
}
|
||||||
|
|
||||||
|
return vfio_pci_core_get_dmabuf_phys(core_vdev, provider, region_index,
|
||||||
|
phys_vec, dma_ranges, nr_ranges);
|
||||||
|
}
|
||||||
|
|
||||||
|
static const struct vfio_pci_device_ops nvgrace_gpu_pci_dev_ops = {
|
||||||
|
.get_dmabuf_phys = nvgrace_get_dmabuf_phys,
|
||||||
|
};
|
||||||
|
|
||||||
static const struct vfio_device_ops nvgrace_gpu_pci_ops = {
|
static const struct vfio_device_ops nvgrace_gpu_pci_ops = {
|
||||||
.name = "nvgrace-gpu-vfio-pci",
|
.name = "nvgrace-gpu-vfio-pci",
|
||||||
.init = vfio_pci_core_init_dev,
|
.init = vfio_pci_core_init_dev,
|
||||||
|
|
@ -673,6 +718,10 @@ static const struct vfio_device_ops nvgrace_gpu_pci_ops = {
|
||||||
.detach_ioas = vfio_iommufd_physical_detach_ioas,
|
.detach_ioas = vfio_iommufd_physical_detach_ioas,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
static const struct vfio_pci_device_ops nvgrace_gpu_pci_dev_core_ops = {
|
||||||
|
.get_dmabuf_phys = vfio_pci_core_get_dmabuf_phys,
|
||||||
|
};
|
||||||
|
|
||||||
static const struct vfio_device_ops nvgrace_gpu_pci_core_ops = {
|
static const struct vfio_device_ops nvgrace_gpu_pci_core_ops = {
|
||||||
.name = "nvgrace-gpu-vfio-pci-core",
|
.name = "nvgrace-gpu-vfio-pci-core",
|
||||||
.init = vfio_pci_core_init_dev,
|
.init = vfio_pci_core_init_dev,
|
||||||
|
|
@ -936,6 +985,9 @@ static int nvgrace_gpu_probe(struct pci_dev *pdev,
|
||||||
memphys, memlength);
|
memphys, memlength);
|
||||||
if (ret)
|
if (ret)
|
||||||
goto out_put_vdev;
|
goto out_put_vdev;
|
||||||
|
nvdev->core_device.pci_ops = &nvgrace_gpu_pci_dev_ops;
|
||||||
|
} else {
|
||||||
|
nvdev->core_device.pci_ops = &nvgrace_gpu_pci_dev_core_ops;
|
||||||
}
|
}
|
||||||
|
|
||||||
ret = vfio_pci_core_register_device(&nvdev->core_device);
|
ret = vfio_pci_core_register_device(&nvdev->core_device);
|
||||||
|
|
|
||||||
|
|
@ -148,6 +148,10 @@ static const struct vfio_device_ops vfio_pci_ops = {
|
||||||
.pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas,
|
.pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
static const struct vfio_pci_device_ops vfio_pci_dev_ops = {
|
||||||
|
.get_dmabuf_phys = vfio_pci_core_get_dmabuf_phys,
|
||||||
|
};
|
||||||
|
|
||||||
static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
|
static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
|
||||||
{
|
{
|
||||||
struct vfio_pci_core_device *vdev;
|
struct vfio_pci_core_device *vdev;
|
||||||
|
|
@ -162,6 +166,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
|
||||||
return PTR_ERR(vdev);
|
return PTR_ERR(vdev);
|
||||||
|
|
||||||
dev_set_drvdata(&pdev->dev, vdev);
|
dev_set_drvdata(&pdev->dev, vdev);
|
||||||
|
vdev->pci_ops = &vfio_pci_dev_ops;
|
||||||
ret = vfio_pci_core_register_device(vdev);
|
ret = vfio_pci_core_register_device(vdev);
|
||||||
if (ret)
|
if (ret)
|
||||||
goto out_put_vdev;
|
goto out_put_vdev;
|
||||||
|
|
|
||||||
|
|
@ -589,10 +589,12 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
|
||||||
virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY);
|
virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY);
|
||||||
new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);
|
new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);
|
||||||
|
|
||||||
if (!new_mem)
|
if (!new_mem) {
|
||||||
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
||||||
else
|
vfio_pci_dma_buf_move(vdev, true);
|
||||||
|
} else {
|
||||||
down_write(&vdev->memory_lock);
|
down_write(&vdev->memory_lock);
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* If the user is writing mem/io enable (new_mem/io) and we
|
* If the user is writing mem/io enable (new_mem/io) and we
|
||||||
|
|
@ -627,6 +629,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
|
||||||
*virt_cmd &= cpu_to_le16(~mask);
|
*virt_cmd &= cpu_to_le16(~mask);
|
||||||
*virt_cmd |= cpu_to_le16(new_cmd & mask);
|
*virt_cmd |= cpu_to_le16(new_cmd & mask);
|
||||||
|
|
||||||
|
if (__vfio_pci_memory_enabled(vdev))
|
||||||
|
vfio_pci_dma_buf_move(vdev, false);
|
||||||
up_write(&vdev->memory_lock);
|
up_write(&vdev->memory_lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -707,12 +711,16 @@ static int __init init_pci_cap_basic_perm(struct perm_bits *perm)
|
||||||
static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev,
|
static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev,
|
||||||
pci_power_t state)
|
pci_power_t state)
|
||||||
{
|
{
|
||||||
if (state >= PCI_D3hot)
|
if (state >= PCI_D3hot) {
|
||||||
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
||||||
else
|
vfio_pci_dma_buf_move(vdev, true);
|
||||||
|
} else {
|
||||||
down_write(&vdev->memory_lock);
|
down_write(&vdev->memory_lock);
|
||||||
|
}
|
||||||
|
|
||||||
vfio_pci_set_power_state(vdev, state);
|
vfio_pci_set_power_state(vdev, state);
|
||||||
|
if (__vfio_pci_memory_enabled(vdev))
|
||||||
|
vfio_pci_dma_buf_move(vdev, false);
|
||||||
up_write(&vdev->memory_lock);
|
up_write(&vdev->memory_lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -900,7 +908,10 @@ static int vfio_exp_config_write(struct vfio_pci_core_device *vdev, int pos,
|
||||||
|
|
||||||
if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) {
|
if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) {
|
||||||
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
||||||
|
vfio_pci_dma_buf_move(vdev, true);
|
||||||
pci_try_reset_function(vdev->pdev);
|
pci_try_reset_function(vdev->pdev);
|
||||||
|
if (__vfio_pci_memory_enabled(vdev))
|
||||||
|
vfio_pci_dma_buf_move(vdev, false);
|
||||||
up_write(&vdev->memory_lock);
|
up_write(&vdev->memory_lock);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -982,7 +993,10 @@ static int vfio_af_config_write(struct vfio_pci_core_device *vdev, int pos,
|
||||||
|
|
||||||
if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) {
|
if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) {
|
||||||
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
||||||
|
vfio_pci_dma_buf_move(vdev, true);
|
||||||
pci_try_reset_function(vdev->pdev);
|
pci_try_reset_function(vdev->pdev);
|
||||||
|
if (__vfio_pci_memory_enabled(vdev))
|
||||||
|
vfio_pci_dma_buf_move(vdev, false);
|
||||||
up_write(&vdev->memory_lock);
|
up_write(&vdev->memory_lock);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -28,6 +28,7 @@
|
||||||
#include <linux/nospec.h>
|
#include <linux/nospec.h>
|
||||||
#include <linux/sched/mm.h>
|
#include <linux/sched/mm.h>
|
||||||
#include <linux/iommufd.h>
|
#include <linux/iommufd.h>
|
||||||
|
#include <linux/pci-p2pdma.h>
|
||||||
#if IS_ENABLED(CONFIG_EEH)
|
#if IS_ENABLED(CONFIG_EEH)
|
||||||
#include <asm/eeh.h>
|
#include <asm/eeh.h>
|
||||||
#endif
|
#endif
|
||||||
|
|
@ -286,6 +287,8 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev,
|
||||||
* semaphore.
|
* semaphore.
|
||||||
*/
|
*/
|
||||||
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
vfio_pci_zap_and_down_write_memory_lock(vdev);
|
||||||
|
vfio_pci_dma_buf_move(vdev, true);
|
||||||
|
|
||||||
if (vdev->pm_runtime_engaged) {
|
if (vdev->pm_runtime_engaged) {
|
||||||
up_write(&vdev->memory_lock);
|
up_write(&vdev->memory_lock);
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
@ -299,11 +302,9 @@ static int vfio_pci_runtime_pm_entry(struct vfio_pci_core_device *vdev,
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags,
|
static int vfio_pci_core_pm_entry(struct vfio_pci_core_device *vdev, u32 flags,
|
||||||
void __user *arg, size_t argsz)
|
void __user *arg, size_t argsz)
|
||||||
{
|
{
|
||||||
struct vfio_pci_core_device *vdev =
|
|
||||||
container_of(device, struct vfio_pci_core_device, vdev);
|
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
|
ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
|
||||||
|
|
@ -320,12 +321,10 @@ static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags,
|
||||||
}
|
}
|
||||||
|
|
||||||
static int vfio_pci_core_pm_entry_with_wakeup(
|
static int vfio_pci_core_pm_entry_with_wakeup(
|
||||||
struct vfio_device *device, u32 flags,
|
struct vfio_pci_core_device *vdev, u32 flags,
|
||||||
struct vfio_device_low_power_entry_with_wakeup __user *arg,
|
struct vfio_device_low_power_entry_with_wakeup __user *arg,
|
||||||
size_t argsz)
|
size_t argsz)
|
||||||
{
|
{
|
||||||
struct vfio_pci_core_device *vdev =
|
|
||||||
container_of(device, struct vfio_pci_core_device, vdev);
|
|
||||||
struct vfio_device_low_power_entry_with_wakeup entry;
|
struct vfio_device_low_power_entry_with_wakeup entry;
|
||||||
struct eventfd_ctx *efdctx;
|
struct eventfd_ctx *efdctx;
|
||||||
int ret;
|
int ret;
|
||||||
|
|
@ -373,14 +372,14 @@ static void vfio_pci_runtime_pm_exit(struct vfio_pci_core_device *vdev)
|
||||||
*/
|
*/
|
||||||
down_write(&vdev->memory_lock);
|
down_write(&vdev->memory_lock);
|
||||||
__vfio_pci_runtime_pm_exit(vdev);
|
__vfio_pci_runtime_pm_exit(vdev);
|
||||||
|
if (__vfio_pci_memory_enabled(vdev))
|
||||||
|
vfio_pci_dma_buf_move(vdev, false);
|
||||||
up_write(&vdev->memory_lock);
|
up_write(&vdev->memory_lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags,
|
static int vfio_pci_core_pm_exit(struct vfio_pci_core_device *vdev, u32 flags,
|
||||||
void __user *arg, size_t argsz)
|
void __user *arg, size_t argsz)
|
||||||
{
|
{
|
||||||
struct vfio_pci_core_device *vdev =
|
|
||||||
container_of(device, struct vfio_pci_core_device, vdev);
|
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
|
ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
|
||||||
|
|
@ -695,6 +694,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
|
||||||
#endif
|
#endif
|
||||||
vfio_pci_core_disable(vdev);
|
vfio_pci_core_disable(vdev);
|
||||||
|
|
||||||
|
vfio_pci_dma_buf_cleanup(vdev);
|
||||||
|
|
||||||
mutex_lock(&vdev->igate);
|
mutex_lock(&vdev->igate);
|
||||||
if (vdev->err_trigger) {
|
if (vdev->err_trigger) {
|
||||||
eventfd_ctx_put(vdev->err_trigger);
|
eventfd_ctx_put(vdev->err_trigger);
|
||||||
|
|
@ -1205,7 +1206,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
|
||||||
*/
|
*/
|
||||||
vfio_pci_set_power_state(vdev, PCI_D0);
|
vfio_pci_set_power_state(vdev, PCI_D0);
|
||||||
|
|
||||||
|
vfio_pci_dma_buf_move(vdev, true);
|
||||||
ret = pci_try_reset_function(vdev->pdev);
|
ret = pci_try_reset_function(vdev->pdev);
|
||||||
|
if (__vfio_pci_memory_enabled(vdev))
|
||||||
|
vfio_pci_dma_buf_move(vdev, false);
|
||||||
up_write(&vdev->memory_lock);
|
up_write(&vdev->memory_lock);
|
||||||
|
|
||||||
return ret;
|
return ret;
|
||||||
|
|
@ -1449,11 +1453,10 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);
|
EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);
|
||||||
|
|
||||||
static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
|
static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
|
||||||
uuid_t __user *arg, size_t argsz)
|
u32 flags, uuid_t __user *arg,
|
||||||
|
size_t argsz)
|
||||||
{
|
{
|
||||||
struct vfio_pci_core_device *vdev =
|
|
||||||
container_of(device, struct vfio_pci_core_device, vdev);
|
|
||||||
uuid_t uuid;
|
uuid_t uuid;
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
|
|
@ -1480,16 +1483,21 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
|
||||||
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
|
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
|
||||||
void __user *arg, size_t argsz)
|
void __user *arg, size_t argsz)
|
||||||
{
|
{
|
||||||
|
struct vfio_pci_core_device *vdev =
|
||||||
|
container_of(device, struct vfio_pci_core_device, vdev);
|
||||||
|
|
||||||
switch (flags & VFIO_DEVICE_FEATURE_MASK) {
|
switch (flags & VFIO_DEVICE_FEATURE_MASK) {
|
||||||
case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY:
|
case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY:
|
||||||
return vfio_pci_core_pm_entry(device, flags, arg, argsz);
|
return vfio_pci_core_pm_entry(vdev, flags, arg, argsz);
|
||||||
case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP:
|
case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP:
|
||||||
return vfio_pci_core_pm_entry_with_wakeup(device, flags,
|
return vfio_pci_core_pm_entry_with_wakeup(vdev, flags,
|
||||||
arg, argsz);
|
arg, argsz);
|
||||||
case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT:
|
case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT:
|
||||||
return vfio_pci_core_pm_exit(device, flags, arg, argsz);
|
return vfio_pci_core_pm_exit(vdev, flags, arg, argsz);
|
||||||
case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
|
case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
|
||||||
return vfio_pci_core_feature_token(device, flags, arg, argsz);
|
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
|
||||||
|
case VFIO_DEVICE_FEATURE_DMA_BUF:
|
||||||
|
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
|
||||||
default:
|
default:
|
||||||
return -ENOTTY;
|
return -ENOTTY;
|
||||||
}
|
}
|
||||||
|
|
@ -2061,6 +2069,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
|
||||||
{
|
{
|
||||||
struct vfio_pci_core_device *vdev =
|
struct vfio_pci_core_device *vdev =
|
||||||
container_of(core_vdev, struct vfio_pci_core_device, vdev);
|
container_of(core_vdev, struct vfio_pci_core_device, vdev);
|
||||||
|
int ret;
|
||||||
|
|
||||||
vdev->pdev = to_pci_dev(core_vdev->dev);
|
vdev->pdev = to_pci_dev(core_vdev->dev);
|
||||||
vdev->irq_type = VFIO_PCI_NUM_IRQS;
|
vdev->irq_type = VFIO_PCI_NUM_IRQS;
|
||||||
|
|
@ -2070,6 +2079,10 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
|
||||||
INIT_LIST_HEAD(&vdev->dummy_resources_list);
|
INIT_LIST_HEAD(&vdev->dummy_resources_list);
|
||||||
INIT_LIST_HEAD(&vdev->ioeventfds_list);
|
INIT_LIST_HEAD(&vdev->ioeventfds_list);
|
||||||
INIT_LIST_HEAD(&vdev->sriov_pfs_item);
|
INIT_LIST_HEAD(&vdev->sriov_pfs_item);
|
||||||
|
ret = pcim_p2pdma_init(vdev->pdev);
|
||||||
|
if (ret && ret != -EOPNOTSUPP)
|
||||||
|
return ret;
|
||||||
|
INIT_LIST_HEAD(&vdev->dmabufs);
|
||||||
init_rwsem(&vdev->memory_lock);
|
init_rwsem(&vdev->memory_lock);
|
||||||
xa_init(&vdev->ctx);
|
xa_init(&vdev->ctx);
|
||||||
|
|
||||||
|
|
@ -2434,6 +2447,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
vfio_pci_dma_buf_move(vdev, true);
|
||||||
vfio_pci_zap_bars(vdev);
|
vfio_pci_zap_bars(vdev);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -2462,8 +2476,11 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
|
||||||
|
|
||||||
err_undo:
|
err_undo:
|
||||||
list_for_each_entry_from_reverse(vdev, &dev_set->device_list,
|
list_for_each_entry_from_reverse(vdev, &dev_set->device_list,
|
||||||
vdev.dev_set_list)
|
vdev.dev_set_list) {
|
||||||
|
if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev))
|
||||||
|
vfio_pci_dma_buf_move(vdev, false);
|
||||||
up_write(&vdev->memory_lock);
|
up_write(&vdev->memory_lock);
|
||||||
|
}
|
||||||
|
|
||||||
list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list)
|
list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list)
|
||||||
pm_runtime_put(&vdev->pdev->dev);
|
pm_runtime_put(&vdev->pdev->dev);
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,316 @@
|
||||||
|
// SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
/* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.
|
||||||
|
*/
|
||||||
|
#include <linux/dma-buf-mapping.h>
|
||||||
|
#include <linux/pci-p2pdma.h>
|
||||||
|
#include <linux/dma-resv.h>
|
||||||
|
|
||||||
|
#include "vfio_pci_priv.h"
|
||||||
|
|
||||||
|
MODULE_IMPORT_NS("DMA_BUF");
|
||||||
|
|
||||||
|
struct vfio_pci_dma_buf {
|
||||||
|
struct dma_buf *dmabuf;
|
||||||
|
struct vfio_pci_core_device *vdev;
|
||||||
|
struct list_head dmabufs_elm;
|
||||||
|
size_t size;
|
||||||
|
struct dma_buf_phys_vec *phys_vec;
|
||||||
|
struct p2pdma_provider *provider;
|
||||||
|
u32 nr_ranges;
|
||||||
|
u8 revoked : 1;
|
||||||
|
};
|
||||||
|
|
||||||
|
static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
|
||||||
|
struct dma_buf_attachment *attachment)
|
||||||
|
{
|
||||||
|
struct vfio_pci_dma_buf *priv = dmabuf->priv;
|
||||||
|
|
||||||
|
if (!attachment->peer2peer)
|
||||||
|
return -EOPNOTSUPP;
|
||||||
|
|
||||||
|
if (priv->revoked)
|
||||||
|
return -ENODEV;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct sg_table *
|
||||||
|
vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
|
||||||
|
enum dma_data_direction dir)
|
||||||
|
{
|
||||||
|
struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
|
||||||
|
|
||||||
|
dma_resv_assert_held(priv->dmabuf->resv);
|
||||||
|
|
||||||
|
if (priv->revoked)
|
||||||
|
return ERR_PTR(-ENODEV);
|
||||||
|
|
||||||
|
return dma_buf_phys_vec_to_sgt(attachment, priv->provider,
|
||||||
|
priv->phys_vec, priv->nr_ranges,
|
||||||
|
priv->size, dir);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
|
||||||
|
struct sg_table *sgt,
|
||||||
|
enum dma_data_direction dir)
|
||||||
|
{
|
||||||
|
dma_buf_free_sgt(attachment, sgt, dir);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
|
||||||
|
{
|
||||||
|
struct vfio_pci_dma_buf *priv = dmabuf->priv;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Either this or vfio_pci_dma_buf_cleanup() will remove from the list.
|
||||||
|
* The refcount prevents both.
|
||||||
|
*/
|
||||||
|
if (priv->vdev) {
|
||||||
|
down_write(&priv->vdev->memory_lock);
|
||||||
|
list_del_init(&priv->dmabufs_elm);
|
||||||
|
up_write(&priv->vdev->memory_lock);
|
||||||
|
vfio_device_put_registration(&priv->vdev->vdev);
|
||||||
|
}
|
||||||
|
kfree(priv->phys_vec);
|
||||||
|
kfree(priv);
|
||||||
|
}
|
||||||
|
|
||||||
|
static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
|
||||||
|
.attach = vfio_pci_dma_buf_attach,
|
||||||
|
.map_dma_buf = vfio_pci_dma_buf_map,
|
||||||
|
.unmap_dma_buf = vfio_pci_dma_buf_unmap,
|
||||||
|
.release = vfio_pci_dma_buf_release,
|
||||||
|
};
|
||||||
|
|
||||||
|
int vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec,
|
||||||
|
struct vfio_region_dma_range *dma_ranges,
|
||||||
|
size_t nr_ranges, phys_addr_t start,
|
||||||
|
phys_addr_t len)
|
||||||
|
{
|
||||||
|
phys_addr_t max_addr;
|
||||||
|
unsigned int i;
|
||||||
|
|
||||||
|
max_addr = start + len;
|
||||||
|
for (i = 0; i < nr_ranges; i++) {
|
||||||
|
phys_addr_t end;
|
||||||
|
|
||||||
|
if (!dma_ranges[i].length)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (check_add_overflow(start, dma_ranges[i].offset,
|
||||||
|
&phys_vec[i].paddr) ||
|
||||||
|
check_add_overflow(phys_vec[i].paddr,
|
||||||
|
dma_ranges[i].length, &end))
|
||||||
|
return -EOVERFLOW;
|
||||||
|
if (end > max_addr)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
phys_vec[i].len = dma_ranges[i].length;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(vfio_pci_core_fill_phys_vec);
|
||||||
|
|
||||||
|
int vfio_pci_core_get_dmabuf_phys(struct vfio_pci_core_device *vdev,
|
||||||
|
struct p2pdma_provider **provider,
|
||||||
|
unsigned int region_index,
|
||||||
|
struct dma_buf_phys_vec *phys_vec,
|
||||||
|
struct vfio_region_dma_range *dma_ranges,
|
||||||
|
size_t nr_ranges)
|
||||||
|
{
|
||||||
|
struct pci_dev *pdev = vdev->pdev;
|
||||||
|
|
||||||
|
*provider = pcim_p2pdma_provider(pdev, region_index);
|
||||||
|
if (!*provider)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
return vfio_pci_core_fill_phys_vec(
|
||||||
|
phys_vec, dma_ranges, nr_ranges,
|
||||||
|
pci_resource_start(pdev, region_index),
|
||||||
|
pci_resource_len(pdev, region_index));
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(vfio_pci_core_get_dmabuf_phys);
|
||||||
|
|
||||||
|
static int validate_dmabuf_input(struct vfio_device_feature_dma_buf *dma_buf,
|
||||||
|
struct vfio_region_dma_range *dma_ranges,
|
||||||
|
size_t *lengthp)
|
||||||
|
{
|
||||||
|
size_t length = 0;
|
||||||
|
u32 i;
|
||||||
|
|
||||||
|
for (i = 0; i < dma_buf->nr_ranges; i++) {
|
||||||
|
u64 offset = dma_ranges[i].offset;
|
||||||
|
u64 len = dma_ranges[i].length;
|
||||||
|
|
||||||
|
if (!len || !PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (check_add_overflow(length, len, &length))
|
||||||
|
return -EINVAL;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* dma_iova_try_alloc() will WARN on if userspace proposes a size that
|
||||||
|
* is too big, eg with lots of ranges.
|
||||||
|
*/
|
||||||
|
if ((u64)(length) & DMA_IOVA_USE_SWIOTLB)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
*lengthp = length;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
|
||||||
|
struct vfio_device_feature_dma_buf __user *arg,
|
||||||
|
size_t argsz)
|
||||||
|
{
|
||||||
|
struct vfio_device_feature_dma_buf get_dma_buf = {};
|
||||||
|
struct vfio_region_dma_range *dma_ranges;
|
||||||
|
DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
|
||||||
|
struct vfio_pci_dma_buf *priv;
|
||||||
|
size_t length;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
if (!vdev->pci_ops || !vdev->pci_ops->get_dmabuf_phys)
|
||||||
|
return -EOPNOTSUPP;
|
||||||
|
|
||||||
|
ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
|
||||||
|
sizeof(get_dma_buf));
|
||||||
|
if (ret != 1)
|
||||||
|
return ret;
|
||||||
|
|
||||||
|
if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
|
||||||
|
return -EFAULT;
|
||||||
|
|
||||||
|
if (!get_dma_buf.nr_ranges || get_dma_buf.flags)
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* For PCI the region_index is the BAR number like everything else.
|
||||||
|
*/
|
||||||
|
if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)
|
||||||
|
return -ENODEV;
|
||||||
|
|
||||||
|
dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
|
||||||
|
sizeof(*dma_ranges));
|
||||||
|
if (IS_ERR(dma_ranges))
|
||||||
|
return PTR_ERR(dma_ranges);
|
||||||
|
|
||||||
|
ret = validate_dmabuf_input(&get_dma_buf, dma_ranges, &length);
|
||||||
|
if (ret)
|
||||||
|
goto err_free_ranges;
|
||||||
|
|
||||||
|
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
|
||||||
|
if (!priv) {
|
||||||
|
ret = -ENOMEM;
|
||||||
|
goto err_free_ranges;
|
||||||
|
}
|
||||||
|
priv->phys_vec = kcalloc(get_dma_buf.nr_ranges, sizeof(*priv->phys_vec),
|
||||||
|
GFP_KERNEL);
|
||||||
|
if (!priv->phys_vec) {
|
||||||
|
ret = -ENOMEM;
|
||||||
|
goto err_free_priv;
|
||||||
|
}
|
||||||
|
|
||||||
|
priv->vdev = vdev;
|
||||||
|
priv->nr_ranges = get_dma_buf.nr_ranges;
|
||||||
|
priv->size = length;
|
||||||
|
ret = vdev->pci_ops->get_dmabuf_phys(vdev, &priv->provider,
|
||||||
|
get_dma_buf.region_index,
|
||||||
|
priv->phys_vec, dma_ranges,
|
||||||
|
priv->nr_ranges);
|
||||||
|
if (ret)
|
||||||
|
goto err_free_phys;
|
||||||
|
|
||||||
|
kfree(dma_ranges);
|
||||||
|
dma_ranges = NULL;
|
||||||
|
|
||||||
|
if (!vfio_device_try_get_registration(&vdev->vdev)) {
|
||||||
|
ret = -ENODEV;
|
||||||
|
goto err_free_phys;
|
||||||
|
}
|
||||||
|
|
||||||
|
exp_info.ops = &vfio_pci_dmabuf_ops;
|
||||||
|
exp_info.size = priv->size;
|
||||||
|
exp_info.flags = get_dma_buf.open_flags;
|
||||||
|
exp_info.priv = priv;
|
||||||
|
|
||||||
|
priv->dmabuf = dma_buf_export(&exp_info);
|
||||||
|
if (IS_ERR(priv->dmabuf)) {
|
||||||
|
ret = PTR_ERR(priv->dmabuf);
|
||||||
|
goto err_dev_put;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* dma_buf_put() now frees priv */
|
||||||
|
INIT_LIST_HEAD(&priv->dmabufs_elm);
|
||||||
|
down_write(&vdev->memory_lock);
|
||||||
|
dma_resv_lock(priv->dmabuf->resv, NULL);
|
||||||
|
priv->revoked = !__vfio_pci_memory_enabled(vdev);
|
||||||
|
list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
|
||||||
|
dma_resv_unlock(priv->dmabuf->resv);
|
||||||
|
up_write(&vdev->memory_lock);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* dma_buf_fd() consumes the reference, when the file closes the dmabuf
|
||||||
|
* will be released.
|
||||||
|
*/
|
||||||
|
ret = dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags);
|
||||||
|
if (ret < 0)
|
||||||
|
goto err_dma_buf;
|
||||||
|
return ret;
|
||||||
|
|
||||||
|
err_dma_buf:
|
||||||
|
dma_buf_put(priv->dmabuf);
|
||||||
|
err_dev_put:
|
||||||
|
vfio_device_put_registration(&vdev->vdev);
|
||||||
|
err_free_phys:
|
||||||
|
kfree(priv->phys_vec);
|
||||||
|
err_free_priv:
|
||||||
|
kfree(priv);
|
||||||
|
err_free_ranges:
|
||||||
|
kfree(dma_ranges);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
|
||||||
|
{
|
||||||
|
struct vfio_pci_dma_buf *priv;
|
||||||
|
struct vfio_pci_dma_buf *tmp;
|
||||||
|
|
||||||
|
lockdep_assert_held_write(&vdev->memory_lock);
|
||||||
|
|
||||||
|
list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
|
||||||
|
if (!get_file_active(&priv->dmabuf->file))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
if (priv->revoked != revoked) {
|
||||||
|
dma_resv_lock(priv->dmabuf->resv, NULL);
|
||||||
|
priv->revoked = revoked;
|
||||||
|
dma_buf_move_notify(priv->dmabuf);
|
||||||
|
dma_resv_unlock(priv->dmabuf->resv);
|
||||||
|
}
|
||||||
|
fput(priv->dmabuf->file);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
|
||||||
|
{
|
||||||
|
struct vfio_pci_dma_buf *priv;
|
||||||
|
struct vfio_pci_dma_buf *tmp;
|
||||||
|
|
||||||
|
down_write(&vdev->memory_lock);
|
||||||
|
list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
|
||||||
|
if (!get_file_active(&priv->dmabuf->file))
|
||||||
|
continue;
|
||||||
|
|
||||||
|
dma_resv_lock(priv->dmabuf->resv, NULL);
|
||||||
|
list_del_init(&priv->dmabufs_elm);
|
||||||
|
priv->vdev = NULL;
|
||||||
|
priv->revoked = true;
|
||||||
|
dma_buf_move_notify(priv->dmabuf);
|
||||||
|
dma_resv_unlock(priv->dmabuf->resv);
|
||||||
|
vfio_device_put_registration(&vdev->vdev);
|
||||||
|
fput(priv->dmabuf->file);
|
||||||
|
}
|
||||||
|
up_write(&vdev->memory_lock);
|
||||||
|
}
|
||||||
|
|
@ -107,4 +107,27 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
|
||||||
return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
|
return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#ifdef CONFIG_VFIO_PCI_DMABUF
|
||||||
|
int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
|
||||||
|
struct vfio_device_feature_dma_buf __user *arg,
|
||||||
|
size_t argsz);
|
||||||
|
void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
|
||||||
|
void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
|
||||||
|
#else
|
||||||
|
static inline int
|
||||||
|
vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
|
||||||
|
struct vfio_device_feature_dma_buf __user *arg,
|
||||||
|
size_t argsz)
|
||||||
|
{
|
||||||
|
return -ENOTTY;
|
||||||
|
}
|
||||||
|
static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev,
|
||||||
|
bool revoked)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
#endif
|
#endif
|
||||||
|
|
|
||||||
|
|
@ -172,11 +172,13 @@ void vfio_device_put_registration(struct vfio_device *device)
|
||||||
if (refcount_dec_and_test(&device->refcount))
|
if (refcount_dec_and_test(&device->refcount))
|
||||||
complete(&device->comp);
|
complete(&device->comp);
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(vfio_device_put_registration);
|
||||||
|
|
||||||
bool vfio_device_try_get_registration(struct vfio_device *device)
|
bool vfio_device_try_get_registration(struct vfio_device *device)
|
||||||
{
|
{
|
||||||
return refcount_inc_not_zero(&device->refcount);
|
return refcount_inc_not_zero(&device->refcount);
|
||||||
}
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(vfio_device_try_get_registration);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* VFIO driver API
|
* VFIO driver API
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,17 @@
|
||||||
|
/* SPDX-License-Identifier: GPL-2.0-only */
|
||||||
|
/*
|
||||||
|
* DMA BUF Mapping Helpers
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
#ifndef __DMA_BUF_MAPPING_H__
|
||||||
|
#define __DMA_BUF_MAPPING_H__
|
||||||
|
#include <linux/dma-buf.h>
|
||||||
|
|
||||||
|
struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach,
|
||||||
|
struct p2pdma_provider *provider,
|
||||||
|
struct dma_buf_phys_vec *phys_vec,
|
||||||
|
size_t nr_ranges, size_t size,
|
||||||
|
enum dma_data_direction dir);
|
||||||
|
void dma_buf_free_sgt(struct dma_buf_attachment *attach, struct sg_table *sgt,
|
||||||
|
enum dma_data_direction dir);
|
||||||
|
#endif
|
||||||
|
|
@ -22,6 +22,7 @@
|
||||||
#include <linux/fs.h>
|
#include <linux/fs.h>
|
||||||
#include <linux/dma-fence.h>
|
#include <linux/dma-fence.h>
|
||||||
#include <linux/wait.h>
|
#include <linux/wait.h>
|
||||||
|
#include <linux/pci-p2pdma.h>
|
||||||
|
|
||||||
struct device;
|
struct device;
|
||||||
struct dma_buf;
|
struct dma_buf;
|
||||||
|
|
@ -530,6 +531,16 @@ struct dma_buf_export_info {
|
||||||
void *priv;
|
void *priv;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* struct dma_buf_phys_vec - describe continuous chunk of memory
|
||||||
|
* @paddr: physical address of that chunk
|
||||||
|
* @len: Length of this chunk
|
||||||
|
*/
|
||||||
|
struct dma_buf_phys_vec {
|
||||||
|
phys_addr_t paddr;
|
||||||
|
size_t len;
|
||||||
|
};
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* DEFINE_DMA_BUF_EXPORT_INFO - helper macro for exporters
|
* DEFINE_DMA_BUF_EXPORT_INFO - helper macro for exporters
|
||||||
* @name: export-info name
|
* @name: export-info name
|
||||||
|
|
|
||||||
|
|
@ -16,7 +16,58 @@
|
||||||
struct block_device;
|
struct block_device;
|
||||||
struct scatterlist;
|
struct scatterlist;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* struct p2pdma_provider
|
||||||
|
*
|
||||||
|
* A p2pdma provider is a range of MMIO address space available to the CPU.
|
||||||
|
*/
|
||||||
|
struct p2pdma_provider {
|
||||||
|
struct device *owner;
|
||||||
|
u64 bus_offset;
|
||||||
|
};
|
||||||
|
|
||||||
|
enum pci_p2pdma_map_type {
|
||||||
|
/*
|
||||||
|
* PCI_P2PDMA_MAP_UNKNOWN: Used internally as an initial state before
|
||||||
|
* the mapping type has been calculated. Exported routines for the API
|
||||||
|
* will never return this value.
|
||||||
|
*/
|
||||||
|
PCI_P2PDMA_MAP_UNKNOWN = 0,
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Not a PCI P2PDMA transfer.
|
||||||
|
*/
|
||||||
|
PCI_P2PDMA_MAP_NONE,
|
||||||
|
|
||||||
|
/*
|
||||||
|
* PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will
|
||||||
|
* traverse the host bridge and the host bridge is not in the
|
||||||
|
* allowlist. DMA Mapping routines should return an error when
|
||||||
|
* this is returned.
|
||||||
|
*/
|
||||||
|
PCI_P2PDMA_MAP_NOT_SUPPORTED,
|
||||||
|
|
||||||
|
/*
|
||||||
|
* PCI_P2PDMA_MAP_BUS_ADDR: Indicates that two devices can talk to
|
||||||
|
* each other directly through a PCI switch and the transaction will
|
||||||
|
* not traverse the host bridge. Such a mapping should program
|
||||||
|
* the DMA engine with PCI bus addresses.
|
||||||
|
*/
|
||||||
|
PCI_P2PDMA_MAP_BUS_ADDR,
|
||||||
|
|
||||||
|
/*
|
||||||
|
* PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk
|
||||||
|
* to each other, but the transaction traverses a host bridge on the
|
||||||
|
* allowlist. In this case, a normal mapping either with CPU physical
|
||||||
|
* addresses (in the case of dma-direct) or IOVA addresses (in the
|
||||||
|
* case of IOMMUs) should be used to program the DMA engine.
|
||||||
|
*/
|
||||||
|
PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
|
||||||
|
};
|
||||||
|
|
||||||
#ifdef CONFIG_PCI_P2PDMA
|
#ifdef CONFIG_PCI_P2PDMA
|
||||||
|
int pcim_p2pdma_init(struct pci_dev *pdev);
|
||||||
|
struct p2pdma_provider *pcim_p2pdma_provider(struct pci_dev *pdev, int bar);
|
||||||
int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
|
int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
|
||||||
u64 offset);
|
u64 offset);
|
||||||
int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
|
int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
|
||||||
|
|
@ -33,7 +84,18 @@ int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
|
||||||
bool *use_p2pdma);
|
bool *use_p2pdma);
|
||||||
ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
|
ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
|
||||||
bool use_p2pdma);
|
bool use_p2pdma);
|
||||||
|
enum pci_p2pdma_map_type pci_p2pdma_map_type(struct p2pdma_provider *provider,
|
||||||
|
struct device *dev);
|
||||||
#else /* CONFIG_PCI_P2PDMA */
|
#else /* CONFIG_PCI_P2PDMA */
|
||||||
|
static inline int pcim_p2pdma_init(struct pci_dev *pdev)
|
||||||
|
{
|
||||||
|
return -EOPNOTSUPP;
|
||||||
|
}
|
||||||
|
static inline struct p2pdma_provider *pcim_p2pdma_provider(struct pci_dev *pdev,
|
||||||
|
int bar)
|
||||||
|
{
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
|
static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
|
||||||
size_t size, u64 offset)
|
size_t size, u64 offset)
|
||||||
{
|
{
|
||||||
|
|
@ -85,6 +147,11 @@ static inline ssize_t pci_p2pdma_enable_show(char *page,
|
||||||
{
|
{
|
||||||
return sprintf(page, "none\n");
|
return sprintf(page, "none\n");
|
||||||
}
|
}
|
||||||
|
static inline enum pci_p2pdma_map_type
|
||||||
|
pci_p2pdma_map_type(struct p2pdma_provider *provider, struct device *dev)
|
||||||
|
{
|
||||||
|
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
|
||||||
|
}
|
||||||
#endif /* CONFIG_PCI_P2PDMA */
|
#endif /* CONFIG_PCI_P2PDMA */
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -99,51 +166,12 @@ static inline struct pci_dev *pci_p2pmem_find(struct device *client)
|
||||||
return pci_p2pmem_find_many(&client, 1);
|
return pci_p2pmem_find_many(&client, 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
enum pci_p2pdma_map_type {
|
|
||||||
/*
|
|
||||||
* PCI_P2PDMA_MAP_UNKNOWN: Used internally as an initial state before
|
|
||||||
* the mapping type has been calculated. Exported routines for the API
|
|
||||||
* will never return this value.
|
|
||||||
*/
|
|
||||||
PCI_P2PDMA_MAP_UNKNOWN = 0,
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Not a PCI P2PDMA transfer.
|
|
||||||
*/
|
|
||||||
PCI_P2PDMA_MAP_NONE,
|
|
||||||
|
|
||||||
/*
|
|
||||||
* PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will
|
|
||||||
* traverse the host bridge and the host bridge is not in the
|
|
||||||
* allowlist. DMA Mapping routines should return an error when
|
|
||||||
* this is returned.
|
|
||||||
*/
|
|
||||||
PCI_P2PDMA_MAP_NOT_SUPPORTED,
|
|
||||||
|
|
||||||
/*
|
|
||||||
* PCI_P2PDMA_MAP_BUS_ADDR: Indicates that two devices can talk to
|
|
||||||
* each other directly through a PCI switch and the transaction will
|
|
||||||
* not traverse the host bridge. Such a mapping should program
|
|
||||||
* the DMA engine with PCI bus addresses.
|
|
||||||
*/
|
|
||||||
PCI_P2PDMA_MAP_BUS_ADDR,
|
|
||||||
|
|
||||||
/*
|
|
||||||
* PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk
|
|
||||||
* to each other, but the transaction traverses a host bridge on the
|
|
||||||
* allowlist. In this case, a normal mapping either with CPU physical
|
|
||||||
* addresses (in the case of dma-direct) or IOVA addresses (in the
|
|
||||||
* case of IOMMUs) should be used to program the DMA engine.
|
|
||||||
*/
|
|
||||||
PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
|
|
||||||
};
|
|
||||||
|
|
||||||
struct pci_p2pdma_map_state {
|
struct pci_p2pdma_map_state {
|
||||||
struct dev_pagemap *pgmap;
|
struct p2pdma_provider *mem;
|
||||||
enum pci_p2pdma_map_type map;
|
enum pci_p2pdma_map_type map;
|
||||||
u64 bus_off;
|
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
/* helper for pci_p2pdma_state(), do not use directly */
|
/* helper for pci_p2pdma_state(), do not use directly */
|
||||||
void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state,
|
void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state,
|
||||||
struct device *dev, struct page *page);
|
struct device *dev, struct page *page);
|
||||||
|
|
@ -162,7 +190,6 @@ pci_p2pdma_state(struct pci_p2pdma_map_state *state, struct device *dev,
|
||||||
struct page *page)
|
struct page *page)
|
||||||
{
|
{
|
||||||
if (IS_ENABLED(CONFIG_PCI_P2PDMA) && is_pci_p2pdma_page(page)) {
|
if (IS_ENABLED(CONFIG_PCI_P2PDMA) && is_pci_p2pdma_page(page)) {
|
||||||
if (state->pgmap != page_pgmap(page))
|
|
||||||
__pci_p2pdma_update_state(state, dev, page);
|
__pci_p2pdma_update_state(state, dev, page);
|
||||||
return state->map;
|
return state->map;
|
||||||
}
|
}
|
||||||
|
|
@ -172,16 +199,15 @@ pci_p2pdma_state(struct pci_p2pdma_map_state *state, struct device *dev,
|
||||||
/**
|
/**
|
||||||
* pci_p2pdma_bus_addr_map - Translate a physical address to a bus address
|
* pci_p2pdma_bus_addr_map - Translate a physical address to a bus address
|
||||||
* for a PCI_P2PDMA_MAP_BUS_ADDR transfer.
|
* for a PCI_P2PDMA_MAP_BUS_ADDR transfer.
|
||||||
* @state: P2P state structure
|
* @provider: P2P provider structure
|
||||||
* @paddr: physical address to map
|
* @paddr: physical address to map
|
||||||
*
|
*
|
||||||
* Map a physically contiguous PCI_P2PDMA_MAP_BUS_ADDR transfer.
|
* Map a physically contiguous PCI_P2PDMA_MAP_BUS_ADDR transfer.
|
||||||
*/
|
*/
|
||||||
static inline dma_addr_t
|
static inline dma_addr_t
|
||||||
pci_p2pdma_bus_addr_map(struct pci_p2pdma_map_state *state, phys_addr_t paddr)
|
pci_p2pdma_bus_addr_map(struct p2pdma_provider *provider, phys_addr_t paddr)
|
||||||
{
|
{
|
||||||
WARN_ON_ONCE(state->map != PCI_P2PDMA_MAP_BUS_ADDR);
|
return paddr + provider->bus_offset;
|
||||||
return paddr + state->bus_off;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#endif /* _LINUX_PCI_P2P_H */
|
#endif /* _LINUX_PCI_P2P_H */
|
||||||
|
|
|
||||||
|
|
@ -301,6 +301,8 @@ static inline void vfio_put_device(struct vfio_device *device)
|
||||||
int vfio_register_group_dev(struct vfio_device *device);
|
int vfio_register_group_dev(struct vfio_device *device);
|
||||||
int vfio_register_emulated_iommu_dev(struct vfio_device *device);
|
int vfio_register_emulated_iommu_dev(struct vfio_device *device);
|
||||||
void vfio_unregister_group_dev(struct vfio_device *device);
|
void vfio_unregister_group_dev(struct vfio_device *device);
|
||||||
|
bool vfio_device_try_get_registration(struct vfio_device *device);
|
||||||
|
void vfio_device_put_registration(struct vfio_device *device);
|
||||||
|
|
||||||
int vfio_assign_device_set(struct vfio_device *device, void *set_id);
|
int vfio_assign_device_set(struct vfio_device *device, void *set_id);
|
||||||
unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
|
unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
|
||||||
|
|
|
||||||
|
|
@ -26,6 +26,8 @@
|
||||||
|
|
||||||
struct vfio_pci_core_device;
|
struct vfio_pci_core_device;
|
||||||
struct vfio_pci_region;
|
struct vfio_pci_region;
|
||||||
|
struct p2pdma_provider;
|
||||||
|
struct dma_buf_phys_vec;
|
||||||
|
|
||||||
struct vfio_pci_regops {
|
struct vfio_pci_regops {
|
||||||
ssize_t (*rw)(struct vfio_pci_core_device *vdev, char __user *buf,
|
ssize_t (*rw)(struct vfio_pci_core_device *vdev, char __user *buf,
|
||||||
|
|
@ -49,9 +51,48 @@ struct vfio_pci_region {
|
||||||
u32 flags;
|
u32 flags;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
struct vfio_pci_device_ops {
|
||||||
|
int (*get_dmabuf_phys)(struct vfio_pci_core_device *vdev,
|
||||||
|
struct p2pdma_provider **provider,
|
||||||
|
unsigned int region_index,
|
||||||
|
struct dma_buf_phys_vec *phys_vec,
|
||||||
|
struct vfio_region_dma_range *dma_ranges,
|
||||||
|
size_t nr_ranges);
|
||||||
|
};
|
||||||
|
|
||||||
|
#if IS_ENABLED(CONFIG_VFIO_PCI_DMABUF)
|
||||||
|
int vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec,
|
||||||
|
struct vfio_region_dma_range *dma_ranges,
|
||||||
|
size_t nr_ranges, phys_addr_t start,
|
||||||
|
phys_addr_t len);
|
||||||
|
int vfio_pci_core_get_dmabuf_phys(struct vfio_pci_core_device *vdev,
|
||||||
|
struct p2pdma_provider **provider,
|
||||||
|
unsigned int region_index,
|
||||||
|
struct dma_buf_phys_vec *phys_vec,
|
||||||
|
struct vfio_region_dma_range *dma_ranges,
|
||||||
|
size_t nr_ranges);
|
||||||
|
#else
|
||||||
|
static inline int
|
||||||
|
vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec,
|
||||||
|
struct vfio_region_dma_range *dma_ranges,
|
||||||
|
size_t nr_ranges, phys_addr_t start,
|
||||||
|
phys_addr_t len)
|
||||||
|
{
|
||||||
|
return -EINVAL;
|
||||||
|
}
|
||||||
|
static inline int vfio_pci_core_get_dmabuf_phys(
|
||||||
|
struct vfio_pci_core_device *vdev, struct p2pdma_provider **provider,
|
||||||
|
unsigned int region_index, struct dma_buf_phys_vec *phys_vec,
|
||||||
|
struct vfio_region_dma_range *dma_ranges, size_t nr_ranges)
|
||||||
|
{
|
||||||
|
return -EOPNOTSUPP;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
struct vfio_pci_core_device {
|
struct vfio_pci_core_device {
|
||||||
struct vfio_device vdev;
|
struct vfio_device vdev;
|
||||||
struct pci_dev *pdev;
|
struct pci_dev *pdev;
|
||||||
|
const struct vfio_pci_device_ops *pci_ops;
|
||||||
void __iomem *barmap[PCI_STD_NUM_BARS];
|
void __iomem *barmap[PCI_STD_NUM_BARS];
|
||||||
bool bar_mmap_supported[PCI_STD_NUM_BARS];
|
bool bar_mmap_supported[PCI_STD_NUM_BARS];
|
||||||
u8 *pci_config_map;
|
u8 *pci_config_map;
|
||||||
|
|
@ -94,6 +135,7 @@ struct vfio_pci_core_device {
|
||||||
struct vfio_pci_core_device *sriov_pf_core_dev;
|
struct vfio_pci_core_device *sriov_pf_core_dev;
|
||||||
struct notifier_block nb;
|
struct notifier_block nb;
|
||||||
struct rw_semaphore memory_lock;
|
struct rw_semaphore memory_lock;
|
||||||
|
struct list_head dmabufs;
|
||||||
};
|
};
|
||||||
|
|
||||||
/* Will be exported for vfio pci drivers usage */
|
/* Will be exported for vfio pci drivers usage */
|
||||||
|
|
|
||||||
|
|
@ -14,6 +14,7 @@
|
||||||
|
|
||||||
#include <linux/types.h>
|
#include <linux/types.h>
|
||||||
#include <linux/ioctl.h>
|
#include <linux/ioctl.h>
|
||||||
|
#include <linux/stddef.h>
|
||||||
|
|
||||||
#define VFIO_API_VERSION 0
|
#define VFIO_API_VERSION 0
|
||||||
|
|
||||||
|
|
@ -1478,6 +1479,33 @@ struct vfio_device_feature_bus_master {
|
||||||
};
|
};
|
||||||
#define VFIO_DEVICE_FEATURE_BUS_MASTER 10
|
#define VFIO_DEVICE_FEATURE_BUS_MASTER 10
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
|
||||||
|
* regions selected.
|
||||||
|
*
|
||||||
|
* open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC,
|
||||||
|
* etc. offset/length specify a slice of the region to create the dmabuf from.
|
||||||
|
* nr_ranges is the total number of (P2P DMA) ranges that comprise the dmabuf.
|
||||||
|
*
|
||||||
|
* flags should be 0.
|
||||||
|
*
|
||||||
|
* Return: The fd number on success, -1 and errno is set on failure.
|
||||||
|
*/
|
||||||
|
#define VFIO_DEVICE_FEATURE_DMA_BUF 11
|
||||||
|
|
||||||
|
struct vfio_region_dma_range {
|
||||||
|
__u64 offset;
|
||||||
|
__u64 length;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct vfio_device_feature_dma_buf {
|
||||||
|
__u32 region_index;
|
||||||
|
__u32 open_flags;
|
||||||
|
__u32 flags;
|
||||||
|
__u32 nr_ranges;
|
||||||
|
struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
|
||||||
|
};
|
||||||
|
|
||||||
/* -------- API for Type1 VFIO IOMMU -------- */
|
/* -------- API for Type1 VFIO IOMMU -------- */
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|
|
||||||
|
|
@ -479,8 +479,8 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
case PCI_P2PDMA_MAP_BUS_ADDR:
|
case PCI_P2PDMA_MAP_BUS_ADDR:
|
||||||
sg->dma_address = pci_p2pdma_bus_addr_map(&p2pdma_state,
|
sg->dma_address = pci_p2pdma_bus_addr_map(
|
||||||
sg_phys(sg));
|
p2pdma_state.mem, sg_phys(sg));
|
||||||
sg_dma_mark_bus_address(sg);
|
sg_dma_mark_bus_address(sg);
|
||||||
continue;
|
continue;
|
||||||
default:
|
default:
|
||||||
|
|
|
||||||
2
mm/hmm.c
2
mm/hmm.c
|
|
@ -811,7 +811,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
|
||||||
break;
|
break;
|
||||||
case PCI_P2PDMA_MAP_BUS_ADDR:
|
case PCI_P2PDMA_MAP_BUS_ADDR:
|
||||||
pfns[idx] |= HMM_PFN_P2PDMA_BUS | HMM_PFN_DMA_MAPPED;
|
pfns[idx] |= HMM_PFN_P2PDMA_BUS | HMM_PFN_DMA_MAPPED;
|
||||||
return pci_p2pdma_bus_addr_map(p2pdma_state, paddr);
|
return pci_p2pdma_bus_addr_map(p2pdma_state->mem, paddr);
|
||||||
default:
|
default:
|
||||||
return DMA_MAPPING_ERROR;
|
return DMA_MAPPING_ERROR;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue