mirror of https://github.com/torvalds/linux.git
Pinning one 4K page at a time is inefficient, so do it in batches of 512
instead. This is just an optimization with no functional change
intended, and in particular the driver still calls iommu_map() with the
largest physically contiguous range possible.
Add two fields in vfio_batch to remember where to start between calls to
vfio_pin_pages_remote(), and use vfio_batch_unpin() to handle remaining
pages in the batch in case of error.
qemu pins pages for guests around 8% faster on my test system, a
two-node Broadwell server with 128G memory per node. The qemu process
was bound to one node with its allocations constrained there as well.
base test
guest ---------------- ----------------
mem (GB) speedup avg sec (std) avg sec (std)
1 7.4% 0.61 (0.00) 0.56 (0.00)
2 8.3% 0.93 (0.00) 0.85 (0.00)
4 8.4% 1.46 (0.00) 1.34 (0.00)
8 8.6% 2.54 (0.01) 2.32 (0.00)
16 8.3% 4.66 (0.00) 4.27 (0.01)
32 8.3% 8.94 (0.01) 8.20 (0.01)
64 8.2% 17.47 (0.01) 16.04 (0.03)
120 8.5% 32.45 (0.13) 29.69 (0.01)
perf diff confirms less time spent in pup. Here are the top ten
functions:
Baseline Delta Abs Symbol
78.63% +6.64% clear_page_erms
1.50% -1.50% __gup_longterm_locked
1.27% -0.78% __get_user_pages
+0.76% kvm_zap_rmapp.constprop.0
0.54% -0.53% vmacache_find
0.55% -0.51% get_pfnblock_flags_mask
0.48% -0.48% __get_user_pages_remote
+0.39% slot_rmap_walk_next
+0.32% vfio_pin_map_dma
+0.26% kvm_handle_hva_range
...
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
||
|---|---|---|
| .. | ||
| accessibility | ||
| acpi | ||
| amba | ||
| android | ||
| ata | ||
| atm | ||
| auxdisplay | ||
| base | ||
| bcma | ||
| block | ||
| bluetooth | ||
| bus | ||
| cdrom | ||
| char | ||
| clk | ||
| clocksource | ||
| connector | ||
| counter | ||
| cpufreq | ||
| cpuidle | ||
| crypto | ||
| dax | ||
| dca | ||
| devfreq | ||
| dio | ||
| dma | ||
| dma-buf | ||
| edac | ||
| eisa | ||
| extcon | ||
| firewire | ||
| firmware | ||
| fpga | ||
| fsi | ||
| gnss | ||
| gpio | ||
| gpu | ||
| greybus | ||
| hid | ||
| hsi | ||
| hv | ||
| hwmon | ||
| hwspinlock | ||
| hwtracing | ||
| i2c | ||
| i3c | ||
| ide | ||
| idle | ||
| iio | ||
| infiniband | ||
| input | ||
| interconnect | ||
| iommu | ||
| ipack | ||
| irqchip | ||
| isdn | ||
| leds | ||
| lightnvm | ||
| macintosh | ||
| mailbox | ||
| mcb | ||
| md | ||
| media | ||
| memory | ||
| memstick | ||
| message | ||
| mfd | ||
| misc | ||
| mmc | ||
| most | ||
| mtd | ||
| mux | ||
| net | ||
| nfc | ||
| ntb | ||
| nubus | ||
| nvdimm | ||
| nvme | ||
| nvmem | ||
| of | ||
| opp | ||
| parisc | ||
| parport | ||
| pci | ||
| pcmcia | ||
| perf | ||
| phy | ||
| pinctrl | ||
| platform | ||
| pnp | ||
| power | ||
| powercap | ||
| pps | ||
| ps3 | ||
| ptp | ||
| pwm | ||
| rapidio | ||
| ras | ||
| regulator | ||
| remoteproc | ||
| reset | ||
| rpmsg | ||
| rtc | ||
| s390 | ||
| sbus | ||
| scsi | ||
| sfi | ||
| sh | ||
| siox | ||
| slimbus | ||
| soc | ||
| soundwire | ||
| spi | ||
| spmi | ||
| ssb | ||
| staging | ||
| target | ||
| tc | ||
| tee | ||
| thermal | ||
| thunderbolt | ||
| tty | ||
| uio | ||
| usb | ||
| vdpa | ||
| vfio | ||
| vhost | ||
| video | ||
| virt | ||
| virtio | ||
| visorbus | ||
| vlynq | ||
| vme | ||
| w1 | ||
| watchdog | ||
| xen | ||
| zorro | ||
| Kconfig | ||
| Makefile | ||