linux/drivers
Ming Lei 396eaf21ee blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback
blk_insert_cloned_request() is called in the fast path of a dm-rq driver
(e.g. blk-mq request-based DM mpath).  blk_insert_cloned_request() uses
blk_mq_request_bypass_insert() to directly append the request to the
blk-mq hctx->dispatch_list of the underlying queue.

1) This way isn't efficient enough because the hctx spinlock is always
used.

2) With blk_insert_cloned_request(), we completely bypass underlying
queue's elevator and depend on the upper-level dm-rq driver's elevator
to schedule IO.  But dm-rq currently can't get the underlying queue's
dispatch feedback at all.  Without knowing whether a request was issued
or not (e.g. due to underlying queue being busy) the dm-rq elevator will
not be able to provide effective IO merging (as a side-effect of dm-rq
currently blindly destaging a request from its elevator only to requeue
it after a delay, which kills any opportunity for merging).  This
obviously causes very bad sequential IO performance.

Fix this by updating blk_insert_cloned_request() to use
blk_mq_request_direct_issue().  blk_mq_request_direct_issue() allows a
request to be issued directly to the underlying queue and returns the
dispatch feedback (blk_status_t).  If blk_mq_request_direct_issue()
returns BLK_SYS_RESOURCE the dm-rq driver will now use DM_MAPIO_REQUEUE
to _not_ destage the request.  Whereby preserving the opportunity to
merge IO.

With this, request-based DM's blk-mq sequential IO performance is vastly
improved (as much as 3X in mpath/virtio-scsi testing).

Signed-off-by: Ming Lei <ming.lei@redhat.com>
[blk-mq.c changes heavily influenced by Ming Lei's initial solution, but
they were refactored to make them less fragile and easier to read/review]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17 09:46:54 -07:00
..
accessibility
acpi PM / sleep: Avoid excess pm_runtime_enable() calls in device_resume() 2017-12-11 14:32:56 +01:00
amba
android
ata libata: sata_down_spd_limit should return if driver has not recorded sstatus speed 2017-12-04 13:57:03 -08:00
atm
auxdisplay
base PM / sleep: Avoid excess pm_runtime_enable() calls in device_resume() 2017-12-11 14:32:56 +01:00
bcma
block aoe: use ktime_t instead of timeval 2018-01-17 08:41:07 -07:00
bluetooth
bus bus: arm-ccn: fix module unloading Error: Removing state 147 which has instances left. 2017-12-04 17:15:20 +00:00
cdrom
char The big changes for IPMI that just went in had a few problems. These 2017-12-11 17:01:59 -08:00
clk
clocksource
connector
cpufreq
cpuidle
crypto
dax
dca
devfreq
dio
dma dmaengine: fsl-edma: disable clks on all error paths 2017-12-15 09:53:04 +05:30
dma-buf
edac
eisa
extcon
firewire
firmware ARM: SoC fixes for 4.15-rc 2017-12-10 08:26:59 -08:00
fmc
fpga
fsi
gpio gpio: pca953x: fix vendor prefix for PCA9654 2017-12-02 22:41:43 +01:00
gpu Merge branch 'akpm' (patches from Andrew) 2017-12-14 16:35:20 -08:00
hid
hsi
hv
hwmon hwmon: (jc42) optionally try to disable the SMBUS timeout 2017-11-30 13:12:44 -08:00
hwspinlock
hwtracing tracing: Pass export pointer as argument to ->write() 2017-12-04 07:14:30 -05:00
i2c i2c: piix4: Fix port number check on release 2017-12-12 23:27:04 +01:00
ide
idle
iio iio: health: max30102: Temperature should be in milli Celsius 2017-12-02 11:15:14 +00:00
infiniband Second pull request for 4.15-rc 2017-12-16 13:43:08 -08:00
input
iommu IOMMU fixes for Linux v4.15-rc3 2017-12-06 10:53:02 -08:00
ipack
irqchip
isdn
leds
lightnvm lightnvm: pblk: refactor pblk_ppa_comp function 2018-01-05 08:50:12 -07:00
macintosh
mailbox
mcb
md blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback 2018-01-17 09:46:54 -07:00
media media fixes for v4.15-rc3 2017-12-08 13:18:47 -08:00
memory
memstick
message
mfd
misc Merge branch 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-12-17 13:54:31 -08:00
mmc mmc: core: apply NO_CMD23 quirk to some specific cards 2017-12-11 13:43:27 +01:00
mtd
mux
net net: qcom/emac: Reduce timeout for mdio read/write 2017-12-15 15:46:19 -05:00
nfc
ntb
nubus
nvdimm
nvme nvme/multipath: Use blk_path_error 2018-01-10 10:52:18 -07:00
nvmem
of of_mdio / mdiobus: ensure mdio devices have fwnode correctly populated 2017-12-13 15:01:47 -05:00
opp
oprofile
parisc
parport
pci Power management fix for v4.15-rc4 2017-12-14 18:25:03 -08:00
pcmcia
perf
phy
pinctrl pinctrl: sunxi: Disable strict mode for H5 driver 2017-11-30 16:50:43 +01:00
platform platform/x86: dell-wmi: check for kmalloc() errors 2017-12-11 17:26:03 -08:00
pnp
power
powercap
pps
ps3
ptp
pwm
rapidio
ras
regulator
remoteproc
reset
rpmsg
rtc
s390 s390/qeth: update takeover IPs after configuration change 2017-12-15 11:29:43 -05:00
sbus
scsi SCSI fixes on 20171215 2017-12-15 12:51:42 -08:00
sfi
sh
sn
soc meson-gx-socinfo: Fix package id parsing 2017-11-30 15:29:44 -08:00
spi
spmi
ssb
staging Merge tag 'staging-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging 2017-12-15 12:59:48 -08:00
target target: Use sgl_alloc_order() and sgl_free() 2018-01-06 09:18:00 -07:00
tc
tee
thermal
thunderbolt
tty
uio
usb USB: core: prevent malicious bNumInterfaces overflow 2017-12-13 12:28:43 +01:00
uwb
vfio
vhost vhost: fix skb leak in handle_rx() 2017-12-02 21:31:03 -05:00
video
virt
virtio virtio_mmio: fix devm cleanup 2017-12-14 21:01:40 +02:00
vlynq
vme
w1
watchdog
xen xen: fixes for 4.15-rc4 2017-12-15 11:32:09 -08:00
zorro
Kconfig
Makefile