mirror of https://github.com/torvalds/linux.git
blk_insert_cloned_request() is called in the fast path of a dm-rq driver (e.g. blk-mq request-based DM mpath). blk_insert_cloned_request() uses blk_mq_request_bypass_insert() to directly append the request to the blk-mq hctx->dispatch_list of the underlying queue. 1) This way isn't efficient enough because the hctx spinlock is always used. 2) With blk_insert_cloned_request(), we completely bypass underlying queue's elevator and depend on the upper-level dm-rq driver's elevator to schedule IO. But dm-rq currently can't get the underlying queue's dispatch feedback at all. Without knowing whether a request was issued or not (e.g. due to underlying queue being busy) the dm-rq elevator will not be able to provide effective IO merging (as a side-effect of dm-rq currently blindly destaging a request from its elevator only to requeue it after a delay, which kills any opportunity for merging). This obviously causes very bad sequential IO performance. Fix this by updating blk_insert_cloned_request() to use blk_mq_request_direct_issue(). blk_mq_request_direct_issue() allows a request to be issued directly to the underlying queue and returns the dispatch feedback (blk_status_t). If blk_mq_request_direct_issue() returns BLK_SYS_RESOURCE the dm-rq driver will now use DM_MAPIO_REQUEUE to _not_ destage the request. Whereby preserving the opportunity to merge IO. With this, request-based DM's blk-mq sequential IO performance is vastly improved (as much as 3X in mpath/virtio-scsi testing). Signed-off-by: Ming Lei <ming.lei@redhat.com> [blk-mq.c changes heavily influenced by Ming Lei's initial solution, but they were refactored to make them less fragile and easier to read/review] Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|---|---|---|
| .. | ||
| accessibility | ||
| acpi | ||
| amba | ||
| android | ||
| ata | ||
| atm | ||
| auxdisplay | ||
| base | ||
| bcma | ||
| block | ||
| bluetooth | ||
| bus | ||
| cdrom | ||
| char | ||
| clk | ||
| clocksource | ||
| connector | ||
| cpufreq | ||
| cpuidle | ||
| crypto | ||
| dax | ||
| dca | ||
| devfreq | ||
| dio | ||
| dma | ||
| dma-buf | ||
| edac | ||
| eisa | ||
| extcon | ||
| firewire | ||
| firmware | ||
| fmc | ||
| fpga | ||
| fsi | ||
| gpio | ||
| gpu | ||
| hid | ||
| hsi | ||
| hv | ||
| hwmon | ||
| hwspinlock | ||
| hwtracing | ||
| i2c | ||
| ide | ||
| idle | ||
| iio | ||
| infiniband | ||
| input | ||
| iommu | ||
| ipack | ||
| irqchip | ||
| isdn | ||
| leds | ||
| lightnvm | ||
| macintosh | ||
| mailbox | ||
| mcb | ||
| md | ||
| media | ||
| memory | ||
| memstick | ||
| message | ||
| mfd | ||
| misc | ||
| mmc | ||
| mtd | ||
| mux | ||
| net | ||
| nfc | ||
| ntb | ||
| nubus | ||
| nvdimm | ||
| nvme | ||
| nvmem | ||
| of | ||
| opp | ||
| oprofile | ||
| parisc | ||
| parport | ||
| pci | ||
| pcmcia | ||
| perf | ||
| phy | ||
| pinctrl | ||
| platform | ||
| pnp | ||
| power | ||
| powercap | ||
| pps | ||
| ps3 | ||
| ptp | ||
| pwm | ||
| rapidio | ||
| ras | ||
| regulator | ||
| remoteproc | ||
| reset | ||
| rpmsg | ||
| rtc | ||
| s390 | ||
| sbus | ||
| scsi | ||
| sfi | ||
| sh | ||
| sn | ||
| soc | ||
| spi | ||
| spmi | ||
| ssb | ||
| staging | ||
| target | ||
| tc | ||
| tee | ||
| thermal | ||
| thunderbolt | ||
| tty | ||
| uio | ||
| usb | ||
| uwb | ||
| vfio | ||
| vhost | ||
| video | ||
| virt | ||
| virtio | ||
| vlynq | ||
| vme | ||
| w1 | ||
| watchdog | ||
| xen | ||
| zorro | ||
| Kconfig | ||
| Makefile | ||