linux/Documentation
Jens Axboe f35546e072 Merge branch 'stable/for-jens-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.11/drivers
Konrad writes:

It has the 'feature-max-indirect-segments' implemented in both backend
and frontend. The current problem with the backend and frontend is that the
segment size is limited to 11 pages. It means we can at most squeeze in 44kB per
request. The ring can hold 32 (next power of two below 36) requests, meaning we
can do 1.4M of outstanding requests. Nowadays that is not enough.

The problem in the past was addressed in two ways - but neither one went upstream.
The first solution to this proposed by Justin from Spectralogic was to negotiate
the segment size.  This means that the ‘struct blkif_sring_entry’ is now a variable size.
It can expand from 112 bytes (cover 11 pages of data - 44kB) to 1580 bytes
(256 pages of data - so 1MB). It is a simple extension by just making the array in the
request expand from 11 to a variable size negotiated. But it had limits: this extension
still limits the number of segments per request to 255 (as the total number must be
specified in the request, which only has an 8-bit field for that purpose).

The other solution (from Intel - Ronghui) was to create one extra ring that only has the
‘struct blkif_request_segment’ in them. The ‘struct blkif_request’ would be changed to have
an index in said ‘segment ring’. There is only one segment ring. This means that the size of
the initial ring is still the same. The requests would point to the segment and enumerate out
how many of the indexes it wants to use. The limit is of course the size of the segment.
If one assumes a one-page segment this means we can in one request cover ~4MB.

Those patches were posted as RFC and the author never followed up on the ideas on changing
it to be a bit more flexible.

There is yet another mechanism that could be employed  (which these patches implement) - and it
borrows from VirtIO protocol. And that is the ‘indirect descriptors’. This very similar to
what Intel suggests, but with a twist. The twist is to negotiate how many of these
'segment' pages (aka indirect descriptor pages) we want to support (in reality we negotiate
how many entries in the segment we want to cover, and we module the number if it is
bigger than the segment size).

This means that with the existing 36 slots in the ring (single page) we can cover:
32 slots * each blkif_request_indirect covers: 512 * 4096 ~= 64M. Since we ample space
in the blkif_request_indirect to span more than one indirect page, that number (64M)
can be also multiplied by eight = 512MB.

Roger Pau Monne took the idea and implemented them in these patches. They work
great and the corner cases (migration between backends with and without this extension)
work nicely. The backend has a limit right now off how many indirect entries
it can handle: one indirect page, and at maximum 256 entries (out of 512 - so  50% of the page
is used). That comes out to 32 slots * 256 entries in a indirect page * 1 indirect page
per request * 4096 = 32MB.

This is a conservative number that can change in the future. Right now it strikes
a good balance between giving excellent performance, memory usage in the backend, and
balancing the needs of many guests.

In the patchset there is also the split of the blkback structure to be per-VBD.
This means that the spinlock contention we had with many guests trying to do I/O and
all the blkback threads hitting the same lock has been eliminated.

Also there are bug-fixes to deal with oddly sized sectors, insane amounts on
th ring, and also a security fix (posted earlier).
2013-06-28 16:01:14 +02:00
..
ABI Merge branch 'stable/for-jens-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.11/drivers 2013-06-28 16:01:14 +02:00
DocBook
EDID
PCI
RCU
accounting
acpi Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma 2013-05-09 09:46:45 -07:00
aoe
arm
arm64
auxdisplay
backlight
blackfin
block
blockdev
bus-devices
cdrom
cgroups blk-throttle: implement proper hierarchy support 2013-05-14 13:52:38 -07:00
connector
console
cpu-freq Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-05-05 13:23:27 -07:00
cpuidle
cris
crypto
development-process
device-mapper
devicetree Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2013-05-10 07:48:05 -07:00
driver-model
dvb
early-userspace
extcon
fault-injection
fb
filesystems Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2013-05-09 13:07:40 -07:00
firmware_class
frv
hid
hwmon
i2c
i2o
ia64
ide
infiniband
input
ioctl
isdn
ja_JP
kbuild Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2013-05-07 07:59:19 -07:00
kdump
ko_KR
laptops
leds
m68k
make
memory-devices
metag
mips
misc-devices
mmc
mn10300
mtd
namespaces
netlabel
networking
nfc
parisc
pcmcia
power
powerpc
pps
prctl
pti
ptp
rapidio
s390
scheduler
scsi
security
serial
sh
sound
spi
sysctl
target
thermal
timers
trace
usb
vDSO
video4linux
virtual Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm 2013-05-05 14:47:31 -07:00
vm
w1
watchdog
wimax
x86
xtensa xtensa: document MMUv3 setup sequence 2013-05-09 01:07:09 -07:00
zh_CN
.gitignore
00-INDEX
BUG-HUNTING
Changes
CodingStyle
DMA-API-HOWTO.txt
DMA-API.txt
DMA-ISA-LPC.txt
DMA-attributes.txt
HOWTO
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
Intel-IOMMU.txt
Makefile
ManagementStyle
SAK.txt
SM501.txt
SecurityBugs
SubmitChecklist
SubmittingDrivers
SubmittingPatches
VGA-softcursor.txt
applying-patches.txt
atomic_ops.txt
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
cachetlb.txt
circular-buffers.txt
clk.txt
coccinelle.txt
cpu-hotplug.txt
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
devices.txt
digsig.txt
dma-buf-sharing.txt
dmaengine.txt
dmatest.txt
dontdiff
dynamic-debug-howto.txt
edac.txt
eisa.txt
email-clients.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcov.txt
gpio.txt
highuid.txt
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
intel_txt.txt
io-mapping.txt
io_ordering.txt
iostats.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kernel-doc-nano-HOWTO.txt
kernel-docs.txt
kernel-parameters.txt IOMMU Updates for Linux v3.10 2013-05-06 14:59:13 -07:00
kmemcheck.txt
kmemleak.txt
kobject.txt
kprobes.txt
kref.txt
ldm.txt
local_ops.txt
lockdep-design.txt
lockstat.txt
lockup-watchdogs.txt
logo.gif
logo.txt
magic-number.txt
md.txt
media-framework.txt
memory-barriers.txt
memory-hotplug.txt
mono.txt
mutex-design.txt
nommu-mmap.txt
numastat.txt
oops-tracing.txt
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
pi-futex.txt
pinctrl.txt
pnp.txt
preempt-locking.txt
printk-formats.txt
pwm.txt
ramoops.txt
rbtree.txt
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rt-mutex-design.txt
rt-mutex.txt
rtc.txt
serial-console.txt
sgi-ioc4.txt
sgi-visws.txt
smsc_ece1099.txt
sparse.txt
spinlocks.txt
stable_api_nonsense.txt
stable_kernel_rules.txt
static-keys.txt
svga.txt
sysfs-rules.txt
sysrq.txt
this_cpu_ops.txt
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt
vgaarbiter.txt
video-output.txt
vme_api.txt
volatile-considered-harmful.txt
workqueue.txt
xz.txt
zorro.txt