linux/kernel
Davidlohr Bueso 4fc828e24c locking/rwsem: Support optimistic spinning
We have reached the point where our mutexes are quite fine tuned
for a number of situations. This includes the use of heuristics
and optimistic spinning, based on MCS locking techniques.

Exclusive ownership of read-write semaphores are, conceptually,
just about the same as mutexes, making them close cousins. To
this end we need to make them both perform similarly, and
right now, rwsems are simply not up to it. This was discovered
by both reverting commit 4fc3f1d6 (mm/rmap, migration: Make
rmap_walk_anon() and try_to_unmap_anon() more scalable) and
similarly, converting some other mutexes (ie: i_mmap_mutex) to
rwsems. This creates a situation where users have to choose
between a rwsem and mutex taking into account this important
performance difference. Specifically, biggest difference between
both locks is when we fail to acquire a mutex in the fastpath,
optimistic spinning comes in to play and we can avoid a large
amount of unnecessary sleeping and overhead of moving tasks in
and out of wait queue. Rwsems do not have such logic.

This patch, based on the work from Tim Chen and I, adds support
for write-side optimistic spinning when the lock is contended.
It also includes support for the recently added cancelable MCS
locking for adaptive spinning. Note that is is only applicable
to the xadd method, and the spinlock rwsem variant remains intact.

Allowing optimistic spinning before putting the writer on the wait
queue reduces wait queue contention and provided greater chance
for the rwsem to get acquired. With these changes, rwsem is on par
with mutex. The performance benefits can be seen on a number of
workloads. For instance, on a 8 socket, 80 core 64bit Westmere box,
aim7 shows the following improvements in throughput:

 +--------------+---------------------+-----------------+
 |   Workload   | throughput-increase | number of users |
 +--------------+---------------------+-----------------+
 | alltests     | 20%                 | >1000           |
 | custom       | 27%, 60%            | 10-100, >1000   |
 | high_systime | 36%, 30%            | >100, >1000     |
 | shared       | 58%, 29%            | 10-100, >1000   |
 +--------------+---------------------+-----------------+

There was also improvement on smaller systems, such as a quad-core
x86-64 laptop running a 30Gb PostgreSQL (pgbench) workload for up
to +60% in throughput for over 50 clients. Additionally, benefits
were also noticed in exim (mail server) workloads. Furthermore, no
performance regression have been seen at all.

Based-on-work-from: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
[peterz: rej fixup due to comment patches, sched/rt.h header]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Scott J Norton" <scott.norton@hp.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Link: http://lkml.kernel.org/r/1399055055.6275.15.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 10:38:21 +02:00
..
debug arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
events Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2014-04-05 13:20:43 -07:00
gcov
irq genirq: Export symbol no_action() 2014-03-22 11:33:09 +01:00
locking locking/rwsem: Support optimistic spinning 2014-06-05 10:38:21 +02:00
power kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
printk printk: fix one circular lockdep warning about console_lock 2014-04-03 16:21:08 -07:00
rcu arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
sched arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
time kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
trace Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 14:13:25 -07:00
acct.c
async.c
audit.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
audit.h audit: Use struct net not pid_t to remember the network namespce to reply in 2014-03-20 10:10:53 -04:00
audit_tree.c
audit_watch.c
auditfilter.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
auditsc.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
backtracetest.c
bounds.c
capability.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2014-04-03 09:26:18 -07:00
cgroup.c cgroup: newly created dirs and files should be owned by the creator 2014-04-07 16:44:47 -04:00
cgroup_freezer.c cgroup: drop const from @buffer of cftype->write_string() 2014-03-19 10:23:54 -04:00
compat.c Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 12:51:41 -07:00
configs.c
context_tracking.c
cpu.c CPU hotplug: Provide lockless versions of callback registration functions 2014-03-20 13:43:40 +01:00
cpu_pm.c
cpuset.c Merge branch 'akpm' (incoming from Andrew) 2014-04-03 16:22:16 -07:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process 2014-04-07 16:36:06 -07:00
extable.c
fork.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
freezer.c
futex.c arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
futex_compat.c
groups.c kernel/groups.c: remove return value of set_groups 2014-04-03 16:21:05 -07:00
hrtimer.c timer: Remove code redundancy while calling get_nohz_timer_target() 2014-03-20 12:35:46 +01:00
hung_task.c kernel: audit/fix non-modular users of module_init in core code 2014-04-03 16:21:07 -07:00
irq_work.c
itimer.c
jump_label.c
kallsyms.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
kcmp.c
kexec.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
kmod.c arch: Mass conversion of smp_mb__*() 2014-04-18 14:20:48 +02:00
kprobes.c
ksysfs.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
kthread.c kthread: ensure locality of task_struct allocations 2014-04-03 16:20:49 -07:00
latencytop.c
module-internal.h
module.c modules: use raw_cpu_write for initialization of per cpu refcount. 2014-04-07 16:36:14 -07:00
module_signing.c
notifier.c notifier: Substitute rcu_access_pointer() for rcu_dereference_raw() 2014-02-26 06:35:13 -08:00
nsproxy.c
padata.c
panic.c kernel/panic.c: display reason at end + pr_emerg 2014-04-07 16:36:08 -07:00
params.c
pid.c
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-02 16:20:21 -07:00
posix-cpu-timers.c
posix-timers.c
profile.c CPU hotplug notifiers registration fixes for 3.15-rc1 2014-04-07 14:55:46 -07:00
ptrace.c kernel/compat: convert to COMPAT_SYSCALL_DEFINE 2014-03-06 15:35:10 +01:00
range.c
reboot.c
relay.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
res_counter.c res_counter: remove interface for locked charging and uncharging 2014-04-07 16:35:54 -07:00
resource.c kernel/resource.c: make reallocate_resource() static 2014-04-03 16:21:07 -07:00
seccomp.c seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF 2014-04-14 16:26:47 -04:00
signal.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
smp.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
smpboot.c
smpboot.h
softirq.c softirq: Add linux/irq.h to make it compile again 2014-03-19 11:28:14 +01:00
stacktrace.c
stop_machine.c stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() 2014-03-11 11:33:47 +01:00
sys.c mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE 2014-04-07 16:35:52 -07:00
sys_ni.c fs, kernel: permit disabling the uselib syscall 2014-04-03 16:21:05 -07:00
sysctl.c hung_task: check the value of "sysctl_hung_task_timeout_sec" 2014-04-07 16:36:07 -07:00
sysctl_binary.c
system_certificates.S
system_keyring.c
task_work.c
taskstats.c
test_kprobes.c
time.c
timeconst.bc
timer.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00
torture.c rcutorture: Gracefully handle NULL cleanup hooks 2014-02-23 09:04:39 -08:00
tracepoint.c This includes the final patch to clean up and fix the issue with the 2014-04-12 13:06:10 -07:00
tsacct.c
uid16.c
up.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
user-return-notifier.c
user.c kernel: audit/fix non-modular users of module_init in core code 2014-04-03 16:21:07 -07:00
user_namespace.c user namespace: fix incorrect memory barriers 2014-04-14 16:03:02 -07:00
utsname.c
utsname_sysctl.c
watchdog.c kernel/watchdog.c: touch_nmi_watchdog should only touch local cpu not every one 2014-04-03 16:20:58 -07:00
workqueue.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:00:07 -07:00
workqueue_internal.h