mirror of https://github.com/torvalds/linux.git
Fix possible cpuset_sem ABBA deadlock if 'notify_on_release' set.
For a particular usage pattern, creating and destroying cpusets fairly
frequently using notify_on_release, on a very large system, this deadlock
can be seen every few days. If you are not using the cpuset
notify_on_release feature, you will never see this deadlock.
The existing code, on task exit (or cpuset deletion) did:
get cpuset_sem
if cpuset marked notify_on_release and is ready to release:
compute cpuset path relative to /dev/cpuset mount point
call_usermodehelper() forks /sbin/cpuset_release_agent with path
drop cpuset_sem
Unfortunately, the fork in call_usermodehelper can allocate memory, and
allocating memory can require cpuset_sem, if the mems_generation values
changed in the interim. This results in an ABBA deadlock, trying to obtain
cpuset_sem when it is already held by the current task.
To fix this, I put the cpuset path (which must be computed while holding
cpuset_sem) in a temporary buffer, to be used in the call_usermodehelper
call of /sbin/cpuset_release_agent only _after_ dropping cpuset_sem.
So the new logic is:
get cpuset_sem
if cpuset marked notify_on_release and is ready to release:
compute cpuset path relative to /dev/cpuset mount point
stash path in kmalloc'd buffer
drop cpuset_sem
call_usermodehelper() forks /sbin/cpuset_release_agent with path
free path
The sharp eyed reader might notice that this patch does not contain any
calls to kmalloc. The existing code in the check_for_release() routine was
already kmalloc'ing a buffer to hold the cpuset path. In the old code, it
just held the buffer for a few lines, over the cpuset_release_agent() call
that in turn invoked call_usermodehelper(). In the new code, with the
application of this patch, it returns that buffer via the new char
**ppathbuf parameter, for later use and freeing in cpuset_release_agent(),
which is called after cpuset_sem is dropped. Whereas the old code has just
one call to cpuset_release_agent(), right in the check_for_release()
routine, the new code has three calls to cpuset_release_agent(), from the
various places that a cpuset can be released.
This patch has been build and booted on SN2, and passed a stress test that
previously hit the deadlock within a few seconds.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
||
|---|---|---|
| .. | ||
| irq | ||
| power | ||
| Kconfig.hz | ||
| Kconfig.preempt | ||
| Makefile | ||
| acct.c | ||
| audit.c | ||
| auditsc.c | ||
| capability.c | ||
| compat.c | ||
| configs.c | ||
| cpu.c | ||
| cpuset.c | ||
| crash_dump.c | ||
| dma.c | ||
| exec_domain.c | ||
| exit.c | ||
| extable.c | ||
| fork.c | ||
| futex.c | ||
| intermodule.c | ||
| itimer.c | ||
| kallsyms.c | ||
| kexec.c | ||
| kfifo.c | ||
| kmod.c | ||
| kprobes.c | ||
| ksysfs.c | ||
| kthread.c | ||
| module.c | ||
| panic.c | ||
| params.c | ||
| pid.c | ||
| posix-cpu-timers.c | ||
| posix-timers.c | ||
| printk.c | ||
| profile.c | ||
| ptrace.c | ||
| rcupdate.c | ||
| resource.c | ||
| sched.c | ||
| seccomp.c | ||
| signal.c | ||
| softirq.c | ||
| spinlock.c | ||
| stop_machine.c | ||
| sys.c | ||
| sys_ni.c | ||
| sysctl.c | ||
| time.c | ||
| timer.c | ||
| uid16.c | ||
| user.c | ||
| wait.c | ||
| workqueue.c | ||