mirror of https://github.com/torvalds/linux.git
16098 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
fab840fc2d |
ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child)
Change ptrace_detach() to call flush_ptrace_hw_breakpoint(child). This
frees the slots for non-ptrace PERF_TYPE_BREAKPOINT users, and this
ensures that the tracee won't be killed by SIGTRAP triggered by the
active breakpoints.
Test-case:
unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
{
unsigned long dr7;
dr7 = ((len | type) & 0xf)
<< (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
if (enable)
dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
return dr7;
}
int write_dr(int pid, int dr, unsigned long val)
{
return ptrace(PTRACE_POKEUSER, pid,
offsetof (struct user, u_debugreg[dr]),
val);
}
void func(void)
{
}
int main(void)
{
int pid, stat;
unsigned long dr7;
pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
kill(getpid(), SIGHUP);
func();
return 0x13;
}
assert(pid == waitpid(-1, &stat, 0));
assert(WSTOPSIG(stat) == SIGHUP);
assert(write_dr(pid, 0, (long)func) == 0);
dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
assert(write_dr(pid, 7, dr7) == 0);
assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
assert(pid == waitpid(-1, &stat, 0));
assert(stat == 0x1300);
return 0;
}
Before this patch the child is killed after PTRACE_DETACH.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
7c8df28633 |
ptrace: revert "Prepare to fix racy accesses on task breakpoints"
This reverts commit |
|
|
|
dcb6b45254 |
panic: add cpu/pid to warn_slowpath_common in WARNING printk()s
Add the cpu/pid that called WARN() so that the stack traces can be matched up with the WARNING messages. [akpm@linux-foundation.org: remove stray quote] Signed-off-by: Alex Thorlton <athorlton@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Vikram Mulukutla <markivx@codeaurora.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
b9ce54c9f5 |
audit: Fix decimal constant description
Use proper decimal type for comparison with u32.
Compilation warning was introduced by
|
|
|
|
2f992ee85a |
kernel/auditfilter.c: fix leak in audit_add_rule() error path
If both 'tree' and 'watch' are valid we must call audit_put_tree(), just like the preceding code within audit_add_rule(). Signed-off-by: Chen Gang <gang.chen@asianux.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
6beb8a23b5 |
kernel/auditfilter.c: fixing build warning
kernel/auditfilter.c:426: warning: this decimal constant is unsigned only in ISO C90 Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
79f6530cb5 |
audit: fix mq_open and mq_unlink to add the MQ root as a hidden parent audit_names record
The old audit PATH records for mq_open looked like this: type=PATH msg=audit(1366282323.982:869): item=1 name=(null) inode=6777 dev=00:0c mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmpfs_t:s15:c0.c1023 type=PATH msg=audit(1366282323.982:869): item=0 name="test_mq" inode=26732 dev=00:0c mode=0100700 ouid=0 ogid=0 rdev=00:00 obj=staff_u:object_r:user_tmpfs_t:s15:c0.c1023 ...with the audit related changes that went into 3.7, they now look like this: type=PATH msg=audit(1366282236.776:3606): item=2 name=(null) inode=66655 dev=00:0c mode=0100700 ouid=0 ogid=0 rdev=00:00 obj=staff_u:object_r:user_tmpfs_t:s15:c0.c1023 type=PATH msg=audit(1366282236.776:3606): item=1 name=(null) inode=6926 dev=00:0c mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmpfs_t:s15:c0.c1023 type=PATH msg=audit(1366282236.776:3606): item=0 name="test_mq" Both of these look wrong to me. As Steve Grubb pointed out: "What we need is 1 PATH record that identifies the MQ. The other PATH records probably should not be there." Fix it to record the mq root as a parent, and flag it such that it should be hidden from view when the names are logged, since the root of the mq filesystem isn't terribly interesting. With this change, we get a single PATH record that looks more like this: type=PATH msg=audit(1368021604.836:484): item=0 name="test_mq" inode=16914 dev=00:0c mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmpfs_t:s0 In order to do this, a new audit_inode_parent_hidden() function is added. If we do it this way, then we avoid having the existing callers of audit_inode needing to do any sort of flag conversion if auditing is inactive. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reported-by: Jiri Jaburek <jjaburek@redhat.com> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
21884a83b2 |
Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
"The timer changes contain:
- posix timer code consolidation and fixes for odd corner cases
- sched_clock implementation moved from ARM to core code to avoid
duplication by other architectures
- alarm timer updates
- clocksource and clockevents unregistration facilities
- clocksource/events support for new hardware
- precise nanoseconds RTC readout (Xen feature)
- generic support for Xen suspend/resume oddities
- the usual lot of fixes and cleanups all over the place
The parts which touch other areas (ARM/XEN) have been coordinated with
the relevant maintainers. Though this results in an handful of
trivial to solve merge conflicts, which we preferred over nasty cross
tree merge dependencies.
The patches which have been committed in the last few days are bug
fixes plus the posix timer lot. The latter was in akpms queue and
next for quite some time; they just got forgotten and Frederic
collected them last minute."
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
hrtimer: Remove unused variable
hrtimers: Move SMP function call to thread context
clocksource: Reselect clocksource when watchdog validated high-res capability
posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
posix_timers: fix racy timer delta caching on task exit
posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
selftests: add basic posix timers selftests
posix_cpu_timers: consolidate expired timers check
posix_cpu_timers: consolidate timer list cleanups
posix_cpu_timer: consolidate expiry time type
tick: Sanitize broadcast control logic
tick: Prevent uncontrolled switch to oneshot mode
tick: Make oneshot broadcast robust vs. CPU offlining
x86: xen: Sync the CMOS RTC as well as the Xen wallclock
x86: xen: Sync the wallclock when the system time is set
timekeeping: Indicate that clock was set in the pvclock gtod notifier
timekeeping: Pass flags instead of multiple bools to timekeeping_update()
xen: Remove clock_was_set() call in the resume path
hrtimers: Support resuming with two or more CPUs online (but stopped)
timer: Fix jiffies wrap behavior of round_jiffies_common()
...
|
|
|
|
2cb7b5a38c |
irqdomain refactoring for v3.11
This is the long awaited simplification of irqdomain. It gets rid of the different types of irq domains and instead both linear and tree mappings can be supported in a single domain. Doing this removes a lot of special case code and makes irq domains simpler to understand overall. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJR1mnQAAoJEEFnBt12D9kBW7gP/jQpLqFU6PTdsCMX459Ib7fE JGLQWYjAsqVbVT3H8LZYOZl+ItIbXSQVk+wf0mislnOzxtGxNCxZeJdniFK2RWyz sGNm/eEZfp9bvm9jlrPE615QE7U7c9fQ5dqETNS4UMMFFmjgrsY4Z82zYH2/Y/jK YbxFUT6+fep5QaF7lYM7phfrs7FrvnpCoxCio6LO+GgxnMxHSsGJ8L9JcojZ5Qoa MwlSwsMaQrWJ6PNVbB2YrtlMxjHkNpQ7EV7bNF693PjACVpXf3ItpvQ1k6W/ns06 j6Nysj4nbjsTdWOVh5mY7tPfknZUDs38rLT5s+RbfHqySMIGQXp6CzieYaJ+lxV9 YsItyfZE7h3anebjwRv/9XqLNsDM4KmSIJGcnq+QxNSGxO3rXkItWtpCKaqYdWnN wI359rSZ//x2ltFzvB14mZJm2j7r9Hh7gH8M5h8W8YTqCbe3oSES3iRhJqnyPxMX of4PsQaTAicuLwberGdVkKafg9XIsHPoebBzDxssB4nxPrEKIILL4g54x1rV17TZ kBXurd79FhP4gOQxJHf5uEpdxrQ3Cm/PQpD5CG2rZJrrcV7hfP4QMMFHqqbFSIOu 2h4M4fZX2gsDh3RLboGRGn0GckSPTHiO05yeeBriLzMCVGcZ1JMqAxLUSJpH/UYl EzFy9RpwnwLoUPi8jUXE =yJRw -----END PGP SIGNATURE----- Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux Pull irqdomain refactoring from Grant Likely: "This is the long awaited simplification of irqdomain. It gets rid of the different types of irq domains and instead both linear and tree mappings can be supported in a single domain. Doing this removes a lot of special case code and makes irq domains simpler to understand overall" * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux: irq: fix checkpatch error irqdomain: Include hwirq number in /proc/interrupts irqdomain: make irq_linear_revmap() a fast path again irqdomain: remove irq_domain_generate_simple() irqdomain: Refactor irq_domain_associate_many() irqdomain: Beef up debugfs output irqdomain: Clean up aftermath of irq_domain refactoring irqdomain: Eliminate revmap type irqdomain: merge linear and tree reverse mappings. irqdomain: Add a name field irqdomain: Replace LEGACY mapping with LINEAR irqdomain: Relax failure path on setting up mappings |
|
|
|
73b0cd674c |
hrtimer: Remove unused variable
Sigh, should have noticed myself. Reported-by: fengguang.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
|
|
|
5ec2481b7b |
hrtimers: Move SMP function call to thread context
smp_call_function_* must not be called from softirq context. But clock_was_set() which calls on_each_cpu() is called from softirq context to implement a delayed clock_was_set() for the timer interrupt handler. Though that almost never gets invoked. A recent change in the resume code uses the softirq based delayed clock_was_set to support Xens resume mechanism. linux-next contains a new warning which warns if smp_call_function_* is called from softirq context which gets triggered by that Xen change. Fix this by moving the delayed clock_was_set() call to a work context. Reported-and-tested-by: Artem Savkov <artem.savkov@gmail.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com>, Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: John Stultz <john.stultz@linaro.org> Cc: xen-devel@lists.xen.org Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
|
|
|
332962f2c8 |
clocksource: Reselect clocksource when watchdog validated high-res capability
Up to commit
|
|
|
|
2b0f89317e |
Merge branch 'timers/posix-cpu-timers-for-tglx' of
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core Frederic sayed: "Most of these patches have been hanging around for several month now, in -mmotm for a significant chunk. They already missed a few releases." Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
|
|
|
80cc38b163 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina: "The usual stuff from trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) treewide: relase -> release Documentation/cgroups/memory.txt: fix stat file documentation sysctl/net.txt: delete reference to obsolete 2.4.x kernel spinlock_api_smp.h: fix preprocessor comments treewide: Fix typo in printk doc: device tree: clarify stuff in usage-model.txt. open firmware: "/aliasas" -> "/aliases" md: bcache: Fixed a typo with the word 'arithmetic' irq/generic-chip: fix a few kernel-doc entries frv: Convert use of typedef ctl_table to struct ctl_table sgi: xpc: Convert use of typedef ctl_table to struct ctl_table doc: clk: Fix incorrect wording Documentation/arm/IXP4xx fix a typo Documentation/networking/ieee802154 fix a typo Documentation/DocBook/media/v4l fix a typo Documentation/video4linux/si476x.txt fix a typo Documentation/virtual/kvm/api.txt fix a typo Documentation/early-userspace/README fix a typo Documentation/video4linux/soc-camera.txt fix a typo lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment ... |
|
|
|
fa18f7bde3 |
posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
When tsk->signal->cputimer->running is 1, signal->cputimer (i.e. per process timer account) and tsk->sum_sched_runtime (i.e. per thread timer account) increase at the same pace because update_curr() increases both accounting. However, there is one exception. When thread exiting, __exit_signal() turns over task's sum_shced_runtime to sig->sum_sched_runtime, but it doesn't stop signal->cputimer accounting. This inconsistency makes POSIX timer wake up too early. This patch fixes it. Original-patch-by: Olivier Langlois <olivier@trillion01.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> |
|
|
|
7f0ef0267e |
Merge branch 'akpm' (updates from Andrew Morton)
Merge first patch-bomb from Andrew Morton: - various misc bits - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been distracted. There has been quite a bit of activity. - About half the MM queue - Some backlight bits - Various lib/ updates - checkpatch updates - zillions more little rtc patches - ptrace - signals - exec - procfs - rapidio - nbd - aoe - pps - memstick - tools/testing/selftests updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits) tools/testing/selftests: don't assume the x bit is set on scripts selftests: add .gitignore for kcmp selftests: fix clean target in kcmp Makefile selftests: add .gitignore for vm selftests: add hugetlbfstest self-test: fix make clean selftests: exit 1 on failure kernel/resource.c: remove the unneeded assignment in function __find_resource aio: fix wrong comment in aio_complete() drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode drivers/memstick/host/r592.c: convert to module_pci_driver drivers/memstick/host/jmb38x_ms: convert to module_pci_driver pps-gpio: add device-tree binding and support drivers/pps/clients/pps-gpio.c: convert to module_platform_driver drivers/pps/clients/pps-gpio.c: convert to devm_* helpers drivers/parport/share.c: use kzalloc Documentation/accounting/getdelays.c: avoid strncpy in accounting tool aoe: update internal version number to v83 aoe: update copyright date aoe: perform I/O completions in parallel ... |
|
|
|
0786f7b225 |
kernel/resource.c: remove the unneeded assignment in function __find_resource
This line was introduced by
|
|
|
|
8f75af44ee |
kernel/pid.c: move statement
Move statement to static initilization of init_pid_ns. Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
18c830df77 |
kernel/fork.c:copy_process(): consolidate the lockless CLONE_THREAD checks
copy_process() does a lot of "chaotic" initializations and checks CLONE_THREAD twice before it takes tasklist. In particular it sets "p->group_leader = p" and then changes it again under tasklist if !thread_group_leader(p). This looks a bit confusing, lets create a single "if (CLONE_THREAD)" block which initializes ->exit_signal, ->group_leader, and ->tgid. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
8190773985 |
kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists
copy_process() adds the new child to thread_group/init_task.tasks list and then does attach_pid(child, PIDTYPE_PID). This means that the lockless next_thread() or next_task() can see this thread with the wrong pid. Say, "ls /proc/pid/task" can list the same inode twice. We could move attach_pid(child, PIDTYPE_PID) up, but in this case find_task_by_vpid() can find the new thread before it was fully initialized. And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch copy_process() initializes child->pids[*].pid first, then calls attach_pid() to insert the task into the pid->tasks list. attach_pid() no longer need the "struct pid*" argument, it is always called after pid_link->pid was already set. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
80628ca06c |
kernel/fork.c:copy_process(): unify CLONE_THREAD-or-thread_group_leader code
Cleanup and preparation for the next changes. Move the "if (clone_flags & CLONE_THREAD)" code down under "if (likely(p->pid))" and turn it into into the "else" branch. This makes the process/thread initialization more symmetrical and removes one check. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
b57922b6c7 |
fork: reorder permissions when violating number of processes limits
When a task is attempting to violate the RLIMIT_NPROC limit we have a check to see if the task is sufficiently priviledged. The check first looks at CAP_SYS_ADMIN, then CAP_SYS_RESOURCE, then if the task is uid=0. A result is that tasks which are allowed by the uid=0 check are first checked against the security subsystem. This results in the security subsystem auditting a denial for sys_admin and sys_resource and then the task passing the uid=0 check. This patch rearranges the code to first check uid=0, since if we pass that we shouldn't hit the security system at all. We then check sys_resource, since it is the smallest capability which will solve the problem. Lastly we check the fallback everything cap_sysadmin. We don't want to give this capability many places since it is so powerful. This will eliminate many of the false positive/needless denial messages we get when a root task tries to violate the nproc limit. (note that kthreads count against root, so on a sufficiently large machine we can actually get past the default limits before any userspace tasks are launched.) Signed-off-by: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
81dabb4641 |
exit.c: unexport __set_special_pids()
Move __set_special_pids() from exit.c to sys.c close to its single caller and make it static. And rename it to set_special_pids(), another helper with this name has gone away. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
7f57cfa4e2 |
usermodehelper: kill the sub_info->path[0] check
call_usermodehelper_exec() does nothing but returns success if path[0] ==
0. The only user which needs this strange feature is request_module(), it
can check modprobe_path[0] itself like other users do if they want to
detect the "disabled by admin" case.
Kill it. Not only it looks strange, it can confuse other callers. And
this allows us to revert
|
|
|
|
29000caecb |
ptrace: add ability to get/set signal-blocked mask
crtools uses a parasite code for dumping processes. The parasite code is injected into a process with help PTRACE_SEIZE. Currently crtools blocks signals from a parasite code. If a process has pending signals, crtools wait while a process handles these signals. This method is not suitable for stopped tasks. A stopped task can have a few pending signals, when we will try to execute a parasite code, we will need to drop SIGSTOP, but all other signals must remain pending, because a state of processes must not be changed during checkpointing. This patch adds two ptrace commands to set/get signal-blocked mask. I think gdb can use this commands too. [akpm@linux-foundation.org: be consistent with brace layout] Signed-off-by: Andrey Vagin <avagin@openvz.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
10fb46d5f7 |
kprobes: handle empty/invalid input to debugfs "enabled" file
When writing invalid input to 'debug/kprobes/enabled' it'll silently be ignored. Even worse, when writing an empty string to this file, the outcome is purely random as the switch statement will make its decision based on the value of an uninitialized stack variable. Fix this by handling invalid/empty input as error returning -EINVAL. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
45c64940c8 |
kernel/sys.c:do_sysinfo(): use get_monotonic_boottime()
Change do_sysinfo() to use get_monotonic_boottime() instead of do_posix_clock_monotonic_gettime() + monotonic_to_bootbased(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
7ec75e1ca1 |
kernel/sys.c: sys_reboot(): fix malformed panic message
If LINUX_REBOOT_CMD_HALT for reboot failed, the message "cannot halt" will stay on the same line with the next message, so append a '\n'. Signed-off-by: liguang <lig.fnst@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
f170168b9a |
drivers: avoid parsing names as kthread_run() format strings
Calling kthread_run with a single name parameter causes it to be handled as a format string. Many callers are passing potentially dynamic string content, so use "%s" in those cases to avoid any potential accidents. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
0ed5fd1385 |
mm: use totalram_pages instead of num_physpages at runtime
The global variable num_physpages is scheduled to be removed, so use totalram_pages instead of num_physpages at runtime. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
f991fae5c6 |
Power management and ACPI updates for 3.11-rc1
- Hotplug changes allowing device hot-removal operations to fail gracefully (instead of crashing the kernel) if they cannot be carried out completely. From Rafael J Wysocki and Toshi Kani. - Freezer update from Colin Cross and Mandeep Singh Baines targeted at making the freezing of tasks a bit less heavy weight operation. - cpufreq resume fix from Srivatsa S Bhat for a regression introduced during the 3.10 cycle causing some cpufreq sysfs attributes to return wrong values to user space after resume. - New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to provide information previously available via related_cpus from Lan Tianyu. - cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and Tang Yuantian. - Fix for an ACPICA regression causing suspend/resume issues to appear on some systems introduced during the 3.4 development cycle from Lv Zheng. - ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and Zhang Rui. - New cupidle driver for Xilinx Zynq processors from Michal Simek. - cpuidle fixes and cleanups from Daniel Lezcano. - Changes to make suspend/resume work correctly in Xen guests from Konrad Rzeszutek Wilk. - ACPI device power management fixes and cleanups from Fengguang Wu and Rafael J Wysocki. - ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo. - Fix for the IA-64 issue that was the reason for reverting commit |
|
|
|
a0b2062b09 |
posix_timers: fix racy timer delta caching on task exit
When a task exits, we perform a caching of the remaining cputime delta
before expiring of its timers.
This is done from the following places:
* When the task is reaped. We iterate through its list of
posix cpu timers and store the remaining timer delta to
the timer struct instead of the absolute value.
(See posix_cpu_timers_exit() / posix_cpu_timers_exit_group() )
* When we call posix_cpu_timer_get() or posix_cpu_timer_schedule().
If the timer's task is considered dying when watched from these
places, the same conversion from absolute to relative expiry time
is performed. Then the given task's reference is released.
(See clear_dead_task() ).
The relevance of this caching is questionable but this is another
and deeper debate.
The big issue here is that these two sources of caching don't mix
up very well together.
More specifically, the caching can easily be done twice, resulting
in a wrong delta as it gets spuriously substracted a second time by
the elapsed clock. This can happen in the following scenario:
1) The task exits and gets reaped: we call posix_cpu_timers_exit()
and the absolute timer expiry values are converted to a relative
delta.
2) timer_gettime() -> posix_cpu_timer_get() is called and relies on
clear_dead_task() because tsk->exit_state == EXIT_DEAD.
The delta gets substracted again by the elapsed clock and we return
a wrong result.
To fix this, just remove the caching done on task reaping time. It
doesn't bring much value on its own. The caching done from
posix_cpu_timer_get/schedule is enough.
And it would also be hard to get it really right: we could make it put and
clear the target task in the timer struct so that readers know if they are
dealing with a relative cached of absolute value. But it would be racy.
The only safe way to do it would be to lock the itimer->it_lock so that we
know nobody reads the cputime expiry value while we modify it and its
target task reference. Doing so would involve some funny workarounds to
avoid circular lock against the sighand lock. There is just no reason to
maintain this.
The user visible effect of this patch can be observed by running the
following code: it creates a subthread that launches a posix cputimer
which expires after 10 seconds. But then the subthread only busy loops for 2
seconds and exits. The parent reaps the subthread and read the timer value.
Its expected value should the be the initial timer's expiration value
minus the cputime elapsed in the subthread. Roughly 10 - 2 = 8 seconds:
#include <sys/time.h>
#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include <pthread.h>
static timer_t id;
static struct itimerspec val = { .it_value.tv_sec = 10, }, new;
static void *thread(void *unused)
{
int err;
struct timeval start, end, diff;
timer_create(CLOCK_THREAD_CPUTIME_ID, NULL, &id);
if (err < 0) {
perror("Can't create timer\n");
return NULL;
}
/* Arm 10 sec timer */
err = timer_settime(id, 0, &val, NULL);
if (err < 0) {
perror("Can't set timer\n");
return NULL;
}
/* Exit after 2 seconds of execution */
gettimeofday(&start, NULL);
do {
gettimeofday(&end, NULL);
timersub(&end, &start, &diff);
} while (diff.tv_sec < 2);
return NULL;
}
int main(int argc, char **argv)
{
pthread_t pthread;
int err;
err = pthread_create(&pthread, NULL, thread, NULL);
if (err) {
perror("Can't create thread\n");
return -1;
}
pthread_join(pthread, NULL);
/* Just wait a little bit to make sure the child got reaped */
sleep(1);
err = timer_gettime(id, &new);
if (err)
perror("Can't get timer value\n");
printf("%d %ld\n", new.it_value.tv_sec, new.it_value.tv_nsec);
return 0;
}
Before the patch:
$ ./posix_cpu_timers
6 2278074
After the patch:
$ ./posix_cpu_timers
8 1158766
Before the patch, the elapsed time got two more seconds spuriously accounted.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
76cdcdd979 |
posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
In order to re-arm a timer after it fired, we take a sample of the current
process or thread cputime.
If the task is dying though, we don't arm anything but we cache the
remaining timer expiration delta for further reads.
Something similar is performed in posix_cpu_timer_get() but here we forget
to take the process wide cputime sample before caching it.
As a result we are storing random stack content, leading every further
reads of that timer to return junk values.
Fix this by taking the appropriate sample in the case of process wide
timers.
This probably doesn't matter much in practice because, at this stage, the
thread is the last one in the group and we reached exit_notify(). This
implies that we called exit_itimers() and there should be no more timers
to handle for that task.
So this is likely dead code anyway but let's fix the current logic
and the warning that came along:
kernel/posix-cpu-timers.c: In function 'posix_cpu_timer_schedule':
kernel/posix-cpu-timers.c:1127: warning: 'now' may be used uninitialized in this function
Then we can start to think further about cleaning up that code.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Chen Gang <gang.chen@asianux.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
2473f3e7a9 |
posix_cpu_timers: consolidate expired timers check
Consolidate the common code amongst per thread and per process timers list on tick time. List traversal, expiry check and subsequent updates can be shared in a common helper. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
1a7fa510b3 |
posix_cpu_timers: consolidate timer list cleanups
Cleaning up the posix cpu timers on task exit shares some common code among timer list types, most notably the list traversal and expiry time update. Unify this in a common helper. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
55ccb616a6 |
posix_cpu_timer: consolidate expiry time type
The posix cpu timer expiry time is stored in a union of two types: a 64 bits field if we rely on scheduler precise accounting, or a cputime_t if we rely on jiffies. This results in quite some duplicate code and special cases to handle the two types. Just unify this into a single 64 bits field. cputime_t can always fit into it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
0b0585c3e1 |
Merge branch 'for-3.11-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cpuset changes from Tejun Heo:
"cpuset has always been rather odd about its configurations - a cgroup
right after creation didn't allow any task executions before
configuration, changing configuration in the parent modifies the
descendants irreversibly and so on. These behaviors are inherently
nasty and almost hostile against sharing the hierarchy with other
controllers making it very difficult to use in unified hierarchy.
Li is currently in the process of updating the behaviors for
__DEVEL__sane_behavior which is the bulk of changes in this pull
request. It isn't complete yet and the behaviors will change further
but all changes are gated behind sane_behavior. In the process, the
rather hairy work-item punting which was used to work around the
limitations of cgroup descendant iterator was simplified."
* 'for-3.11-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: rename @cont to @cgrp
cpuset: fix to migrate mm correctly in a corner case
cpuset: allow to move tasks to empty cpusets
cpuset: allow to keep tasks in empty cpusets
cpuset: introduce effective_{cpumask|nodemask}_cpuset()
cpuset: record old_mems_allowed in struct cpuset
cpuset: remove async hotplug propagation work
cpuset: let hotplug propagation work wait for task attaching
cpuset: re-structure update_cpumask() a bit
cpuset: remove cpuset_test_cpumask()
cpuset: remove unnecessary variable in cpuset_attach()
cpuset: cleanup guarantee_online_{cpus|mems}()
cpuset: remove redundant check in cpuset_cpus_allowed_fallback()
|
|
|
|
b028161fbb |
Merge branch 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
"This pull request contains the following changes.
- cgroup_subsys_state (css) reference counting has been converted to
percpu-ref. css is what each resource controller embeds into its
own control structure and perform reference count against. It may
be used in hot paths of various subsystems and is similar to module
refcnt in that aspect. For example, block-cgroup's css refcnting
was showing up a lot in Mikulaus's device-mapper scalability work
and this should alleviate it.
- cgroup subtree iterator has been updated so that RCU read lock can
be released after grabbing reference. This allows simplifying its
users which requires blocking which used to build iteration list
under RCU read lock and then traverse it outside. This pull
request contains simplification of cgroup core and device-cgroup.
A separate pull request will update cpuset.
- Fixes for various bugs including corner race conditions and RCU
usage bugs.
- A lot of cleanups and some prepartory work for the planned unified
hierarchy support."
* 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (48 commits)
cgroup: CGRP_ROOT_SUBSYS_BOUND should also be ignored when mounting an existing hierarchy
cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options
cgroup: fix deadlock on cgroup_mutex via drop_parsed_module_refcounts()
cgroup: always use RCU accessors for protected accesses
cgroup: fix RCU accesses around task->cgroups
cgroup: fix RCU accesses to task->cgroups
cgroup: grab cgroup_mutex in drop_parsed_module_refcounts()
cgroup: fix cgroupfs_root early destruction path
cgroup: reserve ID 0 for dummy_root and 1 for unified hierarchy
cgroup: implement for_each_[builtin_]subsys()
cgroup: move init_css_set initialization inside cgroup_mutex
cgroup: s/for_each_subsys()/for_each_root_subsys()/
cgroup: clean up find_css_set() and friends
cgroup: remove cgroup->actual_subsys_mask
cgroup: prefix global variables with "cgroup_"
cgroup: convert CFTYPE_* flags to enums
cgroup: rename cont to cgrp
cgroup: clean up cgroup_serial_nr_cursor
cgroup: convert cgroup_cft_commit() to use cgroup_for_each_descendant_pre()
cgroup: make serial_nr_cursor available throughout cgroup.c
...
|
|
|
|
f317ff9eed |
Merge branch 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo: "Surprisingly, Lai and I didn't break too many things implementing custom pools and stuff last time around and there aren't any follow-up changes necessary at this point. The only change in this pull request is Viresh's patches to make some per-cpu workqueues to behave as unbound workqueues dependent on a boot param whose default can be configured via a config option. This leads to higher processing overhead / lower bandwidth as more work items are bounced across CPUs; however, it can lead to noticeable powersave in certain configurations - ~10% w/ idlish constant workload on a big.LITTLE configuration according to Viresh. This is because per-cpu workqueues interfere with how the scheduler perceives whether or not each CPU is idle by forcing pinned tasks on them, which makes the scheduler's power-aware scheduling decisions less effective. Its effectiveness is likely less pronounced on homogenous configurations and this type of optimization can probably be made automatic; however, the changes are pretty minimal and the affected workqueues are clearly marked, so it's an easy gain for some configurations for the time being with pretty unintrusive changes." * 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: fbcon: queue work on power efficient wq block: queue work on power efficient wq PHYLIB: queue work on system_power_efficient_wq workqueue: Add system wide power_efficient workqueues workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues |
|
|
|
3045f94a20 |
Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
"The changes in this tree are:
- ACPI APEI (ACPI Platform Error Interface) improvements, by Chen
Gong
- misc MCE fixes/cleanups"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Update MCE severity condition check
mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
ACPI/APEI: Update einj documentation for param1/param2
ACPI/APEI: Add parameter check before error injection
ACPI, APEI, EINJ: Fix error return code in einj_init()
x86, mce: Fix "braodcast" typo
|
|
|
|
2d722f6d56 |
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes:
- load-calculation cleanups and improvements, by Alex Shi
- various nohz related tidying up of statisics, by Frederic
Weisbecker
- factor out /proc functions to kernel/sched/proc.c, by Paul
Gortmaker
- simplify the RT policy scheduler, by Kirill Tkhai
- various fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask
sched/debug: Fix formatting of /proc/<PID>/sched
sched: Fix typo in struct sched_avg member description
sched/fair: Fix typo describing flags in enqueue_entity
sched/debug: Add load-tracking statistics to task
sched: Change get_rq_runnable_load() to static and inline
sched/tg: Remove tg.load_weight
sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t
sched/tg: Use 'unsigned long' for load variable in task group
sched: Change cfs_rq load avg to unsigned long
sched: Consider runnable load average in move_tasks()
sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task
sched: Update cpu load after task_tick
sched: Fix sleep time double accounting in enqueue entity
sched: Set an initial value of runnable avg for new forked task
sched: Move a few runnable tg variables into CONFIG_SMP
Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking"
sched: Don't mix use of typedef ctl_table and struct ctl_table
sched: Remove WARN_ON(!sd) from init_sched_groups_power()
sched: Fix memory leakage in build_sched_groups()
...
|
|
|
|
f0bb4c0ab0 |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Kernel improvements:
- watchdog driver improvements by Li Zefan
- Power7 CPI stack events related improvements by Sukadev Bhattiprolu
- event multiplexing via hrtimers and other improvements by Stephane
Eranian
- kernel stack use optimization by Andrew Hunter
- AMD IOMMU uncore PMU support by Suravee Suthikulpanit
- NMI handling rate-limits by Dave Hansen
- various hw_breakpoint fixes by Oleg Nesterov
- hw_breakpoint overflow period sampling and related signal handling
fixes by Jiri Olsa
- Intel Haswell PMU support by Andi Kleen
Tooling improvements:
- Reset SIGTERM handler in workload child process, fix from David
Ahern.
- Makefile reorganization, prep work for Kconfig patches, from Jiri
Olsa.
- Add automated make test suite, from Jiri Olsa.
- Add --percent-limit option to 'top' and 'report', from Namhyung
Kim.
- Sorting improvements, from Namhyung Kim.
- Expand definition of sysfs format attribute, from Michael Ellerman.
Tooling fixes:
- 'perf tests' fixes from Jiri Olsa.
- Make Power7 CPI stack events available in sysfs, from Sukadev
Bhattiprolu.
- Handle death by SIGTERM in 'perf record', fix from David Ahern.
- Fix printing of perf_event_paranoid message, from David Ahern.
- Handle realloc failures in 'perf kvm', from David Ahern.
- Fix divide by 0 in variance, from David Ahern.
- Save parent pid in thread struct, from David Ahern.
- Handle JITed code in shared memory, from Andi Kleen.
- Fixes for 'perf diff', from Jiri Olsa.
- Remove some unused struct members, from Jiri Olsa.
- Add missing liblk.a dependency for python/perf.so, fix from Jiri
Olsa.
- Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
- No need to do locking when adding hists in perf report, only 'top'
needs that, from Namhyung Kim.
- Fix alignment of symbol column in in the hists browser (top,
report) when -v is given, from NAmhyung Kim.
- Fix 'perf top' -E option behavior, from Namhyung Kim.
- Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
- Fix compile errors in bp_signal 'perf test', from Sukadev
Bhattiprolu.
... and more things"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
perf/x86: Fix shared register mutual exclusion enforcement
perf/x86/intel: Support full width counting
x86: Add NMI duration tracepoints
perf: Drop sample rate when sampling is too slow
x86: Warn when NMI handlers take large amounts of time
hw_breakpoint: Introduce "struct bp_cpuinfo"
hw_breakpoint: Simplify *register_wide_hw_breakpoint()
hw_breakpoint: Introduce cpumask_of_bp()
hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
perf/x86/intel: Add mem-loads/stores support for Haswell
perf/x86/intel: Support Haswell/v4 LBR format
perf/x86/intel: Move NMI clearing to end of PMI handler
perf/x86/intel: Add Haswell PEBS support
perf/x86/intel: Add simple Haswell PMU support
perf/x86/intel: Add Haswell PEBS record support
perf/x86/intel: Fix sparse warning
perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
perf/x86/amd: Add IOMMU Performance Counter resource management
...
|
|
|
|
a4883ef6af |
Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core irq changes from Ingo Molnar: "The main changes: - generic-irqchip driver additions, cleanups and fixes - 3 new irqchip drivers: ARMv7-M NVIC, TB10x and Marvell Orion SoCs - irq_get_trigger_type() simplification and cross-arch cleanup - various cleanups, simplifications - documentation updates" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) softirq: Use _RET_IP_ genirq: Add the generic chip to the genirq docbook genirq: generic-chip: Export some irq_gc_ functions genirq: Fix can_request_irq() for IRQs without an action irqchip: exynos-combiner: Staticize combiner_init irqchip: Add support for ARMv7-M NVIC irqchip: Add TB10x interrupt controller driver irqdomain: Use irq_get_trigger_type() to get IRQ flags MIPS: octeon: Use irq_get_trigger_type() to get IRQ flags arm: orion: Use irq_get_trigger_type() to get IRQ flags mfd: stmpe: use irq_get_trigger_type() to get IRQ flags mfd: twl4030-irq: Use irq_get_trigger_type() to get IRQ flags gpio: mvebu: Use irq_get_trigger_type() to get IRQ flags genirq: Add irq_get_trigger_type() to get IRQ flags genirq: Irqchip: document gcflags arg of irq_alloc_domain_generic_chips genirq: Set irq thread to RT priority on creation irqchip: Add support for Marvell Orion SoCs genirq: Add kerneldoc for irq_disable. genirq: irqchip: Add mask to block out invalid irqs genirq: Generic chip: Add linear irq domain support ... |
|
|
|
ab3d681e9d |
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The major changes:
- Simplify RCU's grace-period and callback processing based on the new
numbering for callbacks.
- Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for
single-CPU low-latency systems.
- SRCU-related changes and fixes.
- Miscellaneous fixes, including converting a few remaining printk()
calls to pr_*().
- Documentation updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs
rcu: Shrink TINY_RCU by moving exit_rcu()
rcu: Remove TINY_PREEMPT_RCU tracing documentation
rcu: Consolidate rcutiny_plugin.h ifdefs
rcu: Remove rcu_preempt_note_context_switch()
rcu: Remove the CONFIG_TINY_RCU ifdefs in rcutiny.h
rcu: Remove check_cpu_stall_preempt()
rcu: Simplify RCU_TINY RCU callback invocation
rcu: Remove rcu_preempt_process_callbacks()
rcu: Remove rcu_preempt_remove_callbacks()
rcu: Remove rcu_preempt_check_callbacks()
rcu: Remove show_tiny_preempt_stats()
rcu: Remove TINY_PREEMPT_RCU
powerpc,kvm: fix imbalance srcu_read_[un]lock()
rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().
rcu: Apply Dave Jones's NOCB Kconfig help feedback
rcu: Merge adjacent identical ifdefs
rcu: Drive quiescent-state-forcing delay from HZ
rcu: Remove "Experimental" flags
kthread: Add kworker kthreads to OS-jitter documentation
...
|
|
|
|
0c46d68d19 |
Merge branch 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull WW mutex support from Ingo Molnar:
"This tree adds support for wound/wait style locks, which the graphics
guys would like to make use of in the TTM graphics subsystem.
Wound/wait mutexes are used when other multiple lock acquisitions of a
similar type can be done in an arbitrary order. The deadlock handling
used here is called wait/wound in the RDBMS literature: The older
tasks waits until it can acquire the contended lock. The younger
tasks needs to back off and drop all the locks it is currently
holding, ie the younger task is wounded.
See this LWN.net description of W/W mutexes:
https://lwn.net/Articles/548909/
The comments there outline specific usecases for this facility (which
have already been implemented for the DRM tree).
Also see Documentation/ww-mutex-design.txt for more details"
* 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking-selftests: Handle unexpected failures more strictly
mutex: Add more w/w tests to test EDEADLK path handling
mutex: Add more tests to lib/locking-selftest.c
mutex: Add w/w tests to lib/locking-selftest.c
mutex: Add w/w mutex slowpath debugging
mutex: Add support for wound/wait style locks
arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not
|
|
|
|
3e42dee676 |
Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking changes from Ingo Molnar: "Four miscellanous standalone fixes for futexes, rtmutexes and Kconfig.locks." * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Use freezable blocking call futex: Take hugepages into account when generating futex_key rtmutex: Document rt_mutex_adjust_prio_chain() locking: Fix copy/paste errors of "ARCH_INLINE_*_UNLOCK_BH" |
|
|
|
fc76a258d4 |
Driver core patches for 3.11-rc1
Here's the big driver core merge for 3.11-rc1 Lots of little things, and larger firmware subsystem updates, all described in the shortlog. Nice thing here is that we finally get rid of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had been always on for a number of kernel releases, now it's just removed.) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlHRsGMACgkQMUfUDdst+ylIIACfW8lLxOPVK+iYG699TWEBAkp0 LFEAnjlpAMJ1JnoZCuWDZObNCev93zGB =020+ -----END PGP SIGNATURE----- Merge tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here's the big driver core merge for 3.11-rc1 Lots of little things, and larger firmware subsystem updates, all described in the shortlog. Nice thing here is that we finally get rid of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had been always on for a number of kernel releases, now it's just removed)" * tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (27 commits) driver core: device.h: fix doc compilation warnings firmware loader: fix another compile warning with PM_SLEEP unset build some drivers only when compile-testing firmware loader: fix compile warning with PM_SLEEP set kobject: sanitize argument for format string sysfs_notify is only possible on file attributes firmware loader: simplify holding module for request_firmware firmware loader: don't export cache_firmware and uncache_firmware drivers/base: Use attribute groups to create sysfs memory files firmware loader: fix compile warning firmware loader: fix build failure with !CONFIG_FW_LOADER_USER_HELPER Documentation: Updated broken link in HOWTO Finally eradicate CONFIG_HOTPLUG driver core: firmware loader: kill FW_ACTION_NOHOTPLUG requests before suspend driver core: firmware loader: don't cache FW_ACTION_NOHOTPLUG firmware Documentation: Tidy up some drivers/base/core.c kerneldoc content. platform_device: use a macro instead of platform_driver_register firmware: move EXPORT_SYMBOL annotations firmware: Avoid deadlock of usermodehelper lock at shutdown dell_rbu: Select CONFIG_FW_LOADER_USER_HELPER explicitly ... |
|
|
|
bcd7351e83 |
FS-Cache patches 2013-07-02
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIVAwUAUdLdUxOxKuMESys7AQK1kQ//W7fgFXCG+5XVk4ECHGN5tqRn4tU69DY0
9nYU2/y1wbqV5cTO36XTcFPQK1qbW2ZdyvEZ2CF8OfwtQpLmcALGtpBIgJwYs+4H
DMkgO06zdk4caxc0C4JBIGs+MDeLNk2SQObqblGl1BAQKQ5cqsCLsIZ/rxln999m
ufuobfns1YvuHkzMtswUDmm3zWMpwqqPAbbl+fTwPU683a/AleckG2ACyFvKZAxA
OyI8kJR4e33a3/BGo/5OFb3qI1+Z25EOWdvdnM+r4hdKJZF9ZySlyc640GZHAO2J
wKj5lYp1nBpyNPvYvly174s2MxPju1CRHb7gxcV4LX3vtEY4/MCg7m6P46EUfC6R
C3V7PMMCjZXEQ01MKEmGig47EJKIiecCQUZupJnP7HFKPzeJR9mQZFd68WqzswAM
w9hcCw9hQ9y/kTDVrTVCHs0Q9iTxShfrJyfRJnQ1VcoT+1dieruTa9am9OBKiEw6
CQrPjq9RZZfsZHYr6RlGZHGJyzjrTzrf6EhxwmgaCxWycpvCuV7z76YgAVZI7V4r
qnJmH8dXWdoSA7nZ6sgsb5TRCLT9wu1nNId0DMpAGB1cDGga/55AZtqxdoJLnlkj
y/4wQavIrkfHHuS8c3gzVXPtYmM19CHgcKRFydXD0uGobzfxwYKTKMH+Gviu1NnH
/pGNNY2vVGI=
=Wjhu
-----END PGP SIGNATURE-----
Merge tag 'fscache-20130702' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull FS-Cache updates from David Howells:
"This contains a number of fixes for various FS-Cache issues plus some
cleanups. The commits are, in order:
1) Provide a system wait_on_atomic_t() and wake_up_atomic_t() sharing
the bit-wait table (enhancement for #8).
2) Don't put spin_lock() in a while-condition as spin_lock() may have
a do {} while(0) wrapper (cleanup).
3) Symbolically name i_mutex lock classes rather than using numbers
in CacheFiles (cleanup).
4) Don't sleep in page release if __GFP_FS is not set (deadlock vs
ext4).
5) Uninline fscache_object_init() (cleanup for #7).
6) Wrap checks on object state (cleanup for #7).
7) Simplify the object state machine by separating work states from
wait states.
8) Simplify cookie retention by objects (NULL pointer deref fix).
9) Remove unused list_to_page() macro (cleanup).
10) Make the remaining-pages counter in the retrieval op atomic
(assertion failure fix).
11) Don't use spin_is_locked() in assertions (assertion failure fix)"
* tag 'fscache-20130702' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
FS-Cache: Don't use spin_is_locked() in assertions
FS-Cache: The retrieval remaining-pages counter needs to be atomic_t
cachefiles: remove unused macro list_to_page()
FS-Cache: Simplify cookie retention for fscache_objects, fixing oops
FS-Cache: Fix object state machine to have separate work and wait states
FS-Cache: Wrap checks on object state
FS-Cache: Uninline fscache_object_init()
FS-Cache: Don't sleep in page release if __GFP_FS is not set
CacheFiles: name i_mutex lock class explicitly
fs/fscache: remove spin_lock() from the condition in while()
Add wait_on_atomic_t() and wake_up_atomic_t()
|
|
|
|
07bd117290 |
tick: Sanitize broadcast control logic
The recent implementation of a generic dummy timer resulted in a
different registration order of per cpu local timers which made the
broadcast control logic go belly up.
If the dummy timer is the first clock event device which is registered
for a CPU, then it is installed, the broadcast timer is initialized
and the CPU is marked as broadcast target.
If a real clock event device is installed after that, we can fail to
take the CPU out of the broadcast mask. In the worst case we end up
with two periodic timer events firing for the same CPU. One from the
per cpu hardware device and one from the broadcast.
Now the problem is that we have no way to distinguish whether the
system is in a state which makes broadcasting necessary or the
broadcast bit was set due to the nonfunctional dummy timer
installment.
To solve this we need to keep track of the system state seperately and
provide a more detailed decision logic whether we keep the CPU in
broadcast mode or not.
The old decision logic only clears the broadcast mode, if the newly
installed clock event device is not affected by power states.
The new logic clears the broadcast mode if one of the following is
true:
- The new device is not affected by power states.
- The system is not in a power state affected mode
- The system has switched to oneshot mode. The oneshot broadcast is
controlled from the deep idle state. The CPU is not in idle at
this point, so it's safe to remove it from the mask.
If we clear the broadcast bit for the CPU when a new device is
installed, we also shutdown the broadcast device when this was the
last CPU in the broadcast mask.
If the broadcast bit is kept, then we leave the new device in shutdown
state and rely on the broadcast to deliver the timer interrupts via
the broadcast ipis.
Reported-and-tested-by: Stehle Vincent-B46079 <B46079@freescale.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: John Stultz <john.stultz@linaro.org>,
Cc: Mark Rutland <mark.rutland@arm.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
1f73a9806b |
tick: Prevent uncontrolled switch to oneshot mode
When the system switches from periodic to oneshot mode, the broadcast logic causes a possibility that a CPU which has not yet switched to oneshot mode puts its own clock event device into oneshot mode without updating the state and the timer handler. CPU0 CPU1 per cpu tickdev is in periodic mode and switched to broadcast Switch to oneshot mode tick_broadcast_switch_to_oneshot() cpumask_copy(tick_oneshot_broacast_mask, tick_broadcast_mask); broadcast device mode = oneshot Timer interrupt irq_enter() tick_check_oneshot_broadcast() dev->set_mode(ONESHOT); tick_handle_periodic() if (dev->mode == ONESHOT) dev->next_event += period; FAIL. We fail, because dev->next_event contains KTIME_MAX, if the device was in periodic mode before the uncontrolled switch to oneshot happened. We must copy the broadcast bits over to the oneshot mask, because otherwise a CPU which relies on the broadcast would not been woken up anymore after the broadcast device switched to oneshot mode. So we need to verify in tick_check_oneshot_broadcast() whether the CPU has already switched to oneshot mode. If not, leave the device untouched and let the CPU switch controlled into oneshot mode. This is a long standing bug, which was never noticed, because the main user of the broadcast x86 cannot run into that scenario, AFAICT. The nonarchitected timer mess of ARM creates a gazillion of differently broken abominations which trigger the shortcomings of that broadcast code, which better had never been necessary in the first place. Reported-and-tested-by: Stehle Vincent-B46079 <B46079@freescale.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Cc: John Stultz <john.stultz@linaro.org>, Cc: Mark Rutland <mark.rutland@arm.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |