mirror of https://github.com/torvalds/linux.git
18811 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
d0cd84817c |
dmaengine-3.17
1/ Step down as dmaengine maintainer see commit |
|
|
|
bdf428feb2 |
Nothing major: support for compressing modules, and auto-tainting params.
Cheers,
Rusty.
PS. My virtio-next tree is empty: DaveM took the patches I had. There might
be a virtio-rng starvation fix, but so far it's a bit voodoo so I will
get to that in the next two days or it will wait.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUGFrvAAoJENkgDmzRrbjxOJYQALaZbTumrtX3Mo/FAtzn8d5N
8gxcqk1Mhz4lR1vPWy/YN/H2f23qb/saqLxPar8Wgou3h7N8EqSdwDqJSuvEqhG0
iEXUsNLC7BOsDkLYhdjTfZoW/lsVU/EH4bkZMSxAZI9V64phXhDYfPb5SQgJTECr
Ue6IK4ijW6zdWLstGfg/ixrIeGDUSnyiThF9O2mYVaB1D0QkLDIAZxbjZJgfFfut
PwO33/sEV4pceTpkmxFKl/OiS+obi/VbDixjSCcO+jaBd1pVxH9fhhKREStOhN4z
88z5ADR71RH6so9TQTwIIcgb2Hon5d+3RVMB6CxuvKs9NmHSXDiQyZvG9J/jiSdm
KrPKSiVwGGwJSwxXTm8CDaz6Oj0ibDXBIzv/vYI22sR7u8PmRQFvL3O1VrW+KDnE
yoG75S9DHzSQ1183xFFFTt4FBRm/4XKyVs+F6YqYkchLigrUfQMCGb1cmZyE5y7K
bgNyonu0m/ItoQmekoDgYqvSjwdguaJ35XCW55GrKJ84JDHBaw3SpPdEfjAS8FsH
aT5o2oernvwRG6gsX9858RvB/uo1UKwHv1waDfV4cqNjMm5Ko+Yr6OIdQvBQiq07
cFkVmkrMtEyX19QyIGW3QSbFL1lr3X5cC5glzEeKY941yZbTluSsNuMlMPT1+IMx
NOUbh0aG8B8ZaMZPFNLi
=QzCn
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
"Nothing major: support for compressing modules, and auto-tainting
params.
PS. My virtio-next tree is empty: DaveM took the patches I had. There
might be a virtio-rng starvation fix, but so far it's a bit voodoo
so I will get to that in the next two days or it will wait"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
moduleparam: Resolve missing-field-initializer warning
kbuild: handle module compression while running 'make modules_install'.
modinst: wrap long lines in order to enhance cmd_modules_install
modsign: lookup lines ending in .ko in .mod files
modpost: simplify file name generation of *.mod.c files
modpost: reduce visibility of symbols and constify r/o arrays
param: check for tainting before calling set op.
drm/i915: taint the kernel if unsafe module parameters are set
module: add module_param_unsafe and module_param_named_unsafe
module: make it possible to have unsafe, tainting module params
module: rename KERNEL_PARAM_FL_NOARG to avoid confusion
|
|
|
|
74da38631a |
Tinification for 3.18
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJUL0J0AAoJEA7Zo9+K/4c9w40P/iMFPfCethdBtPz5rI88CVr2 7yU99TdbEPoRJm+rU4ohvHdB73p2KWINIKvpSThvegvjXbEcKxQkdpVWHsFJZeHS bZiYmhjxdCBvJGLrYo5IwqH0PrSjokTPzMUekUCk7BkUKNJRaDjfUBHvUmKsinUR dQL+3KE3edy6W3DL+FOd0QZwSOgmOfEibTWpfmg+n16kFNa75Kg/QLwjYRvtQplP eElywDZN07IhAeBFqKhKvlKmDSAeqMd8RfoPPo9Ts+reeIrWYjVNbl9ISOqXqy2x JoLeZQmwSXj/C9Ehr5e+aId2eO8In5xueQfXP8SS8dCC7VLwRbnNgyAQQZEslEBk QH0GhT6GqTamBdiNI3I+usfs65cEaialXh2afcoLwGS/iGD8MhZ8Dt+m4iyXNxEZ kT9VA4974mPjJ1g0mDDnYIxNjxF43m+SD5K1sR/XGpMcA8NdqMUmvKNcbePCobVa WTutIemQqGipNeWE94XwZEbc0B+aWwH7eiZOBMVGhWsHInd7QeTBTbfZlctyBkzf AswgsFjC5FW05CWK6J1Lf/UI1FD9PmHMKpmQUPED1+7okDTfqGjKjdREWgZSixUt LIRfWqWEaNpRRBFbDyt0C+F4pBRPLiRDaOyNhwEdtXuVGKRXb1G3qX7nFOJAZo6G GDTZo9iIRNSfm/M4tJ+n =2VyW -----END PGP SIGNATURE----- Merge tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux Pull "tinification" patches from Josh Triplett. Work on making smaller kernels. * tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux: bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_ mm: Support compiling out madvise and fadvise x86: Support compiling out human-friendly processor feature names x86: Drop support for /proc files when !CONFIG_PROC_FS x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE x86, boot: Use the usual -y -n mechanism for objects in vmlinux x86: Add "make tinyconfig" to configure the tiniest possible kernel x86, platform, kconfig: move kvmconfig functionality to a helper |
|
|
|
039001972a |
While testing some new changes for 3.18, I kept hitting a bug every so
often in the ring buffer. At first I thought it had to do with some of the changes I was working on, but then testing something else I realized that the bug was in 3.17 itself. I ran several bisects as the bug was not very reproducible, and finally came up with the commit that I could reproduce easily within a few minutes, and without the change I could run the tests over an hour without issue. The change fit the bug and I figured out a fix. That bad commit was: Commit |
|
|
|
6c72e3501d |
perf: fix perf bug in fork()
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by calling perf_event_free_task() when failing sched_fork() we will not yet have done the memset() on ->perf_event_ctxp[] and will therefore try and 'free' the inherited contexts, which are still in use by the parent process. This is bad.. Suggested-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
24607f114f |
ring-buffer: Fix infinite spin in reading buffer
Commit |
|
|
|
7bced39751 |
net_dma: simple removal
Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.
This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.
Marked for stable due to Roman's report of a memory leak in
dma_pin_iovec_pages():
https://lkml.org/lkml/2014/9/3/177
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: David Whipple <whipple@securedatainnovations.ch>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: <stable@vger.kernel.org>
Reported-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
|
|
6111da3432 |
Merge branch 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This is quite late but these need to be backported anyway.
This is the fix for a long-standing cpuset bug which existed from
2009. cpuset makes use of PF_SPREAD_{PAGE|SLAB} flags to modify the
task's memory allocation behavior according to the settings of the
cpuset it belongs to; unfortunately, when those flags have to be
changed, cpuset did so directly even whlie the target task is running,
which is obviously racy as task->flags may be modified by the task
itself at any time. This obscure bug manifested as corrupt
PF_USED_MATH flag leading to a weird crash.
The bug is fixed by moving the flag to task->atomic_flags. The first
two are prepatory ones to help defining atomic_flags accessors and the
third one is the actual fix"
* 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags
sched: add macros to define bitops for task atomic flags
sched: fix confusing PFA_NO_NEW_PRIVS constant
|
|
|
|
f4cb707e7a |
ACPI and power management fixes for 3.17-rc7
- Revert of a recent hibernation core commit that introduced
a NULL pointer dereference during resume for at least one user
(Rafael J Wysocki).
- Fix for the ACPI LPSS (Low-Power Subsystem) driver to disable
asynchronous PM callback execution for LPSS devices during system
suspend/resume (introduced in 3.16) which turns out to break
ordering expectations on some systems. From Fu Zhonghui.
- cpufreq core fix related to the handling of sysfs nodes during
system suspend/resume that has been broken for intel_pstate
since 3.15 from Lan Tianyu.
- Restore the generation of "online" uevents for ACPI container
devices that was removed in 3.14, but some user space utilities
turn out to need them (Rafael J Wysocki).
- The cpufreq core fails to release a lock in an error code path
after changes made in 3.14. Fix from Prarit Bhargava.
- ACPICA and ACPI/GPIO fixes to make the handling of ACPI GPIO
operation regions (which means AML using GPIOs) work correctly
in all cases from Bob Moore and Srinivas Pandruvada.
- Fix for a wrong sign of the ACPI core's create_modalias() return
value in case of an error from Mika Westerberg.
- ACPI backlight blacklist entry for ThinkPad X201s from Aaron Lu.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUJJGgAAoJEILEb/54YlRxt3kP/19OjVjGK/lFKJk4LCmQ77k5
6DDF7/clNJmYBkKBXGdyqqRVdDUXjRuHS1Yd78zWMmwdLtdOcyI+wBjG1w0mMU7o
vAYvXkIks9fCeKBRHSlqdtQROFf3+bxothKD8JGTONA5z4Fih40fqsnuSW8G7uJs
iTEQQK7L2uPJ+w1OnltwN6eNgzN5KqfxgxI+L6DhEMRjWXRHuhfRZorVIjvz+ALV
Fjm8shhjnhQKzS2zuv5PZ5gGM7zZBH7hy7kd4aDYsbppOLAB2pMOwVs0sgC1Xcbv
teyWkyzmhix2Z1bX9wwia5FfMgbnY2leejJN7mukKzHz8CQ1vxS98Sji2uviIAej
Ctp6GKjuemGvjryjbkstD6r3KYS8CuWAL++YwlamqSa0eWBuM+aD9YqGj4i6ntbU
8BFT5KXauOIsA5U51zC8wNUDHoTgBcvoN99zNIM1jIF81M7wuQrXUzJLXBStuSlR
/bDpExwxHt7I6MeUfRTjg37ApVNRAiStw32+DfsKAj4HLsqTkGs1879Paxf30T0f
Z2SlYr5Jeusu5u9DNhk7MG21A+m46R0jjLd1OKBbf2mrtfQfdKCo6szGR7vjEMZC
aGIlwtIA4iS4MN3UAyqOW3SxIPT2SxqPXzG/z27hRN5MUsGNWiClzcUsaaHoHmpp
GlbY/BvDYfur4NBeCSli
=SzQq
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are regression fixes (ACPI hotplug, cpufreq, hibernation, ACPI
LPSS driver), fixes for stuff that never worked correctly (ACPI GPIO
support in some cases and a wrong sign of an error code in the ACPI
core in one place), and one blacklist item for ACPI backlight
handling.
Specifics:
- Revert of a recent hibernation core commit that introduced a NULL
pointer dereference during resume for at least one user (Rafael J
Wysocki).
- Fix for the ACPI LPSS (Low-Power Subsystem) driver to disable
asynchronous PM callback execution for LPSS devices during system
suspend/resume (introduced in 3.16) which turns out to break
ordering expectations on some systems. From Fu Zhonghui.
- cpufreq core fix related to the handling of sysfs nodes during
system suspend/resume that has been broken for intel_pstate since
3.15 from Lan Tianyu.
- Restore the generation of "online" uevents for ACPI container
devices that was removed in 3.14, but some user space utilities
turn out to need them (Rafael J Wysocki).
- The cpufreq core fails to release a lock in an error code path
after changes made in 3.14. Fix from Prarit Bhargava.
- ACPICA and ACPI/GPIO fixes to make the handling of ACPI GPIO
operation regions (which means AML using GPIOs) work correctly in
all cases from Bob Moore and Srinivas Pandruvada.
- Fix for a wrong sign of the ACPI core's create_modalias() return
value in case of an error from Mika Westerberg.
- ACPI backlight blacklist entry for ThinkPad X201s from Aaron Lu"
* tag 'pm+acpi-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"
gpio / ACPI: Use pin index and bit length
ACPICA: Update to GPIO region handler interface.
ACPI / platform / LPSS: disable async suspend/resume of LPSS devices
cpufreq: release policy->rwsem on error
cpufreq: fix cpufreq suspend/resume for intel_pstate
ACPI / scan: Correct error return value of create_modalias()
ACPI / video: disable native backlight for ThinkPad X201s
ACPI / hotplug: Generate online uevents for ACPI containers
|
|
|
|
2ad654bc5e |
cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags
When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.
Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.
Here's the full report:
https://lkml.org/lkml/2014/9/19/230
To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.
v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Fixes:
|
|
|
|
5c4dd348af |
Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"
Revert commit |
|
|
|
324c7b62d0 |
Merge branch 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo: "One late fix for cgroup. I was waiting for another set of fixes for a long-standing obscure cpuset bug but am not sure whether they'll be ready before v3.17 release. This one is a simple fix for a mutex unlock balance bug in an allocation failure path in pidlist_array_load(). The bug was introduced in v3.14 and the fix is tagged for -stable" * 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix unbalanced locking |
|
|
|
3cf6b0151b | Merge branches 'tiny/bloat-o-meter-no-SyS', 'tiny/more-procless', 'tiny/no-advice', 'tiny/tinyconfig' and 'tiny/x86-boot-compressed-use-yn' into tiny/next | |
|
|
598a0c7d09 |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar: "Two kernel side fixes: a kprobes fix and a perf_remove_from_context() fix (which does not yet fix the migration bug which is WIP)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix a race condition in perf_remove_from_context() kprobes/x86: Free 'optinsn' cache when range check fails |
|
|
|
eb4aec84d6 |
cgroup: fix unbalanced locking
cgroup_pidlist_start() holds cgrp->pidlist_mutex and then calls
pidlist_array_load(), and cgroup_pidlist_stop() releases the mutex.
It is wrong that we release the mutex in the failure path in
pidlist_array_load(), because cgroup_pidlist_stop() will be called
no matter if cgroup_pidlist_start() returns errno or not.
Fixes:
|
|
|
|
1536340e7c |
Merge branches 'locking-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex and timer fixes from Thomas Gleixner:
"A oneliner bugfix for the jinxed futex code:
- Drop hash bucket lock in the error exit path. I really could slap
myself for intruducing that bug while fixing all the other horror
in that code three month ago ...
and the timer department is not too proud about the following fixes:
- Deal with a long standing rounding bug in the timeval to jiffies
conversion. It's a real issue and this fix fell through the cracks
for quite some time.
- Another round of alarmtimer fixes. Finally this code gets used
more widely and the subtle issues hidden for quite some time are
noticed and fixed. Nothing really exciting, just the itty bitty
details which bite the serious users here and there"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Unlock hb->lock in futex_wait_requeue_pi() error path
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Lock k_itimer during timer callback
alarmtimer: Do not signal SIGEV_NONE timers
alarmtimer: Return relative times in timer_gettime
jiffies: Fix timeval conversion to jiffies
|
|
|
|
474e941bed |
alarmtimer: Lock k_itimer during timer callback
Locks the k_itimer's it_lock member when handling the alarm timer's expiry callback. The regular posix timers defined in posix-timers.c have this lock held during timout processing because their callbacks are routed through posix_timer_fn(). The alarm timers follow a different path, so they ought to grab the lock somewhere else. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: Richard Larocque <rlarocque@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org> |
|
|
|
265b81d23a |
alarmtimer: Do not signal SIGEV_NONE timers
Avoids sending a signal to alarm timers created with sigev_notify set to SIGEV_NONE by checking for that special case in the timeout callback. The regular posix timers avoid sending signals to SIGEV_NONE timers by not scheduling any callbacks for them in the first place. Although it would be possible to do something similar for alarm timers, it's simpler to handle this as a special case in the timeout. Prior to this patch, the alarm timer would ignore the sigev_notify value and try to deliver signals to the process anyway. Even worse, the sanity check for the value of sigev_signo is skipped when SIGEV_NONE was specified, so the signal number could be bogus. If sigev_signo was an unitialized value (as it often would be if SIGEV_NONE is used), then it's hard to predict which signal will be sent. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: Richard Larocque <rlarocque@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org> |
|
|
|
e86fea7649 |
alarmtimer: Return relative times in timer_gettime
Returns the time remaining for an alarm timer, rather than the time at which it is scheduled to expire. If the timer has already expired or it is not currently scheduled, the it_value's members are set to zero. This new behavior matches that of the other posix-timers and the POSIX specifications. This is a change in user-visible behavior, and may break existing applications. Hopefully, few users rely on the old incorrect behavior. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: Richard Larocque <rlarocque@google.com> [jstultz: minor style tweak] Signed-off-by: John Stultz <john.stultz@linaro.org> |
|
|
|
d78c9300c5 |
jiffies: Fix timeval conversion to jiffies
timeval_to_jiffies tried to round a timeval up to an integral number
of jiffies, but the logic for doing so was incorrect: intervals
corresponding to exactly N jiffies would become N+1. This manifested
itself particularly repeatedly stopping/starting an itimer:
setitimer(ITIMER_PROF, &val, NULL);
setitimer(ITIMER_PROF, NULL, &val);
would add a full tick to val, _even if it was exactly representable in
terms of jiffies_ (say, the result of a previous rounding.) Doing
this repeatedly would cause unbounded growth in val. So fix the math.
Here's what was wrong with the conversion: we essentially computed
(eliding seconds)
jiffies = usec * (NSEC_PER_USEC/TICK_NSEC)
by using scaling arithmetic, which took the best approximation of
NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
x/(2^USEC_JIFFIE_SC), and computed:
jiffies = (usec * x) >> USEC_JIFFIE_SC
and rounded this calculation up in the intermediate form (since we
can't necessarily exactly represent TICK_NSEC in usec.) But the
scaling arithmetic is a (very slight) *over*approximation of the true
value; that is, instead of dividing by (1 usec/ 1 jiffie), we
effectively divided by (1 usec/1 jiffie)-epsilon (rounding
down). This would normally be fine, but we want to round timeouts up,
and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
would be fine if our division was exact, but dividing this by the
slightly smaller factor was equivalent to adding just _over_ 1 to the
final result (instead of just _under_ 1, as desired.)
In particular, with HZ=1000, we consistently computed that 10000 usec
was 11 jiffies; the same was true for any exact multiple of
TICK_NSEC.
We could possibly still round in the intermediate form, adding
something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
convert usec->nsec, round in nanoseconds, and then convert using
time*spec*_to_jiffies. This adds one constant multiplication, and is
not observably slower in microbenchmarks on recent x86 hardware.
Tested: the following program:
int main() {
struct itimerval zero = {{0, 0}, {0, 0}};
/* Initially set to 10 ms. */
struct itimerval initial = zero;
initial.it_interval.tv_usec = 10000;
setitimer(ITIMER_PROF, &initial, NULL);
/* Save and restore several times. */
for (size_t i = 0; i < 10; ++i) {
struct itimerval prev;
setitimer(ITIMER_PROF, &zero, &prev);
/* on old kernels, this goes up by TICK_USEC every iteration */
printf("previous value: %ld %ld %ld %ld\n",
prev.it_interval.tv_sec, prev.it_interval.tv_usec,
prev.it_value.tv_sec, prev.it_value.tv_usec);
setitimer(ITIMER_PROF, &prev, NULL);
}
return 0;
}
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Paul Turner <pjt@google.com>
Reported-by: Aaron Jacobs <jacobsa@google.com>
Signed-off-by: Andrew Hunter <ahh@google.com>
[jstultz: Tweaked to apply to 3.17-rc]
Signed-off-by: John Stultz <john.stultz@linaro.org>
|
|
|
|
13c42c2f43 |
futex: Unlock hb->lock in futex_wait_requeue_pi() error path
futex_wait_requeue_pi() calls futex_wait_setup(). If
futex_wait_setup() succeeds it returns with hb->lock held and
preemption disabled. Now the sanity check after this does:
if (match_futex(&q.key, &key2)) {
ret = -EINVAL;
goto out_put_keys;
}
which releases the keys but does not release hb->lock.
So we happily return to user space with hb->lock held and therefor
preemption disabled.
Unlock hb->lock before taking the exit route.
Reported-by: Dave "Trinity" Jones <davej@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
acbbe6fbb2 |
kcmp: fix standard comparison bug
The C operator <= defines a perfectly fine total ordering on the set of
values representable in a long. However, unlike its namesake in the
integers, it is not translation invariant, meaning that we do not have
"b <= c" iff "a+b <= a+c" for all a,b,c.
This means that it is always wrong to try to boil down the relationship
between two longs to a question about the sign of their difference,
because the resulting relation [a LEQ b iff a-b <= 0] is neither
anti-symmetric or transitive. The former is due to -LONG_MIN==LONG_MIN
(take any two a,b with a-b = LONG_MIN; then a LEQ b and b LEQ a, but a !=
b). The latter can either be seen observing that x LEQ x+1 for all x,
implying x LEQ x+1 LEQ x+2 ... LEQ x-1 LEQ x; or more directly with the
simple example a=LONG_MIN, b=0, c=1, for which a-b < 0, b-c < 0, but a-c >
0.
Note that it makes absolutely no difference that a transmogrying bijection
has been applied before the comparison is done. In fact, had the
obfuscation not been done, one could probably not observe the bug
(assuming all values being compared always lie in one half of the address
space, the mathematical value of a-b is always representable in a long).
As it stands, one can easily obtain three file descriptors exhibiting the
non-transitivity of kcmp().
Side note 1: I can't see that ensuring the MSB of the multiplier is
set serves any purpose other than obfuscating the obfuscating code.
Side note 2:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
#include <sys/syscall.h>
enum kcmp_type {
KCMP_FILE,
KCMP_VM,
KCMP_FILES,
KCMP_FS,
KCMP_SIGHAND,
KCMP_IO,
KCMP_SYSVSEM,
KCMP_TYPES,
};
pid_t pid;
int kcmp(pid_t pid1, pid_t pid2, int type,
unsigned long idx1, unsigned long idx2)
{
return syscall(SYS_kcmp, pid1, pid2, type, idx1, idx2);
}
int cmp_fd(int fd1, int fd2)
{
int c = kcmp(pid, pid, KCMP_FILE, fd1, fd2);
if (c < 0) {
perror("kcmp");
exit(1);
}
assert(0 <= c && c < 3);
return c;
}
int cmp_fdp(const void *a, const void *b)
{
static const int normalize[] = {0, -1, 1};
return normalize[cmp_fd(*(int*)a, *(int*)b)];
}
#define MAX 100 /* This is plenty; I've seen it trigger for MAX==3 */
int main(int argc, char *argv[])
{
int r, s, count = 0;
int REL[3] = {0,0,0};
int fd[MAX];
pid = getpid();
while (count < MAX) {
r = open("/dev/null", O_RDONLY);
if (r < 0)
break;
fd[count++] = r;
}
printf("opened %d file descriptors\n", count);
for (r = 0; r < count; ++r) {
for (s = r+1; s < count; ++s) {
REL[cmp_fd(fd[r], fd[s])]++;
}
}
printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
qsort(fd, count, sizeof(fd[0]), cmp_fdp);
memset(REL, 0, sizeof(REL));
for (r = 0; r < count; ++r) {
for (s = r+1; s < count; ++s) {
REL[cmp_fd(fd[r], fd[s])]++;
}
}
printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
return (REL[0] + REL[2] != 0);
}
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
"Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
000a7d66ec |
kernel/printk/printk.c: fix faulty logic in the case of recursive printk
We shouldn't set text_len in the code path that detects printk recursion
because text_len corresponds to the length of the string inside textbuf.
A few lines down from the line
text_len = strlen(recursion_msg);
is the line
text_len += vscnprintf(text + text_len, ...);
So if printk detects recursion, it sets text_len to 29 (the length of
recursion_msg) and logs an error. Then the message supplied by the
caller of printk is stored inside textbuf but offset by 29 bytes. This
means that the output of the recursive call to printk will contain 29
bytes of garbage in front of it.
This defect is caused by commit
|
|
|
|
3577af70a2 |
perf: Fix a race condition in perf_remove_from_context()
We saw a kernel soft lockup in perf_remove_from_context(), it looks like the `perf` process, when exiting, could not go out of the retry loop. Meanwhile, the target process was forking a child. So either the target process should execute the smp function call to deactive the event (if it was running) or it should do a context switch which deactives the event. It seems we optimize out a context switch in perf_event_context_sched_out(), and what's more important, we still test an obsolete task pointer when retrying, so no one actually would deactive that event in this situation. Fix it directly by reloading the task pointer in perf_remove_from_context(). This should cure the above soft lockup. Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1409696840-843-1-git-send-email-xiyou.wangcong@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
|
|
d030671f3f |
Merge branch 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo: "This pull request includes Alban's patch to disallow '\n' in cgroup names. Two other patches from Li to fix a possible oops when cgroup destruction races against other file operations and one from Vivek to fix a unified hierarchy devel behavior" * 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: check cgroup liveliness before unbreaking kernfs cgroup: delay the clearing of cgrp->kn->priv cgroup: Display legacy cgroup files on default hierarchy cgroup: reject cgroup names with '\n' |
|
|
|
6fef37c9a7 |
ACPI and power management fixes for 3.17-rc4
- Fix for recently broken test_suspend= command line argument
(Rafael J Wysocki).
- Fixes for regressions related to the ACPI video driver caused
by switching the default to native backlight handling in 3.16
from Hans de Goede.
- Fix for a sysfs attribute of ACPI device objects that returns
stale values sometimes due to the fact that they are cached
instead of executing the appropriate method (_SUN) every time
(broken in 3.14). From Yasuaki Ishimatsu.
- Fix for a deadlock between cpuidle_lock and cpu_hotplug.lock
in the ACPI processor driver from Jiri Kosina.
- Runtime output validation for the ACPI _DSD device configuration
object missing from the support for it that has been introduced
recently. From Mika Westerberg.
- Fix for an unuseful and misleading RAPL (Running Average Power
Limit) domain detection message in the RAPL driver from Jacob Pan.
- New Intel Haswell CPU ID for the RAPL driver from Jason Baron.
- New Clevo W350etq blacklist entry for the ACPI EC driver
from Lan Tianyu.
- Cleanup for the intel_pstate driver and the core generic PM
domains code from Gabriele Mazzotta and Geert Uytterhoeven.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUCcZNAAoJEILEb/54YlRxhAEP/1O6gUMzbEs1LNuMoUSP/Bcx
L+sAImXBsUsvEhEVSceXrM3Gr/TpTP7t4m+O05PC8QwpCEAAB5z6NXRK3uckwmR3
//jZKm5D5eXny4QkTaZl1yUmxdoX5DlwkPkhlNS6DxBn/cq+wvPxs0crGw+0arpi
Sylj8GFbVeibhD1Wz0wor95BRg+KcbTNy5jmECs5fSWmitMC62fYXpwybbxHg8Yt
4FIHiAZSsSDT+MFPnH68pwKN0D3HDVmK0FBzvexjiHQvDRh6QFUmjSCIbiV7lDj8
bZk84xmoMtiA4eIFiFk6MTx8BibumrbefG6TT8rFH7kCOfuHbxIOzslVVmYbSpvK
ldyndGueC4AIBRREJodt6jZ3j7CQeVmtxN/CL9PvA31p6Fz0R8vMgjPKNhNN0YWj
sILY2aHWACGxefCq2Jw4osvKzMucBsC/I8C14ErhKyLf1mH/AAiavefMvpIjLLKn
OOPB6XxnqBH8iadSbVpX2rgHvaMExzB9vDZPKK67CS04opTdqhS0VQR13dYw8EOk
KGuVzF18bQXHjm+FzeaYqfi24WkpVh8kHuXJ6msTnTGLMWdJkql41pNtkpw6s98m
oh92vI/CWKChC2jlsIOgdbTom5xbaiv8QLq0z+A22FNw3h6M3X5nIkJoIOUF0xTb
wXnTBZCQPRfUsK0KdbC3
=EzJF
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are regression fixes (ACPI sysfs, ACPI video, suspend test),
ACPI cpuidle deadlock fix, missing runtime validation of ACPI _DSD
output, a fix and a new CPU ID for the RAPL driver, new blacklist
entry for the ACPI EC driver and a couple of trivial cleanups
(intel_pstate and generic PM domains).
Specifics:
- Fix for recently broken test_suspend= command line argument (Rafael
Wysocki).
- Fixes for regressions related to the ACPI video driver caused by
switching the default to native backlight handling in 3.16 from
Hans de Goede.
- Fix for a sysfs attribute of ACPI device objects that returns stale
values sometimes due to the fact that they are cached instead of
executing the appropriate method (_SUN) every time (broken in
3.14). From Yasuaki Ishimatsu.
- Fix for a deadlock between cpuidle_lock and cpu_hotplug.lock in the
ACPI processor driver from Jiri Kosina.
- Runtime output validation for the ACPI _DSD device configuration
object missing from the support for it that has been introduced
recently. From Mika Westerberg.
- Fix for an unuseful and misleading RAPL (Running Average Power
Limit) domain detection message in the RAPL driver from Jacob Pan.
- New Intel Haswell CPU ID for the RAPL driver from Jason Baron.
- New Clevo W350etq blacklist entry for the ACPI EC driver from Lan
Tianyu.
- Cleanup for the intel_pstate driver and the core generic PM domains
code from Gabriele Mazzotta and Geert Uytterhoeven"
* tag 'pm+acpi-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
ACPI / scan: not cache _SUN value in struct acpi_device_pnp
cpufreq: intel_pstate: Remove unneeded variable
powercap / RAPL: change domain detection message
powercap / RAPL: add support for CPU model 0x3f
PM / domains: Make generic_pm_domain.name const
PM / sleep: Fix test_suspend= command line option
ACPI / EC: Add msi quirk for Clevo W350etq
ACPI / video: Disable native_backlight on HP ENVY 15 Notebook PC
ACPI / video: Add a disable_native_backlight quirk
ACPI / video: Fix use_native_backlight selection logic
ACPICA: ACPI 5.1: Add support for runtime validation of _DSD package.
|
|
|
|
81368f8bb8 |
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU fix from Ingo Molnar: "A boot hang fix for the offloaded callback RCU model (RCU_NOCB_CPU=y && (TREE_CPU=y || TREE_PREEMPT_RC)) in certain bootup scenarios" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Make nocb leader kthreads process pending callbacks after spawning |
|
|
|
ebc54f278f |
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Three fixlets from the timer departement:
- Update the timekeeper before updating vsyscall and pvclock. This
fixes the kvm-clock regression reported by Chris and Paolo.
- Use the proper irq work interface from NMI. This fixes the
regression reported by Catalin and Dave.
- Clarify the compat_nanosleep error handling mechanism to avoid
future confusion"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Update timekeeper before updating vsyscall and pvclock
compat: nanosleep: Clarify error handling
nohz: Restore NMI safe local irq work for local nohz kick
|
|
|
|
9bf2419fa7 |
timekeeping: Update timekeeper before updating vsyscall and pvclock
The update_walltime() code works on the shadow timekeeper to make the
seqcount protected region as short as possible. But that update to the
shadow timekeeper does not update all timekeeper fields because it's
sufficient to do that once before it becomes life. One of these fields
is tkr.base_mono. That stays stale in the shadow timekeeper unless an
operation happens which copies the real timekeeper to the shadow.
The update function is called after the update calls to vsyscall and
pvclock. While not correct, it did not cause any problems because none
of the invoked update functions used base_mono.
commit
|
|
|
|
849151dd54 |
compat: nanosleep: Clarify error handling
The error handling in compat_sys_nanosleep() is correct, but completely non obvious. Document it and restrict it to the -ERESTART_RESTARTBLOCK return value for clarity. Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
|
|
|
40bea03959 |
nohz: Restore NMI safe local irq work for local nohz kick
The local nohz kick is currently used by perf which needs it to be
NMI-safe. Recent commit though (
|
|
|
|
aa32362f01 |
cgroup: check cgroup liveliness before unbreaking kernfs
When cgroup_kn_lock_live() is called through some kernfs operation and another thread is calling cgroup_rmdir(), we'll trigger the warning in cgroup_get(). ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1228 at kernel/cgroup.c:1034 cgroup_get+0x89/0xa0() ... Call Trace: [<c16ee73d>] dump_stack+0x41/0x52 [<c10468ef>] warn_slowpath_common+0x7f/0xa0 [<c104692d>] warn_slowpath_null+0x1d/0x20 [<c10bb999>] cgroup_get+0x89/0xa0 [<c10bbe58>] cgroup_kn_lock_live+0x28/0x70 [<c10be3c1>] __cgroup_procs_write.isra.26+0x51/0x230 [<c10be5b2>] cgroup_tasks_write+0x12/0x20 [<c10bb7b0>] cgroup_file_write+0x40/0x130 [<c11aee71>] kernfs_fop_write+0xd1/0x160 [<c1148e58>] vfs_write+0x98/0x1e0 [<c114934d>] SyS_write+0x4d/0xa0 [<c16f656b>] sysenter_do_call+0x12/0x12 ---[ end trace 6f2e0c38c2108a74 ]--- Fix this by calling css_tryget() instead of cgroup_get(). v2: - move cgroup_tryget() right below cgroup_get() definition. (Tejun) Cc: <stable@vger.kernel.org> # 3.15+ Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> |
|
|
|
a4189487da |
cgroup: delay the clearing of cgrp->kn->priv
Run these two scripts concurrently:
for ((; ;))
{
mkdir /cgroup/sub
rmdir /cgroup/sub
}
for ((; ;))
{
echo $$ > /cgroup/sub/cgroup.procs
echo $$ > /cgroup/cgroup.procs
}
A kernel bug will be triggered:
BUG: unable to handle kernel NULL pointer dereference at 00000038
IP: [<c10bbd69>] cgroup_put+0x9/0x80
...
Call Trace:
[<c10bbe19>] cgroup_kn_unlock+0x39/0x50
[<c10bbe91>] cgroup_kn_lock_live+0x61/0x70
[<c10be3c1>] __cgroup_procs_write.isra.26+0x51/0x230
[<c10be5b2>] cgroup_tasks_write+0x12/0x20
[<c10bb7b0>] cgroup_file_write+0x40/0x130
[<c11aee71>] kernfs_fop_write+0xd1/0x160
[<c1148e58>] vfs_write+0x98/0x1e0
[<c114934d>] SyS_write+0x4d/0xa0
[<c16f656b>] sysenter_do_call+0x12/0x12
We clear cgrp->kn->priv in the end of cgroup_rmdir(), but another
concurrent thread can access kn->priv after the clearing.
We should move the clearing to css_release_work_fn(). At that time
no one is holding reference to the cgroup and no one can gain a new
reference to access it.
v2:
- move RCU_INIT_POINTER() into the else block. (Tejun)
- remove the cgroup_parent() check. (Tejun)
- update the comment in css_tryget_online_from_dir().
Cc: <stable@vger.kernel.org> # 3.15+
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
651bc1a474 |
Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull an RCU fix from Paul E. McKenney:
"This series contains a single commit fixing an initialization bug
reported by Amit Shah and fixed by Pranith Kumar (and tested by Amit).
This bug results in a boot-time hang in callback-offloaded configurations
where callbacks were posted before the offloading ('rcuo') kthreads
were created."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
62109b4317 |
PM / sleep: Fix test_suspend= command line option
After commit |
|
|
|
7505ceaf86 |
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq handling fixlet from Thomas Gleixner: "Just an export for an interrupt flow handler which is now used in gpio modules" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq: Export handle_fasteoi_irq |
|
|
|
74ca317c26 |
kexec: create a new config option CONFIG_KEXEC_FILE for new syscall
Currently new system call kexec_file_load() and all the associated code compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory code which currently uses gcc option -mcmodel=large. This option seems to be available only gcc 4.4 onwards. Hiding new functionality behind a new config option will not break existing users of old gcc. Those who wish to enable new functionality will require new gcc. Having said that, I am trying to figure out how can I move away from using -mcmodel=large but that can take a while. I think there are other advantages of introducing this new config option. As this option will be enabled only on x86_64, other arches don't have to compile generic kexec code which will never be used. This new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other arches had to do this for CONFIG_KEXEC. Now with introduction of new config option, we can remove crypto dependency from other arches. Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I had CONFIG_X86_64 defined, I got rid of that. For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to "depends on CRYPTO=y". This should be safer as "select" is not recursive. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Tested-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
|
|
800df627e2 |
resource: fix the case of null pointer access
Richard and Daniel reported that UML is broken due to changes to
resource traversal functions. Problem is that iomem_resource.child can
be null and new code does not consider that possibility. Old code used
a for loop and that loop will not even execute if p was null.
Revert back to for() loop logic and bail out if p is null.
I also moved sibling_only check out of resource_lock. There is no
reason to keep it inside the lock.
Following is backtrace of the UML crash.
RIP: 0033:[<0000000060039b9f>]
RSP: 0000000081459da0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000219b3fff RCX: 000000006010d1d9
RDX: 0000000000000001 RSI: 00000000602dfb94 RDI: 0000000081459df8
RBP: 0000000081459de0 R08: 00000000601b59f4 R09: ffffffff0000ff00
R10: ffffffff0000ff00 R11: 0000000081459e88 R12: 0000000081459df8
R13: 00000000219b3fff R14: 00000000602dfb94 R15: 0000000000000000
Kernel panic - not syncing: Segfault with no mm
CPU: 0 PID: 1 Comm: swapper Not tainted 3.16.0-10454-g58d08e3 #13
Stack:
00000000 000080d0 81459df0 219b3fff
81459e70 6010d1d9 ffffffff 6033e010
81459e50 6003a269 81459e30 00000000
Call Trace:
[<6010d1d9>] ? kclist_add_private+0x0/0xe7
[<6003a269>] walk_system_ram_range+0x61/0xb7
[<6000e859>] ? proc_kcore_init+0x0/0xf1
[<6010d574>] kcore_update_ram+0x4c/0x168
[<6010d72e>] ? kclist_add+0x0/0x2e
[<6000e943>] proc_kcore_init+0xea/0xf1
[<6000e859>] ? proc_kcore_init+0x0/0xf1
[<6000e859>] ? proc_kcore_init+0x0/0xf1
[<600189f0>] do_one_initcall+0x13c/0x204
[<6004ca46>] ? parse_args+0x1df/0x2e0
[<6004c82d>] ? parameq+0x0/0x3a
[<601b5990>] ? strcpy+0x0/0x18
[<60001e1a>] kernel_init_freeable+0x240/0x31e
[<6026f1c0>] kernel_init+0x12/0x148
[<60019fad>] new_thread_handler+0x81/0xa3
Fixes
|
|
|
|
11ed7f934c |
rcu: Make nocb leader kthreads process pending callbacks after spawning
The nocb callbacks generated before the nocb kthreads are spawned are
enqueued in the nocb queue for later processing. Commit
|
|
|
|
c0fe5dcb91 |
Josef Bacik found a bug in the ring_buffer_poll_wait() where the
condition variable (waiters_pending) was set before being added to the poll queue via poll_wait(). This allowed for a small race window to happen where an event could come in, check the condition variable see it set to true, clear it, and then wake all the waiters. But because the waiter set the variable before adding itself to the queue, the waker could have cleared the variable after it was set and then miss waking it up as it wasn't added to the queue yet. Discussing this bug, we realized that a memory barrier needed to be added too, for the rare case that something polls for a single trace event to happen (and just one, no more to come in), and miss the wakeup due to memory ordering. Ideally, a memory barrier needs to be added on the writer side too, but as that will kill tracing performance and this is for a situation that tracing wasn't even designed for (who traces one instance of an event, use a printk instead!), this isn't worth adding the barrier. But we can in the future add the barrier for when the buffer goes from empty to the first event, as that would cover this case. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJT/IgUAAoJEKQekfcNnQGumJgIALG63LVR4ZS+bjHGTxyovCiL EE6Aic7/hHB8ajn/QZJAaVbaxf0woIyPaU6NP5p17rgF44gTtaEzg3hDqOXkBXvh aMLTEz2Xm6nu1VQ5vk//9qqplE+WMXWS6YjjnxErRja90cBZblFy9h9LzwwexLkm oXmhjVF1ke5AKFiXQ+Dj9LJse80MvSEEFk1eeUR7oNqK/4rwzKmkefkUnk2NbST4 cFkOAbTfZnMXlhUhB2EY2Ptprty3scrA7bpe00ClzFmoQ9MxDVlLJBN9aEjaTnxM zKiXsxy/eJ+0IPSOSEajh3IJb96sbqZnt++28vDhck3e6k3G4CQwbuktPdQXUo8= =jful -----END PGP SIGNATURE----- Merge tag 'trace-fixes-v3.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull trace buffer epoll hang fix from Steven Rostedt: "Josef Bacik found a bug in the ring_buffer_poll_wait() where the condition variable (waiters_pending) was set before being added to the poll queue via poll_wait(). This allowed for a small race window to happen where an event could come in, check the condition variable see it set to true, clear it, and then wake all the waiters. But because the waiter set the variable before adding itself to the queue, the waker could have cleared the variable after it was set and then miss waking it up as it wasn't added to the queue yet. Discussing this bug, we realized that a memory barrier needed to be added too, for the rare case that something polls for a single trace event to happen (and just one, no more to come in), and miss the wakeup due to memory ordering. Ideally, a memory barrier needs to be added on the writer side too, but as that will kill tracing performance and this is for a situation that tracing wasn't even designed for (who traces one instance of an event, use a printk instead!), this isn't worth adding the barrier. But we can in the future add the barrier for when the buffer goes from empty to the first event, as that would cover this case" * tag 'trace-fixes-v3.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: trace: Fix epoll hang when we race with new entries |
|
|
|
7a486d3781 |
param: check for tainting before calling set op.
This means every set op doesn't need to call it, and it can move into params.c. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> |
|
|
|
91f9d330cc |
module: make it possible to have unsafe, tainting module params
Add flags field to struct kernel_params, and add the first flag: unsafe parameter. Modifying a kernel parameter with the unsafe flag set, either via the kernel command line or sysfs, will issue a warning and taint the kernel. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jean Delvare <khali@linux-fr.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Jon Mason <jon.mason@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> |
|
|
|
6a4c264313 |
module: rename KERNEL_PARAM_FL_NOARG to avoid confusion
Make it clear this is about kernel_param_ops, not kernel_param (which will soon have a flags field of its own). No functional changes. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jean Delvare <khali@linux-fr.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Jon Mason <jon.mason@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> |
|
|
|
4ce97dbf50 |
trace: Fix epoll hang when we race with new entries
Epoll on trace_pipe can sometimes hang in a weird case. If the ring buffer is empty when we set waiters_pending but an event shows up exactly at that moment we can miss being woken up by the ring buffers irq work. Since ring_buffer_empty() is inherently racey we will sometimes think that the buffer is not empty. So we don't get woken up and we don't think there are any events even though there were some ready when we added the watch, which makes us hang. This patch fixes this by making sure that we are actually on the wait list before we set waiters_pending, and add a memory barrier to make sure ring_buffer_empty() is going to be correct. Link: http://lkml.kernel.org/p/1408989581-23727-1-git-send-email-jbacik@fb.com Cc: stable@vger.kernel.org # 3.10+ Cc: Martin Lau <kafai@fb.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
|
|
|
01e9982ab3 |
The rewrite of the ftrace code that makes it possible to allow for
separate trampolines had a design flaw with the interaction between the function and function_graph tracers. The main flaw was the simplification of the use of multiple tracers having the same filter (like function and function_graph, that use the set_ftrace_filter file to filter their code). The design assumed that the two tracers could never run simultaneously as only one tracer can be used at a time. The problem with this assumption was that the function profiler could be implemented on top of the function graph tracer, and the function profiler could run at the same time as the function tracer. This caused the assumption to be broken and when ftrace detected this failed assumpiton it would spit out a nasty warning and shut itself down. Instead of using a single ftrace_ops that switches between the function and function_graph callbacks, the two tracers can again use their own ftrace_ops. But instead of having a complex hierarchy of ftrace_ops, the filter fields are placed in its own structure and the ftrace_ops can carefully use the same filter. This change took a bit to be able to allow for this and currently only the global_ops can share the same filter, but this new design can easily be modified to allow for any ftrace_ops to share its filter with another ftrace_ops. The first four patches deal with the change of allowing the ftrace_ops to share the filter (and this needs to go to 3.16 as well). The fifth patch fixes a bug that was also caused by the new changes but only for archs other than x86, and only if those archs implement a direct call to the function_graph tracer which they do not do yet but will in the future. It does not need to go to stable, but needs to be fixed before the other archs update their code to allow direct calls to the function_graph trampoline. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJT+hqSAAoJEKQekfcNnQGulvcH/0O4NMXX4HH1dQlYgKEaSYxE Nh8WdiewopF5iaeNvo+8Nzdq8D2k3KgMOqSlzJ4JVmzd7gjOBSGeKDfqFwR+IbTk 9LcaJJCI3oG3MEf6m7gZMdjKPKyxkeYHDtG7kRHo8z94eliV9pKC6fUnEWayQO3o Kv6IBupdkF8ICAiKRae5Uo0c9wjZ9YP0bZS7fxI2hJw3h/NMFnhnhUL03URIx8e3 dqgpweYg+P3KPfp2Jz6safdJqLTPK9rqqhkZhylbDl7o78xEzRN7wCyB6Nak00xz swRgsW6vFP7ci/YSNx+B6HCIf7NTm3WLDrrIhitNHcJUZwUMU3CRO9IJHGsTuEE= =J5lZ -----END PGP SIGNATURE----- Merge tag 'trace-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull fix for ftrace function tracer/profiler conflict from Steven Rostedt: "The rewrite of the ftrace code that makes it possible to allow for separate trampolines had a design flaw with the interaction between the function and function_graph tracers. The main flaw was the simplification of the use of multiple tracers having the same filter (like function and function_graph, that use the set_ftrace_filter file to filter their code). The design assumed that the two tracers could never run simultaneously as only one tracer can be used at a time. The problem with this assumption was that the function profiler could be implemented on top of the function graph tracer, and the function profiler could run at the same time as the function tracer. This caused the assumption to be broken and when ftrace detected this failed assumpiton it would spit out a nasty warning and shut itself down. Instead of using a single ftrace_ops that switches between the function and function_graph callbacks, the two tracers can again use their own ftrace_ops. But instead of having a complex hierarchy of ftrace_ops, the filter fields are placed in its own structure and the ftrace_ops can carefully use the same filter. This change took a bit to be able to allow for this and currently only the global_ops can share the same filter, but this new design can easily be modified to allow for any ftrace_ops to share its filter with another ftrace_ops. The first four patches deal with the change of allowing the ftrace_ops to share the filter (and this needs to go to 3.16 as well). The fifth patch fixes a bug that was also caused by the new changes but only for archs other than x86, and only if those archs implement a direct call to the function_graph tracer which they do not do yet but will in the future. It does not need to go to stable, but needs to be fixed before the other archs update their code to allow direct calls to the function_graph trampoline" * tag 'trace-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Use current addr when converting to nop in __ftrace_replace_code() ftrace: Fix function_profiler and function tracer together ftrace: Fix up trampoline accounting with looping on hash ops ftrace: Update all ftrace_ops for a ftrace_hash_ops update ftrace: Allow ftrace_ops to use the hashes from other ops |
|
|
|
7cad45eea3 |
irq: Export handle_fasteoi_irq
Export handle_fasteoi_irq to be able to use it in e.g. the Zynq gpio driver
since commit
|
|
|
|
44744bb344 |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar: "A kprobes and a perf compat ioctl fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Handle compat ioctl kprobes: Skip kretprobe hit in NMI context to avoid deadlock |
|
|
|
39b5552cd5 |
ftrace: Use current addr when converting to nop in __ftrace_replace_code()
In __ftrace_replace_code(), when converting the call to a nop in a function
it needs to compare against the "curr" (current) value of the ftrace ops, and
not the "new" one. It currently does not affect x86 which is the only arch
to do the trampolines with function graph tracer, but when other archs that do
depend on this code implement the function graph trampoline, it can crash.
Here's an example when ARM uses the trampolines (in the future):
------------[ cut here ]------------
WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4()
Modules linked in: omap_rng rng_core ipv6
CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28f303-dirty #52
[<c02188f4>] (unwind_backtrace) from [<c021343c>] (show_stack+0x20/0x24)
[<c021343c>] (show_stack) from [<c095a674>] (dump_stack+0x78/0x94)
[<c095a674>] (dump_stack) from [<c02532a0>] (warn_slowpath_common+0x7c/0x9c)
[<c02532a0>] (warn_slowpath_common) from [<c02532ec>] (warn_slowpath_null+0x2c/0x34)
[<c02532ec>] (warn_slowpath_null) from [<c02cbac4>] (ftrace_bug+0x17c/0x1f4)
[<c02cbac4>] (ftrace_bug) from [<c02cc44c>] (ftrace_replace_code+0x80/0x9c)
[<c02cc44c>] (ftrace_replace_code) from [<c02cc658>] (ftrace_modify_all_code+0xb8/0x164)
[<c02cc658>] (ftrace_modify_all_code) from [<c02cc718>] (__ftrace_modify_code+0x14/0x1c)
[<c02cc718>] (__ftrace_modify_code) from [<c02c7244>] (multi_cpu_stop+0xf4/0x134)
[<c02c7244>] (multi_cpu_stop) from [<c02c6e90>] (cpu_stopper_thread+0x54/0x130)
[<c02c6e90>] (cpu_stopper_thread) from [<c0271cd4>] (smpboot_thread_fn+0x1ac/0x1bc)
[<c0271cd4>] (smpboot_thread_fn) from [<c026ddf0>] (kthread+0xe0/0xfc)
[<c026ddf0>] (kthread) from [<c020f318>] (ret_from_fork+0x14/0x20)
---[ end trace dc9ce72c5b617d8f ]---
[ 65.047264] ftrace failed to modify [<c0208580>] asm_do_IRQ+0x10/0x1c
[ 65.054070] actual: 85:1b:00:eb
Fixes:
|
|
|
|
5f151b2401 |
ftrace: Fix function_profiler and function tracer together
The latest rewrite of ftrace removed the separate ftrace_ops of the function tracer and the function graph tracer and had them share the same ftrace_ops. This simplified the accounting by removing the multiple layers of functions called, where the global_ops func would call a special list that would iterate over the other ops that were registered within it (like function and function graph), which itself was registered to the ftrace ops list of all functions currently active. If that sounds confusing, the code that implemented it was also confusing and its removal is a good thing. The problem with this change was that it assumed that the function and function graph tracer can never be used at the same time. This is mostly true, but there is an exception. That is when the function profiler uses the function graph tracer to profile. The function profiler can be activated the same time as the function tracer, and this breaks the assumption and the result is that ftrace will crash (it detects the error and shuts itself down, it does not cause a kernel oops). To solve this issue, a previous change allowed the hash tables for the functions traced by a ftrace_ops to be a pointer and let multiple ftrace_ops share the same hash. This allows the function and function_graph tracer to have separate ftrace_ops, but still share the hash, which is what is done. Now the function and function graph tracers have separate ftrace_ops again, and the function tracer can be run while the function_profile is active. Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out) Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
|
|
|
bce0b6c51a |
ftrace: Fix up trampoline accounting with looping on hash ops
Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec the rec->flags by more than once (one per those that share the ftrace_hash). This means that the tramp_hash may not have a hash item when it was added. For example, if two ftrace_ops share a hash for a ftrace record, and the first ops has a trampoline, when it adds itself it will set the rec->flags TRAMP flag and increments its nr_trampolines counter. When the second ops is added, it must clear that tramp flag but also decrement the other ops that shares its hash. As the update to the function callbacks has not yet been performed, the other ops will not have the tramp hash set yet and it can not be used to know to decrement its nr_trampolines. Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is held, a ops with a trampoline to a record during an update of another ops that shares the record will have its func_hash pointing to it. Since a trampoline can only be set for a record if only one ops is attached to it, we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag is set) and then find the ops that has this record in its hashes. Also added some output to help debug when things go wrong. Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out) Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |