We need to export ppc64_firmware_features for modules. Before we do that
I think we should probably rename it to powerpc_firmware_features.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When we build for the MPC8540 ADS produce a uImage by default.
Updated the defconfig to reflect this as well.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Towards the goal of having arch/powerpc not build anything over in arch/ppc
move math-emu over. Also, killed some references to arch/ppc/ in the
arch/powerpc Makefile which should belong in drivers/ when the particular
sub-arch's move over to arch/powerpc.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Export validate_sp so we can use it in the oprofile calltrace code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Match, Linus's fix to arch/powerpc in arch/ppc. strcasecmp takes a size_t,
not an int, as its third argument.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
32-bit CHRP machines are now supported only in arch/powerpc, as are
all 64-bit PowerPC processors. This means that we don't use
Open Firmware on any platform in arch/ppc any more.
This makes PReP support a single-platform option like every other
platform support option in arch/ppc now, thus CONFIG_PPC_MULTIPLATFORM
is gone from arch/ppc. CONFIG_PPC_PREP is the option that selects
PReP support and is generally what has replaced
CONFIG_PPC_MULTIPLATFORM within arch/ppc.
_machine is all but dead now, being #defined to 0.
Updated Makefiles, comments and Kconfig options generally to reflect
these changes.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a mistake I made when editing these functions - when I
took out the interrupt disabling code (because interrupts are now
disabled by the caller) I left the register that is used for the MSR
value to be used during doze/nap uninitialized. This fixes it.
Also updated some of the comments in idle_power4.S and removed some
code that was copied over from idle_6xx.S but is no longer relevant
(we don't ever clear the CPU_FTR_CAN_NAP bit at runtime for POWER4).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Eliminate an unnecessary -- and flawed -- use of the expensive
num_online_cpus().
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Andi's previous fix to initialise powernow_data on all siblings
will not work properly with CPU Hotplug.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
It was reported from a field customer that global spin lock ptcg_lock
is giving a lot of grief on munmap performance running on a large numa
machine. What appears to be a problem coming from flush_tlb_range(),
which currently unconditionally calls platform_global_tlb_purge().
For some of the numa machines in existence today, this function is
mapped into ia64_global_tlb_purge(), which holds ptcg_lock spin lock
while executing ptc.ga instruction.
Here is a patch that attempt to avoid global tlb purge whenever
possible. It will use local tlb purge as much as possible. Though the
conditions to use local tlb purge is pretty restrictive. One of the
side effect of having flush tlb range instruction on ia64 is that
kernel don't get a chance to clear out cpu_vm_mask. On ia64, this mask
is sticky and it will accumulate if process bounces around. Thus
diminishing the possible use of ptc.l. Thoughts?
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Function lazy_mmu_prot_update is also used on huge pages when it is called
by set_huge_ptep_writable, but it isn't aware of huge pages.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
DDC reading via the Video BIOS may take several tens of seconds with some
combination of display cards and monitors.
Make this option configurable. It defaults to `y' to minimise disruption.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an RTC subsystem driver for the ARM SA1100/PXA2XX processor RTC.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix some namespace conflicts between the RTC subsystem and the ARM Integrator
time functions.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes from the ARM subsytem some of the rtc-related functions
that have been included in the RTC subsystem. It also fixes some naming
collisions.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
x86_64: add the futex_atomic_cmpxchg_inuser() assembly implementation, and
wire up the new syscalls.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386: add the futex_atomic_cmpxchg_inuser() assembly implementation, and wire
up the new syscalls.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Just about every architecture defines some macros to do operations on pfns.
They're all virtually identical. This patch consolidates all of them.
One minor glitch is that at least i386 uses them in a very skeletal header
file. To keep away from #include dependency hell, I stuck the new
definitions in a new, isolated header.
Of all of the implementations, sh64 is the only one that varied by a bit.
It used some masks to ensure that any sign-extension got ripped away before
the arithmetic is done. This has been posted to that sh64 maintainers and
the development list.
Compiles on x86, x86_64, ia64 and ppc64.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Because pgdat_list was linked to pgdat_list in *reverse* order, (By default)
some of arch has to sort it by themselves.
for_each_pgdat has gone..for_each_online_pgdat() uses node_online_map, which
doesn't need to be sorted.
This patch removes codes for sorting pgdat.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't use cpuid.2 to determine cache info if cpuid.4 is supported. The
exception is P4 trace cache. We always use cpuid.2 to get trace cache
under P4.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new sched domain for representing multi-core with shared caches
between cores. Consider a dual package system, each package containing two
cores and with last level cache shared between cores with in a package. If
there are two runnable processes, with this appended patch those two
processes will be scheduled on different packages.
On such systems, with this patch we have observed 8% perf improvement with
specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2
users).
This new domain will come into play only on multi-core systems with shared
caches. On other systems, this sched domain will be removed by domain
degeneration code. This new domain can be also used for implementing power
savings policy (see OLS 2005 CMP kernel scheduler paper for more details..
I will post another patch for power savings policy soon)
Most of the arch/* file changes are for cpu_coregroup_map() implementation.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes a race in the starting of write_sigio_thread. Previously, some of
the data needed by the thread was initialized after the clone. If the thread
ran immediately, it would see the uninitialized data, including an empty
pollfds, which would cause it to hang.
We move the data initialization to before the clone, and adjust the error
paths and cleanup accordingly.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Behavior when booting two UMLs with the same umid was broken. The second one
would steal the umid. This fixes that, making the second UML take a random
umid instead.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes a process segfault where a signal was being delivered such that a
new stack page needed to be allocated to hold the signal frame. This was
tripping some logic in the page fault handler which wouldn't allocate the page
if the faulting address was more that 32 bytes lower than the current stack
pointer. Since a signal frame is greater than 32 bytes, this exercised that
case.
It's fixed by updating the SP in the pt_regs before starting to copy the
signal frame. Since those are the registers that will be copied on to the
stack, we have to be careful to put the original SP, not the new one which
points to the signal frame, on the stack.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds a 'c' option to the ubd switch which turns off host file locking so
that the device can be shared, as with a cluster. There's also some
whitespace cleanup while I was in this file.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This rearranges the OS declarations by moving some declarations into os.h.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The serial UML OS-abstraction layer patch (um/kernel dir).
This moves all systemcalls from tty_log.c file under os-Linux dir
Signed-off-by: Gennady Sharapov <Gennady.V.Sharapov@intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For security reasons, UML in is_syscall() needs to have access to code in
vsyscall-page. The current implementation grants this access by explicitly
allowing access to vsyscall in access_ok_skas(). With this change,
copy_from_user() may be used to read the code. Ptrace access to vsyscall-page
for debugging already was implemented in get_user_pages() by mainline. In
i386, copy_from_user can't access vsyscall-page, but returns EFAULT.
To make UML behave as i386 does, I changed is_syscall to use
access_process_vm(current) to read the code from vsyscall-page. This doesn't
hurt security, but simplifies the code and prepares implementation of
stub-vmas.
Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The serial UML OS-abstraction layer patch (um/kernel dir).
This moves sigio_user.c to os-Linux dir
Signed-off-by: Gennady Sharapov <Gennady.V.Sharapov@intel.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The serial UML OS-abstraction layer patch (um/kernel dir).
This moves all startup code from sigio_user.c file under os-Linux dir
Signed-off-by: Gennady Sharapov <Gennady.V.Sharapov@intel.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The serial UML OS-abstraction layer patch (um/kernel dir).
This joins irq_user.c and irq.c files.
Signed-off-by: Gennady Sharapov <Gennady.V.Sharapov@intel.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The serial UML OS-abstraction layer patch (um/kernel dir).
This moves all systemcalls from irq_user.c file under os-Linux dir
Signed-off-by: Gennady Sharapov <Gennady.V.Sharapov@intel.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some printf formats are incorrect for large memory sizes.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes a conflict between a header and what gcc "knows" the declaration'
to be.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current timer_pm.c reads I/O port triple times, in order to avoid the bug
of chipset. But I/O port is slow.
2.6.16 (pmtmr)
Simple gettimeofday: 3.6532 microseconds
2.6.16+patch (pmtmr)
Simple gettimeofday: 1.4582 microseconds
[if chip is buggy, probably it will be 7us or more in 4.2% of probability.]
This patch adds blacklist of buggy chip, and if chip is not buggy, this
uses fast normal version instead of slow workaround version.
If chip is buggy, warnings "pmtmr is slow". But sounds like there is gray
zone. I found the PIIX4 errata, but I couldn't find the ICH4 errata. But
some motherboard seems to have problem.
So, if we found a ICH4, generate warnings, and use a workaround version.
If user's ICH4 is good, the user can specify the "pmtmr_good" boot
parameter to use fast version.
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ifeq ($CONFIG_PREEMPT,y) -> ifeq ($(CONFIG_PREEMPT),y)
Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Erik Mouw
The LART website moved to http://www.lartmaker.nl/. This patch
updates the URL in ARM specific files.
Signed-off-by: Erik Mouw <erik@bitwizard.nl>
Acked-by: Jan-Derk Bakker <jdb@lartmaker.nl>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The high page vector (0xFFFF0000) does not supported in nommu mode.
This patch allows the vectors to be 0x00000000 or the begining of DRAM
in nommu mode.
Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds Kconfig-nommu for noMMU specific configurations
and MMUEXT variable into Makefile.
Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds nommu version start-up code head-nommu.S.
The common part of the start-up codes is moved to head-common.S.
Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On CHRP machines we are supposed to call into firmware (RTAS)
periodically, to give it a chance to check for errors and other
events. Under ppc we had some special code in timer_interrupt
to do this, but that didn't get transferred over to arch/powerpc.
Instead, we use an array of timer_list structs, one per CPU,
and use add_timer_on to make sure each one gets called on the
appropriate CPU.
With this we can remove the heartbeat_* elements of the ppc_md
struct.
Signed-off-by: Paul Mackerras <paulus@samba.org>
__down, __down_interruptible and __up are defined and exported in
arch/powerpc/kernel/semaphore.c, and used from there for ARCH=ppc,
so there is no need to export them in arch/ppc/kernel/ppc_ksyms.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds MPU support in boot/compressed/head.S.
Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
All of the things needed for 32-bit ARCH=powerpc builds have now
moved to arch/powerpc/kernel, so we don't need to go down into
arch/ppc/kernel any more, and we can remove the CONFIG_PPC_MERGE
conditional from arch/ppc/kernel/Makefile.
There were two files still referenced in the merge section of
arch/ppc/kernel/Makefile: ppc-stub.o, depending on CONFIG_KGDB,
and dma-mapping.o, depending on CONFIG_NOT_COHERENT_CACHE. None
of the platforms currently in ARCH=powerpc have caches that
aren't coherent with DMA, but when we do get one we'll move
dma-mapping.c over. As for CONFIG_KGDB, none of the Kconfig
files in the tree define it, so I'll let it languish for now.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The worst part about this bug is what it would cause
a hugepage TSB to be allocated for every address space
since "0 >= 0".
Signed-off-by: David S. Miller <davem@davemloft.net>
... and rename it to module_32.c since it is the 32-bit version.
The 32-bit and 64-bit ABIs are sufficiently different that having
a merged version isn't really practical.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Also renamed temp.c to tau_6xx.c (for thermal assist unit) and updated
the Kconfig option description and help text for CONFIG_TAU.
Signed-off-by: Paul Mackerras <paulus@samba.org>
No functional changes, but call it l2cr_6xx.S since it is specific
to 6xx-family (including G3/750 and G4/74xx) processors.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since pSeries only wants to do something different in the idle loop when
there is no work to do, we can simplify the code by implementing
ppc_md.power_save functions instead of complete idle loops. There are
two versions: one for shared-processor partitions and one for dedicated-
processor partitions.
With this we also do a cede_processor() call on dedicated processor
partitions if the poll_pending() call indicates that the hypervisor
has work it wants to do.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This unifies the 32-bit (ARCH=ppc and ARCH=powerpc) and 64-bit idle
loops. It brings over the concept of having a ppc_md.power_save
function from 32-bit to ARCH=powerpc, which lets us get rid of
native_idle(). With this we will also be able to simplify the idle
handling for pSeries and cell.
Signed-off-by: Paul Mackerras <paulus@samba.org>
ppc32: Reorganize and complete MPC52xx initial cpu setup
This patch splits up the CPU setup into a generic part and a
platform specific part. We also add a few missing init at the
same time.
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
ppc32: Adds support for the LITE5200B dev board
This LITE5200B devboard is the new development board for the
Freescale MPC5200 processor. It has two PCI slots and so a
different PCI IRQ routing.
Signed-off-by: John Rigby <jrigby@freescale.com>
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
ppc32: Adds support for the PCI hostbridge in MPC5200B
Signed-off-by: John Rigby <jrigby@freescale.com>
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We only ever execute the loop once, so let's move it to a function
making it more readable. Cleanup patch, no functional change.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We were printing node ids in hex in one spot. Lets be consistent and
always print them in decimal.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We currently have a hack to flip the boot cpu and its secondary thread
to logical cpuid 0 and 1. This means the logical - physical mapping will
differ depending on which cpu is boot cpu. This is most apparent on
kexec, where we might kexec on any cpu and therefore change the mapping
from boot to boot.
The patch below does a first pass early on to work out the logical cpuid
of the boot thread. We then fix up some paca structures to match.
Ive also removed the boot_cpuid_phys variable for ppc64, to be
consistent we use get_hard_smp_processor_id(boot_cpuid) everywhere.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If an SPE attempts a DMA put to a local store after already doing
a get, the kernel must update the HW PTE to allow the write access.
This case was not being handled correctly.
From: Mike Kistler <mkistler@us.ibm.com>
Signed-off-by: Mike Kistler <mkistler@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
I'm not sure where the information came from, but I assumed
that doing cache-inhibited mappings for mmio regions was
sufficient.
It seems we also need the guarded bit set, like everyone
else, which is the default for ioremap.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As noticed by Milton Miller, setting the initial affinity in
spider-pic can go wrong if the target node field was not orinally
empty.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change the dynamic PCI probe function for pSeries to use
ppc_md.pci_probe_mode() when appropriate.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Do not call prom exit prom_panic. It clears the screen and the exit
message is lost.
On some (or all?) pmacs it causes another crash when OF tries to print
the date and time in its banner.
Set of_platform earlier to catch more prom_panic() calls.
Signed-off-by: Olaf Hering <olh@suse.de>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The return statement is to prevent `warning: 'nid' might be used uninitialized
in this function'.
Cc: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The address of variable val in prom_init_stdout is passed to prom_getprop.
prom_getprop casts the pointer to u32 and passes it to call_prom in the hope
that OpenFirmware stores something there.
But the pointer is truncated in the lower bits and the expected value is
stored somewhere else.
In my testing I had a stackpointer of 0x0023e6b4. val was at offset 120,
wich has address 0x0023e72c. But the value passed to OF was 0x0023e728.
c00000000040b710: 3b 01 00 78 addi r24,r1,120
...
c00000000040b754: 57 08 00 38 rlwinm r8,r24,0,0,28
...
c00000000040b784: 80 01 00 78 lwz r0,120(r1)
...
c00000000040b798: 90 1b 00 0c stw r0,12(r27)
...
The stackpointer came from 32bit code.
The chain was yaboot -> zImage -> vmlinux
PowerMac OpenFirmware does appearently not handle the ELF sections
correctly. If yaboot was compiled in
/usr/src/packages/BUILD/lilo-10.1.1/yaboot, then the stackpointer is
unaligned. But the stackpointer is correct if yaboot is compiled in
/tmp/yaboot.
This bug triggered since 2.6.15, now prom_getprop is an inline
function. gcc clears the lower bits, instead of just clearing the
upper 32 bits.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
the mfc member of a new context was not initialized to zero,
which potentially leads to wild memory accesses.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch is layered on top of CONFIG_SPARSEMEM
and is patterned after direct mapping of LS.
This patch allows mmap() of the following regions:
"mfc", which represents the area from [0x3000 - 0x3fff];
"cntl", which represents the area from [0x4000 - 0x4fff];
"signal1" which begins at offset 0x14000; "signal2" which
begins at offset 0x1c000.
The signal1 & signal2 files may be mmap()'d by regular user
processes. The cntl and mfc file, on the other hand, may
only be accessed if the owning process has CAP_SYS_RAWIO,
because they have the potential to confuse the kernel
with regard to parallel access to the same files with
regular file operations: the kernel always holds a spinlock
when accessing registers in these areas to serialize them,
which can not be guaranteed with user mmaps,
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a new file called 'mfc' to each spufs directory.
The file accepts DMA commands that are a subset of what would
be legal DMA commands for problem state register access. Upon
reading the file, a bitmask is returned with the completed
tag groups set.
The file is meant to be used from an abstraction in libspe
that is added by a different patch.
From the kernel perspective, this means a process can now
offload a memory copy from or into an SPE local store
without having to run code on the SPE itself.
The transfer will only be performed while the SPE is owned
by one thread that is waiting in the spu_run system call
and the data will be transferred into that thread's
address space, independent of which thread started the
transfer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An SPU does not have a way to implement system calls
itself, but it can create intercepts to the kernel.
This patch uses the method defined by the JSRE interface
for C99 host library calls from an SPU to implement
Linux system calls. It uses the reserved SPU stop code
0x2104 for this, using the structure layout and syscall
numbers for ppc64-linux.
I'm still undecided wether it is better to have a list
of allowed syscalls or a list of forbidden syscalls,
since we can't allow an SPU to call all syscalls that
are defined for ppc64-linux.
This patch implements the easier choice of them, with a
blacklist that only prevents an SPU from calling anything
that interacts with its own execution, e.g fork, execve,
clone, vfork, exit, spu_run and spu_create and everything
that deals with signals.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
powerpc currently declares some of its own system calls
in <asm/unistd.h>, but not all of them. That place also
contains remainders of the now almost unused kernel syscall
hack.
- Add a new <asm/syscalls.h> with clean declarations
- Include that file from every source that implements one
of these
- Get rid of old declarations in <asm/unistd.h>
This patch is required as a base for implementing system
calls from an SPU, but also makes sense as a general
cleanup.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Apparently we have found a bug in the CPU that causes
external interrupts to sometimes get disabled indefinitely.
This adds a workaround for the problem.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current interrupt controller setup on Cell is done
in a rather ad-hoc way with device tree properties
that are not standardized at all.
In an attempt to do something that follows the OF standard
(or at least the IBM extensions to it) more closely,
we have now come up with this patch. It still provides
a fallback to the old behaviour when we find older firmware,
that hack can not be removed until the existing customer
installations have upgraded.
Cc: hpenner@de.ibm.com
Cc: stk@de.ibm.com
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Milton Miller <miltonm@bga.com>
Cc: benh@kernel.crashing.org
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The default configuration in mainline got a little out of
sync with what we use internally.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A small bug crept in the iommu driver when we made it more
generic. This patch is needed for boards that have a dma
window that does not start at bus address zero.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The amba-pl010 hardware does not provide RTS and DTR control lines; it
is expected that these will be implemented using GPIO. Allow platforms
to supply a function to implement manipulation of modem control lines.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Richard Purdie
Add an EXPORT_SYMBOL for the Akita IO Expander Device.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As announced in feature-removal-schedule.txt.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The find_*_bit() routines are defined to work on a pointer to unsigned long.
But partial_page.bitmap is unsigned int and it is passed to find_*_bit() in
arch/ia64/ia32/sys_ia32.c. So the compiler will print warnings.
This patch changes to unsigned long instead.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use config options instead of gcc builtin definition to tell the use of
instruction set extensions (CIX and FIX).
This is introduced to tell the kbuild system the use of opmized hweight*()
routines on alpha architecture.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Build fix for user mode linux.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix warning messages triggered by bitops code consolidation patches.
cxn_bitmap is the array of unsigned long. '&' is unnesesary for the argument
of *_bit() routins.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide proper kprobes fault handling, if a user-specified pre/post handlers
tries to access user address space, through copy_from_user(), get_user() etc.
The user-specified fault handler gets called only if the fault occurs while
executing user-specified handlers. In such a case user-specified handler is
allowed to fix it first, later if the user-specifed fault handler does not fix
it, we try to fix it by calling fix_exception().
The user-specified handler will not be called if the fault happens when single
stepping the original instruction, instead we reset the current probe and
allow the system page fault handler to fix it up.
I could not test this patch for sparc64.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide proper kprobes fault handling, if a user-specified pre/post handlers
tries to access user address space, through copy_from_user(), get_user() etc.
The user-specified fault handler gets called only if the fault occurs while
executing user-specified handlers. In such a case user-specified handler is
allowed to fix it first, later if the user-specifed fault handler does not fix
it, we try to fix it by calling fix_exception().
The user-specified handler will not be called if the fault happens when single
stepping the original instruction, instead we reset the current probe and
allow the system page fault handler to fix it up.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Anil S Keshavamurthy<anil.s.keshavamurthy@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide proper kprobes fault handling, if a user-specified pre/post handlers
tries to access user address space, through copy_from_user(), get_user() etc.
The user-specified fault handler gets called only if the fault occurs while
executing user-specified handlers. In such a case user-specified handler is
allowed to fix it first, later if the user-specifed fault handler does not fix
it, we try to fix it by calling fix_exception().
The user-specified handler will not be called if the fault happens when single
stepping the original instruction, instead we reset the current probe and
allow the system page fault handler to fix it up.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide proper kprobes fault handling, if a user-specified pre/post handlers
tries to access user address space, through copy_from_user(), get_user() etc.
The user-specified fault handler gets called only if the fault occurs while
executing user-specified handlers. In such a case user-specified handler is
allowed to fix it first, later if the user-specifed fault handler does not fix
it, we try to fix it by calling fix_exception().
The user-specified handler will not be called if the fault happens when single
stepping the original instruction, instead we reset the current probe and
allow the system page fault handler to fix it up.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide proper kprobes fault handling, if a user-specified pre/post handlers
tries to access user address space, through copy_from_user(), get_user() etc.
The user-specified fault handler gets called only if the fault occurs while
executing user-specified handlers. In such a case user-specified handler is
allowed to fix it first, later if the user-specifed fault handler does not fix
it, we try to fix it by calling fix_exception().
The user-specified handler will not be called if the fault happens when single
stepping the original instruction, instead we reset the current probe and
allow the system page fault handler to fix it up.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently kprobe handler traps only happen in kernel space, so function
kprobe_exceptions_notify should skip traps which happen in user space.
This patch modifies this, and it is based on 2.6.16-rc4.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: <hiramatu@sdl.hitachi.co.jp>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When kretprobe probes the schedule() function, if the probed process exits
then schedule() will never return, so some kretprobe instances will never
be recycled.
In this patch the parent process will recycle retprobe instances of the
probed function and there will be no memory leak of kretprobe instances.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In normal operation, kretprobe makes a target function return to trampoline
code. A kprobe (called trampoline_probe) has been inserted in the trampoline
code. When the kernel hits this kprobe, it calls kretprobe's handler and it
returns to the original return address.
Kretprobe-booster removes the trampoline_probe. It allows the trampoline code
to call kretprobe's handler directly instead of invoking kprobe. The
trampoline code returns to the original return address.
(changelog from Chuck Ebbert <76306.1226@compuserve.com> - thanks ;))
Signed-off-by: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current kprobe copies the original instruction at the probe point and replaces
it with a breakpoint instruction (int3). When the kernel hits the probe
point, kprobe handler is invoked. And the copied instruction is single-step
executed on the copied buffer (not on the original address) by kprobe. After
that, the kprobe checks registers and modify it (if need) as if the
instructions was executed on the original address.
My proposal is based on the fact there are many instructions which do NOT
require the register modification after the single-step execution. When the
copied instruction is a kind of them, kprobe just jumps back to the next
instruction after single-step execution. If so, why don't we execute those
instructions directly?
With kprobe-booster patch, kprobes will execute a copied instruction directly
and (if need) jump back to original code. This direct execution is executed
when the kprobe don't have both post_handler and break_handler, and the copied
instruction can be executed directly.
I sorted instructions which can be executed directly or not;
- Call instructions are NG(can not be executed directly).
We should correct the return address pushed into top of stack.
- Indirect instructions except for absolute indirect-jumps
are NG. Those instructions changes EIP randomly. We should
check EIP and correct it.
- Instructions that change EIP beyond the range of the
instruction buffer are NG.
- Instructions that change EIP to tail 5 bytes of the
instruction buffer (it is the size of a jump instruction).
We must write a jump instruction which backs to original
kernel code in the instruction buffer.
- Break point instruction is NG. We should not touch EIP and
pass to other handlers.
- Absolute direct/indirect jumps are OK.- Conditional Jumps are NG.
- Halt and software-interruptions are NG. Because it will stay on
the instruction buffer of kprobes.
- Prefixes are NG.
- Unknown/reserved opcode is NG.
- Other 1 byte instructions are OK. But those instructions need a
jump back code.
- 2 bytes instructions are mapped sparsely. So, in this release,
this patch don't boost those instructions.
>From Intel's IA-32 opcode map described in IA-32 Intel Architecture Software
Developer's Manual Vol.2 B, I determined that following opcodes are not
boostable.
- 0FH (2byte escape)
- 70H - 7FH (Jump on condition)
- 9AH (Call) and 9CH (Pushf)
- C0H-C1H (Grp 2: includes reserved opcode)
- C6H-C7H (Grp11: includes reserved opcode)
- CCH-CEH (Software-interrupt)
- D0H-D3H (Grp2: includes reserved opcode)
- D6H (Reserved)
- D8H-DFH (Coprocessor)
- E0H-E3H (loop/conditional jump)
- E8H (Call)
- F0H-F3H (Prefixes and reserved)
- F4H (Halt)
- F6H-F7H (Grp3: includes reserved opcode)
- FEH-FFH(Grp4,5: includes reserved opcode)
Kprobe-booster checks whether target instruction can be boosted (can be
executed directly) at arch_copy_kprobe() function. If the target instruction
can be boosted, it clears "boostable" flag. If not, it sets "boostable" flag
-1. This is disabled status. In resume_execution() function, If "boostable"
flag is cleared, kprobe-booster measures the size of the target instruction
and sets "boostable" flag 1.
In kprobe_handler(), kprobe checks the "boostable" flag. If the flag is 1, it
resets current kprobe and executes instruction buffer directly instead of
single stepping.
When unregistering a boosted kprobe, it calls synchronize_sched()
after "int3" is removed. So we can ensure followings after
the synchronize_sched() called.
- interrupt handlers are finished on all CPUs.
- instruction buffer is not executed on all CPUs.
And we can release the boosted kprobe safely.
And also, on preemptible kernel, the booster is not enabled where the kernel
preemption is enabled. So, there are no preempted threads on the instruction
buffer.
The description of kretprobe-booster:
====================================
In the normal operation, kretprobe make a target function return to trampoline
code. And a kprobe (called trampoline_probe) have been inserted at the
trampoline code. When the kernel hits this kprobe, it calls kretprobe's
handler and it returns to original return address.
Kretprobe-booster patch removes the trampoline_probe. It allows the
trampoline code to call kretprobe's handler directly instead of invoking
kprobe. And tranpoline code returns to original return address.
This new trampoline code stores and restores registers, so the kretprobe
handler is still able to access those registers.
Current kprobe has about 1.3 usec/probe(*) overhead, and kprobe-booster patch
reduces it to 0.6 usec/probe(*). Also current kretprobe has about 2.0
usec/probe(*) overhead. Kprobe-booster patch reduces it to 1.3 usec/probe(*),
and the combination of both kprobe-booster patch and kretprobe-booster patch
reduces it to 0.9 usec/probe(*).
I expect the combination of both patches can reduce half of a probing
overhead.
Performance numbers strongly depend on the processor model.
Andrew Morton wrote:
> These preempt tricks look rather nasty. Can you please describe what the
> problem is, precisely? And how this code avoids it? Perhaps we can find
> something cleaner.
The problem is how to remove the copied instructions of the
kprobe *safely* on the preemptable kernel (CONFIG_PREEMPT=y).
Kprobes basically executes the following actions;
(1)int3
(2)preempt_disable()
(3)kprobe_prehandler()
(4)copied instructioin(single step)
(5)kprobe_posthandler()
(6)preempt_enable()
(7)return to the original code
During the execution of copied instruction, preemption is
disabled (from step (2) to (6)).
When unregistering the probes, Kprobe waits for RCU
quiescent state by using synchronize_sched() after removing
int3 instruction.
Thus we can ensure the copied instruction is not executed.
On the other hand, kprobe-booster executes the following actions;
(1)int3
(2)preempt_disable()
(3)kprobe_prehandler()
(4)preempt_enable() <-- this one is added by my patch
(5)copied instruction(direct execution)
(6)jmp back to the original code
The problem is that we have no way to prevent preemption on
step (5) or (6). We cannot call preempt_disable() after step (6),
because there are no rooms to do that. Thus, some other
processes may be preempted at step(5) or (6) on preemptable kernel.
And I couldn't find the easy way to ensure that other processes'
stack do *not* have the address of them. (I thought some way
to do that, but those are very costly.)
So currently, I simply boost the kprobe only when the probe
point is already preemption disabled.
> Also, the patch adds a preempt_enable() but I don't see a corresponding
> preempt_disable(). Am I missing something?
It is corresponding to the preempt_disable() in the top of
kprobe_handler().
I copied the code of kprobe_handler() here:
static int __kprobes kprobe_handler(struct pt_regs *regs)
{
struct kprobe *p;
int ret = 0;
kprobe_opcode_t *addr = NULL;
unsigned long *lp;
struct kprobe_ctlblk *kcb;
/*
* We don't want to be preempted for the entire
* duration of kprobe processing
*/
preempt_disable(); <-- HERE
kcb = get_kprobe_ctlblk();
Signed-off-by: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up kprobe's resume_execute() for i386 arch.
Signed-off-by: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Coverity found an over-run @ line 364 of efi.c
This is due to the loop checking the size correctly, then adding a '\0'
after possibly hitting the end of the array.
Ensure the loop exits with one space left in the array.
Signed-off-by: Darren Jenkins <darrenrjenkins@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create compat_sys_adjtimex and use it an all appropriate places.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a copy of the compatibility version of struct timex in each 64 bit
architecture. This patch just creates a global one and replaces all the
usages of the old ones.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jens Axboe <axboe@suse.de>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here's a patch that fixes EFI boot for x86 on 2.6.16-rc5-mm3. The
off-by-one is admittedly my fault, but the other two fix up the rest.
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Almost all users of the table addresses from the EFI system table want
physical addresses. So rather than doing the pa->va->pa conversion, just keep
physical addresses in struct efi.
This fixes a DMI bug: the efi structure contained the physical SMBIOS address
on x86 but the virtual address on ia64, so dmi_scan_machine() used ioremap()
on a virtual address on ia64.
This is essentially the same as an earlier patch by Matt Tolentino:
http://marc.theaimsgroup.com/?l=linux-kernel&m=112130292316281&w=2
except that this changes all table addresses, not just ACPI addresses.
Matt's original patch was backed out because it caused MCAs on HP sx1000
systems. That problem is resolved by the ioremap() attribute checking added
for ia64.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dmi_scan_machine() tries to ioremap 0x10000 (64K) bytes, even though it only
looks at the first 32 bytes or so. If the SMBIOS table is near the end of a
memory region, the ioremap() may fail when it shouldn't.
This is in the efi_enabled path, so it really only affects ia64 at the moment.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Check the EFI memory map so we can use the correct memory attributes for
ioremap(). Previously, we always used uncacheable access, which blows up on
some machines for regular system memory.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the size, not a pointer to the size, to efi_mem_attribute_range().
This function validates memory regions for the /dev/mem read/write/mmap paths.
The pointer allows arches to reduce the size of the range, but I think that's
unnecessary complexity. Simplifying it will let me use
efi_mem_attribute_range() to improve the ia64 ioremap() implementation.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enable DMI table parsing on ia64.
Andi Kleen has a patch in his x86_64 tree which enables the use of i386
dmi_scan.c on x86_64. dmi_scan.c functions are being used by the
drivers/char/ipmi/ipmi_si_intf.c driver for autodetecting the ports or
memory spaces where the IPMI controllers may be found.
This patch adds equivalent changes for ia64 as to what is in the x86_64
tree. In addition, I reworked the DMI detection, such that on EFI-capable
systems, it uses the efi.smbios pointer to find the table, rather than
brute-force searching from 0xF0000. On non-EFI systems, it continues the
brute-force search.
My test system, an Intel S870BN4 'Tiger4', aka Dell PowerEdge 7250, with
latest BIOS, does not list the IPMI controller in the ACPI namespace, nor
does it have an ACPI SPMI table. Also note, currently shipping Dell x8xx
EM64T servers don't have these either, so DMI is the only method for
obtaining the address of the IPMI controller.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently /proc/iomem exports physical memory also apart from io device
memory. But on i386, it truncates any memory more than 4GB. This leads to
problems for kexec/kdump.
Kexec reads /proc/iomem to determine the system memory layout and prepares a
memory map based on that and passes it to the kernel being kexeced. Given the
fact that memory more than 4GB has been truncated, new kernel never gets to
see and use that memory.
Kdump also reads /proc/iomem to determine the physical memory layout of the
system and encodes this informaiton in ELF headers. After a crash new kernel
parses these ELF headers being used by previous kernel and vmcore is prepared
accordingly. As memory more than 4GB has been truncated, kdump never sees
that memory and never prepares ELF headers for it. Hence vmcore is truncated
and limited to 4GB even if there is more physical memory in the system.
This patch exports memory more than 4GB through /proc/iomem on i386.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the trap number causing the call to notify_die() to the die
notification handler chain in a number of instances. Also, honor the
return value from the handler chain invocation in die() as, through a
debugger, the fault may have been fixed.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-By: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a "make isoimage" to i386 and x86-64, which allows the automatic
creation of a bootable CD image. It also adds an option FDINITRD= to
include an initrd of the user's choice in generated floppy- or CD boot
images. Finally, some minor cleanups of the image generation code.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3030/2: fix permission check in the obscur cmpxchg syscall
[ARM] nommu: rename compressed/head.S symbols to a new style
[ARM] select TLS_REG_EMUL and NEEDS_SYSCALL_FOR_CMPXCHG
[ARM] nommu: Move hardware page table definitions to pgtable-hwdef.h
[ARM] Move read of processor ID out of lookup_processor_type()
[ARM] Fix typo in tlbflush.h
[ARM] noMMU: removes TLB codes in nommu mode
[ARM] noMMU: block sys_fork in nommu mode
[ARM] 3399/1: Fix link problem when CONFIG_PRINTK is disabled
[ARM] 3398/1: Fix the VFP registers loading/storing base address
[ARM] 3397/1: AT91RM9200 Header update
[ARM] 3385/1: Battery support for sharp zaurus sl-5500 (collie)
[ARM] SMP: don't set cpu_*_map in smp_prepare_boot_cpu
include/linux/clk.h is betraying its ARM origins
[ARM] Move enable_irq and disable_irq to assembler.h
[ARM] 3391/1: use PLAT8250_DEV_PLATFORM{,1} for platform device id instead of 0/1
Patch from Lennert Buytenhek
Add a PLAT8250_DEV_PLATFORM2, and convert the two ixdp2x01 CPLD serial
ports to use platform serial devices with ids PLAT8250_DEV_PLATFORM[12].
(The on-chip xscale UART is PLAT8250_DEV_PLATFORM, id #0.)
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Nicolas Pitre
Quoting RMK:
|pte_write() just says that the page _may_ be writable. It doesn't say
|that the MMU is programmed to allow writes. If pte_dirty() doesn't
|return true, that means that the page is _not_ writable from userspace.
|If you write to it from kernel mode (without using put_user) you'll
|bypass the MMU read-only protection and may end up writing to a page
|owned by two separate processes.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Malcolm Parsons
Printking a backtrace requires printk, so disable backtrace code
when printk is disabled.
Without this patch, a kernel with CONFIG_PRINTK disabled does not link:
arch/arm/lib/lib.a(backtrace.o): In function `c_backtrace':
arch/arm/lib/backtrace.S:(.text+0x108): undefined reference to `printk'
arch/arm/lib/backtrace.S:(.text+0x11c): undefined reference to `printk'
arch/arm/lib/lib.a(backtrace.o):(.fixup+0x8): undefined reference to `printk'
Signed-off-by: Malcolm Parsons <malcolm.parsons@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Catalin Marinas
The current VFP code corrupts the VFP registers (including the control
ones) if more than one floating point application is executed at the same
time. This patch fixes the updating of the load/store base addresses for
the VFP registers.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Pavel Machek
This adds support for battery reading on collie. Collie slowly charges
battery even with charging disabled, so I did not yet enable fast
charge.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The recent addition of boot_cpu_init() implements the initialisation
of the online, present and possible cpu maps for the boot CPU, so
there is no reason to duplicate this in the architecture
smp_prepare_boot_cpu() hook.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
[PATCH] fix audit_init failure path
[PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
[PATCH] sem2mutex: audit_netlink_sem
[PATCH] simplify audit_free() locking
[PATCH] Fix audit operators
[PATCH] promiscuous mode
[PATCH] Add tty to syscall audit records
[PATCH] add/remove rule update
[PATCH] audit string fields interface + consumer
[PATCH] SE Linux audit events
[PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
[PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
[PATCH] Fix IA64 success/failure indication in syscall auditing.
[PATCH] Miscellaneous bug and warning fixes
[PATCH] Capture selinux subject/object context information.
[PATCH] Exclude messages by message type
[PATCH] Collect more inode information during syscall processing.
[PATCH] Pass dentry, not just name, in fsnotify creation hooks.
[PATCH] Define new range of userspace messages.
[PATCH] Filter rule comparators
...
Fixed trivial conflict in security/selinux/hooks.c
tcsh is not happy with the -9999 error code.
Suggested by Ernie Petrides
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I got an oops on a dual core system because the lost tick handler
called cpufreq_get() on core 1 and powernow tried to follow
a NULL powernow_data[] pointer there.
Initialize powernow_data for all cores of a CPU.
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
No need to restrict to power of two here.
TBD needs more double checking
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pfn_to_page() and others need to access both memnode_shift and the very
first bytes of memnodemap[]. If we force memnode_shift to be just before the
memnodemap array, we can reduce the memory footprint to one cache line
instead of two for most setups. This patch introduce a 'memnode' structure
where shift and map[] are carefully placed.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
register_die_notifier is exported twice, once in traps.c and once in
x8664_ksyms.c. This results in a warning on build.
Signed-off-by: Kevin Winchester <kwin@ns.sympatico.ca>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/x86_64/kernel/aperture.c: The search for the AGP bridge has been
extended to search for all the 256 buses instead of the first 32. This
is required since on a some systems, the bridge may be located on a bus
much farther than the first 32. By searching all 256 buses, we guarantee
that the search succeeds on such systems.
arch/x86_64/kernel/pci-gart.c: The search for the Northbridge is not
limited to just bus 0 anymore. This is required because on certain
systems, we may not find one on bus 0.
Signed-off-by: Navin Boppuri <navin.boppuri@newisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Have the GART_IOMMU help text specify that this is the hardware IOMMU in
amd64 processors. This will be significant if/when other IOMMUs are
added to the x86-64 architecture. :-)
Also, note that the previous help text stated that IOMMU was needed for
>3GB memory instead of >4GB. This is fixed in the newer version.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It was a failed experiment - all benchmarks done with it on both AMD
and Intel showed it was a loss. That was probably because the store
buffers of the CPUs for write combining traffic weren't large enough.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
free_bootmem_node expects a physical address to be passed in, but
__alloc_bootmem_node returns a virtual one. That address needs to be
translated to physical.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o check_timer() routine fails while second kernel is booting after a crash
on an opetron box. Problem happens because timer vector (0x31) seems to be
locked.
o After a system crash, it is not safe to service interrupts any more, hence
interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
sends these interrupts to the CPU during early boot of second kernel. Other
pending interrupts are discarded saying unexpected trap but timer interrupt
is serviced and CPU does not issue an LAPIC EOI because it think this
interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
locking as LAPIC does not clear respective ISR and keeps on waiting for
EOI.
o This patch issues extra EOI for the pending interrupts who have ISR set.
o Though today only timer seems to be the special case because in early
boot it thinks interrupts are coming from i8259 and uses
mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But
probably doing it in generic manner for all vectors makes sense.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpu_vm_mask is of type cpumask_t, so use the proper bitops.
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes problems with very large nodes (over 128GB) filling up all of
the first 4GB with their mem_map and not leaving enough space for the
swiotlb.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This means i386 processes compiled with a recent compiler will get non
executable heap by default now. This is the same default as a 32bit PAE
kernel would use on a NX enabled CPU.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Because 256 causes overflows in some code that stores them in 8 bit
fields and the x86 APIC architecture cannot handle more than 255
anyways.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When x86_64 timer init messages were changed to use apic verbosity
levels, two messages were missed and one got the wrong level. This
causes the last word of a suppressed message to print on a line by
itself. Fix that so either the entire message prints or none of it
does.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Only log data in microcode driver when something is changed Otherwise it
was far too noisy on large systems.
Also remove the printk when it is unloaded.
Cc: tigran@veritas.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch puts the infrastructure in place to allow for a reordering of
functions based inside the vmlinux. The general idea is that it is possible
to put all "common" functions into the first 2Mb of the code, so that they
are covered by one TLB entry. This as opposed to the current situation where
a typical vmlinux covers about 3.5Mb (on x86-64) and thus 2 TLB entries.
This is done by enabling the -ffunction-sections flag in gcc, which puts
each function in its own ELF section, so that the linker can then order them
in a way defined by the linker script.
As per previous discussions, Linus said he wanted a "static" list for this,
eg a list provided by the kernel tarbal, so that most people have the same
ordering at least. A script is provided to create this list based on
readprofile(1) output. The included list is provisional, and entirely biased
on my own testbox and me running a few kernel compiles and some other
things.
I think that to get to a better list we need to invite people to submit
their own profiles, and somehow add those all up and base the final list on
that. I'm willing to do that effort if this is ends up being the prefered
approach. Such an effort probably needs to be repeated like once a year or
so to adopt to the changing nature of the kernel.
Made it a CONFIG with default n because it increases link times
dramatically.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I tested it on a couple of chipsets and it worked everywhere so it
should be ok as default for now.
So far I haven't done the great purge of the useless old check_timer
code yet though.
Can be overwritten with enable_8254_timer in the worst case
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a fallback logic, so it's better to not use the OOM killer
in the allocations.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ACPIv2 has an official but optional way to get a date >2100. Use it.
But all the platforms I tested didn't seem to support it. But anyways
the x86-64 kernel should be ready for the 22nd century now. Actually i
shouldn't care about this because I will be dead by then @)
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch puts the code from head.S in a special .bootstrap.text
section.
I'm working on a patch to reorder the functions in the kernel (I'll post
that later), but for x86-64 at least the kernel bootstrap requires that
the head.S functions are on the very first page/pages of the kernel
text. This is understandable since the bootstrap is complex enough
already and not a problem at all, it just means they aren't allowed to
be reordered. This patch puts these special functions into a separate
section to document this, and to guarantee this in the light of possibly
reordering the rest later.
(So this patch doesn't fix a bug per se, but makes things more robust by
making the order of these functions explicit)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are more and more cases where we need to know DMI information
early to work around bugs. i386 already had early DMI scanning, but
x86-64 didn't. Implement this now.
This required some cleanup in the i386 code.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Move the core parser into dmi_scan.c. It can be useful for other
subsystems too.
- Differentiate between field doesn't exist and field is 0 or
unparseable. The first case is likely an old BIOS with broken ACPI,
the later is likely a slightly buggy BIOS where someone forget to
edit the date. Don't blacklist in the later case.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As suggested by Andi (and Alan), move the default kernel location
from 1Mb to 2Mb, to align to the start of a TLB entry.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In a micro-benchmark that stresses the pagefault path, the down_read_trylock
on the mmap_sem showed up quite high on the profile. Turns out this lock is
bouncing between cpus quite a bit and thus is cache-cold a lot. This patch
prefetches the lock (for write) as early as possible (and before some other
somewhat expensive operations). With this patch, the down_read_trylock
basically fell out of the top of profile.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
phys_proc_id[] on AMD boxes is right now populated with the initial
apic id, obtained by the cpuid instruction. But, the initial apic id
need not be the local apic id on clustered APIC systems (see comment at
x86_64/kernel/genapic_cluster.c, line 110). On vSMPowered with AMD
CPUs the cpu_to_node will turn out to be incorrect (as apicid_to_node[] is
indexed by the initial apic id rather than the local apic id).
On vSMPowered boxes with Intel CPUs this is working correctly as
phys_proc_id[] is initialized correctly in detect_ht().
This fixes AMD boot path according to specification, to use the correct
routines for local apic id and socket ids. We use
hard_smp_processor_id() to read the local apic id, and phys_pkg_id() to
determine socket id for phys_proc_id[]
Patch tested on Tyan multicore boxes as well as vSMPowered boxes.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- adjust limits of GDT/IDT pseudo-descriptors (some were off by one)
- move empty_zero_page into .bss.page_aligned
- move cpu_gdt_table into .data.page_aligned
- move idt_table into .bss
- align inital_code and init_rsp
- eliminate pointless (re-)declaration of idt_table in traps.c
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It needs num_physpages, so initialize it early. It's later overwritten
again.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attached is a small code style cleanup patch that resulted from my
skimming through the arch/x86_64/kernel/traps.c code to figure out what
went haywire.
Signed-off-by: Roberto Nibali <ratz@drugphish.ch>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
gcc should handle this anyways, and it causes problems when
sprintf is turned into strcpy by gcc behind our backs and
the C fallback version of strcpy is actually defining __builtin_strcpy
Then drop -ffreestanding from the main Makefile because it isn't
needed anymore and implies -fno-builtin, which is wrong now.
(it was only added for x86-64, so dropping it should be safe)
Noticed by Roman Zippel
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We've always had the problem that arguments only did a prefix match,
which resulted e.g. in noapic and noapictimer getting confused.
Fix the early argument parsing code to always check that arguments are
whole words (except for those that take additional arguments of course)
I factored out the checking code for that while also makes the code
easier to maintain.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While the modular aspect of the respective i386 patch doesn't apply to
x86-64 (as the top level page directory entry is shared between modules
and the base kernel), handlers registered with register_die_notifier()
are still under similar constraints for touching ioremap()ed or
vmalloc()ed memory. The likelihood of this problem becoming visible is
of course significantly lower, as the assigned virtual addresses would
have to cross a 2**39 byte boundary. This is because the callback gets
invoked
(a) in the page fault path before the top level page table propagation
gets carried out (hence a fault to propagate the top level page table
entry/entries mapping to module's code/data would nest infinitly) and
(b) in the NMI path, where nested faults must absolutely not happen,
since otherwise the IRET from the nested fault re-enables NMIs,
potentially resulting in nested NMI occurences.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The code waits for the GART to clear the TLB flush bit. Use cpu_relax
in this time to allow hypervisors to yield the CPU in this time.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The PM timer path through main_timer_handler doesn't need
the delay variable because it figures it out in a different way.
Don't try to read it from the PIT. With stopped PIT timer
it is even useless.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor cleanup to lend better for physical CPU hotplug.
Earlier way of using num_processors as index doesnt
fit if CPUs come and go. This makes the code little bit better
to read, and helps physical hotplug use the same functions as boot.
Reserving CPU0 for BSP is too late to be done in smp_prepare_boot_cpu().
Since logical assignments from MADT is already done via
setup_arch()->acpi_boot_init()->parse lapic
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Touching of the floating point state in a kernel debugger must be
NMI-safe, specifically math_state_restore() must be able to deal with
being called out of an NMI context. In order to do that reliably, the
context switch code must take care to not leave a window open where
the current task's TS_USEDFPU flag and CR0.TS could get out of sync.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For consistency and to have only a single place of definition, replace
set_debug() uses with set_debugreg(), and eliminate the definition of
thj former.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While AMD formally permits multi-byte execution breakpoints, Intel
disallows 8-byte as much as 2- or 4-byte ones.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix one place where the previous change of cpu_pda from being an array
to being a macro was not properly carried out.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It conflicts with the struct node in node.h
Actually the x86-64 version was there first, but ..
Suggested by Jan Beulich
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The upcomming gcc 4.2 got a new option -mtune=generic to tune
code for both common AMD and Intel CPUs. Use this option
when available for generic kernels.
On x86-64 it is used with CONFIG_GENERIC_CPU. On i386 it is
enabled with CONFIG_X86_GENERIC. It won't affect the base
line CPU support in any ways and also not the minimum supported CPU.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Memory >39bits has a different PUD.
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] New IA64 core/thread detection patch
[IA64] Increase max node count on SN platforms
[IA64] Increase max node count on SN platforms
[IA64] Increase max node count on SN platforms
[IA64] Increase max node count on SN platforms
[IA64] Tollhouse HP: IA64 arch changes
[IA64] cleanup dig_irq_init
[IA64] MCA recovery: kernel context recovery table
IA64: Use early_parm to handle mvec_name and nomca
[IA64] move patchlist and machvec into init section
[IA64] add init declaration - nolwsys
[IA64] add init declaration - gate page functions
[IA64] add init declaration to memory initialization functions
[IA64] add init declaration to cpu initialization functions
[IA64] add __init declaration to mca functions
[IA64] Ignore disabled Local SAPIC Affinity Structure in SRAT
[IA64] sn_check_intr: use ia64_get_irr()
[IA64] fix ia64 is_hugepage_only_range
* master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild: (46 commits)
kbuild: remove obsoleted scripts/reference_* files
kbuild: fix make help & make *pkg
kconfig: fix time ordering of writes to .kconfig.d and include/linux/autoconf.h
Kconfig: remove the CONFIG_CC_ALIGN_* options
kbuild: add -fverbose-asm to i386 Makefile
kbuild: clean-up genksyms
kbuild: Lindent genksyms.c
kbuild: fix genksyms build error
kbuild: in makefile.txt note that Makefile is preferred name for kbuild files
kbuild: replace PHONY with FORCE
kbuild: Fix bug in crc symbol generating of kernel and modules
kbuild: change kbuild to not rely on incorrect GNU make behavior
kbuild: when warning symbols exported twice now tell user this is the problem
kbuild: fix make dir/file.xx when asm symlink is missing
kbuild: in the section mismatch check try harder to find symbols
kbuild: fix section mismatch check for unwind on IA64
kbuild: kill false positives from section mismatch warnings for powerpc
kbuild: kill trailing whitespace in modpost & friends
kbuild: small update of allnoconfig description
kbuild: make namespace.pl CROSS_COMPILE happy
...
Trivial conflict in arch/ppc/boot/Makefile manually fixed up
Fix bug identified by Linus Torvalds <torvalds@osdl.org>: the `out'
instruction depends upon the state of memory_data[], so we need to tell gcc
that before executing it. (The opcode, not gcc).
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=5553
Thanks to Antonio Ospite <ospite@studenti.unina.it> for testing.
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Moved check for online cpu out of smp_prepare_cpu()
- Moved default declaration of smp_prepare_cpu() to kernel/cpu.c
- Removed lock_cpu_hotplug() from smp_prepare_cpu() to around it, since
its called from cpu_up() as well now.
- Removed clearing from cpu_present_map during cpu_offline as it breaks
using cpu_up() directly during a subsequent online operation.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds missing bits of collie (sharp sl-5500) PCMCIA support and
MFD support.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The Kconfig text for CONFIG_DEBUG_SLAB and CONFIG_DEBUG_PAGEALLOC have always
seemed a bit confusing. Change them to:
CONFIG_DEBUG_SLAB: "Debug slab memory allocations"
CONFIG_DEBUG_PAGEALLOC: "Debug page memory allocations"
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some BIOSes do not always set CF on error before return from int13. The
patch adds additional check for status being zero (AH == 0).
Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
MODULE_PARM was actually breaking: recent gcc version optimize them out as
unused. It's time to replace the last users, which are generally in the
most unloved drivers anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
DEBUG_PAGEALLOC is not compatible with hugetlb page support. That debug
option turns off PSE. Once it is turned off in CR4, the cpu will ignore
pse bit in the pmd and causing infinite page-not- present faults.
So disable DEBUG_PAGEALLOC if the user selected hugetlbfs.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make CONFIG_HOTPLUG_CPU depend on !X86_PC, so we need to turn on either
CONFIG_GENERICARCH, CONFIG_BIGSMP or any other subarch except X86_PC when
CONFIG_HOTPLUG_CPU=y
With 2.6.15+ kernels when CONFIG_HOTPLUG_CPU is turned on we switch to
bigsmp mode for sending IPI's and ioapic configurations that caused the
following error message.
>> More than 8 CPUs detected and CONFIG_X86_PC cannot handle it.
>> Use CONFIG_X86_GENERICARCH or CONFIG_X86_BIGSMP.
Originally bigsmp was added just to handle >8 cpus, but now with hotplug
cpu support we need to use bigsmp mode (why? see below), that cause the
above error message even if there were less than 8 cpus in the system.
The message is bogus, but we are cannot use logical flat mode due to issues
with broadcast IPI can confuse a CPU just comming up. We use flat physical
mode just like x86_64 case. More details on why bigsmp now uses flat
physical mode (vs. cluster mode) in following link.
http://marc.theaimsgroup.com/?l=linux-kernel&m=113261865814107&w=2
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
alarm() calls the kernel with an unsigend int timeout in seconds. The
value is stored in the tv_sec field of a struct timeval to setup the
itimer. The tv_sec field of struct timeval is of type long, which causes
the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX.
Before the hrtimer merge (pre 2.6.16) such a negative value was converted
to the maximum jiffies timeout by the timeval_to_jiffies conversion. It's
not clear whether this was intended or just happened to be done by the
timeval_to_jiffies code.
hrtimers expect a timeval in canonical form and treat a negative timeout as
already expired. This breaks the legitimate usage of alarm() with a
timeout value > INT_MAX seconds.
For 32 bit machines it is therefor necessary to limit the internal seconds
value to avoid API breakage. Instead of doing this in all implementations
of sys_alarm the duplicated sys_alarm code is moved into a common function
in itimer.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attached patch fixes invalid pointer arithmetic in DMI code to make onboard
device discovery working again.
akpm: bug has been present since dmi_find_device() was added in 2.6.14.
Affects ipmi only (I think) - the symptoms weren't described.
akpm: changed to use pointer arithmetic rather than open-coded sizeof.
Signed-off-by: Andrey Panin <pazke@donpac.ru>
Cc: Corey Minyard <minyard@acm.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IPF SDM 2.2 changes definition of PAL_LOGICAL_TO_PHYSICAL to add
proc_number=-1 to get core/thread mapping info on the running processer.
Based on this change, we had better to update existing core/thread
detection in IA64 kernel correspondingly. The attached patch implements
this change. It simplifies detection code and eliminates potential race
condition. It also runs a bit faster and has better scalability especially
when cores and threads number grows up in one package.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Node number are kept in the cpu_to_node_map which is
currently defined as u8. Change to u16 to accomodate
larger node numbers.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add support in IA64 acpi for platforms that support more than
256 nodes. Currently, ACPI is limited to 256 nodes because the
proximity domain number is 8-bits.
Long term, we expect to use ACPI3.0 to support >256 nodes.
This patch is an interim solution that works with platforms
that pass the high order bits of the proximity domain in
"reserved" fields of the ACPI tables. This code is enabled
ONLY on SN platforms.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a configuration option to allow the maximum
number of nodes to be configurable for GENERIC or SN
kernels.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/sn and include/asm-ia64/sn changes required to support Tollhouse
system PCI hotplug, fixes the ia64_sn_sysctl_ioboard_get call, and introduces
the PRF_HOTPLUG_SUPPORT feature bit.
Signed-off-by: Prarit Bhargava <prarit@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
dig_irq_init is equivalent to machvec_noop, no need to define
another empty function.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Memory errors encountered by user applications may surface
when the CPU is running in kernel context. The current code
will not attempt recovery if the MCA surfaces in kernel
context (privilage mode 0). This patch adds a check for cases
where the user initiated the load that surfaces in kernel
interrupt code.
An example is a user process lauching a load from memory
and the data in memory had bad ECC. Before the bad data
gets to the CPU register, and interrupt comes in. The
code jumps to the IVT interrupt entry point and begins
execution in kernel context. The process of saving the
user registers (SAVE_REST) causes the bad data to be loaded
into a CPU register, triggering the MCA. The MCA surfaces in
kernel context, even though the load was initiated from
user context.
As suggested by David and Tony, this patch uses an exception
table like approach, puting the tagged recovery addresses in
a searchable table. One difference from the exception table
is that MCAs do not surface in precise places (such as with
a TLB miss), so instead of tagging specific instructions,
address ranges are registers. A single macro is used to do
the tagging, with the input parameter being the label
of the starting address and the macro being the ending
address. This limits clutter in the code.
This patch only tags one spot, the interrupt ivt entry.
Testing showed that spot to be a "heavy hitter" with
MCAs surfacing while saving user registers. Other spots
can be added as needed by adding a single macro.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
We need set_page_dirty() to return true if it actually transitioned the page
from a clean to dirty state. This wasn't right in a couple of places. Do a
kernel-wide audit, fix things up.
This leaves open the possibility of returning a negative errno from
set_page_dirty() sometime in the future. But we don't do that at present.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove all trailing tabs and spaces. No other changes.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.
This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
- parisc
- sparc64
- make the needlessly global function static:
- arm
- h8300
- m68k
- m68knommu
- s390
- v850
- x86_64
- add a prototype in asm/system.h:
- cris
- i386
- ia64
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert all kmalloc + memset sequences in arch/s390 to kzalloc usage.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Include connector config in the s390 arch Kconfig to get support for
connectors.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Retry starting of new cpu if sigp restart returns condition code 2 (busy).
Signed-off-by: Michael Ryan <ryan@funsoft.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set permissoin of /proc/sys/vm/cmm_* files to 0644.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use common code parser for early parameters instead of our own.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While working on these patch set, I found several possible cleanup on x86-64
and ia64.
akpm: I stole this from Andi's queue.
Not only does it clean up bitops. It also unrelatedly changes the prototype
of pci_mmcfg_init() and removes its arch_initcall(). It seems that the wrong
two patches got joined together, but this is the one which has been tested.
This patch fixes the current x86_64 build error (the pci_mmcfg_init()
declaration in arch/i386/pci/pci.h disagrees with the definition in
arch/x86_64/pci/mmconfig.c)
This also means that x86_64's pci_mmcfg_init() gets called in the same (new)
manner as x86's: from arch/i386/pci/init.c:pci_access_init(), rather than via
initcall.
The bitops cleanups came along for free.
All this worked OK in -mm testing (since 2.6.16-rc4-mm1) because x86_64 was
tested with both patches applied.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Con Kolivas <kernel@kolivas.org>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I screwed up this conversion - we should be iterating across online CPUs, not
possible ones.
Spotted by Joe Perches <joe@perches.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch renames symbols to a new style to prepare mpu support
code merging. e.g. __armv4_cache_on --> __armv4_mmu_cache_on
Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Todo items:
- IRQ_INPROGRESS flag - use sparc64 irq buckets, or generic irq_desc?
- sun4d
- re-indent large chunks of sun4m_smp.c
- some places assume sequential cpu numbering (i.e. 0,1 instead of 0,2)
Last I checked (with 2.6.14), random programs segfault with dual
HyperSPARC. And with SuperSPARC II's, it seems stable but will
eventually die from a write lock error (wrong lock owner or something).
I haven't tried the HyperSPARC + highmem combination recently, so that
may still be a problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
"In some cases, especially on modern laptops with a lot of PCI and cardbus
bridges, we're unable to assign correct secondary/subordinate bus numbers
to all cardbus bridges due to BIOS limitations unless we are using
"pci=assign-busses" boot option." -- Ivan Kokshaysky (from a patch comment)
Without it, Cardbus cards inserted are never seen by PCI because the parent
PCI-PCI Bridge of the Cardbus bridge will not pass and translate Type 1 PCI
configuration cycles correctly and the system will fail to find and
initialise the PCI devices in the system.
Reference: PCI-PCI Bridges: PCI Configuration Cycles and PCI Bus Numbering:
http://www.science.unitn.it/~fiorella/guidelinux/tlk/node72.html
The reason for this is that:
``All PCI busses located behind a PCI-PCI bridge must reside between the
secondary bus number and the subordinate bus number (inclusive).''
"pci=assign-busses" makes pcibios_assign_all_busses return 1 and this
turns on PCI renumbering during PCI probing.
Alan suggested to use DMI automatically set assign-busses on problem systems.
The only question for me was where to put it. I put it directly before
scanning PCI bus into pcibios_scan_root() because it's called from legacy,
acpi and numa and so it can be one place for all systems and configurations
which may need it.
AMD64 Laptops are also affected and fixed by assign-busses, and the code is
also incuded from arch/x86_64/pci/ that place will also work for x86_64
kernels, I only ifdef'-ed the x86-only Laptop in this example.
Affected and known or assumed to be fixed with it are (found by googling):
* ASUS Z71V and L3s
* Samsung X20
* Compaq R3140us and all Compaq R3000 series laptops with TI1620 Controller,
also Compaq R4000 series (from a kernel.org bugreport)
* HP zv5000z (AMD64 3700+, known that fixup_parent_subordinate_busnr fixes it)
* HP zv5200z
* IBM ThinkPad 240
* An IBM ThinkPad (1.8 GHz Pentium M) debugged by Pavel Machek
gives the correspondig message which detects the possible problem.
* MSI S260 / Medion SIM 2100 MD 95600
The patch also expands the "try pci=assign-busses" warning so testers will
help us to update the DMI table.
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On Tue, 21 Feb 2006, Ivan Kokshaysky wrote:
> There are two bogus entries in the BIOS memory map table which are
> conflicting with a prefetchable memory range of the AGP bridge:
>
> BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
> BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
>
> 0000:00:02.0 PCI bridge: Silicon Integrated Systems [SiS] Virtual PCI-to-PCI bridge (AGP) (prog-if 00 [Normal decode])
> Flags: bus master, fast devsel, latency 0
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> I/O behind bridge: 0000c000-0000cfff
> Memory behind bridge: e7e00000-e7efffff
> Prefetchable memory behind bridge: fec00000-ffcfffff
> ^^^^^^^^^^^^^^^^^
Yes. However, it's pretty clear that the e820 entries are there for a
reason. Probably they are a hack by the BIOS maintainers to keep Windows
from stomping/moving that region, exactly because they want to keep the
bridge where it is (or, it's actually for the BIOS itself - the BIOS
tables are a horrid mess, and BIOS engineers are pretty hacky people:
they'll add random entries to make their own broken algorithms do the
"right thing").
> Starting from 2.6.13, kernel tries to resolve that sort of conflicts,
> so that prefetch window of the bridge and the framebuffer memory behind
> it get moved to 0x10000000.
I think we could (and probably should) solve this another way: consider
the ACPI "reserved regions" from the e820 map exactly the same way that we
do other ACPI hints - they should restrict _new_ allocations, but not
impact stuff we figure out on our own.
Basically, right now we assign _unassigned_ resources at "fs_initcall"
time. If we were to add in the e820 "reserved region" stuff before that
(but after we've done PCI discovery), we'd probably do the right thing.
Right now we do the e820 reserved regions very early indeed: we call
"register_memory()" from setup_arch(). We could move at least part of it
(the part that registers the resources) down a bit.
Here's a test-patch. I'm not saying we should absolutely do this, but it
might be interesting to try...
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: <bjk@luxsci.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I moved it to a separate function which is safer.
This avoids problems with the linker reordering them and the
less useful PCI config space access methods taking priority
over the better ones.
Fixes some problems with broken MMCONFIG
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I'm not sure of the worthiness of this idea, so please consider it an RFC.
Its key merits are:
* Reuse existing infrastructure
* Greatly tightens up the parsing of nomca
* Greatly simplifies the parsing of machvec
Addition cleanup (moving setup_mvec() to machvec.c) by Ken Chen.
Signed-Off-By: Horms <horms@verge.net.au>
Signed-Off-By: Tony Luck <tony.luck@intel.com>
5d25ac038a broke VFP builds due to
enable_irq not being defined as an assembly macro. Move it to
assembler.h so everyone can use it.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().
This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attempt to fix the problem wherein people's oops reports scroll off the screen
due to repeated oopsing or to oopses on other CPUs.
If this happens the user can reboot with the `pause_on_oops=<seconds>' option.
It will allow the first oopsing CPU to print an oops record just a single
time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs
to enter a tight loop until the specified number of seconds have elapsed.
The patch implements the infrastructure generically in the expectation that
architectures other than x86 will find it useful.
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes all occurances of _INLINE_ in the kernel.
With the exception of tty_flip.h, I've simply removed the inline's since
gcc should know best which functions to be inlined.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidate all kernel bug printouts to begin with the "BUG: " string.
Makes it easier to find them in large bootup logs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the standard BCD macros instead of redefining them.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch from Pavel moves userland freeze signals handling into more logical
place. It now hits even with mysqld running.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't create "online" control file for BSP (i386/x86_64) since its
not removable.
We originally added this to support ppc64 if the kernel has support but
BIOS indicated no offline support, we just didnt create online files for
them.
We used the same method in ia64 as well, if we have a cpu taking platform
interrupts but cannot be removed if those interrupts cannot be re-targeted
to another cpu.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
You must always ensure to fulfill the dependencies of what you are
select'ing.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Martin Bligh <mbligh@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Checking APIC version instead of CPU family to determine XAPIC. Family 6
CPU could have xapic as well.
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 has a small bug in the stack dump code where it prints an extra log
level code. Remove that and fix the alignment of normal stack dump
printout. Also remove some unnecessary printk() calls.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With cpu_gdt_descr having been converted to per-CPU data, the old object
(in head.S) no longer needs to reserve space for each CPU's instance. With
cpu_gdt_table not being used for CPU 0 anymore, it doesn't seem to need
page alignment (or if in fact there is a need for it to retain that
alignment, the whole object should go into .data.page_align).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/i386/kernel/cpu/centaur.c: In function `centaur_mcr_insert':
arch/i386/kernel/cpu/centaur.c:33: warning: implicit declaration of function `mtrr_centaur_report_mcr'
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Document a limitation of vsyscall-sysenter, since patches to fix it have
been rejected.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>commit 76381fee7e
>Author: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
>Date: Thu Jun 23 00:08:46 2005 -0700
>
> [PATCH] xen: x86_64: use more usermode macro
>
> Make use of the user_mode macro where it's possible. This is useful for Xen
> because it will need only to redefine only the macro to a hypervisor call.
I am of the opinion that the above changeset is incomplete, i.e. it missed
converting some previous uses of user_mode to user_mode_vm. While most of
them could be considered just cosmetical, at least the one in die_nmi
doesn't appear to be.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Zachary Amsden <zach@vmware.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Registering a callback handler through register_die_notifier() is obviously
primarily intended for use by modules. However, the way these currently
get called it is basically impossible for them to actually be used by
modules, as there is, on non-PAE configurationes, a good chance (the larger
the module, the better) for the system to crash as a result.
This is because the callback gets invoked
(a) in the page fault path before the top level page table propagation
gets carried out (hence a fault to propagate the top level page table
entry/entries mapping to module's code/data would nest infinitly) and
(b) in the NMI path, where nested faults must absolutely not happen,
since otherwise the IRET from the nested fault re-enables NMIs,
potentially resulting in nested NMI occurences.
Besides the modular aspect, similar problems would even arise for in-
kernel consumers of the API if they touched ioremap()ed or vmalloc()ed
memory inside their handlers.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use boot info to start early_printk() at the current row on VGA console, as
left by the boot loader.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Stas Sergeev <stsp@aknet.ru>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The history is that -mm kernels do not work for me for a few months
already. The things started from crashing somewhere after starting init,
and for the last month - no boot at all, just "Uncompressing... OK,
booting kernel", and silence. Early console didn't work too. With the
latest releases this degraded into an infinite stream of the "Unknown
interrupt or fault" messages. So today my patience ran out and I started
to think how can I collect at least some info for the bug-report. Attached
is the patch that allows to gather some valueable debug info on the problem
by making an early console more useable. I can't properly test the patch,
as the kernel still doesn't boot, so I'll explain it in details in a hope
someone else can justify the intrusive changes.
arch_hooks.h: added prototypes for setup_early_printk() and early_printk().
setup.c: killed wrong setup_early_printk() prototype. Moved
setup_early_printk() a bit earlier, as it was not "early enough" to cover
the bug I was fighting with.
early_printk.c: made it to start printing from the bottom of the screen,
otherwise the messages interfere with the ones of the boot-loader, so you
can't read them.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Cc: Andi Kleen <ak@muc.de>
Cc: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow signal handlers to set the RF bit in EFLAGS. This lets a simple
debugger using SIGTRAP skip one instruction after returning from a signal.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's no good reason for allowing ptrace to set the NT bit in EFLAGS, so
mask it off.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Merge a few printk calls in i386 traps.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ES7000 platform code clean up for compilation errors and a warning.
Ifdef'd the ACPI related parts in the ES7000 platform code. They were
causing compile errors in certain configuration (without ACPI defined). I
think this approach would be best (as opposed to Kconfig changes) since it
only touches the subarch...
Signed-off-by: <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When vendor-specific i386 initialization code is unavailable the kernel
falls back to a default CPU model name. Make that model name reflect the
CPU family instead of an internal vendor index.
Tested on Pentium II (family 6 model 5).
/proc/cpuinfo before:
model name : ff/05
after:
model name : 06/05
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: "Seth, Rohit" <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow the x86 "sep" feature to be disabled at bootup. This forces use of the
int80 vsyscall. Mainly for testing or benchmarking the int80 vsyscall code.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Several places in arch/i386/kernel/cpu and kernel/cpu were using __devinit
when they should have been __cpuinit. Fixing that saves ~4K when
CONFIG_HOTPLUG && !CONFIG_HOTPLUG_CPU.
Noticed by Andrew Morton.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement SMP alternatives, i.e. switching at runtime between different
code versions for UP and SMP. The code can patch both SMP->UP and UP->SMP.
The UP->SMP case is useful for CPU hotplug.
With CONFIG_CPU_HOTPLUG enabled the code switches to UP at boot time and
when the number of CPUs goes down to 1, and switches to SMP when the number
of CPUs goes up to 2.
Without CONFIG_CPU_HOTPLUG or on non-SMP-capable systems the code is
patched once at boot time (if needed) and the tables are released
afterwards.
The changes in detail:
* The current alternatives bits are moved to a separate file,
the SMP alternatives code is added there.
* The patch adds some new elf sections to the kernel:
.smp_altinstructions
like .altinstructions, also contains a list
of alt_instr structs.
.smp_altinstr_replacement
like .altinstr_replacement, but also has some space to
save original instruction before replaving it.
.smp_locks
list of pointers to lock prefixes which can be nop'ed
out on UP.
The first two are used to replace more complex instruction
sequences such as spinlocks and semaphores. It would be possible
to deal with the lock prefixes with that as well, but by handling
them as special case the table sizes become much smaller.
* The sections are page-aligned and padded up to page size, so they
can be free if they are not needed.
* Splitted the code to release init pages to a separate function and
use it to release the elf sections if they are unused.
Signed-off-by: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Print stack backtraces in multiple columns, saving screen space. Number of
columns is configurable and defaults to one so behavior is
backwards-compatible.
Also removes the brackets around addresses when printing more
that one entry per line so they print as:
<address>
instead of:
[<address>]
This helps multiple entries fit better on one line.
Original idea by Dave Jones, taken from x86_64.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make CONFIG_REGPARM enabled by default. It's a noticable win both for size
and for performance, and gcc[34] handles it correctly.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
REGPARM has already gotten much testing, what about removing the
dependency on EXPERIMENTAL?
Additionally, this patch does:
- remove the useless "default n"
- remove note regarding binary only modules (nowadays, there are even
some binary only modules compiled with REGPARM=y available)
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Merge clash will have broken sparc64. Synch up its online_page
implementation with powerpc, which was identical until the
set_page_count removal.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Lennert Buytenhek
This patch changes iop3xx and omap2 and to use PLAT8250_DEV_PLATFORM{,1}
as platform device id instead of just hardcoding 0/1 directly.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the assumption that driver_register() returns the number of devices
bound to the driver. In fact, it returns zero for success or a negative
error value.
Nobody uses the return value of of_register_driver() anyway.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
arch/powerpc/mm/mem.c: In function `add_memory':
arch/powerpc/mm/mem.c:128: warning: assignment makes integer from pointer without a cast
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
ia64_mv is initialized based on platform detected or specified.
However, there is one instantiation of each platform type. We
don't expect to switch platform vector during run time. Move
those platform specific type into init section since a copy is
made into global ia64_mv at initialization.
Also move instruction patch list into init section as well.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add init declaration to bunch of patch functions and gate
page setup function.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add init declaration to variables/functions used for memory
initialization. I don't think they would clash with memory
hotplug. If they do, please yell.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add init declaration to cpu initialization functions.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Mark init related variable and functions with appropriate
__init* declaration to mca functions.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
According to the ACPI spec, the OSPM must ignore the contents of the
Processor Local APIC/SAPIC Affinity Structure in System Resource
Affinity Table (SRAT), if its enable flag is cleared. However, ia64
linux refers all of the Processor Local APIC/SAPIC Affinity Structures
in SRAT regardless of the enable flag. This is obviously against the
ACPI spec. This patch fixes this bug.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use the recently-added ia64_get_irr() rather than duplicating the code.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
fix is_hugepage_only_range() definition to be "overlaps"
instead of "within architectural restricted hugetlb address
range". Simplify the ia64 specific code that used to use
is_hugepage_only_range() to just check which region the
address is in.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Patch from Erik Hovland
I found a typo and what seems to be a run-on sentence in
arch/arm/common/dmabounce.c
This patch corrects both.
Signed-off-by: Erik Hovland <erik@hovland.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Andrew Victor
This patch includes a few changes to the clock support on the
AT91RM9200.
1. Added definitions for Ethernet, MMC, TWI, USARTs, and SPI peripheral
clocks.
2. Replaced some hard-coded hex values with the text definitions in
at91rm9200_sys.h.
3. If the USB96M bit is set for PLLB, then the rate of PLLB is not
affected but only the USB Host/Device clocks which are derived from it.
Issue reported by Sergei Sharonov.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Andrew Victor
If the timer interrupt is ever significantly delayed (or after the
system was suspended), the system could spin incrementing the time for
too long.
The fix is to replace the "do {} while" with a "while {}".
Orignal patch by Savin Zlobec and Peter Menzebach.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
Unify the five existing ixp2000 defconfigs into one defconfig.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
ixp2000 used to initially mark GPIO interrupts as invalid, and not
mark them valid until set_irq_type() was called, but this doesn't
work if you want to use request_irq() with the SA_TRIGGER_* flags.
So, just mark the GPIO interrupts valid from the beginning. We
configure GPIOs as inputs when set_irq_type() is called anyway, so
this shouldn't be a problem.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Add a secondary TSB for hugepage mappings.
[SPARC]: Respect vm_page_prot in io_remap_page_range().
Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.
Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range(). On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.
In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).
This patch cleans up by abolishing is_aligned_hugepage_range(). Instead
prepare_hugepage_range() is defined directly. Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage. ia64 and powerpc define custom versions. The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned. The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.
No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the inlining of the new vs old mmap system call common code. This
reduces the size of the resulting vmlinux for defconfig as follows:
mb@pc1:~/develop/git/linux-2.6$ size vmlinux.mmap*
text data bss dec hex filename
3303749 521524 186564 4011837 3d373d vmlinux.mmapinline
3303557 521524 186564 4011645 3d367d vmlinux.mmapnoinline
The new sys_mmap2() has also one function call overhead removed, now.
(probably it was already optimized to a jmp before, but anyway...)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().
This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of places set_page_count(page, 1) that don't need to.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use page->lru.next to implement the singly linked list of pages rather than
the struct deferred_page which needs to be allocated and freed for each
page.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stop using __put_page and page_count in i386 pageattr.c
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When on_each_cpu() runs the callback on other CPUs, it runs with local
interrupts disabled. So we should run the function with local interrupts
disabled on this CPU, too.
And do the same for UP, so the callback is run in the same environment on both
UP and SMP. (strictly it should do preempt_disable() too, but I think
local_irq_disable is sufficiently equivalent).
Also uninlines on_each_cpu(). softirq.c was the most appropriate file I could
find, but it doesn't seem to justify creating a new file.
Oh, and fix up that comment over (under?) x86's smp_call_function(). It
drives me nuts.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Have an explicit mm call to split higher order pages into individual pages.
Should help to avoid bugs and be more explicit about the code's intention.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Don't return uninitialised stack values in case of allocation failure
- Don't bother clearing PageCompound because __GFP_COMP wasn't specified
Increment over the pte page rather than one pte entry in
pte_alloc_one_kernel
- Actually increment the page pointer in pte_alloc_one
- Compile fixes, typos.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sam's tree includes a new check, which found that we're exporting strpbrk()
multiple times.
It seems that the convention is that this is exported from the arch files, so
reove the lib/string.c export.
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This variable is rarely written to. Mark the variable accordingly.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/i386/kernel/efi.c: In function `efi_call_phys_epilog': arch/i386/kernel/efi.c:118: warning: assignment makes integer from pointer without a cast
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Only issue a "nobody cared" warning after 99900 spurious interrupts.
This avoids the occasional spurious interrupt causing warnings, as
per x86.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In mm_init_ppc64() we calculate the location of the "IO hole", but then
no one ever looks at the value. So don't bother.
That's actually all mm_init_ppc64() does, so get rid of it too.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add the command line args to the device tree as /chosen/bootargs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add /system-id, /model and /compatible to the iSeries device tree.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add strne2a() which converts a string from EBCDIC to ASCII.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make mf_get_rtc(), mf_get_boot_rtc() and mf_set_rtc() static, cause they can
be. We need to move mf_set_rtc() to avoid a forward declaration.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These routines just call through to the mf routines, so point ppc_md straight
at the mf routines. We need to pass the cmd through to mf_reboot to make it
work, but that seems reasonable.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some cleanups in the iSeries code.
- Make mf_display_progress() check mf_initialized rather than the caller.
- Set mf_initialized in mf_init() rather than in setup.c
- Then move mf_initialized into mf.c, the only place it's used.
- Move the mf related logic from iSeries_progress() to mf_display_progress()
- Use a #define to size the pending_event_prealloc array
- Use that define in the initialsation loop rather than sizeof jiggery pokery
- Remove stupid comment(s)
- Mark stuff static and/or __init
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It has been decreed that platform numbers are evil, so as a step in that
direction, replace platform_is_lpar() with a FW_FEATURE_LPAR bit.
Currently FW_FEATURE_LPAR really means i/pSeries LPAR, in the future we might
have to clean that up if we need to be more specific about what LPAR actually
means. But that's another patch ...
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When iommu_init_early_pSeries() was added, ages ago, we forgot to remove
the code that checks /chosen/linux,iommu-off in pSeries_init_early(). We
do it now in iommu_init_early_pSeries().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
htab_bolt_mapping() takes a vstart and pstart parameter, but all but one of
its callers actually pass it vstart and vstart. Luckily before it passes
paddr (calculated from paddr) to the hpte_insert routines it calls
virt_to_abs() (aka. __pa()) on the address, so there isn't actually a bug.
map_io_page() however does pass pstart properly, so currently it's broken
AFAICT because we're calling __pa(paddr) which will get us something very
large. Presumably no one's calling map_io_page() in the right context.
Anyway, change htab_bolt_mapping() callers to properly pass pstart, and then
use it properly in htab_bolt_mapping(), ie. don't call __pa() on it again.
Booted on p5 LPAR, iSeries and Power3.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We can plug the boot cpu into its node independently of whether numa
topology is detected. And numa_setup_cpu does the right thing for all
cases now, so remove special-casing for non-numa from the cpu hotplug
callback.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The powerpc numa code unconditionally onlines all nodes from 0 to the
highest node id found, regardless of whether cpus or memory are
present in the nodes. This wastes 8K per node and complicates some
cpu and memory hotplug situations, such as adding a resource that
doesn't map to one of the nodes discovered at boot.
Set nodes online as resources are scanned. Fall back to node 0 only
when we're sure this isn't a NUMA machine.
Instead of defaulting to node 0 for cases of hot-adding a resource
which doesn't belong to any initialized node, assign it to the first
online node.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Code to handle Power4's invalid node id (0xffff) is duplicated for cpu
and memory. Better to handle this case in one place --
of_node_to_nid. Overall behavior should be unchanged.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since we effectively treat the domain ids given to us by firmare as
logical node ids, make this explicit (basically s/numa_domain/nid/).
No functional changes, only variable and function names are modified.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
map_cpu_to_node does not need to be inline, it is never called in a
hot path.
map_cpu_to_node, numa_setup_cpu, and find_cpu_node can be marked
__cpuinit, as they are never used after boot if CONFIG_HOTPLUG_CPU=n.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add debug statement for map_cpu_to_node; it's useful for cpu hotplug.
Clarify debug statement about not finding the numa reference points
property.
Don't print a meaningless associativity depth (-1) on non-numa systems.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At boot, the numa code is assigning boot_cpuid to node 0
unconditionally. Basically, numa_setup_cpu is being stupid about it,
but this is the minimal fix -- just call numa_setup_cpu(boot_cpuid)
later, after all nodes have been set online.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64-SGI] SN2-XP reduce kmalloc wrapper inlining
[IA64] MCA: remove obsolete ifdef
[IA64] MCA: update MCA comm field for user space tasks
[IA64] MCA: print messages in MCA handler
[IA64-SGI] - Eliminate SN pio_phys_xxx macros. Move to assembly
[IA64] use icc defined constant
[IA64] add __builtin_trap definition for icc build
[IA64] clean up asm/intel_intrin.h
[IA64] map ia64_hint definition to intel compiler intrinsic
[IA64] hooks to wait for mmio writes to drain when migrating processes
[IA64-SGI] driver bugfixes and hardware workarounds for CE1.0 asic
[IA64-SGI] Handle SC env. powerdown events
[IA64] Delete MCA/INIT sigdelayed code
[IA64-SGI] sem2mutex ioc4.c
[IA64] implement ia64 specific mutex primitives
[IA64] Fix UP build with BSP removal support.
[IA64] support for cpu0 removal
We need this to be zero initialised. Since this is an array, use kcalloc
rather than kzalloc or kmalloc.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Add Simtec Osiris to the default build, and enable the
USB-OHCI section by default.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Fix the build of arch/arm/mach-s3c2410/mach-osiris.c
and fix the warnings from sparse.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
Add GPIO interrupt support for the first 16 GPIO lines (port A
and B.)
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Add USB bus clock definition for 48MHz fed to OHCI and gadget cores
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Add set_rate methods for the extra clocks on the S3C2440
and add the camera UPLL clock source
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Add support for clk_set_rate and clk_round_rate to the
s3c2410 clock implementation
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
Move the uengine loader from arch/arm/mach-ixp2000 to arch/arm/common
so that ixp23xx can use it too.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
Add ep93xx defconfig.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
Add support for setting the direction of and getting/setting the
value of the 64 GPIO lines.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
This patch adds support for the Cirrus ep93xx series of CPUs. The
ep93xx is an ARM920T based CPU with two VICs, PL010 based UARTs,
IrDA, MaverickCrunch floating point coprocessor, between 24 and 64
GPIOs, ethernet, OHCI USB and, depending on the model, pcmcia, raster
engine, graphics accelerator, IDE controller and a bunch of other
stuff.
This patch adds the core ep93xx support code, and support for the
Glomation GESBC-9312-sx and the Technologic Systems TS-72xx SBCs.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Alessandro Zummo
ixp4xx_config_irq did not configure the gpio line
as an input.
As an added bonus, the irq2gpio array has been converted
from int to char.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Peter Teichmann
Currently, if the kernels HZ value is greater than 100, delays with the udelay function are too short. This can cause trouble for instance with the zd1201 usb wlan driver.
This patch suggests a solution that keeps the overhead small and maintains (hopefully) sufficient resolution.
Signed-off-by: Peter Teichmann
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Deepak Saxena
This patch adds support for Intel's IXDP28x5 platform. This
is just and IXDP2801 with a new CPU rev but the bootloader
has been updated to reflect a new machine ID so we just build
support for it by default when we build IXDP2801.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Add enable and set_parent calls for the dclk
and clkout clocks.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Add clk_set_parent() call to clock code
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Move the UPLL clock registration to the central
clock file, and add an enable method
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Add selection for timer code for the Simtec Osiris
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Support for Simtec IM2440D20 CPU modules (Osiris)
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Nicolas Pitre
This field is redundent since it must be equal to PHYS_OFFSET anyway.
There is no reference to it anymore so remove it at last.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Some SoCs have multiple VIC devices. Adapt the generic vic code
to allow multiple implementations to be handled.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In all current use cases, "chipdata" is used to store an iomem address.
Mark it with __iomem, and rename it to 'base'. Leave the accessor macros
alone.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Allow the individual coprocessor handlers to decide when to enable
interrupts, rather than unconditionally enabling them.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
asm/hardware.h is not required for the majority of processor support
files, ioremap support, mm initialisation, acorn IO support, nor
the debug code (which picks up its machine specific includes via
debug-macros.S)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Unfortunately, OMAP platforms without the 32K timer left HZ set to
an empty value. Fix this by making the dependency on OMAP_32K_TIMER
rather than OMAP_32K_TIMER_HZ.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
asm/arch/irq.h used to be included from asm/irq.h, but was removed
from the ARM kernel a long time ago. Consequently, the contents
of asm/arch/irq.h (which mostly contain a definition for fixup_irq())
have not been used. Hence, remove asm/arch/irq.h.
Some machine support files incorrectly included this file, making
little or no use of the contents. Move the contents to a local
include file, and remove those include statements as well.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move the HZ definition into Kconfig, and set appropriate defaults
for platforms. Remove mostly empty asm/arch/param.h include file.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move the hardware PMD and PTE page table definitions from pgtable.h
into pgtable-hwdef.h, and include pgtable-hwdef.h as necessary.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Read the processor ID at boot, and save it in "processor_id" as we
did before. Later, when we re-parse the CPU type in the setup.c code,
re-use the value stored in "processor_id".
This allows a cleaner work-around for noMMU devices without CP#15.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The sys_fork is not supported in nommu mode. The other syscalls
that is not supported in nommu mode are to be defined as cond_signal
in kernel/sys_ni.c.
Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6:
[CRYPTO] aes: Fixed array boundary violation
[CRYPTO] tcrypt: Fix key alignment
[CRYPTO] all: Add missing cra_alignmask
[CRYPTO] all: Use kzalloc where possible
[CRYPTO] api: Align tfm context as wide as possible
[CRYPTO] twofish: Use rol32/ror32 where appropriate
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (23 commits)
[PATCH] sysfs: fix a kobject leak in sysfs_add_link on the error path
[PATCH] sysfs: don't export dir symbols
[PATCH] get_cpu_sysdev() signedness fix
[PATCH] kobject_add_dir
[PATCH] debugfs: Add debugfs_create_blob() helper for exporting binary data
[PATCH] sysfs: fix problem with duplicate sysfs directories and files
[PATCH] Kobject: kobject.h: fix a typo
[PATCH] Kobject: provide better warning messages when people do stupid things
[PATCH] Driver core: add macros notice(), dev_notice()
[PATCH] firmware: fix BUG: in fw_realloc_buffer
[PATCH] sysfs: kzalloc conversion
[PATCH] fix module sysfs files reference counting
[PATCH] add EXPORT_SYMBOL_GPL_FUTURE() to USB subsystem
[PATCH] add EXPORT_SYMBOL_GPL_FUTURE() to RCU subsystem
[PATCH] add EXPORT_SYMBOL_GPL_FUTURE()
[PATCH] Clean up module.c symbol searching logic
[PATCH] kobj_map semaphore to mutex conversion
[PATCH] kref: avoid an atomic operation in kref_put()
[PATCH] handle errors returned by platform_get_irq*()
[PATCH] driver core: platform_get_irq*(): return -ENXIO on error
...
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
README: bzip2 is not new
Documentation/Changes: remove outdated translation references
remove dead Radeon URL
SCSI_AACRAID: add a help text
update the i386 defconfig
MAINTAINERS: remove the LANMEDIA entry
Move ip2.c and ip2main.c to drivers/char/ip2/ where the other files
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Update defconfigs.
[MIPS] Separate CPU entries in /proc/cpuinfo with a blank line.
[MIPS] sys_mmap2 offset argument should always be shifted 12, not PAGE_SHIFT.
[MIPS] TX49XX has prefetch.
[MIPS] Kill tlb-andes.c.
[MIPS] War on whitespace: cleanup initial spaces followed by tabs.
[MIPS] Makefile crapectomy.
[MIPS] Reformat __xchg().
[MIPS] Mention Broadcom part number for BigSur board
[MIPS] Remove CONFIG_BUILD_ELF64.
[MIPS] Further sparsification for 32-bit compat code.
[MIPS] fix wrong __user usage in _sysn32_rt_sigsuspend
[MIPS] Signal cleanup
[MIPS] Reformat all of signal32.c with tabs instead of space for consistency
[MIPS] Delete unused sys32_waitpid.
[MIPS] Make I/O helpers more customizable
[MIPS] Symmetric Uniprocessor support for Qemu.
[MIPS] sc-rm7k.c cleanup
[MIPS] MIPS64 R2 optimizations for 64-bit endianess swapping.
[MIPS] Add early console for Cobalt.
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] Fix cosmetic typo in asm/irq.h
[ARM] 3367/1: CLCD mode no longer supported on the RealView boards
[ARM] 3366/1: Allow the 16bpp mode configuration in the CLCD control register
Put in a blank line between CPU entries in /proc/cpuinfo, just like
most other architectures (i386, ia64, x86_64) do.
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---
This patch adjusts the offset argument passed into sys_mmap2 to be
always shifted 12, even when the native page size isn't 4K. This is
what all existing userspace libraries expect.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---
The TX49XX has the prefetch instruction. It supports only Pref_Load
(hint 0). Actually changes in this patch except for Kconfig are not
have any effects, I added these changes to prevent misuse of unsupported
hints.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Basically identical to c-r4k.c, so maintaining one is really enough.
Signed-off-by: Thiemo Seufer <ths@networkno.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Dump all the ridiculously complicated stuff that was needed support
compilers older and newer than 3.0.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Thiemo Seufer <ths@networkno.de>
Mention the Broadcom part number for the BigSur board (BCM91480B)
in Kconfig, just like it's done for other Broadcom boards.
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This option is no longer usable with supported compilers. It will be
replaced by usage of -msym32 in a separate patch.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Move function prototypes to asm/signal.h to detect trivial errors and
add some __user tags to get rid of sparse warnings. Generated code
should not be changed.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
SMP bits needed to builds and run an SMP kernel. While only a single
processor is supported ATM it's still useful for some SMP debugging using
Qemu.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The AES setkey routine writes 64 bytes to the E_KEY area even though
there are only 60 bytes there. It is in fact safe since E_KEY is
immediately follwed by D_KEY which is initialised afterwards. However,
doing this may trigger undefined behaviour and makes Coverity unhappy.
So by combining E_KEY and D_KEY into one array we sidestep this issue
altogether.
This problem was reported by Adrian Bunk.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ALCHEMY: Add OHCI support for AU1200
Updated by moving the OHCI support out of the EHCI patch.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ALCHEMY: Add EHCI support for AU1200
Updated by removing the OHCI support
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the ability to mark symbols that will be changed in the
future, so that kernel modules that don't include MODULE_LICENSE("GPL")
and use the symbols, will be flagged and printed out to the system log.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
platform_get_irq*() now returns on -ENXIO when the resource cannot be
found. Ensure all users of platform_get_irq*() handle this error
appropriately.
Signed-off-by: David Vrabel <dvrabel@arcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: (230 commits)
[SPARC64]: Update defconfig.
[SPARC64]: Fix 2 bugs in huge page support.
[SPARC64]: CONFIG_BLK_DEV_RAM fix
[SPARC64]: Optimized TSB table initialization.
[SPARC64]: Allow CONFIG_MEMORY_HOTPLUG to build.
[SPARC64]: Use SLAB caches for TSB tables.
[SPARC64]: Don't kill the page allocator when growing a TSB.
[SPARC64]: Randomize mm->mmap_base when PF_RANDOMIZE is set.
[SPARC64]: Increase top of 32-bit process stack.
[SPARC64]: Top-down address space allocation for 32-bit tasks.
[SPARC64] bbc_i2c: Fix cpu check and add missing module license.
[SPARC64]: Fix and re-enable dynamic TSB sizing.
[SUNSU]: Fix missing spinlock initialization.
[TG3]: Do not try to access NIC_SRAM_DATA_SIG on Sun parts.
[SPARC64]: First cut at VIS simulator for Niagara.
[SPARC64]: Fix system type in /proc/cpuinfo and remove bogus OBP check.
[SPARC64]: Add SMT scheduling support for Niagara.
[SPARC64]: Fix 32-bit truncation which broke sparsemem.
[SPARC64]: Move over to sparsemem.
[SPARC64]: Fix new context version SMP handling.
...
The i386 defconfig wasn't updated for ages.
Instead of running "make oldconfig" on the old defconfig and trying to
give reasonable answers at all new options, this patch replaces it with
the one I'm using in 2.6.16-rc1.
This way, it's a .config that is confirmed to work on at least one
computer in the world. ;-)
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Original 2.6.9 patch and explanation from somewhere within HP via
bugzilla...
ia64 stores a success/failure code in r10, and the return value (normal
return, or *positive* errno) in r8. The patch also sets the exit code to
negative errno if it's a failure result for consistency with other
architectures.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
hi,
The motivation behind the patch below was to address messages in
/var/log/messages such as:
Jan 31 10:54:15 mets kernel: audit(:0): major=252 name_count=0: freeing
multiple contexts (1)
Jan 31 10:54:15 mets kernel: audit(:0): major=113 name_count=0: freeing
multiple contexts (2)
I can reproduce by running 'get-edid' from:
http://john.fremlin.de/programs/linux/read-edid/.
These messages come about in the log b/c the vm86 calls do not exit via
the normal system call exit paths and thus do not call
'audit_syscall_exit'. The next system call will then free the context for
itself and for the vm86 context, thus generating the above messages. This
patch addresses the issue by simply adding a call to 'audit_syscall_exit'
from the vm86 code.
Besides fixing the above error messages the patch also now allows vm86
system calls to become auditable. This is useful since strace does not
appear to properly record the return values from sys_vm86.
I think this patch is also a step in the right direction in terms of
cleaning up some core auditing code. If we can correct any other paths
that do not properly call the audit exit and entries points, then we can
also eliminate the notion of context chaining.
I've tested this patch by verifying that the log messages no longer
appear, and that the audit records for sys_vm86 appear to be correct.
Also, 'read_edid' produces itentical output.
thanks,
-Jason
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1) huge_pte_offset() did not check the page table hierarchy
elements as being empty correctly, resulting in an OOPS
2) Need platform specific hugetlb_get_unmapped_area() to handle
the top-down vs. bottom-up address space allocation strategies.
Signed-off-by: David S. Miller <davem@davemloft.net>
init/do_mounts_rd.c depends upon CONFIG_BLK_DEV_RAM, not CONFIG_BLK_DEV_INITRD.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We only need to write an invalid tag every 16 bytes,
so taking advantage of this can save many instructions
compared to the simple memset() call we make now.
A prefetching implementation is implemented for sun4u
and a block-init store version if implemented for Niagara.
The next trick is to be able to perform an init and
a copy_tsb() in parallel when growing a TSB table.
Signed-off-by: David S. Miller <davem@davemloft.net>
online_page() is straightforward, and then add a dummy
remove_memory() that returns -EINVAL just like i386.
There is no point in implementing remove_memory() since
__remove_pages() has no implementation either.
Signed-off-by: David S. Miller <davem@davemloft.net>
Try only lightly on > 1 order allocations.
If a grow fails, we are under memory pressure, so do not try
to grow the TSB for this address space any more.
If a > 0 order TSB allocation fails on a new fork, retry using
a 0 order allocation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Put it one page below the top of the 32-bit address space.
This gives us ~16MB more address space to work with.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently allocations are very constrained for 32-bit processes.
It grows down-up from 0x70000000 to 0xf0000000 which gives about
2GB of stack + dynamic mmap() space.
So support the top-down method, and we need to override the
generic helper function in order to deal with D-cache coloring.
With these changes I was able to squeeze out a mmap() just over
3.6GB in size in a 32-bit process.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is good for up to %50 performance improvement of some test cases.
The problem has been the race conditions, and hopefully I've plugged
them all up here.
1) There was a serious race in switch_mm() wrt. lazy TLB
switching to and from kernel threads.
We could erroneously skip a tsb_context_switch() and thus
use a stale TSB across a TSB grow event.
There is a big comment now in that function describing
exactly how it can happen.
2) All code paths that do something with the TSB need to be
guarded with the mm->context.lock spinlock. This makes
page table flushing paths properly synchronize with both
TSB growing and TLB context changes.
3) TSB growing events are moved to the end of successful fault
processing. Previously it was in update_mmu_cache() but
that is deadlock prone. At the end of do_sparc64_fault()
we hold no spinlocks that could deadlock the TSB grow
sequence. We also have dropped the address space semaphore.
While we're here, add prefetching to the copy_tsb() routine
and put it in assembler into the tsb.S file. This piece of
code is quite time critical.
There are some small negative side effects to this code which
can be improved upon. In particular we grab the mm->context.lock
even for the tsb insert done by update_mmu_cache() now and that's
a bit excessive. We can get rid of that locking, and the same
lock taking in flush_tsb_user(), by disabling PSTATE_IE around
the whole operation including the capturing of the tsb pointer
and tsb_nentries value. That would work because anyone growing
the TSB won't free up the old TSB until all cpus respond to the
TSB change cross call.
I'm not quite so confident in that optimization to put it in
right now, but eventually we might be able to and the description
is here for reference.
This code seems very solid now. It passes several parallel GCC
bootstrap builds, and our favorite "nut cruncher" stress test which is
a full "make -j8192" build of a "make allmodconfig" kernel. That puts
about 256 processes on each cpu's run queue, makes lots of process cpu
migrations occur, causes lots of page table and TLB flushing activity,
incurs many context version number changes, and it swaps the machine
real far out to disk even though there is 16GB of ram on this test
system. :-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Report 'sun4v' when appropriate in /proc/cpuinfo
Remove all the verifications of the OBP version string. Just
make sure it's there, and report it raw in the bootup logs and
via /proc/cpuinfo.
Signed-off-by: David S. Miller <davem@davemloft.net>
The mapping is a simple "(cpuid >> 2) == core" for now.
Later we'll add more sophisticated code that will walk
the sun4v machine description and figure this out from
there.
We should also add core mappings for jaguar and panther
processors.
Signed-off-by: David S. Miller <davem@davemloft.net>
The page->flags manipulations done by the D-cache dirty
state tracking was broken because the constants were not
marked with "UL" to make them 64-bit, which means we were
clobbering the upper 32-bits of page->flags all the time.
This doesn't jive well with sparsemem which stores the
section and indexing information in the top 32-bits of
page->flags.
This is yet another sparc64 bug which has been with us
forever.
While we're here, tidy up some things in bootmem_init()
and paginig_init():
1) Pass min_low_pfn to init_bootmem_node(), it's identical
to (phys_base >> PAGE_SHIFT) but we should use consistent
with the variable names we print in CONFIG_BOOTMEM_DEBUG
2) max_mapnr, although no longer used, was being set
inaccurately, we shouldn't subtract pfn_base any more.
3) All the games with phys_base in the zones_*[] arrays
we pass to free_area_init_node() are no longer necessary.
Thanks to Josh Grebe and Fabbione for the bug reports
and testing. Fix also verified locally on an SB2500
which had a memory layout that triggered the same problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
This has been pending for a long time, and the fact
that we waste a ton of ram on some configurations
kind of pushed things over the edge.
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't piggy back the SMP receive signal code to do the
context version change handling.
Instead allocate another fixed PIL number for this
asynchronous cross-call. We can't use smp_call_function()
because this thing is invoked with interrupts disabled
and a few spinlocks held.
Also, fix smp_call_function_mask() to count "cpus" correctly.
There is no guarentee that the local cpu is in the mask
yet that is exactly what this code was assuming.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Always spin_lock_init() in init_context(). The caller essentially
clears it out, or copies the mm info from the parent. In both
cases we need to explicitly initialize the spinlock.
2) Always do explicit IRQ disabling while taking mm->context.lock
and ctx_alloc_lock.
Signed-off-by: David S. Miller <davem@davemloft.net>
this patch converts arch/sparc64 to kzalloc usage.
Crosscompile tested with allyesconfig.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we were aligned, but didn't have at least 256MB left
to process, we would loop forever.
Thanks to fabbione for the report and testing the fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't try to avoid putting non-base page sized entries
into the user TSB. It actually costs us more to check
this than it helps.
Eventually we'll have a multiple TSB scheme for user
processes. Once a process starts using larger pages,
we'll allocate and use such a TSB.
Signed-off-by: David S. Miller <davem@davemloft.net>
This cpu mondo sending interface isn't all that easy to
use correctly...
We were clearing out the wrong bits from the "mask" after getting
something other than EOK from the hypervisor.
It turns out the hypervisor can just be resent the same cpu_list[]
array, with the 0xffff "done" entries still in there, and it will do
the right thing.
So don't update or try to rebuild the cpu_list[] array to condense it.
This requires the "forward_progress" check to be done slightly
differently, but this new scheme is less bug prone than what we were
doing before.
Signed-off-by: David S. Miller <davem@davemloft.net>
We were clobbering a base register before we were done
using it. Fix a comment typo while we're here.
Signed-off-by: David S. Miller <davem@davemloft.net>
The UltraSPARC T1 manual recommends this because the chip
could instruction prefetch into the VA hole, and this would
also make decoding certain kinds of memory access traps
more difficult (because the chip sign extends certain pieces
of trap state).
Signed-off-by: David S. Miller <davem@davemloft.net>
First of all, use the known _PAGE_EXEC_{4U,4V} value instead
of loading _PAGE_EXEC from memory. We either know which one
to use by context, or we can code patch the test.
Next, we need to check executability of a PTE in the generic
TSB miss handler.
Signed-off-by: David S. Miller <davem@davemloft.net>
There were several bugs in the SUN4V cpu mondo dispatch code.
In fact, if we ever got a EWOULDBLOCK or other error from
the hypervisor call, we'd potentially send a cpu mondo multiple
times to the same cpu and even worse we could loop until the
timeout resending the same mondo over and over to such cpus.
So let's bulletproof this thing as follows:
1) Implement cpu_mondo_send() and cpu_state() hypervisor calls
in arch/sparc64/kernel/entry.S, add prototypes to asm/hypervisor.h
2) Don't build and update the cpulist using inline functions, this
was causing the cpu mask to not get updated in the caller.
3) Disable interrupts during the entire mondo send, otherwise our
cpu list and/or mondo block could get overwritten if we take
an interrupt and do a cpu mondo send on the current cpu.
4) Check for all possible error return types from the cpu_mondo_send()
hypervisor call. In particular:
HV_EOK) Our work is done, all cpus have received the mondo.
HV_CPUERROR) One or more of the cpus in the cpu list we passed
to the hypervisor are in error state. Use cpu_state()
calls over the entries in the cpu list to see which
ones. Record them in "error_mask" and report this
after we are done sending the mondo to cpus which are
not in error state.
HV_EWOULDBLOCK) We need to keep trying.
Any other error we consider fatal, we report the event and exit
immediately.
5) We only timeout if forward progress is not made. Forward progress
is defined as having at least one cpu get the mondo successfully
in a given cpu_mondo_send() call. Otherwise we bump a counter
and delay a little. If the counter hits a limit, we signal an
error and report the event.
Also, smp_call_function_mask() error handling reports the number
of cpus incorrectly.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) We must flush the TLB, duh.
2) Even if the sw context was seen to be valid, the local cpu's
hw context can be out of date, so reload it unconditionally.
Signed-off-by: David S. Miller <davem@davemloft.net>
Check TLB flush hypervisor calls for errors and report them.
Pass HV_MMU_ALL always for now, we can add back the optimization
to avoid the I-TLB flush later.
Always explicitly page align the virtual address arguments.
Signed-off-by: David S. Miller <davem@davemloft.net>
get_new_mmu_context() can be invoked from interrupt context
now for the new SMP version wrap handling.
So disable interrupt while taking ctx_alloc_lock in destroy_context()
so we don't deadlock.
Signed-off-by: David S. Miller <davem@davemloft.net>
The context allocation scheme we use depends upon there being a 1<-->1
mapping from cpu to physical TLB for correctness. Chips like Niagara
break this assumption.
So what we do is notify all cpus with a cross call when the context
version number changes, and if necessary this makes them allocate
a valid context for the address space they are running at the time.
Stress tested with make -j1024, make -j2048, and make -j4096 kernel
builds on a 32-strand, 8 core, T2000 with 16GB of ram.
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise with too much stuff enabled in the kernel config
we can end up with an unaligned trap table.
Signed-off-by: David S. Miller <davem@davemloft.net>
If we take a window fault, on SUN4V set %gl to zero before we
turn PSTATE_IE back on in %pstate. Otherwise if we take an
interrupt we'll end up with corrupt register state.
Signed-off-by: David S. Miller <davem@davemloft.net>
It can map all of the linear kernel mappings with zero TSB hash
conflicts for systems with 16GB or less ram. In such cases, on
SUN4V, once we load up this TSB the first time with all the
mappings, we never take a linear kernel mapping TLB miss ever
again, the hypervisor handles them all.
Signed-off-by: David S. Miller <davem@davemloft.net>
We use a bitmap, one bit for every 256MB of memory. If the
bit is set we can use a 256MB PTE for linear mappings, else
we have to use a 4MB PTE.
SUN4V support is there, and we can very easily add support
for Panther cpu 256MB PTEs in the future.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to turn off the "polling nrflag" bit when we sleep
the cpu like this, so that we'll get a cross-cpu interrupt
to wake the processor up from the yield.
We also have to disable PSTATE_IE in %pstate around the yield
call and recheck need_resched() in order to avoid any races.
Signed-off-by: David S. Miller <davem@davemloft.net>
Set, but never used.
We used to use this for dynamic IRQ retargetting, but that
code died a long time ago.
Signed-off-by: David S. Miller <davem@davemloft.net>
They were getting truncated to 32-bit and this is very bad
when your MMU fault status area is in physical memory above
4GB on SUN4V.
Signed-off-by: David S. Miller <davem@davemloft.net>
The math-emu code only expects unfinished fpop traps when
emulating FPU sqrt instructions on pre-Niagara chips.
On Niagara we can get unimplemented fpop, so handle that.
Signed-off-by: David S. Miller <davem@davemloft.net>
It's extremely noisy and causes much grief on slow
consoles with large numbers of cpus.
We'll have to provide this some saner way in order
to re-enable this.
Signed-off-by: David S. Miller <davem@davemloft.net>
We're about to seriously die in these cases so it is important
that the messages make it to the console.
Signed-off-by: David S. Miller <davem@davemloft.net>
Another case where we have to force ourselves into global register
level one. Also make sure the arguments passed to sun4v_do_mna() are
correct.
This area actually needs some more work, for example spill fixup is
not necessarily going to do the right thing for this case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Just like kvmap_dtlb_longpath we have to force the
global register level to one in order to mimick the
PSTATE_MG --> PSTATE_AG trasition done on SUN4U.
Signed-off-by: David S. Miller <davem@davemloft.net>
The SUN4V convention with non-shared TSBs is that the context
bit of the TAG is clear. So we have to choose an "invalid"
bit and initialize new TSBs appropriately. Otherwise a zero
TAG looks "valid".
Make sure, for the window fixup cases, that we use the right
global registers and that we don't potentially trample on
the live global registers in etrap/rtrap handling (%g2 and
%g6) and that we put the missing virtual address properly
in %g5.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Add error return checking for TLB load hypervisor
calls.
2) Don't fallthru to dtlb tsb miss handler from itlb tsb
miss handler, oops.
3) On window fixups, propagate fault information to fixup
handler correctly.
Signed-off-by: David S. Miller <davem@davemloft.net>
This gives more consistent bogomips and delay() semantics,
especially on sun4v. It gives weird looking values though...
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to use the real hardware processor ID when
targetting interrupts, not the "define to 0" thing
the uniprocessor build gives us.
Also, fill in the Node-ID and Agent-ID fields properly
on sun4u/Safari.
Signed-off-by: David S. Miller <davem@davemloft.net>
If the top-level cnode had multi entries in it's "reg"
property, we'd fail. The buffer wasn't large enough in
such cases.
Signed-off-by: David S. Miller <davem@davemloft.net>
The sibling cpu bringup is extremely fragile. We can only
perform the most basic calls until we take over the trap
table from the firmware/hypervisor on the new cpu.
This means no accesses to %g4, %g5, %g6 since those can't be
TLB translated without our trap handlers.
In order to achieve this:
1) Change sun4v_init_mondo_queues() so that it can operate in
several modes.
It can allocate the queues, or install them in the current
processor, or both.
The boot cpu does both in it's call early on.
Later, the boot cpu allocates the sibling cpu queue, starts
the sibling cpu, then the sibling cpu loads them in.
2) init_cur_cpu_trap() is changed to take the current_thread_info()
as an argument instead of reading %g6 directly on the current
cpu.
3) Create a trampoline stack for the sibling cpus. We do our basic
kernel calls using this stack, which is locked into the kernel
image, then go to our proper thread stack after taking over the
trap table.
4) While we are in this delicate startup state, we put 0xdeadbeef
into %g4/%g5/%g6 in order to catch accidental accesses.
5) On the final prom_set_trap_table*() call, we put &init_thread_union
into %g6. This is a hack to make prom_world(0) work. All that
wants to do is restore the %asi register using
get_thread_current_ds().
Longer term we should just do the OBP calls to set the trap table by
hand just like we do for everything else. This would avoid that silly
prom_world(0) issue, then we can remove the init_thread_union hack.
Signed-off-by: David S. Miller <davem@davemloft.net>
For 32 cpus and a slow console, it just wedges the
machine especially with DETECT_SOFTLOCKUP enabled.
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole algorithm was wrong. What we need to do is:
1) Walk each PCI bus above this device on the path to the
PCI controller nexus, and for each:
a) If interrupt-map exists, apply it, record IRQ controller node
b) Else, swivel interrupt number using PCI_SLOT(), use PCI bus
parent OBP node as controller node
c) Walk up to "controller node" until we hit the first PCI bus
in this domain, or "controller node" is the PCI controller
OBP node
2) If we walked to PCI controller OBP node, we're done.
3) Else, apply PCI controller interrupt-map to interrupt.
There is some stuff that needs to be checked out for ebus and
isa, but the PCI part is good to go.
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to set the global register set _AND_ disable
PSTATE_IE in %pstate. The original patch sequence was
leaving PSTATE_IE enabled when returning to kernel mode,
oops.
This fixes the random register corruption being seen
on SUN4V.
Signed-off-by: David S. Miller <davem@davemloft.net>
Forgot to multiply by 8 * 1024, oops. Correct the size constant when
the virtual-dma arena is 2GB in size, it should bet 256 not 128.
Finally, log some info about the TSB at probe time.
Signed-off-by: David S. Miller <davem@davemloft.net>
For SUN4V, we were clobbering %o5 to do the hypervisor call.
This clobbers the saved %pstate value and we end up writing
garbage into that register as a result. Oops.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use prom_startcpu_cpuid() on SUN4V instead of prom_startcpu().
We should really test for "SUNW,start-cpu-by-cpuid" presence
and use it if present even on SUN4U.
Signed-off-by: David S. Miller <davem@davemloft.net>
When crawling up the PCI bus chain, stop at the first node
that has an interrupt-map property before we hit the root.
Also, if we use a bus interrupt-{map,mask} do not forget to
update the 'intmask' pointer as we do for the 'intmap' pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
On SUN4V, force IRQ state to idle in enable_irq(). However,
I'm still not sure this is %100 correct.
Call add_interrupt_randomness() on SUN4V too.
Signed-off-by: David S. Miller <davem@davemloft.net>
On the PBM's first bus number, only allow device 0, function 0, to be
poked at with PCI config space accesses.
For some reason, this single device responds to all device numbers.
Also, reduce the verbiage of the debugging log printk's for PCI cfg
space accesses in the SUN4V PCI controller driver, so that it doesn't
overwhelm the slow SUN4V hypervisor console.
Signed-off-by: David S. Miller <davem@davemloft.net>
We should dynamically allocate the per-cpu pglist not use
an in-kernel-image datum, since __pa() does not work on
such addresses.
Also, consistently use "u32" for devhandle.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add udelay to polling console write loop, and increment
the loop limit.
Name the device "ttyHV" and pass that to add_preferred_console()
when we're using hypervisor console.
Kill sunhv_console_setup(), it's empty.
Handle the case where we don't want to use hypervisor console.
(ie. we have a head attached to a sun4v machine)
Signed-off-by: David S. Miller <davem@davemloft.net>
Get bus range from child of PCI controller root nexus.
This is actually a hack, but the PCI-E bridge sitting
at the top of the PCI tree responds to PCI config cycles
for every device number, so best to just ignore it for now.
Preliminary PCI irq routing, needs lots of work.
Signed-off-by: David S. Miller <davem@davemloft.net>
Clear top 8-bits of physical addresses in "ranges" property.
This gives the actual physical address.
Detect PBM-A vs. PBM-B by checking bit 0x40 of the devhandle.
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI cfg space is accessed transparently through the Hypervisor and not
through direct cpu PIO operations.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to use bootmem during init_IRQ and page alloc
for sibling cpu calls.
Also, fix incorrect hypervisor call return value
checks in the hypervisor SMP cpu mondo send code.
Signed-off-by: David S. Miller <davem@davemloft.net>
Yes, you heard it right, they changed the PTE layout for
SUN4V. Ho hum...
This is the simple and inefficient way to support this.
It'll get optimized, don't worry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Code patching did not sign extend negative branch
offsets correctly.
Kernel TLB miss path needs patching and %g4 register
preservation in order to handle SUN4V correctly.
Signed-off-by: David S. Miller <davem@davemloft.net>
prom_sun4v_name should be "sun4v" not "SUNW,sun4v"
Also, this is too early to make use of the
.sun4v_Xinsn_patch code patching, so just check
things manually.
This gets us at least to prom_init() on Niagara.
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to restore the %asi register properly.
For the kernel this means get_fs(), for user this
means ASI_PNF.
Also, NGcopy_to_user.S was including U3memcpy.S instead
of NGmemcpy.S, oops :-)
Signed-off-by: David S. Miller <davem@davemloft.net>
There was also a bug in sun4v_itlb_miss, it loaded the
MMU Fault Status base into %g3 instead of %g2.
This pointed out a fast path for TSB miss processing,
since we have %g2 with the MMU Fault Status base, we
can use that to quickly load up the PGD phys address.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is where the virtual address of the fault status
area belongs.
To set it up we don't make a hypervisor call, instead
we call OBP's SUNW,set-trap-table with the real address
of the fault status area as the second argument. And
right before that call we write the virtual address into
ASI_SCRATCHPAD vaddr 0x0.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add assembler file for PCI hypervisor calls.
Setup basic skeleton of SUN4V PCI controller driver.
Add 32-bit devhandle to PBM struct, as this is needed for
hypervisor calls.
Signed-off-by: David S. Miller <davem@davemloft.net>
Abstract out IOMMU operations so that we can have a different
set of calls on sun4v, which needs to do things through
hypervisor calls.
Signed-off-by: David S. Miller <davem@davemloft.net>
When we register a TSB with the hypervisor, so that it or hardware can
handle TLB misses and do the TSB walk for us, the hypervisor traps
down to these trap when it incurs a TSB miss.
Processing is simple, we load the missing virtual address and context,
and do a full page table walk.
Signed-off-by: David S. Miller <davem@davemloft.net>
We look for "SUNW,sun4v" in the 'compatible' property
of the root OBP device tree node.
Protect every %ver register access, to make sure it is
not touched on sun4v, as %ver is hyperprivileged there.
Lock kernel TLB entries using hypervisor calls instead of
calls into OBP.
Signed-off-by: David S. Miller <davem@davemloft.net>
Technically the hypervisor call supports sending in a list
of all cpus to get the cross-call, but I only pass in one
cpu at a time for now.
The multi-cpu support is there, just ifdef'd out so it's easy to
enable or delete it later.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sun4v has 4 interrupt queues: cpu, device, resumable errors,
and non-resumable errors. A set of head/tail offset pointers
help maintain a work queue in physical memory. The entries
are 64-bytes in size.
Each queue is allocated then registered with the hypervisor
as we bring cpus up.
The two error queues each get a kernel side buffer that we
use to quickly empty the main interrupt queue before we
call up to C code to log the event and possibly take evasive
action.
Signed-off-by: David S. Miller <davem@davemloft.net>
Happily we have no D-cache aliasing issues on these
chips, so the implementation is very straightforward.
Add a stub in bootup which will be where the patching
calls will be made for niagara/sun4v/hypervisor.
Signed-off-by: David S. Miller <davem@davemloft.net>
Things are a little tricky because, unlike sun4u, we have
to:
1) do a hypervisor trap to do the TLB load.
2) do the TSB lookup calculations by hand
Signed-off-by: David S. Miller <davem@davemloft.net>
If we're just switching between different alternate global
sets, nop it out on sun4v. Also, get rid of all of the
alternate global save/restore in the OBP CIF trampoline code.
Signed-off-by: David S. Miller <davem@davemloft.net>
They are totally unnecessary because:
1) Interrupts are already disabled when switch_to()
runs.
2) We don't use hard-coded alternate globals any longer.
This found a case in rtrap, which still assumed alternate
global %g6 was current_thread_info(), and that is fixed
by this changeset as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
As we save trap state onto the stack, the store buffer fills up
mid-way through and we stall for several cycles as the store buffer
trickles out to the L2 cache. Meanwhile we can do some privileged
register reads and other calculations, essentially for free.
Signed-off-by: David S. Miller <davem@davemloft.net>
And more consistently check cheetah{,_plus} instead
of assuming anything not spitfire is cheetah{,_plus}.
Signed-off-by: David S. Miller <davem@davemloft.net>
When saving and restoing trap state, do the window spill/fill
handling inline so that we never trap deeper than 2 trap levels.
This is important for chips like Niagara.
The window fixup code is massively simplified, and many more
improvements are now possible.
Signed-off-by: David S. Miller <davem@davemloft.net>
On uniprocessor, it's always zero for optimize that.
On SMP, the jmpl to the stub kills the return address stack in the cpu
branch prediction logic, so expand the code sequence inline and use a
code patching section to fix things up. This also always better and
explicit register selection, which will be taken advantage of in a
future changeset.
The hard_smp_processor_id() function is big, so do not inline it.
Fix up tests for Jalapeno to also test for Serrano chips too. These
tests want "jbus Ultra-IIIi" cases to match, so that is what we should
test for.
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several tricky races involved with growing the TSB. So just
use base-size TSBs for user contexts and we can revisit enabling this
later.
One part of the SMP problems is that tsb_context_switch() can see
partially updated TSB configuration state if tsb_grow() is running in
parallel. That's easily solved with a seqlock taken as a writer by
tsb_grow() and taken as a reader to capture all the TSB config state
in tsb_context_switch().
Then there is flush_tsb_user() running in parallel with a tsb_grow().
In theory we could take the seqlock as a reader there too, and just
resample the TSB pointer and reflush but that looks really ugly.
Lastly, I believe there is a case with threads that results in a TSB
entry lock bit being set spuriously which will cause the next access
to that TSB entry to wedge the cpu (since the TSB entry lock bit will
never clear). It's either copy_tsb() or some bug elsewhere in the TSB
assembly.
Signed-off-by: David S. Miller <davem@davemloft.net>
The are distrupting, which by the sparc v9 definition means they
can only occur when interrupts are enabled in the %pstate register.
This never occurs in any of the trap handling code running at
trap levels > 0.
So just mark it as an unexpected trap.
This allows us to kill off the cee_stuff member of struct thread_info.
Signed-off-by: David S. Miller <davem@davemloft.net>
This way we don't need to lock the TSB into the TLB.
The trick is that every TSB load/store is registered into
a special instruction patch section. The default uses
virtual addresses, and the patch instructions use physical
address load/stores.
We can't do this on all chips because only cheetah+ and later
have the physical variant of the atomic quad load.
Signed-off-by: David S. Miller <davem@davemloft.net>
If we are returning back to kernel mode, %g4 could be live
(for example, in the case where we window spill in the etrap
code). So do not change it's value if going back to kernel.
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we use %g5 itself as a temporary, it can get clobbered
if we take an interrupt mid-stream and thus cause end up with
the final %g5 value too early as a result of rtrap processing.
Set %g5 at the very end, atomically, to avoid this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
%g6 is not necessarily set to current_thread_info()
at sparc64_realfault_common. So store the fault
code and address after we invoke etrap and %g6 is
properly set up.
Signed-off-by: David S. Miller <davem@davemloft.net>
Just flip the bit off of whatever it's currently set to.
PSTATE_IE is guarenteed to be enabled when we get here.
Signed-off-by: David S. Miller <davem@davemloft.net>
It is totally unnecessary complexity. After we take over
the trap table, we handle all PROM tlb misses fully.
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the trap code was still assuming that alternate
global %g6 was hard coded with current_thread_info().
Let's just consistently flush at KERNBASE when we need
a pipeline synchronization. That's locked into the TLB
and will always work.
Signed-off-by: David S. Miller <davem@davemloft.net>
The TSB_LOCK_BIT define is actually a special
value shifted down by 32-bits for the assembler
code macros.
In C code, this isn't what we want.
Signed-off-by: David S. Miller <davem@davemloft.net>
As the RSS grows, grow the TSB in order to reduce the likelyhood
of hash collisions and thus poor hit rates in the TSB.
This definitely needs some serious tuning.
Signed-off-by: David S. Miller <davem@davemloft.net>
This also cleans up tsb_context_switch(). The assembler
routine is now __tsb_context_switch() and the former is
an inline function that picks out the bits from the mm_struct
and passes it into the assembler code as arguments.
setup_tsb_parms() computes the locked TLB entry to map the
TSB. Later when we support using the physical address quad
load instructions of Cheetah+ and later, we'll simply use
the physical address for the TSB register value and set
the map virtual and PTE both to zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
Move {init_new,destroy}_context() out of line.
Do not put huge pages into the TSB, only base page size translations.
There are some clever things we could do here, but for now let's be
correct instead of fancy.
Signed-off-by: David S. Miller <davem@davemloft.net>
UltraSPARC has special sets of global registers which are switched to
for certain trap types. There is one set for MMU related traps, one
set of Interrupt Vector processing, and another set (called the
Alternate globals) for all other trap types.
For what seems like forever we've hard coded the values in some of
these trap registers. Some examples include:
1) Interrupt Vector global %g6 holds current processors interrupt
work struct where received interrupts are managed for IRQ handler
dispatch.
2) MMU global %g7 holds the base of the page tables of the currently
active address space.
3) Alternate global %g6 held the current_thread_info() value.
Such hardcoding has resulted in some serious issues in many areas.
There are some code sequences where having another register available
would help clean up the implementation. Taking traps such as
cross-calls from the OBP firmware requires some trick code sequences
wherein we have to save away and restore all of the special sets of
global registers when we enter/exit OBP.
We were also using the IMMU TSB register on SMP to hold the per-cpu
area base address, which doesn't work any longer now that we actually
use the TSB facility of the cpu.
The implementation is pretty straight forward. One tricky bit is
getting the current processor ID as that is different on different cpu
variants. We use a stub with a fancy calling convention which we
patch at boot time. The calling convention is that the stub is
branched to and the (PC - 4) to return to is in register %g1. The cpu
number is left in %g6. This stub can be invoked by using the
__GET_CPUID macro.
We use an array of per-cpu trap state to store the current thread and
physical address of the current address space's page tables. The
TRAP_LOAD_THREAD_REG loads %g6 with the current thread from this
table, it uses __GET_CPUID and also clobbers %g1.
TRAP_LOAD_IRQ_WORK is used by the interrupt vector processing to load
the current processor's IRQ software state into %g6. It also uses
__GET_CPUID and clobbers %g1.
Finally, TRAP_LOAD_PGD_PHYS loads the physical address base of the
current address space's page tables into %g7, it clobbers %g1 and uses
__GET_CPUID.
Many refinements are possible, as well as some tuning, with this stuff
in place.
Signed-off-by: David S. Miller <davem@davemloft.net>
Taking a nod from the powerpc port.
With the per-cpu caching of both the page allocator and SLAB, the
pgtable quicklist scheme becomes relatively silly and primitive.
Signed-off-by: David S. Miller <davem@davemloft.net>
We now use the TSB hardware assist features of the UltraSPARC
MMUs.
SMP is currently knowingly broken, we need to find another place
to store the per-cpu base pointers. We hid them away in the TSB
base register, and that obviously will not work any more :-)
Another known broken case is non-8KB base page size.
Also noticed that flush_tlb_all() is not referenced anywhere, only
the internal __flush_tlb_all() (local cpu only) is used by the
sparc64 port, so we can get rid of flush_tlb_all().
The kernel gets it's own 8KB TSB (swapper_tsb) and each address space
gets it's own private 8K TSB. Later we can add code to dynamically
increase the size of per-process TSB as the RSS grows. An 8KB TSB is
good enough for up to about a 4MB RSS, after which the TSB starts to
incur many capacity and conflict misses.
We even accumulate OBP translations into the kernel TSB.
Another area for refinement is large page size support. We could use
a secondary address space TSB to handle those.
Signed-off-by: David S. Miller <davem@davemloft.net>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch "[SPARC64]: Get rid of fast IRQ feature"
moved the the code from arch/sparc64/kernel/entry.S:
lduba [%g7] ASI_PHYS_BYPASS_EC_E, %g5
or %g5, AUXIO_AUX1_FTCNT, %g5
stba %g5, [%g7] ASI_PHYS_BYPASS_EC_E
andn %g5, AUXIO_AUX1_FTCNT, %g5
stba %g5, [%g7] ASI_PHYS_BYPASS_EC_E
to arch/sparc64/kernel/irq.c:
val = readb(auxio_register);
val |= AUXIO_AUX1_FTCNT;
writeb(val, auxio_register);
val &= AUXIO_AUX1_FTCNT;
writeb(val, auxio_register);
This looks like it it missing a bitwise not, which is reintroduced
by this patch.
Due to lack of a floppy device, I could not test it, but it looks
evident.
Signed-off-by: Bernhard R Link <brlink@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
sb1250_gettimeoffset() simply reads the current cpu 0 timer remaining
value, however once this counter reaches 0 and the interrupt is raised,
it immediately resets and begins to count down again.
If sb1250_gettimeoffset() is called on cpu 1 via do_gettimeofday() after
the timer has reset but prior to cpu 0 processing the interrupt and
taking write_seqlock() in timer_interrupt() it will return a full value
(or close to it) causing time to jump backwards 1ms. Once cpu 0 handles
the interrupt and timer_interrupt() gets far enough along it will jump
forward 1ms.
Fix this problem by implementing mips_hpt_*() on sb1250 using a spare
timer unrelated to the existing periodic interrupt timers. It runs at
1Mhz with a full 23bit counter. This eliminated the custom
do_gettimeoffset() for sb1250 and allowed use of the generic
fixed_rate_gettimeoffset() using mips_hpt_*() and timerhi/timerlo.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
The timers need to be loaded with 1 less than the desired interval not
the interval itself.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:
* do_timer() expects the arch-specific handler to take the lock as it
modifies jiffies[_64] and xtime.
* writing timerhi/lo in timer_interrupt() will mess up
fixed_rate_gettimeoffset() which reads timerhi/lo.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If dcache_size != icache_size or dcache_size != scache_size, or
set-associative cache, icache/scache does not flushed properly. Make
blast_?cache_page_indexed() masks its index value correctly. Also,
use physical address for physically indexed pcache/scache.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Bryce reported a bug wherein offlining CPU0 (on x86 box) and then
subsequently onlining it resulted in a lockup.
On x86, CPU0 is never offlined. The subsequent attempt to online CPU0
doesn't take that into account. It actually tries to bootup the already
booted CPU. Following patch fixes the problem (as acknowledged by Bryce).
Please consider for inclusion in 2.6.16.
Check if cpu is already online.
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
My patch (d7a5b2ffa1) to always panic if
lmb_alloc() fails is broken because it checks alloc < 0, but should be
checking alloc == 0.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
While adding USB support to an MV64360 based board this week, I
discovered that all MV64x60 boards in the kernel have platform_notify
functions marked with __init. This causes an oops if a device is added
after boot.
The patch below removes the __init markers. I do not have all these
boards to test on, but the change seems very unlikely to break anything
else.
Signed-off-by: Adrian Cox <adrian@humboldt.co.uk>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Dump a stream of rawbytes with a new 'dr' command.
Produces less output and it is simpler to feed the output to scripts.
Also, dr has no dumpsize limits.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This converts arch/ppc to kzalloc usage.
Crosscompile tested with allyesconfig.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge:
powerpc: update defconfigs
[PATCH] powerpc: properly configure DDR/P5IOC children devs
[PATCH] powerpc: remove duplicate EXPORT_SYMBOLS
[PATCH] powerpc: RTC memory corruption
[PATCH] powerpc: enable NAP only on cpus who support it to avoid memory corruption
[PATCH] powerpc: Clarify wording for CRASH_DUMP Kconfig option
[PATCH] powerpc/64: enable CONFIG_BLK_DEV_SL82C105
[PATCH] powerpc: correct cacheflush loop in zImage
powerpc: Fix problem with time going backwards
powerpc: Disallow lparcfg being a module
Patch from Catalin Marinas
Chosing of the CLCD RGB mode is no longer possible via the SYS_CLCD
register on the RealView boards. Instead, this configuration is done in the
CLCD primecell control register directly.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The dynamic add path for PCI Host Bridges can fail to configure children
adapters under P5IOC controllers. It fails to properly fixup bus/device
resources, and it fails to properly enable EEH. Both of these steps
need to occur before any children devices are enabled in
pci_bus_add_devices().
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
remove warnings when building a 64bit kernel.
smp_call_function triggers also with 32bit kernel.
WARNING: vmlinux: duplicate symbol 'smp_call_function' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:164:EXPORT_SYMBOL(smp_call_function);
arch/powerpc/kernel/smp.c:300:EXPORT_SYMBOL(smp_call_function);
WARNING: vmlinux: duplicate symbol 'ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:113:EXPORT_SYMBOL(ioremap);
arch/powerpc/mm/pgtable_64.c:321:EXPORT_SYMBOL(ioremap);
WARNING: vmlinux: duplicate symbol '__ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:117:EXPORT_SYMBOL(__ioremap);
arch/powerpc/mm/pgtable_64.c:322:EXPORT_SYMBOL(__ioremap);
WARNING: vmlinux: duplicate symbol 'iounmap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:118:EXPORT_SYMBOL(iounmap);
arch/powerpc/mm/pgtable_64.c:323:EXPORT_SYMBOL(iounmap);
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We should be memset'ing the data we are pointing to, not the pointer
itself. This is in an error path so we probably don't hit it much.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch fixes incorrect setting of powersave_nap to 1 on all
PowerMacs, potentially causing memory corruption on some models. This
bug was introuced by me during the 32/64 bits merge.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The wording of the CRASH_DUMP Kconfig option is not very clear. It gives you a
kernel that can be used _as_ the kdump kernel, not a kernel that can boot into
a kdump kernel.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Enable the onboard IDE driver for p610, p615 and p630.
They have the CD connected to this card. All other RS/6000 systems with this
controller have no connectors and dont need this option.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Correct the loop for cacheflush. No idea where I copied the code from,
but the original does not work correct. Maybe the flush is not needed.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The recent changes to keep gettimeofday in sync with xtime had the side
effect that it was occasionally possible for the time reported by
gettimeofday to go back by a microsecond. There were two reasons:
(1) when we recalculated the offsets used by gettimeofday every 2^31
timebase ticks, we lost an accumulated fractional microsecond, and
(2) because the update is done some time after the notional start of
jiffy, if ntp is slowing the clock, it is possible to see time go backwards
when the timebase factor gets reduced.
This fixes it by (a) slowing the gettimeofday clock by about 1us in
2^31 timebase ticks (a factor of less than 1 in 3.7 million), and (b)
adjusting the timebase offsets in the rare case that the gettimeofday
result could possibly go backwards (i.e. when ntp is slowing the clock
and the timer interrupt is late). In this case the adjustment will
reduce to zero eventually because of (a).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Patch from Ben Dooks
arch/arm/kernel/setup.c declares mem_fclk_21285 when
this is already declared in include/asm-arm/system.h
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
arch/arm/kernel/compat.c exports two functions,
convert_to_tag_list and squash_mem_tags which
are not defined in any header files, and not
used outside arch/arm/kernel.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Ben Dooks
Fix the following warnings from sparse:
arch/arm/kernel/process.c:86:6: warning: symbol 'default_idle' was not declared. Should it be static?
arch/arm/kernel/process.c:378:5: warning: symbol 'dump_fpu' was not declared. Should it be static?
Include <linux/elfcore.h> for dump_fpu() decleration, and
make default_idle() static as it is not used outside the file.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Albrecht Dreß
Add DMA resources to s3c2410 spi platform devices - dma_(alloc|free)_coherent should now work as expected.
Signed-off-by: Albrecht Dreß <albrecht.dress@lios-tech.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Pavel Machek
Enable frontlight during collie bootup, so that display is actually
readable in anything other than bright sunlight.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
lapic_shutdown() re-enables interrupts which is un-desirable for panic
case, so use local_irq_save() and local_irq_restore() to keep the irqs
disabled for kexec on panic case, and close a possible race window while
kdump shutdown as shown in this stack trace
-- BUG: spinlock lockup on CPU#1, bash/4396, c52781a0
[<c01c1870>] _raw_spin_lock+0xb7/0xd2
[<c029e148>] _spin_lock+0x6/0x8
[<c011b33f>] scheduler_tick+0xe7/0x328
[<c0128a7c>] update_process_times+0x51/0x5d
[<c0114592>] smp_apic_timer_interrupt+0x4f/0x58
[<c01141ff>] lapic_shutdown+0x76/0x7e
[<c0104d7c>] apic_timer_interrupt+0x1c/0x30
[<c01141ff>] lapic_shutdown+0x76/0x7e
[<c0116659>] machine_crash_shutdown+0x83/0xaa
[<c013cc36>] crash_kexec+0xc1/0xe3
[<c029e148>] _spin_lock+0x6/0x8
[<c013cc22>] crash_kexec+0xad/0xe3
[<c0215280>] __handle_sysrq+0x84/0xfd
[<c018d937>] write_sysrq_trigger+0x2c/0x35
[<c015e47b>] vfs_write+0xa2/0x13b
[<c015ea73>] sys_write+0x3b/0x64
[<c0103c69>] syscall_call+0x7/0xb
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit c33d4568ac.
Andrew Clayton and Hugh Dickins report that it's broken for them and
causes strange page table and slab corruption, and spontaneous reboots.
Let's get it right next time.
Cc: Andrew Clayton <andrew@rootshell.co.uk>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current pcspkr code combines the device and driver registration.
This patch splits these, putting the device registration in the arch
specific code.
PowerPC and MIPS only have the pcspkr present sometimes.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
The lparcfg code needs several things which are pretty arcane internal
details and which we don't want to export, which means that lparcfg
doesn't work when built as a module. This makes it a bool instead of
a tristate in the Kconfig so that users can't try to build it as a
module.
Signed-off-by: Paul Mackerras <paulus@samba.org>
EM64T CPUs have somewhat weird error reporting for non canonical RIPs in
SYSRET.
We can't handle any exceptions there because the exception handler would
end up running on the user stack which is unsafe.
To avoid problems any code that might end up with a user touched pt_regs
should return using int_ret_from_syscall. int_ret_from_syscall ends up
using IRET, which allows safe exceptions.
Cc: Ernie Petrides <petrides@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] iwmmxt thread state alignment
[ARM] 3350/1: Enable 1-wire on ARM
[ARM] 3356/1: Workaround for the ARM1136 I-cache invalidation problem
[ARM] 3355/1: NSLU2: remove propmt depends
[ARM] 3354/1: NAS100d: fix power led handling
[ARM] Fix muldi3.S
This patch removes the reliance of iwmmxt on hand coded alignments.
Since thread_info is always 8K aligned, specifying that fpstate is
8-byte aligned achieves the same effect without needing to resort
to hand coded alignments.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This seems to work for a short period of time, but when
used in conjunction with a userspace governor that changes
the frequency regularly, it's only a matter of time before
everything just locks up.
Signed-off-by: Dave Jones <davej@redhat.com>
Patch from Alessandro Zummo
This patches add the 1-wire drivers
to the ARM Kconfig.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Catalin Marinas
ARM1136 erratum 371025 (category 2) specifies that, under rare
conditions, an invalidate I-cache by MVA (line or range) operation can
fail to invalidate a cache line. The recommended workaround is to
either invalidate the entire I-cache or invalidate the range by
set/way rather than MVA.
Note that for a 16K cache size, invalidating a 4K page by set/way is
equivalent to invalidating the entire I-cache.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix the code to disable freqs less than 2GHz in N60 errata.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Currently the code tries up to spin_retry times to grab a lock using the cs
instruction. The cs instruction has exclusive access to a memory region
and therefore invalidates the appropiate cache line of all other cpus. If
there is contention on a lock this leads to cache line trashing. This can
be avoided if we first check wether a cs instruction is likely to succeed
before the instruction gets actually executed.
Signed-off-by: Christian Ehrhardt <ehrhardt@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
/usr/src/ctest/git/kernel/mm/rmap.c: In function `page_referenced_one':
/usr/src/ctest/git/kernel/mm/rmap.c:354: warning: implicit declaration of function `rwsem_is_locked'
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a lockup which was introduced during the conversion to the generic IRQ
framework.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
At times gcc will place bits of __exit functions into .rodata. If
compiled into the kernle itself we used to discard .exit.text - but
not the bits left in .rodata. While harmless this did at times result
in a large number of warnings. So until gcc fixes this, discard
.exit.text at runtime.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
In case a particular system doesn't support highmem the runtime checks
will ensure nothing bad is going to happen.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
ATI chipsets tend to generate double timer interrupts for the local APIC
timer when both the 8254 and the IO-APIC timer pins are enabled. This is
because they route it to both and the result is anded together and the CPU
ends up processing it twice.
This patch changes check_timer to disable the 8254 routing for interrupt 0.
I think it would be safe on all chipsets actually (i tested it on a couple
and it worked everywhere) and Windows seems to do it in a similar way, but
to be conservative this patch only enables this mode on ATI (and adds
options to enable/disable too)
Ported over from a similar x86-64 change.
I reused the ACPI earlyquirk infrastructure for the ATI bridge check, but
tweaked it a bit to work even without ACPI.
Inspired by a patch from Chuck Ebbert, but redone.
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A pte may be zapped by the swapper, exiting process, unmapping or page
migration while the accessed or dirty bit handers are about to run. In that
case the accessed bit or dirty is set on an zeroed pte which leads the VM to
conclude that this is a swap pte. This may lead to
- Messages from the vm like
swap_free: Bad swap file entry 4000000000000000
- Processes being aborted
swap_dup: Bad swap file entry 4000000000000000
VM: killing process ....
Page migration is particular suitable for the creation of this race since
it needs to remove and restore page table entries.
The fix here is to check for the present bit and simply not update
the pte if the page is not present anymore. If the page is not present
then the fault handler should run next which will take care of the problem
by bringing the page back and then mark the page dirty or move it onto the
active list.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Patch from Alessandro Zummo
The patch that would have made the NSLU2
kernel non compatible with other ixp4xx machs
never entered the kernel. So it is actually
safe to remove the prompt dependencies.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Alessandro Zummo
Disable GPIO clocks to allow
the power led to work properly.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
While testing kexec and kdump we hit problems where the new kernel would
freeze or instantly reboot. The easiest way to trigger it was to kexec a
kernel compiled for CONFIG_M586 on an athlon cpu. Compiling for CONFIG_MK7
instead would work fine.
The patch fixes a few problems with the kexec inline asm.
Signed-off-by: Chris Mason <mason@suse.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The x86_model calculation also applies for family 6. early_cpu_detect
does the right thing, but generic_identify misses.
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
strnlen_user is supposed to return then length count + 1 if no terminating \0
is found, and it should return 0 on exception. Found by David Howells
<dhowells@redhat.com>.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix i386 nmi_watchdog that does not meet watchdog timeout condition. It
does not hit die_nmi when it should be triggered, because the current
nmi_watchdog_tick in arch/i386/kernel/nmi.c never count up alert_counter
like this:
void nmi_watchdog_tick (struct pt_regs * regs) {
if (last_irq_sums[cpu] == sum) {
alert_counter[cpu]++; <- count up alert_counter, but
if (alert_counter[cpu] == 5*nmi_hz)
die_nmi(regs, "NMI Watchdog detected LOCKUP");
alert_counter[cpu] = 0; <- reset alert_counter
This patch changes it back to the previous and working version.
This was found and originally written by Kohta NAKASHIMA.
(akpm: also uninline write_watchdog_counter(), saving 184 byets)
Signed-off-by: GOTO Masanori <gotom@sanori.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] mca recovery return value when no bus check
[IA64] SGI SN drivers: don't report !sn2 hardware as an error
[IA64] don't report !sn2 or !summit hardware as an error
[IA64] gensparse_defconfig: turn on PNPACPI
[IA64] Increase severity of MCA recovery messages
When shifting the low-parts of signed numbers, a logical shift
should be used to avoid sign-extending a bit which isn't a sign
bit.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
A careful reading of the recent changes to the system call entry/exit
paths revealed several problems, plus some things that could be
simplified and improved:
* 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit
path, so it was only doing anything with it once it saw some other
bit being set. In other words, the noerror behaviour would apply to
the next system call where we had to reschedule or deliver a signal,
which is not necessarily the current system call.
* 32-bit wasn't doing the call to ptrace_notify in the syscall exit
path when the _TIF_SINGLESTEP bit was set.
* _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and
_TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set
by system calls. I took it out of _TIF_USER_WORK_MASK.
* On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers
to be restored (unless perhaps a signal was delivered or the syscall
was traced or single-stepped). Thus the non-volatile registers
weren't restored on exit from a signal handler. We probably got
away with it mostly because signal handlers written in C wouldn't
alter the non-volatile registers.
* On 32-bit I simplified the code and made it more like 64-bit by
making the syscall exit path jump to ret_from_except to handle
preemption and signal delivery.
* 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was
set - but I think because of that 32-bit was actually restoring the
non-volatile registers on exit from a signal handler.
* I changed the order of enabling interrupts and saving the
non-volatile registers before calling do_syscall_trace_leave; now we
enable interrupts first.
Signed-off-by: Paul Mackerras <paulus@samba.org>
When there is no bus check, the return code should be failure, not success.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
This stuff is all in the generic ia64 kernel, and the new initcall error
reporting complains about them.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Turn on CONFIG_PNPACPI. I recently removed 8250_acpi.c. All devices
previously claimed by 8250_acpi.c should now be claimed by 8250_pnp.c.
This depends on having CONFIG_PNPACPI so ACPI devices show up as PNP
devices.
All other ia64 defconfigs either have CONFIG_PNPACPI already, or
don't have 8250 support turned on at all.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The MCA recovery messages are currently KERN_DEBUG,
so they don't show up in /var/log/messages (by default).
Increase the severity to KERN_ERR, for the initial
message (and also add the physical address to this
message). Leave the successful isolation message as
KERN_DEBUG, but increase the severity when isolation
fails to KERN_CRIT.
[Russ' patch made these all KERN_CRIT]
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Patch from Alessandro Zummo
nas100d_power_exit(void) gets some protection
to avoid freeing an irq when it is not appropriate to do so.
Signed-off-by: Rod Whitby <rod@whitby.id.au>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Catalin Marinas
Chapter B2.7.3 in the latest ARM ARM (with v6 information) states that
the completion of a TLB maintenance operation is only guaranteed by
the execution of a DSB (Data Syncronization Barrier, formerly Data
Write Barrier or Drain Write Buffer).
Note that a DSB is only needed in the flush_tlb_kernel_* functions
since the completion is guaranteed by a mode change (i.e. switching
back to user mode) for the flush_tlb_user_* functions.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch makes the kernel bootable again on ia32 EFI systems.
Signed-off-by: Edgar Hucek <hostmaster@ed-soft.at>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the missing pm_power_off's for the h8300, v850 and xtensa
architectures.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Also from Thomas Gleixner <tglx@linutronix.de>
Function next_timer_interrupt() got broken with a recent patch
6ba1b91213 as sys_nanosleep() was moved to
hrtimer. This broke things as next_timer_interrupt() did not check hrtimer
tree for next event.
Function next_timer_interrupt() is needed with dyntick (CONFIG_NO_IDLE_HZ,
VST) implementations, as the system can be in idle when next hrtimer event
was supposed to happen. At least ARM and S390 currently use
next_timer_interrupt().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 timer_resume is updating jiffies, not jiffies_64. It looks there is a
potential overflow problem. And jiffies_64 and wall_jiffies should be
protected by xtime_lock.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The kbuild system takes advantage of an incorrect behavior in GNU make.
Once this behavior is fixed, all files in the kernel rebuild every time,
even if nothing has changed. This patch ensures kbuild works with both
the incorrect and correct behaviors of GNU make.
For more details on the incorrect behavior, see:
http://lists.gnu.org/archive/html/bug-make/2006-03/msg00003.html
Changes in this patch:
- Keep all targets that are to be marked .PHONY in a variable, PHONY.
- Add .PHONY: $(PHONY) to mark them properly.
- Remove any $(PHONY) files from the $? list when determining whether
targets are up-to-date or not.
Signed-off-by: Paul Smith <psmith@gnu.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
powernow-k8: Let cpufreq driver handle affected CPUs
Let the cpufreq driver manage AMD Dual-Core CPUs being tied together.
Since cpufreq driver's affected CPUs data, cpufreq_policy->cpus, already
knows about which cores are tied together, powernow driver does not have
keep its internal data for every core. (even a pointer.. it will never
be called on) Telling cpufreq driver about cpu_core_map at init time is
sufficient.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
We must use the "a" (allocate) attribute every time we
emit an entry into the __ex_table section.
For consistency, use "a" instead of #alloc which is some
Solaris compat cruft GNU as provides on Sparc.
Signed-off-by: David S. Miller <davem@davemloft.net>
yaboot is scrogged and calls us with an invalid stack alignment,
it seems.
Thanks to David Woodhouse to pointing me to the problem.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On Thu, 2006-03-02 at 19:55 +0100, Olaf Hering wrote:
> My iBook1 has 2 memory regions in reg. Depending on how I boot it
> (vmlinux+initrd) or zImage.initrd, it will not boot with current Linus
> tree.
> rmo_top should be 160MB instead of 32MB.
On logically-partitioned machines the first element of the reg
property in the memory node is defined to be the "RMO" region,
i.e. the memory that the processor can access in real mode. On other
machines the first element has no special meaning, so only take it to
be the RMO region on LPAR machines.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch makes userland aware of the icache snoop capability of the
POWER5 (and possibly others in the future) and of SMT capabilities.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The variable `timebase' used to transfer the current timebase value
from one cpu to the other in smp_core99_give/take_timebase was only
an unsigned long, i.e. 32 bits on 32-bit machines. It needs to be
64 bits. This makes it a u64, and fixes the issue reported by Kyle
Moffett, that the two cpus see wildly different values for the time
of day.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is along the lines suggested by Chris Lumens but goes further
in that it leaves the DEBUG symbol undefined, making the DBG macro
empty.
Signed-off-by: Paul Mackerras <paulus@samba.org>
On 32-bit, the exception prolog for the program check exception doesn't
enable interrupts early on. If it is an illegal instruction exception,
we read the instruction in order to emulate certain instructions, and
the get_user of the instruction triggers a WARN_ON since interrupts
are still disabled. This adds a local_irq_enable() to enable
interrupts before reading the instruction.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 'tick_usec' is USER_HZ period in usec. do_gettimeofday() should
use kernel HZ value.
Here is a patch for MIPS. It seems m32r, m68k and sparc have same
problem though their HZ and USER_HZ are same for now.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch adds mm->task_size to keep track of the task size of a given mm
and uses that to fix the powerpc vdso so that it uses the mm task size to
decide what pages to fault in instead of the current thread flags (which
broke when ptracing).
(akpm: I expect that mm_struct.task_size will become the way in which we
finally sort out the confusion between 32-bit processes and 32-bit mm's. It
may need tweaks, but at this stage this patch is powerpc-only.)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This driver loops over 'num_online_cpus', but it doesn't account for holes
in the online map created by offlined cpus, and assumes that the cpu
numbers stay linear.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow sysadmin to disable all warnings about userland apps
making unaligned accesses by using:
# echo 1 > /proc/sys/kernel/ignore-unaligned-usertrap
Rather than having to use prctl on a process by process basis.
Default behaivour leaves the warnings enabled.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When a CPU has no scache, the scache flushing functions currently
aren't getting initialized and the NULL pointer is eventually called
as a function. Initialize the scache flushing functions as a noop
when there's no scache.
Initial patch by me and most of the debugging done by Martin Michlmayr.
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
beautify coding style for zeroing end of fsyscall_table entries.
Remove misleading __NR_syscall_last and add more comments.
Drop (now unneeded) "guard against failure to increase NR_syscalls"
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Before the merge I updated create_pte_mapping() to work for iSeries, by
calling iSeries_hpte_bolt_or_insert. (4c55130b2a)
Later we changed iSeries_hpte_insert to cope with the bolting case, and called
that instead from create_pte_mapping() (which was renamed to htab_bolt_mapping)
(3c726f8dee).
Unfortunately that change introduced a subtle bug, where we pass an absolute
address to iSeries_hpte_insert() where it expects a physical address. This
leads to us calling phys_to_abs() twice on the physical address, which is
seriously bogus.
This only causes a problem if the absolute address from the first translation
can be looked up again in the chunk_map, which depends on the size and layout
of memory. I've seen it fail on one box, but not others.
The minimal fix is to pass the physical address to iSeries_hpte_insert(). For
2.6.17 we should make phys_to_abs() BUG if we try to double-translate an
address.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A bug in the assembly code of the vdso can cause gettimeofday() to hang
or to return incorrect results. The wrong register was used to test for
pending updates of the calibration variables and to create a dependency
for subsequent loads. This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some hotplug driver functions were migrated to the kernel for use by EEH
in commit 2bf6a8fa21.
Previously, the PCI Hotplug module had been changed to use the new
OFDT-based PCI probe when appropriate:
5fa80fcdca
When rpaphp_pci_config_slot() was moved from the rpaphp driver to the
new kernel function pcibios_add_pci_devices(), the OFDT-based probe
stuff was dropped. This patch restores it.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This reverts commit 13a229abc2.
Quoth Andi:
"After some consideration and feedback from various people it turns
out this wasn't that good an idea. It has some problems and needs
more work. Since it was only an optimization anyways it's best to
just back it out again for now."
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/ia64/kernel/unaligned.c erroneously marked die_if_kernel()
with a "noreturn" attribute ... which is silly (it returns whenever
the argument regs say that the fault happened in user mode, as one
might expect given the "if_kernel" part of its name!). Thanks to
Alan and Gareth for pointing this out.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Christoph Hellwig <hch@infradead.org> pointed that there are no
in-tree uses of this. So revert 9c65cb9be6
Signed-off-by: Tony Luck <tony.luck@intel.com>
pcibios_setup() should return NULL if it handled a parameter. Since ia64
handles no parameters, it should return the string that was passed in,
not NULL. This brings ia64 into line with all other architectures that
handle no parameters.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Take advantage of kzalloc() as well as reduce the size of code generated
for the error returns in xpc_setup_infrastructure().
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
unaligned_access does fetch cr.ipsr, then calls
dispatch_unaligned_handler, but dispatch_unaligned_handler fetches
cr.ipsr again, so delete the first one.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Not just cleanup but also fixes O32 readdir(2) emulation.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
A recent change requires cpu_possible_map to be initialized before
smp_sched_init() but most MIPS platforms were initializing their
processors in the prom_prepare_cpus callback of smp_prepare_cpus. The
simple fix of calling prom_prepare_cpus from one of the earlier SMP
initialization hooks doesn't work well either since IPIs may require
init_IRQ() to have completed, so bit the bullet and split
prom_prepare_cpus into two initialization functions, plat_smp_setup
which is called early from setup_arch and plat_prepare_cpus called where
prom_prepare_cpus used to be called.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The inline cputime_to_foo and foo_to_cputime conversion functions in
include/asm-powerpc/cputime.h refer to 5 variables, which need to be
exported if those functions are to be usable from modules.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 9ec4b1f356 made kprobes not compile
without module support, so just make that clear in the Kconfig file.
Also, since it's marked EXPERIMENTAL, make that dependency explicit too.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The change to kernel/sched.c's init code to use for_each_cpu()
requires that the cpu_possible_map be setup much earlier.
Set it up via setup_arch(), constrained to NR_CPUS, and later
constrain it to max_cpus in smp_prepare_cpus().
This fixes SMP booting on sparc64.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9c869edac5 broke voyager again
rather subtly because it already had its own topology exporting
functions, so now each CPU gets registered twice.
I think we can actually use the generic ones, so I don't propose
reverting it. The attached should eliminate the voyager topology
functions in favour of the generic ones.
I also added a define to ensure voyager is never hotplug CPU (we don't
have the support in the SMP harness).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The commit e2c0388866 added
setup_additional_cpus to setup.c but this is only defined if
CONFIG_HOTPLUG_CPU is set. This patch changes the #ifdef to reflect that.
Signed-off-by: Brian Magnuson <magnuson@rcn.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The previous experiment for using apicmaintimer on ATI systems didn't
work out very well. In particular laptops with C2/C3 support often
don't let it tick during idle, which makes it useless. There were also
some other bugs that made the apicmaintimer often not used at all.
I tried some other experiments - running timer over RTC and some other
things but they didn't really work well neither.
I rechecked the specs now and it turns out this simple change is
actually enough to avoid the double ticks on the ATI systems. We just
turn off IRQ 0 in the 8254 and only route it directly using the IO-APIC.
I tested it on a few ATI systems and it worked there. In fact it worked
on all chipsets (NVidia, Intel, AMD, ATI) I tried it on.
According to the ACPI spec routing should always work through the
IO-APIC so I think it's the correct thing to do anyways (and most of the
old gunk in check_timer should be thrown away for x86-64).
But for 2.6.16 it's best to do a fairly minimal change:
- Use the known to be working everywhere-but-ATI IRQ0 both over 8254
and IO-APIC setup everywhere
- Except on ATI disable IRQ0 in the 8254
- Remove the code to select apicmaintimer on ATI chipsets
- Add some boot options to allow to override this (just paranoia)
In 2.6.17 I hope to switch the default over to this for everybody.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SMP time selection originally ran after all CPUs were brought up because
it needed to know the number of CPUs to decide if it needs an MP safe
timer or not.
This is not needed anymore because we know present CPUs early.
This fixes a couple of problems:
- apicmaintimer didn't always work because it relied on state that was
set up time_init_gtod too late.
- The output for the used timer in early kernel log was misleading
because time_init_gtod could actually change it later. Now always
print the final timer choice
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It didn't set up the CPU possible map early enough, so the
option didn't actually work.
Noticed by Heiko Carstens
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[description from AK]
Old check for the IO-APIC watchdog during the timer check was wrong -
it obviously should only drop into this if the IO-APIC watchdog is used.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes x86-64 use the common X86_PM_TIMER Kconfig entry in drivers/acpi
And since PM timer is needed for correct timing on a lot of systems
now (e.g. AMD dual cores) and we often get bug reports from people
who forgot to set it make it depend on CONFIG_EMBEDDED. x86-64 had
this change before and it's a good thing.
I also fixed the description slightly to make this more clear.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[description from AK]
This fixes booting in APIC mode on some ACER laptops. x86-64
did a similar change some time ago.
See http://bugzilla.kernel.org/show_bug.cgi?id=4700 for details
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Big Unisys systems have multiple clusters too, but they have an
synchronized TSC.
I'm using the SMBIOS to check for vendor == IBM.
Cc: Chris McDermott <lcm@us.ibm.com>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@unisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In previous versions of pci-gart.c, no_iommu was used to determine if IOMMU was
disabled in the GART DMA mapping functions. This changed in 2.6.16 and now
gart_xxx() functions are only called if gart is enabled. Therefore, uses of
no_iommu in the GART code are no longer necessary and can be removed.
Also, it removes double deceleration of no_iommu and force_iommu in pci.h and
proto.h, by removing the deceleration in pci.h.
Lastly, end_pfn off by one error.
Tested (along with patch 1/2) on dual opteron with gart enabled, iommu=soft,
and iommu=off.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 9c869edac5 moved the i386 topology.c
file. That change broke x86-64 compiles, as it uses the same file.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mem= command line option was being ignored in arch/powerpc if we were not
a CONFIG_MULTIPLATFORM (which is handled via prom_init stub). The initial
command line extraction and parsing needed to be moved earlier in the boot
process and have code to actual parse mem= and do something about it.
Also, fixed a compile warning in the file.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When compiling a non-default subarch, topology.c is missing from the kernel
build. This causes builds with CONFIG_HOTPLUG_CPU to fail. In addition,
on Intel processors with cpuid level > 4, it causes intel_cacheinfo.c to
reference uninitialized data that should have been set up by the initcall
in topology.c which calls register_cpu. This causes a kernel panic on boot
on newer Intel processors. Moving topology.c to arch/i386/kernel fixes
both of these problems.
Thanks to Dan Hecht for finding and fixing this problem.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Dan Hecht <dhect@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm currently at the POSIX meeting and one thing covered was the
incompatibility of Linux's link() with the POSIX definition. The name.
Linux does not follow symlinks, POSIX requires it does.
Even if somebody thinks this is a good default behavior we cannot change this
because it would break the ABI. But the fact remains that some application
might want this behavior.
We have one chance to help implementing this without breaking the behavior.
For this we could use the new linkat interface which would need a new
flags parameter. If the new parameter is AT_SYMLINK_FOLLOW the new
behavior could be invoked.
I do not want to introduce such a patch now. But we could add the
parameter now, just don't use it. The patch below would do this. Can we
get this late patch applied before the release more or less fixes the
syscall API?
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Recent GDT changes broke the SMP boot sequence if the booting CPU is
numbered anything other than zero. There's also a subtle source of error
in that the boot time CPU now uses cpu_gdt_table (which is actually the GDT
for booting CPUs in head.S). This patch fixes both problems by making GDT
descriptors themselves allocated from a per_cpu area and switching to them
in cpu_init(), which now means that cpu_gdt_table is exclusively used for
booting CPUs again.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Matt Tolentino <metolent@snoqualmie.dp.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Right at the moment (thanks to a patch from Andrew), cpu_possible_map on
voyager is CPU_MASK_NONE, which means the machine always thinks it has no
CPUs. Fix that by doing an early initialisation of the cpu_possible_map
from the cpu_phys_present_map.
(akpm: we aim to please)
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It looks like I can't get away without exporting topology functions from
voyager any longer, so add them to the voyager subarchitecture.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a problem seen on i686 machine with NX support where the instruction
could not be single stepped because of NX bit set on the memory pages
allocated by kprobes module. This patch provides allocation of instruction
solt so that the processor can execute the instruction from that location
similar to x86_64 architecture. Thanks to Bibo and Masami for testing this
patch.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Improve (especially for coherence) some prototypes, and return code of
init_cow_file in error case - for a short write return -EINVAL, otherwise
return the error we got!
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do precise error handling: print precise error messages, distinguishing short
reads and read errors. This functions fails frequently enough for me so I
bothered doing this fix.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix an fd leak and a return of -1 instead of -errno in the error path - this
showed up in intensive testing of HPPFS, the os_connect_socket user.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use __attribute_used__ instead of __attribute__ ((unused)). This will help
with GCC > 3.2.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To avoid conflicts, in kernel files errno is expanded to kernel_errno, to
distinguish it from glibc errno. In this case, the code wants to use the libc
errno but the kernel one is used; in the other usage, we return errno in place
of -errno in case of an error.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Improve some error messages in the COW driver, and say V3, not V2, when
talking about V3 format. Also resync with our userspace code utility a bit
more.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix and update for gcc-4.0.
- arch/m32r/kernel/signal.c:
Change type of the 8th parameter of sys_rt_sigsuspend() from
'struct pt_regs' to 'struct pt_regs *'.
This functions make use of the 'regs' parameter to return status value,
but gcc-4.0 optimizes and removes it as a dead code.
Functions, sys_sigaltstack() and sys_rt_sigreturn(), have also modified.
- arch/m32r/lib/usercopy.c, include/asm-m32r/uaccess.h:
Add early-clobber constraints('&') to output values of asm statements;
these constraints seems to be required for gcc-4.0 register assignment.
Signed-off-by: Hayato Fujiwara <fujiwara@linux-m32r.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add -O2 option to AFLAGS to enable asm code optimization for m32r.
On m32r gas, "-m32r2 -O" option enables assembler's parallel code
generation optimization for M32R2 ISA as a default. So, "-no-parallel"
option is required explicitly for a cpu core with single instuction
issuing, for example, VDEC2.
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Andrew Victor
disable_irq() lazily disables the interrupt, so the IRQ is only disabled
once the interrupt occurs again. The GPIO interrupt handler therefore
must first check disable_depth to see if the IRQ needs to be disabled.
Orignal patch by Bill Gatliff.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This implements accurate task and cpu time accounting for 64-bit
powerpc kernels. Instead of accounting a whole jiffy of time to a
task on a timer interrupt because that task happened to be running at
the time, we now account time in units of timebase ticks according to
the actual time spent by the task in user mode and kernel mode. We
also count the time spent processing hardware and software interrupts
accurately. This is conditional on CONFIG_VIRT_CPU_ACCOUNTING. If
that is not set, we do tick-based approximate accounting as before.
To get this accurate information, we read either the PURR (processor
utilization of resources register) on POWER5 machines, or the timebase
on other machines on
* each entry to the kernel from usermode
* each exit to usermode
* transitions between process context, hard irq context and soft irq
context in kernel mode
* context switches.
On POWER5 systems with shared-processor logical partitioning we also
read both the PURR and the timebase at each timer interrupt and
context switch in order to determine how much time has been taken by
the hypervisor to run other partitions ("steal" time). Unfortunately,
since we need values of the PURR on both threads at the same time to
accurately calculate the steal time, and since we can only calculate
steal time on a per-core basis, the apportioning of the steal time
between idle time (time which we ceded to the hypervisor in the idle
loop) and actual stolen time is somewhat approximate at the moment.
This is all based quite heavily on what s390 does, and it uses the
generic interfaces that were added by the s390 developers,
i.e. account_system_time(), account_user_time(), etc.
This patch doesn't add any new interfaces between the kernel and
userspace, and doesn't change the units in which time is reported to
userspace by things such as /proc/stat, /proc/<pid>/stat, getrusage(),
times(), etc. Internally the various task and cpu times are stored in
timebase units, but they are converted to USER_HZ units (1/100th of a
second) when reported to userspace. Some precision is therefore lost
but there should not be any accumulating error, since the internal
accumulation is at full precision.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Maple firmware does not need PCI resource allocation, and in fact, it
can cause problems in some strange cases.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Do disable, not enable, the HT APIC IRQ in the function that is
supposed to.
Enable the MPIC IRQ before enabling the downstream APIC IRQ, avoids
potentially losing an interrupt.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Update defconfigs for g5, pseries and generic ppc64. Default choices
for everything, with the following exceptions:
* Enable WINDFARM_PM112 on g5 and ppc64.
* Increase CONFIG_NR_CPUS to 4 in g5_defconfig
* CONFIG_TIGON3=y instead of =m in g5_defconfig
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
HMT support is currently broken and needs to be reworked to play nicely
with the SMT scheduler. Remove the bit rotten bits for the time being.
I also updated an incorrect comment, we enter __secondary_hold with the
physical cpu id in r3.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The runlatch SPR can take a lot of time to write. My original runlatch
code would set it on every exception entry even though most of the time
this was not required. It would also continually set it in the idle
loop, which is an issue on an SMT capable processor.
Now we cache the runlatch value in a threadinfo bit, and only check for
it in decrementer and hardware interrupt exceptions as well as the idle
loop. Boot on POWER3, POWER5 and iseries, and compile tested on pmac32.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
native_hpte_clear has a spinlock recursion problem with the native_tlbie_lock
being called twice, once in native_hpte_clear() and once within tlbie().
Fix the problem by changing the call to tlbie() in native_hpte_clear() to
__tlbie(). It still supports only 4k pages for now.
Signed-off-by: R Sharada <sharada@in.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Disable OProfile in Kconfig for iSeries to prevent hangs. OProfile
was not originally intended to work with legacy iSeries.
Signed-off-by: Kelly Daly <kelly@au.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
altivec_unavailable_exception is called without setting r3... it looks like
the r3 that actually gets passed in as struct pt_regs *regs is the
undisturbed value of r3 at the time the altivec instruction was encountered.
The user actually gets to choose the pt_regs printed in the Oops!
This fixes the oops by passing the correct pt_regs pointer to
altivec_unavailable_exception.
Signed-off-by: Alan Curry <pacman@TheWorld.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The panic CPU is waiting forever due to some large timeout value if some
CPU is not responding to an IPI.
This patch fixes the problem - the maximum waiting period will be
10 seconds and then the kdump boot will go ahead.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix up xmon compilation after the last change.
Remove lots of dead code, all the pmac and chrp support is in arch/powerpc
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For kexec we need to know the size of the MMU hash table.
Currently we calculate the size once in the htab code, and then twice more in
the kexec code, once using htab_hash_mask and once using ppc64_pft_size.
On some machines the ppc64_pft_size calculation is broken because
ppc64_pft_size is not set.
So we need to fix the second calculation, but better still we should just
calculate the size once and use it everywhere else.
Tested on Power5 LPAR, Power4 non-LPAR and Power3.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When I changed the hvlpevent_queue code to use a spinlock instead of a
custom atomic (719d1cd867) I didn't
initialise the lock anywhere, oops.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Patch from Mårten Wikström
This patch fixes a bug in ixp4xx_set_irq_type() which leads to
GPIO being incorrectly set to both edge triggered for raising
as well as falling edge interrupt types. See the previous
discussion on patch 3312/1.
Signed-off-by: Mårten Wikström <marten.wikstrom@passito.se>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Andrew Victor
This patch adds the at91_set_multi_drive() function to enable/disable
the multi-drive (open collector) pin capability on the AT91RM9200
processor.
This is necessary to fix the UDC (USB Gadget) driver for the AT91RM9200
board as it will not allow the board reset line to be pulled low if the
pullup is not driven as an open collector output as the boards are wired
to the USB connector on both the DK/EK.
This version of the patch updates it to 2.6.16-rc4.
Orignal patch by Jeff Warren.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Uli Luckas
This is a bugfix.
The comment in arch/arm/common/rtctime.c explains it:
* FIXME: for now, we just copy the alarm time because we're lazy (and
* is therefore buggy - setting a 10am alarm at 8pm will not result in
* the alarm triggering.)
This patch adds one day to the alarm iff the alarm wrapped beyond midnight and therefore appears to be in the past.
Signed-off-by: Uli Luckas <u.luckas@road-gmbh.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Alessandro Zummo
This patch adds support for the beeper
embedded in the NSLU2 to the machine setup code.
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Rod Whitby <rod@whitby.id.au>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Alessandro Zummo
The power button exit routine for the Linksys NSLU2 was not protected by
a machine_is_nslu2(). This patch fixes it.
Signed-off-by: Rod Whitby <rod@whitby.id.au>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Catalin Marinas
The initial code did not configure the inbound memory windows for direct
master access to the SDRAM. This patch creates a 1:1 mapping between the
Versatile/PB PCI memory windows and its SDRAM. Note that an updated FPGA
image is needed for Versatile/PB since the original windows were 1MB and
not able to cover the whole SDRAM (now extended to 256MB). The patch also
fixes the PCI IRQ mapping for slot #2.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Although you could ask the kernel for panic-on-oops, it remained
non-functional because the architecture specific code fragment had
not been implemented. Add it, so it works as advertised.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add PCI support for setting PCI from flat device tree on 85xx specifically for
MPC8540 ADS.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
According to the 'MPC8349E MDS Processor Board User Manual Rev. 1.6'
the size of the BCSR mapping is 32kb.
Signed-off-by: Horst Kronstorfer <hkronsto@frequentis.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The last argument of div_long_long_rem() must be long.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
A recent patch introduced cpu topology in sysfs. When you run a kernel
with SMP and sysfs enabled, you now get an Oops on boot. The following
patch fixes that by adding topology_init to arch/mips/kernel/smp.c. The
code is copied from arch/s390/kernel/smp.c.
Signed-off-by: Rojhalat Ibrahim <imr@rtschenk.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Fix the following compiler warnings:
CC arch/mips/sibyte/bcm1480/irq.o
arch/mips/sibyte/bcm1480/irq.c: In function ‘bcm1480_set_affinity’:
arch/mips/sibyte/bcm1480/irq.c:168: warning: ISO C90 forbids mixed declarations and code
arch/mips/sibyte/bcm1480/irq.c: In function ‘ack_bcm1480_irq’:
arch/mips/sibyte/bcm1480/irq.c:230: warning: ISO C90 forbids mixed declarations and code
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Originally found through an oops in the Gentoo N32 userland build; patch
based on original patch by Daniel Jacobwitz.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
do_signal has been changed to return void since the "return value is
ignored everywhere". Convert do_signal32 accordingly.
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>