Users that use kernel log filtering (e.g. via syslogd or a proprietry method)
wouldn't like to see warning prints that are not really warnings.
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_vm_area always returns an area with an adjacent guard page. That guard
page is included in vm_struct.size. iounmap uses vm_struct.size to
determine how much address space needs to have change_page_attr applied to
it, which will BUG if applied to the guard page.
This patch adds a helper function - get_vm_area_size() in linux/vmalloc.h -
to return the actual size of a vm area, and uses it to make iounmap do the
right thing. There are probably other places which should be using
get_vm_area_size().
Thanks to Dave Young <hidave.darkstar@gmail.com> for debugging the
problem.
[ Andi, it wasn't clear to me whether x86_64 needs the same fix. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
setup_pit_timer is declared in asm-i386/timer.h. Move it to the pit header
file, so it can be used by x86_64 as well.
Move also the PIT constants.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I fixed this in x86_64. Looks like the kind of thing that will break voyager
on i386.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the volatile in apic. We have a cpu_relax() in the wait loop. Fix a
coding style issue while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When NUMA emulation succeeds, acpi_numa needs to be set to -1 so that
srat_disabled() will always return true. We won't be calling
acpi_scan_nodes() or registering the true nodes we've found.
[hugh@veritas.com: Fix x86_64 CONFIG_NUMA_EMU build: acpi_numa needs CONFIG_ACPI_NUMA]
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e820_hole_size() now uses the newly extracted helper function,
e820_find_active_region(), to determine the size of usable RAM in a range of
PFN's.
This was previously broken because of two reasons:
- The start and end PFN's of each e820 entry were not properly rounded
prior to excluding those entries in the range, and
- Entries smaller than a page were not properly excluded from being
accumulated.
This resulted in emulated nodes being incorrectly mapped to ranges that
were completely reserved and not candidates for being registered as
active ranges.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G
RAM installed. when using kexec to load second kernel. In the second
kernel, when mem is allocated for GART, it will do the memset for clear, it
will cause restart, because some device still used that for dma. solution
will be:
in second kernel: disable that at first before we try to allocate mem for
it. or in the first kernel: do disable that before shutdown.
Andi/Eric/Alan prefer to second one for clean shutdown in first kernel.
Andi also point out need to consider to AGP enable but mem less 4G case
too.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This function is called via dma_ops->.., so change it to static
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace the pcspkr private PIT lock by the global PIT lock to serialize the
PIT access all over the place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During a VM oom condition, kill all threads in the process group.
We have had complaints where a threaded application is left in a bad state
after one of it's threads is killed when we hit a VM: out_of_memory condition.
Killing just one of the process threads can leave the application in a bad
state, whereas killing the entire process group would allow for the
application to restart, or otherwise handled, and makes it very obvious that
something has gone wrong.
This change allows the entire process group to be taken down, rather than just
the one thread.
Signed-off-by: Will <will_schmidt@vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some interrupt entry points are currently defined in i8259.c They probably
belong in a header. Right now, their only user is init_IRQ, justifying
their declaration in-file. But when virtualization comes in, we may be
interested in using that functions in late initializations.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are seeing corruption of the decompressed kernel. It is suspected that
this is platform specific as it has yet to be seen on any other x86. Move
the kernel to the 16MB boundary.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove unneeded test of task != NULL from
arch/i386/kernel/traps.c::dump_trace()
At the start of the function we have this test:
if (!task)
task = current;
so further down there's no need to test 'task'.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PAE is useful for more than supporting more than 4GB RAM. It supports
expanded swapspace and NX executable protections. Some users may want NX
or expanded swapspace support without the overhead or instability of
highmem. For these reasons, the following patch divorces CONFIG_X86_PAE
from CONFIG_HIGHMEM64G.
Cc: Mark Lord <lkml@rtr.ca>
Signed-off-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some systems have a HPET which is not incrementing, which leads to a
complete hang. Detect it during HPET setup.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following patch enables reboot through BIOS on the Dell Optiplex 745
Small Form Factor base, on which reboot hangs. The larger form factor does
not require this, hence the match on DMI_BOARD_NAME.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some systems the ACPI NVS area is located in the first 1 MB of RAM and
it is overwritten by the i386 code during the restore after hibernation.
This confuses the ACPI platform firmware that doesn't update the AC adapter
status appropriately as a result
(http://bugzilla.kernel.org/show_bug.cgi?id=7995).
The solution is to register the reserved memory in the first 1 MB as
'nosave', so that swsusp doesn't touch it during the restore. Also, this
has been done on x86_64 for a long time now, so this patch makes the i386
restore code behave like the x86_64 one.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix following warning:
WARNING: arch/i386/kernel/built-in.o(.init.text+0x3818): Section mismatch: reference to .exit.text:cache_remove_dev (between 'cacheinfo_cpu_callback' and 'cache_sysfs_init')
It points out that a function marked __cpuexit is calling a function marked
__cpuinit => oops.
The call happens only in an error-condition which may explain why we have
not seen it before.
The offending function was not used anywhere else - so marked it __cpuexit.
Note: This warning triggers only with a local copy of modpost
but that version will soon be pushed out.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pgd_{c,d}tor() can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the bitmap changes we can get rid of the unlocked versions of
calgary_unmap_sg and iommu_free. Fold __calgary_unmap_sg and
__iommu_free into their calgary_unmap_sg and iommu_free, respectively.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
there function are called via dma_ops->.., so change them to static
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the IOMMU table's lock protects both the bitmap and access
to the hardware's TCE table. Access to the TCE table is synchronized
through the bitmap; therefore, only hold the lock while modifying the
bitmap. This gives a yummy 10-15% reduction in CPU utilization for
netperf on a large SMP machine.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No actual code was harmed in the production of this patch.
Thanks to Andrew Morton <akpm@linux-foundation.org> for telling me
about checkpatch.pl.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cleanup unneeded macros used for register space address calculation.
Now we are using the EBDA to find the space address.
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This works around a bug where DMAs that have the same addresses as
some MEM regions do not go through. Not clear yet if this is due to a
mis-configuration or something deeper.
[akpm@linux-foundation.org: coding style fixlet]
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide seperate versions for Calgary and CalIOC2
Also print out the PCIe Root Complex Status on CalIOC2 errors
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CalIOC2 is a PCI-e implementation of the Calgary logic. Most of the
programming details are the same, but some differ, e.g., TCE cache
flush. This patch introduces CalIOC2 support - detection and various
support routines. It's not expected to work yet (but will with
follow-on patches).
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... in preparation for doing it differently for CalIOC2.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Calgary and CalIOC2 share most of the same logic. Introduce struct
cal_chipset_ops for quirks and tce flush logic which are
[akpm@linux-foundation.org: make calgary_chip_ops static]
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... will be used by CalIOC2 later
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Rise CPUs were only very short-lived, and there are no reports of
anyone both owning one and running Linux on it.
Googling for the printk string "CPU: Rise iDragon" didn't find any dmesg
available online.
If it turns out that against all expectations there are actually users
reverting this patch would be easy.
This patch will make the kernel images smaller by a few bytes for all
i386 users.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On x86_64 kernel, level triggered irq migration gets initiated in the
context of that interrupt(after executing the irq handler) and following
steps are followed to do the irq migration.
1. mask IOAPIC RTE entry; // write to IOAPIC RTE
2. EOI; // processor EOI write
3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and
// and interrupt vector due to per cpu vector
// allocation.
4. unmask IOAPIC RTE entry; // write to IOAPIC RTE
Because of the per cpu vector allocation in x86_64 kernels, when the irq
migrates to a different cpu, new vector(corresponding to the new cpu) will
get allocated.
An EOI write to local APIC has a side effect of generating an EOI write for
level trigger interrupts (normally this is a broadcast to all IOAPICs).
The EOI broadcast generated as a side effect of EOI write to processor may
be delayed while the other IOAPIC writes (step 3 and 4) can go through.
Normally, the EOI generated by local APIC for level trigger interrupt
contains vector number. The IOAPIC will take this vector number and search
the IOAPIC RTE entries for an entry with matching vector number and clear
the remote IRR bit (indicate EOI). However, if the vector number is
changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI
is received later. This will cause the remote IRR to get stuck causing the
interrupt hang (no more interrupt from this RTE).
Current x86_64 kernel assumes that remote IRR bit is cleared by the time
IOAPIC RTE is reprogrammed. Fix this assumption by checking for remote IRR
bit and if it still set, delay the irq migration to the next interrupt
arrival event(hopefully, next time remote IRR bit will get cleared before
the IOAPIC RTE is reprogrammed).
Initial analysis and patch from Nanhai.
Clean up patch from Suresh.
Rewritten to be less intrusive, and to contain a big fat comment by Eric.
[akpm@linux-foundation.org: fix comments]
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Nanhai Zou <nanhai.zou@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Keith Packard <keith.packard@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This helps to reduce the frequency at which the CPU must be taken out of a
lower-power state.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Tim Hockin <thockin@hockin.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (as921) adds code to the show_regs() routine in i386 and x86_64
to print the contents of the debug registers along with all the others.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Following section mismatch warnings were reported by Andrey Borzenkov:
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:amd_init_mtrr from .text between 'mtrr_bp_init' (at offset 0x967a) and 'mtrr_attrib_to_str'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:cyrix_init_mtrr from .text between 'mtrr_bp_init' (at offset 0x967f) and 'mtrr_attrib_to_str'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:centaur_init_mtrr from .text between 'mtrr_bp_init' (at offset 0x9684) and 'mtrr_attrib_to_str'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text: from .text between 'get_mtrr_state' (at offset 0xa735) and 'generic_get_mtrr'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text: from .text between 'get_mtrr_state' (at offset 0xa749) and 'generic_get_mtrr'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text: from .text between 'get_mtrr_state' (at offset 0xa770) and 'generic_get_mtrr'
It was tracked down to a few functions missing __init tag.
Compile tested only.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Background:
The MCE handler has several paths that it can take, depending on various
conditions of the MCE status and the value of the 'tolerant' knob. The
exact semantics are not well defined and the code is a bit twisty.
Description:
This patch makes the MCE handler's behavior more clear by documenting the
behavior for various 'tolerant' levels. It also fixes or enhances
several small things in the handler. Specifically:
* If RIPV is set it is not safe to restart, so set the 'no way out'
flag rather than the 'kill it' flag.
* Don't panic() on correctable MCEs.
* If the _OVER bit is set *and* the _UC bit is set (meaning possibly
dropped uncorrected errors), set the 'no way out' flag.
* Use EIPV for testing whether an app can be killed (SIGBUS) rather
than RIPV. According to docs, EIPV indicates that the error is
related to the IP, while RIPV simply means the IP is valid to
restart from.
* Don't clear the MCi_STATUS registers until after the panic() path.
This leaves the status bits set after the panic() so clever BIOSes
can find them (and dumb BIOSes can do nothing).
This patch also calls nonseekable_open() in mce_open (as suggested by akpm).
Result:
Tolerant levels behave almost identically to how they always have, but
not it's well defined. There's a slightly higher chance of panic()ing
when multiple errors happen (a good thing, IMHO). If you take an MBE and
panic(), the error status bits are not cleared.
Alternatives:
None.
Testing:
I used software to inject correctable and uncorrectable errors. With
tolerant = 3, the system usually survives. With tolerant = 2, the system
usually panic()s (PCC) but not always. With tolerant = 1, the system
always panic()s. When the system panic()s, the BIOS is able to detect
that the cause of death was an MC4. I was not able to reproduce the
case of a non-PCC error in userspace, with EIPV, with (tolerant < 3).
That will be rare at best.
Signed-off-by: Tim Hockin <thockin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Background:
/dev/mcelog is typically polled manually. This is less than optimal for
situations where accurate accounting of MCEs is important. Calling
poll() on /dev/mcelog does not work.
Description:
This patch adds support for poll() to /dev/mcelog. This results in
immediate wakeup of user apps whenever the poller finds MCEs. Because
the exception handler can not take any locks, it can not call the wakeup
itself. Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is
caught at the next return from interrupt or exit from idle, calling the
mce_user_notify() routine. This patch also disables the "fake panic"
path of the mce_panic(), because it results in printk()s in the exception
handler and crashy systems.
This patch also does some small cleanup for essentially unused variables,
and moves the user notification into the body of the poller, so it is
only called once per poll, rather than once per CPU.
Result:
Applications can now poll() on /dev/mcelog. When an error is logged
(whether through the poller or through an exception) the applications are
woken up promptly. This should not affect any previous behaviors. If no
MCEs are being logged, there is no overhead.
Alternatives:
I considered simply supporting poll() through the poller and not using
TIF_MCE_NOTIFY at all. However, the time between an uncorrectable error
happening and the user application being notified is *the*most* critical
window for us. Many uncorrectable errors can be logged to the network if
given a chance.
I also considered doing the MCE poll directly from the idle notifier, but
decided that was overkill.
Testing:
I used an error-injecting DIMM to create lots of correctable DRAM errors
and verified that my user app is woken up in sync with the polling interval.
I also used the northbridge to inject uncorrectable ECC errors, and
verified (printk() to the rescue) that the notify routine is called and the
user app does wake up. I built with PREEMPT on and off, and verified
that my machine survives MCEs.
[wli@holomorphy.com: build fix]
Signed-off-by: Tim Hockin <thockin@google.com>
Signed-off-by: William Irwin <bill.irwin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Background:
/dev/mcelog is a clear-on-read interface. It is currently possible for
multiple users to open and read() the device. Users are protected from
each other during any one read, but not across reads.
Description:
This patch adds support for O_EXCL to /dev/mcelog. If a user opens the
device with O_EXCL, no other user may open the device (EBUSY). Likewise,
any user that tries to open the device with O_EXCL while another user has
the device will fail (EBUSY).
Result:
Applications can get exclusive access to /dev/mcelog. Applications that
do not care will be unchanged.
Alternatives:
A simpler choice would be to only allow one open() at all, regardless of
O_EXCL.
Testing:
I wrote an application that opens /dev/mcelog with O_EXCL and observed
that any other app that tried to open /dev/mcelog would fail until the
exclusive app had closed the device.
Caveats:
None.
Signed-off-by: Tim Hockin <thockin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Insert the unclaimed MMCONFIG resources into the resource tree without the
IORESOURCE_BUSY flag during late initialization. This allows the MMCONFIG
regions to be visible in the iomem resource tree without interfering with
other system resources that were discovered during PCI initialization.
[akpm@linux-foundation.org: nanofixes]
Signed-off-by: Aaron Durbin <adurbin@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we are in the emulated NUMA case, we need to make sure that all existing
apicid_to_node mappings that point to real node ID's now point to the
equivalent fake node ID's.
If we simply iterate over all apicid_to_node[] members for each node, we risk
remapping an entry if it shares a node ID with a real node. Since apicid's
may not be consecutive, we're forced to create an automatic array of
apicid_to_node mappings and then copy it over once we have finished remapping
fake to real nodes.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For NUMA emulation, our SLIT should represent the true NUMA topology of the
system but our proximity domain to node ID mapping needs to reflect the
emulated state.
When NUMA emulation has successfully setup fake nodes on the system, a new
function, acpi_fake_nodes() is called. This function determines the proximity
domain (_PXM) for each true node found on the system. It then finds which
emulated nodes have been allocated on this true node as determined by its
starting address. The node ID to PXM mapping is changed so that each fake
node ID points to the PXM of the true node that it is located on.
If the machine failed to register a SLIT, then we assume there is no special
requirement for emulated node affinity so we use the default LOCAL_DISTANCE,
which is newly exported to this code, as our measurement if the emulated nodes
appear in the same PXM. Otherwise, we use REMOTE_DISTANCE.
PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we
can compare node_to_pxm() results in generic code (in this case, the SRAT
code).
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The logic in e820_find_active_regions() for determining the true active
regions for an e820 entry given a range of PFN's is needed for
e820_hole_size() as well.
e820_hole_size() is called from the NUMA emulation code to determine the
reserved area within an address range on a per-node basis. Its logic should
duplicate that of finding active regions in an e820 entry because these are
the only true ranges we may register anyway.
[akpm@linux-foundation.org: cleanup]
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds caching of pgds and puds, pmds, pte. That way we can avoid costly
zeroing and initialization of special mappings in the pgd.
A second quicklist is useful to separate out PGD handling. We can carry the
initialized pgds over to the next process needing them.
Also clean up the pgd_list handling to use regular list macros. There is no
need anymore to avoid the lru field.
Move the add/removal of the pgds to the pgdlist into the constructor /
destructor. That way the implementation is congruent with i386.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every file should include the headers containing the prototypes for its
global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Constrain __supported_pte_mask and NX handling to just the PAE kernel.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hence remove its handling in the opposite case.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
.. and adjust documentation to properly reflect options that are
x86-64 specific.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Consolidate the three 32-bit system call entry points so that they all
treat registers in similar ways.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove unused code and variables and do some codingstyle / whitespace
cleanups while at it.
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xtime can be initialized including the cmos update from the generic
timekeeping code. Remove the arch specific implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the generic cmos update function in kernel/time/ntp.c
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current SMI detection logic in read_hpet_tsc() makes sure,
that when a SMI happens between the read of the HPET counter and
the read of the TSC, this wrong value is used for TSC calibration.
This is not the intention of the function. The comparison must ensure,
that we do _NOT_ use such a value.
Fix the check to use calibration values where delta of the two TSC reads
is smaller than a reasonable threshold.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Intel PerfMon NMI watchdog reserves the first performance counter,
but uses the second one. Make it correctly reserve the second one.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is not fully softirq safe anyways.
Can't do a WARN_ON unfortunately because it could trigger in the
panic case.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With that an L3 cache is correctly reported in the cache information in /sys
With fixes from Andreas Herrmann and Dean Gaudet and Joachim Deguara
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This implements new vDSO for x86-64. The concept is similar
to the existing vDSOs on i386 and PPC. x86-64 has had static
vsyscalls before, but these are not flexible enough anymore.
A vDSO is a ELF shared library supplied by the kernel that is mapped into
user address space. The vDSO mapping is randomized for each process
for security reasons.
Doing this was needed for clock_gettime, because clock_gettime
always needs a syscall fallback and having one at a fixed
address would have made buffer overflow exploits too easy to write.
The vdso can be disabled with vdso=0
It currently includes a new gettimeofday implemention and optimized
clock_gettime(). The gettimeofday implementation is slightly faster
than the one in the old vsyscall. clock_gettime is significantly faster
than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
The new calls are generally faster than the old vsyscall.
Advantages over the old x86-64 vsyscalls:
- Extensible
- Randomized
- Cleaner
- Easier to virtualize (the old static address range previously causes
overhead e.g. for Xen because it has to create special page tables for it)
Weak points:
- glibc support still to be written
The VM interface is partly based on Ingo Molnar's i386 version.
Includes compile fix from Joachim Deguara
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler generally generates reasonable inline code for the simple
cases and for the rest it's better for code size for them to be out of line.
Also there they can be potentially optimized more in the future.
In fact they probably should be in a .S file because they're all pure
assembly, but that's for another day.
Also some code style cleanup on them while I was on it (this seems
to be the last untouched really early Linux code)
This saves ~12k text for a defconfig kernel with gcc 4.1.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In acpi_scan_nodes(), we immediately return -1 if acpi_numa <= 0, meaning
we haven't detected any underlying ACPI topology or we have explicitly
disabled its use from the command-line with numa=noacpi.
acpi_table_print_srat_entry() and acpi_table_parse_srat() are only
referenced within drivers/acpi/numa.c, so we can mark them as static and
remove their prototypes from the header file.
Likewise, pxm_to_node_map[] and node_to_pxm_map[] are only used within
drivers/acpi/numa.c, so we mark them as static and remove their externs
from the header file.
The automatic 'result' variable is unused in acpi_numa_init(), so it's
removed.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use LOCAL_DISTANCE and REMOTE_DISTANCE in x86_64 ACPI code
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linux 64bit only uses the IO-APIC ID as an internal cookie. In the future
there could be some cases where the IO-APIC IDs are not unique because
they share an 8 bit space with CPUs and if there are enough CPUs
it is difficult to get them that. But Linux needs the io apic ID
internally for its data structures. Assign unique IO APIC ids on
table parsing.
TBD do for 32bit too
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a bug introduced with the CLFLUSH changes: we must always flush pages
changed in cpa(), not just when they are reverted.
Reenable CLFLUSH usage with that now (it was temporarily disabled
for .22)
Add some BUG_ONs
Contains fixes from Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reading from the signal{1,2} files requires a spu_acquire_saved, so make these
files write-only for contexts created with SPU_CREATE_NOSCHED.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PS3 bootwrapper files use instructions only available on 64-bit CPUs.
Add the code generation directive '.machine "ppc64"' for toolchains
configured for 32-bit CPUs.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a FLASH ROM Storage Driver for the PS3:
- Implemented as a misc character device driver
- Uses a fixed 256 KiB buffer allocated from boot memory as the hypervisor
requires the writing of aligned 256 KiB blocks
Cc: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a BD/DVD/CD-ROM Storage Driver for the PS3:
- Implemented as a SCSI device driver
- Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
doesn't support scatter-gather
Cc: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a Disk Storage Driver for the PS3:
- Implemented as a block device driver with a dynamic major
- Disk names (and partitions) are of the format ps3d%c(%u)
- Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
doesn't support scatter-gather
Cc: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
allnoconfig results in this:
CC arch/powerpc/mm/tlb_32.o
In file included from include/asm/tlb.h:60,
from arch/powerpc/mm/tlb_32.c:30:
include/asm-generic/tlb.h: In function 'tlb_flush_mmu':
include/asm-generic/tlb.h:76: error: implicit declaration of function 'release_pages'
include/asm-generic/tlb.h: In function 'tlb_remove_page':
include/asm-generic/tlb.h:105: error: implicit declaration of function 'page_cache_release'
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WARNING: arch/i386/kernel/built-in.o(.text+0xb6a7): Section mismatch: reference to .init.text:find_num_cache_leaves (between 'init_intel_cacheinfo' and 'cache_shared_cpu_map_setup')
It could be __init_refok, but gcc >= 4.0 anyway inlines it into the
__cpuinit init_intel_cacheinfo(), and IMHO it's too small for "noinline
__init".
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch is necessary on one of my boxen, where programming the stop
sequence twice leads to PIT malfunction.
Sigh !
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
i386 and sparc64 have the identical code to update the cmos clock. Move it
into kernel/time/ntp.c as there are other architectures coming along with the
same requirements.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add some more debug information to the hrtimer and clock events code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to make sure, that the clockevent devices are resumed, before
the tick is resumed. The current resume logic does not guarantee this.
Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock
event devices before resuming the tick / oneshot functionality.
Fixup the existing users.
Thanks to Nigel Cunningham for tracking down a long standing thinko,
which affected the jinxed VAIO.
[akpm@linux-foundation.org: xen build fix]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix following warning:
WARNING: vmlinux.o(.text+0x35264): Section mismatch: reference to .init.text:__alloc_bootmem (between 'mdesc_bootmem_alloc' and 'mdesc_bootmem_free')
Rename mdesc_mem_ops to *_ops so modpost ignores __init references
and declare mdesc_bootmem_alloc __init since it is only used
during __init context.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following warning:
WARNING: vmlinux.o(.text+0x3cf50): Section mismatch: reference to .init.text:page_in_phys_avail (between 'pci_sun4v_pbm_init' and 'sun4v_pci_init')
pci_sun4v_pbm_init and sun4v_pci_init was only used under __init
context so declare them _init.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing sparc64 mini_rtc driver can handle CMOS based
rtcs trivially with just a few lines of code and the simplifies
things tremendously.
Tested on SB1500.
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_handle and dev_ino fields don't match up exactly to
the traditional IMAP_IGN and IMAP_INO masks.
So store them away in a table and look them up directly.
Signed-off-by: David S. Miller <davem@davemloft.net>
When booting up a control node it's quite common to
not be able to register several service types.
And likewise on guests at least one or two are going
to not be there.
Signed-off-by: David S. Miller <davem@davemloft.net>
The best scheme to get uniqueness seems to be:
FOO -- If node lacks "id" property
FOO-$(ID) -- If node has "id" but parent lacks "cfg-handle"
FOO-$(ID)-$(CFG_HANDLE) -- If node has both
Signed-off-by: David S. Miller <davem@davemloft.net>
The current scheme works on static interpretation of text names, which
is wrong.
The output-device setting, for example, must be resolved via an alias
or similar to a full path name to the console device.
Paths also contain an optional set of 'options', which starts with a
colon at the end of the path. The option area is used to specify
which of two serial ports ('a' or 'b') the path refers to when a
device node drives multiple ports. 'a' is assumed if the option
specification is missing.
This was caught by the UltraSPARC-T1 simulator. The 'output-device'
property was set to 'ttya' and we didn't pick upon the fact that this
is an OBP alias set to '/virtual-devices/console'. Instead we saw it
as the first serial console device, instead of the hypervisor console.
The infrastructure is now there to take advantage of this to resolve
the console correctly even in multi-head situations in fbcon too.
Thanks to Greg Onufer for the bug report.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-2.6.23' of master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6: (37 commits)
[CELL] spufs: rework list management and associated locking
[CELL] oprofile: add support to OProfile for profiling CELL BE SPUs
[CELL] oprofile: enable SPU switch notification to detect currently active SPU tasks
[CELL] spu_base: locking cleanup
[CELL] cell: indexing of SPUs based on firmware vicinity properties
[CELL] spufs: integration of SPE affinity with the scheduller
[CELL] cell: add placement computation for scheduling of affinity contexts
[CELL] spufs: extension of spu_create to support affinity definition
[CELL] cell: add hardcoded spu vicinity information for QS20
[CELL] cell: add vicinity information on spus
[CELL] cell: add per BE structure with info about its SPUs
[CELL] spufs: use find_first_bit() instead of sched_find_first_bit()
[CELL] spufs: remove unused file argument from spufs_run_spu()
[CELL] spufs: change decrementer restore timing
[CELL] spufs: dont halt decrementer at restore step 47
[CELL] spufs: limit saving MFC_CNTL bits
[CELL] spufs: fix read and write for decr_status file
[CELL] spufs: fix decr_status meanings
[CELL] spufs: remove needless context save/restore code
[CELL] spufs: fix array size of channel index
...
Currently, Linux doesn't generate correct page tables for ARMv6 and
later cores if the cache policy is different from the default one (it
may lead to strongly ordered or shared device mappings). This patch
disallows cache policies other than writeback and the
CPU_[ID]CACHE_DISABLE options only affect the CP15 system control
register rather than the page tables.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds the necessary ifdef's to the proc-v7.S code and
defines the v7wbi_tlb_fns macro in pgtable-nommu.h
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The auxiliary control and the L2 auxiliary control registers are
Cortex-A8 specific. They need to be removed from the generic ARMv7
support code.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
These patches add full SSP/MCU support for the HP Jornada 720
machine. Its needed to handle keyboard, touchscreen, battery
and backlight/lcd.
The main driver exports functions and the header file exports
the command values. When talking to the MCU the general procedure
is to start MCU, send command (using ssp_inout(command)), the
proper reply is always TXDUMMY. After receiving TXDUMMY you can
send the value you wish pushed (for example brightness level).
End with ssp_end() so the spinlock gets unlocked.
Drivers using this havent been implemented yet, but will shortly.
Signed-off-by: Kristoffer Ericson <Kristoffer.ericson@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
With this patch, Kconfig only selects CPU_HAS_ASID for the MMU
case. It also corrects the typo in the v6wbi_tlb_fns definition in
pgtable-nommu.h.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The __cpu_{clear|copy}_user_page functions are not defined for the
MMU-less case and therefore should not be exported.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
If not MMU and not v6K, access to the TLS register has to be
emulated. MMU-less systems do not provide a high page for kuser
helpers.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The background operations of the L2x0 cache controllers are aborted if
another operation is issued on the same or different core. This patch
protects the maintenance operation issuing/polling with a spinlock.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The 'i' and 'zi' targets short-circuit the dependencies for
'install' and 'zinstall' targets; these are useful for
installing the kernel on platforms which have make but no
compiler installed.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This is to avoid a compiler warning for overriding the built-in "putc"
function.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This sorts out the various lists and related locks in the spu code.
In detail:
- the per-node free_spus and active_list are gone. Instead struct spu
gained an alloc_state member telling whether the spu is free or not
- the per-node spus array is now locked by a per-node mutex, which
takes over from the global spu_lock and the per-node active_mutex
- the spu_alloc* and spu_free function are gone as the state change is
now done inline in the spufs code. This allows some more sharing of
code for the affinity vs normal case and more efficient locking
- some little refactoring in the affinity code for this locking scheme
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
From: Maynard Johnson <mpjohn@us.ibm.com>
This patch updates the existing arch/powerpc/oprofile/op_model_cell.c
to add in the SPU profiling capabilities. In addition, a 'cell' subdirectory
was added to arch/powerpc/oprofile to hold Cell-specific SPU profiling code.
Exports spu_set_profile_private_kref and spu_get_profile_private_kref which
are used by OProfile to store private profile information in spufs data
structures.
Also incorporated several fixes from other patches (rrn). Check pointer
returned from kzalloc. Eliminated unnecessary cast. Better error
handling and cleanup in the related area. 64-bit unsigned long parameter
was being demoted to 32-bit unsigned int and eventually promoted back to
unsigned long.
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Bob Nelson <rrnelson@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
From: Maynard Johnson <mpjohn@us.ibm.com>
This patch adds to the capability of spu_switch_event_register so that
the caller is also notified of currently active SPU tasks.
Exports spu_switch_event_register and spu_switch_event_unregister so
that OProfile can get access to the notifications provided.
Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Bob Nelson <rrnelson@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Sort out the locking mess in spu_base and document the current rules.
As an added benefit spu_alloc* and spu_free don't block anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch links spus according to their physical position using
information provided by the firmware through a special vicinity
device-tree property. This property is present in current version
of Malta firmware.
Example of vicinity properties for a node in Malta:
Node: Vicinity property contains phandles of:
spe@0 [ spe@100000 , mic-tm@50a000 ]
spe@100000 [ spe@0 , spe@200000 ]
spe@200000 [ spe@100000 , spe@300000 ]
spe@300000 [ spe@200000 , bif0@512000 ]
spe@80000 [ spe@180000 , mic-tm@50a000 ]
spe@180000 [ spe@80000 , spe@280000 ]
spe@280000 [ spe@180000 , spe@380000 ]
spe@380000 [ spe@280000 , bif0@512000 ]
Only spe@* have a vicinity property (e.g., bif0@512000 and
mic-tm@50a000 do not have it).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch makes the scheduller honor affinity information for each
context being scheduled. If the context has no affinity information,
behaviour is unchanged. If there are affinity information, context is
schedulled to be run on the exact spu recommended by the affinity
placement algorithm.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch provides the spu affinity placement logic for the spufs scheduler.
Each time a gang is going to be scheduled, the placement of a reference
context is defined. The placement of all other contexts with affinity from
the gang is defined based on this reference context location and on a
precomputed displacement offset.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds support for additional flags at spu_create, which relate
to the establishment of affinity between contexts and contexts to memory.
A fourth, optional, parameter is supported. This parameter represent
a affinity neighbor of the context being created, and is used when defining
SPU-SPU affinity.
Affinity is represented as a doubly linked list of spu_contexts.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch allows the use of spu affinity on QS20, whose
original FW does not provide affinity information.
This is done through two hardcoded arrays, and by reading the reg
property from each spu.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds affinity data to each spu instance.
A doubly linked list is created, meant to connect the spus
in the physical order they are placed in the BE. SPUs
near to memory should be marked as having memory affinity.
Adjustments of the fields acording to FW properties is done
in separate patches, one for CPBW, one for Malta (patch for
Malta under testing).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Addition of a spufs-global "cbe_info" array. Each entry contains information
about one Cell/B.E. node, namelly:
* list of spus (both free and busy spus are in this list);
* list of free spus (replacing the static spu_list from spu_base.c)
* number of spus;
* number of reserved (non scheduleable) spus.
SPE affinity implementation actually requires only access to one spu per
BE node (since it implements its own pointer to walk through the other spus
of the ring) and the number of scheduleable spus (n_spus - non_sched_spus)
However having this more general structure can be useful for other
functionalities, concentrating per-cbe statistics / data.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
spu_sched->bitmap has MAX_PRIO(=140) width in bits.However, since
ff80a77f20, sched_find_first_bit()
only supports 100-bit bitmaps.
Thus, spu_sched->bitmap should be treated by generic find_first_bit().
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
From: Sebastian Siewior <cbe-oss-dev@ml.breakpoint.cc>
The 'file' argument is unused in spufs_run_spu(). This change removes
it.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The SPU decrementer should be restored after the LSCSA DMA has
completed.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
No need to halt the SPE decrementer at context restore step 47, it will
be done in step 7.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
At save step 8, the mfc control register in the CSA should be written
_only_ with Sc and Sm bits (at least MFC_CNTL[Dh] should be set to 0)
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The decr_status in the LSCSA is valid only in the sequence of context
restore. Thus, it's nonsense to read and/or write it through spufs.
This patch changes decr_status node to access MFC_CNTL[Ds] in the CSA.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The decr_status in the LSCSA is confusedly used as two meanings:
* SPU decrementer was running
* SPU decrementer was wrapped as a result of adjust
and the code to set decr_status is missing.
This patch fixes these problems by using the decr_status argument as a
set of flags. This requires a rebuild of the shipped spu_restore code.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The following steps are not needed in the SPE context save/restore
paths:
Save Step 12: save_mfc_decr()
save suspend_time to CSA (It will be done by step 14)
save ch 7 (decrementer value will be saved in LSCSA by spe-side step 10)
Restore Step 59: restore_ch_part1()
restore ch 1 (it will be done by spe-side step 15)
This change removes the unnecessary steps.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Based on a fix from Masato Noguchi <Masato.Noguchi@jp.sony.com>.
Remove the (incorrect) array size declarations in the spufs channel
arrays, and use ARRAY_SIZE rather than hardcoded values.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Currently a process is removed from the physical spu when spu_acquire_saved
is saved but never put back. This patch adds a new spu_release_saved
that is to be paired with spu_acquire_saved and put the process back if
it has been in RUNNABLE state before.
Niether Jeremy not be are entirely happy about this exact patch because
it adds another spu_activate call outside of the owner thread, but I
feel this is the best short-term fix we can come up with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch exports per-context statistics in spufs as long as spu
statistics in sysfs.
It was formed by merging:
"spufs: add spu stats in sysfs" From: Christoph Hellwig
"spufs: add stat file to spufs" From: Christoph Hellwig
"spufs: fix libassist accounting" From: Jeremy Kerr
"spusched: fix spu utilization statistics" From: Luke Browning
And some adjustments by myself, after suggestions on cbe-oss-dev.
Having separate patches was making the review process harder
than it should, as we end up integrating spus and ctx statistics
accounting much more than it was on the first implementation.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
In 6cbf93960e64f313f6e247cbca7afaa50e3ee2c we added a WARN_ON for
calling spu_deactivate on contexts created with the SPU_CREATE_NOSCHED
flag. However, all NOSCHED contexts will need to be deactivated when
the context is destroyed, so this gives a spurious warning when any
NOSCHED context is closed.
This change removes the WARN_ON.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Reading from the signal{1,2} files requires a spu_acquire_saved, so
make these files write-only for contexts created with
SPU_CREATE_NOSCHED.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The current SPU context saving procedure in SPUFS unexpectedly
restarts MFC when halting decrementer, because MFC_CNTL[Dh] is set
without MFC_CNTL[Sm]. This bug causes, for example, saving broken DMA
queues. Here is a patch to fix the problem.
Signed-off-by: Kazunori Asayama <asayama@sm.sony.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
WARNING: arch/powerpc/platforms/cell/spufs/spufs.o(.init.text+0x158): Section
mismatch: reference to .exit.text:.spu_sched_exit (between '.init_module' and
'.spu_sched_init')
was introduced by c99c1994a2
This patch removes the warning.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds support for the setup and decoding of MSIs
on Axon-based Cell systems, using the MSIC mechanism.
This involves setting up an area of BE memory which the Axon
then uses as a FIFO for MSI messages. When one or more MSIs
are decoded by the MSIC we receive an interrupt on the MPIC,
and the MSI messages are written into the FIFO. At the moment
we use a 64KB FIFO, one per MSIC/BE.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds support for investigating spus information after a
kernel crash event, through kdump vmcore file.
Implementation is based on xmon code, but the new functionality was
kept independent from xmon.
Signed-off-by: Lucio Jose Herculano Correia <luciojhc@br.ibm.com>
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The Axon bridge chip used on new Cell/B.E. based blade servers
comes with a DDR2 memory controller that can be used to
attach cheap memory modules, as opposed to the high-speed
XDR memory that is used by the CPU itself.
Since the memory controller does not participate in the
cache coherency protocol, we can not use the memory direcly
for Linux applications, but by providing a block device
it can be used for swap space, temporary file storage and
through the use of the direct_access block device operation
for mapping into user addresses, when it is mounted with
an appropriate file system.
Signed-off-by: Maxim Shchetynin <maxim@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The platforms missing the "cpus" property in the "be" node are mono-Cell
platforms such as CAB or Getaway.
Therefore it is possible to assume that if there is no "cpus" properties
under the "be" node then we can safely return the "device node" without
more checking. This is a bit hacky but ... it allows it to work on
these platforms.
Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Acked-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Previous patch changed based on Christian Krafft's comment.
On some legacy SLOF tree the generic code is unable to ioremap some Cell BE
registers. Therefore the "generic" functions are returning a NULL pointer,
triggering a crash on such platforms.
Let's handle this more gracefully.
Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Acked-by: Christian Kraff <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Previous patch changed based on Christian Krafft's comment.
On some legacy SLOF tree the generic code is unable to ioremap some Cell BE
registers. Therefore the "generic" functions are returning a NULL pointer,
triggering a crash on such platforms.
Let's handle this more gracefully.
Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net>
Acked-by: Christian Kraff <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch reorganizes the code of the driver into three files.
Two cbe_cpufreq_pmi.c and cbe_cpufreq_pervasive.c care about hardware.
cbe_cpufreq.c contains the logic.
There is no changed behaviour, except that the PMI related function
is now located in a seperate module cbe_cpufreq_pmi. This module
will be required by cbe_cpufreq, if CONFIG_CBE_CPUFREQ_PMI has been set.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Minor issues have been fixed:
* added a missing call to of_node_put()
* signedness of a function parameter
* added some line breaks
* changed global pmi_frequency_limit to a
per node pmi_slow_mode_limit array
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch fixes the initialization of the cbe_cpufreq driver.
The code that initializes the PMI related functions was called per cpu:
* registering cpufreq notifier block
* registering a pmi handler
This ends in a bug that the notifier block gets called in an endless loop.
The initialization code is being put to the
module init code path by this patch. This way it only gets called once.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch fixes the debug code that calculates the transition time when
changing the slow modes on a Cell BE cpu.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The pmi driver got simplified by removing support for multiple devices.
As there is no more than one pmi device per maschine, there is no need to
specify the device for listening and sending messages.
This way the caller (cbe_cpufreq) doesn't need to scan the device tree.
When registering the handler on a board without a pmi
interface, pmi.c will just return -ENODEV.
The patch that fixed the breakage of cell_defconfig has been
broken out of the earlier version of this patch. So this is
the version that applies cleanly on top of it.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Prevent people from directly including <asm/rwsem.h>.
[IA64] remove time interpolator
[IA64] Convert to generic timekeeping/clocksource
[IA64] refresh some config files for 64K pagesize
[IA64] Delete iosapic_free_rte()
[IA64] fallocate system call
[IA64] Enable percpu vector domain for IA64_DIG
[IA64] Enable percpu vector domain for IA64_GENERIC
[IA64] Support irq migration across domain
[IA64] Add support for vector domain
[IA64] Add mapping table between irq and vector
[IA64] Check if irq is sharable
[IA64] Fix invalid irq vector assumption for iosapic
[IA64] Use dynamic irq for iosapic interrupts
[IA64] Use per iosapic lock for indirect iosapic register access
[IA64] Cleanup lock order in iosapic_register_intr
[IA64] Remove duplicated members in iosapic_rte_info
[IA64] Remove block structure for locking in iosapic.c
This is a merge of Peter Keilty's initial patch (which was
revived by Bob Picco) for this with Hidetoshi Seto's fixes
and scaling improvements.
Acked-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The commit d815461c7a in linus tree
converts the rtc-rs5c372 driver to a "new style" i2c driver.
Like commit c00593f6f8, this patch
register the rtc i2c device for the em7210 board.
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
1. split pxa_cpu_suspend to pxa25x_cpu_suspend and pxa27x_cpu_suspend
and make pxa25x_cpu_pm_enter() and pxa27x_cpu_pm_enter() to invoke
the corresponding _suspend functions, thus remove all those ugly
#ifdef .. #endif out of sleep.S
2. move the declarations of those suspend functions to pm.h
note: this is not a clean enough solution until all the pxa25x and
pxa27x specific part is further removed out of sleep.S, sleep.S is
supposed to contain generic code only
Signed-off-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
1. introduce a structure pxa_cpu_pm_fns for pxa25x/pxa27x specific
operations as follows:
struct pxa_cpu_pm_fns {
int save_size;
void (*save)(unsigned long *);
void (*restore)(unsigned long *);
int (*valid)(suspend_state_t state);
void (*enter)(suspend_state_t state);
}
2. processor specific registers saving and restoring are performed
by calling the corresponding (*save) and (*restore)
3. pxa_cpu_pm_fns->save_size should be initialized to the required
size for processor specific registers saving, the allocated
memory address will be passed to (*save) and (*restore)
memory allocation happens early in pxa_pm_init(), and save_size
should be assigned prior to this (which is usually true, since
pxa_pm_init() happens in device_initcall()
4. there're some redundancies for those SLEEP_SAVE_XXX and related
macros, will be fixed later, one way possible is for the system
devices to handle the specific registers saving and restoring
Signed-off-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/ofcons:
Create drivers/of/platform.c
Create linux/of_platorm.h
[SPARC/64] Rename some functions like PowerPC
Begin consolidation of of_device.h
Begin to consolidate of_device.c
Consolidate of_find_node_by routines
Consolidate of_get_next_child
Consolidate of_get_parent
Consolidate of_find_property
Consolidate of_device_is_compatible
Start split out of common open firmware code
Split out common parts of prom.h
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (26 commits)
sh: intc - add support for SH7750 and its variants
sh: Move entry point code to .text.head.
sh: heartbeat: Shut up resource size warning.
sh: update r2d defconfig and fix SH7751R pci compliation
sh: Many symbol exports for nommu allmodconfig.
sh: zero terminate 8250 platform data for r2d board
sh: cpufreq: Fix up the build for SH-2.
sh: Make on-chip DMA channel selection explicit.
sh: Fix up CPU dependencies for on-chip DMAC.
sh: cpufreq: clock framework support.
sh: Support rate rounding for SH7722 FRQCR clocks.
sh: Implement clk_round_rate() in the clock framework.
sh: Fix up PCI section mismatch warnings.
sh: Wire up fallocate() syscall.
sh: intc - add support for 7780
sh: intc - improve group support
sh: Fix up SH-3 and SH-4 driver dependencies.
sh: push-switch: Correct license string.
sh: cpufreq: Fix driver dependencies and flag as broken.
sh: IPR/INTC2 IRQ setup consolidation.
...
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh64-2.6:
sh64: Flag sh64_get_page() as __init_refok.
sh64: Move entry point code to .text.head.
sh64: Fix up PCI section mismatch warnings.
sh64: Update cayman defconfig.
sh64: Wire up fallocate() syscall.
Update arm to use bitwise types for its VM_FAULT_ constants.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reformat show_cpuinfo() to be consistent with normal coding style
(and rest of this file).
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every file should include the headers containing the prototypes for
its global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the nice ideas behind paravirt is that CONFIG_XEN=y can be included
in a standard configuration and be no worse for native booting than as a
Xen guest. The glibc feature that supports the vDSO "nosegneg" note is
designed specifically to make this easy. You just have to flip one bit at
boot time. This patch makes Xen flip the bit, so a CONFIG_XEN=y kernel on
bare hardware does not make glibc use the less-optimized library builds.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Fix two year old bug in early bootup asm.
[SPARC64]: Update defconfig.
[SPARC64]: Fix log message type in vio_create_one().
[SPARC64]: Tweak assertions in sun4v_build_virq().
[SPARC64]: Tweak kernel log messages in power_probe().
[SPARC64]: Fix handling of multiple vdc-port nodes.
[SPARC64]: Fix device type matching in VIO's devspec_show().
[SPARC64]: Fix MODULE_DEVICE_TABLE() specification in VDC and VNET.
[SPARC]: Add sys_fallocate() entries.
[SPARC64]: Use orderly_poweroff().
anything that wants working dma-mapping won't work
parport_pc won't work on m68k unless we have ISA
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
i.e. tell modpost that entry point code (that has to be outside
of .init.text for external reasons) is OK to refer to .init.*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch converts the cpu specific 7750 setup code to use the
new intc controller. Many new vectors are added and multiple
processor variants including 7091, 7750, 7750s, 7750r, 7751 and
7751r should all have the correct vectors hooked up.
IRLM interrupts can be enabled using ipr_irq_enable_irlm() which
now is marked as __init.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The HRM states that is must be done this way ...
Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
I changed the naming to be more obvious---unfortunately the HRM
doesn't specify these.
Moreover the numbering is changed to be zero indexed as this is more
natural.
Adjust all callers.
Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
make ns9xxx_ack_irq_functions static and add one include to get declarations
for ns9xxx_map_io and ns9xxx_init_machine.
Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
I must have written the patch introducing support for this machine
deep in the night...
Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
1. for common devices across all the pxa variants, the names
are changed to be:
"pxa_device_xxx"
2. for pxa25x or pxa27x specific devices, the names are
changed to be:
"pxa25x_device_xxx", or
"pxa27x_device_xxx"
Signed-off-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
I believe that the following patch is necessary to properly configure
GPIO line configuration for IRQ's which are mapped to a GPIO line >= 8
(without this patch the wrong GPIO is configured as an input.)
Signed-off-by: Tim Harvey <tim_harvey@yahoo.com>
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
sh64_get_page() wraps in to regular allocators as well as the
bootmem allocator for fetching pages, it carefully checks to
see which one it can use depending on the system state, so
the access is safe.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Convert the AT91RM9200 platform-setup code to use the new atmel_spi
driver (and manually-driven chip-selects), instead of the legacy
AT91-only SPI stack.
The AT91SAM9 processors are already using the atmel_spi driver.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Follow Al Viro's m68k change from l-k:
i.e. tell modpost that entry point code (that has to be outside
of .init.text for external reasons) is OK to refer to .init.*
Shuts up some section mismatch warnings from modpost.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
In order for this driver to be shared across the iop architectures the
iop3xx and iop13xx header files are modified to present a common interface
for the iop_wdt driver.
Details:
* iop13xx supports disabling the timer while iop3xx does not. This requires
a few 'compatibility' definitions in include/asm-arm/hardware/iop3xx.h to
preclude adding #ifdef CONFIG_ARCH_IOP13XX blocks to the driver code.
* The heartbeat interval is derived from the internal bus clock rate, so this
this patch also exports the tick rate to the iop_wdt driver.
Cc: Curt Bruns <curt.e.bruns@intel.com>
Cc: Peter Milne <peter.milne@d-tacq.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds the basic support for the em7210 board. It is similar to
the iq31244 board and can be found on Intel "Baxter Creek" ss4000e nas.
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch updates the r2d board support in a few ways:
- CPU_SUBTYPE_SH7751R is selected in the defconfig to play well
with the r2d board Kconfig entry. Without this the defconfig
results in no board enabled.
- Enable EARLY_PRINTK.
- Enable SH_STANDARD_BIOS
- this works well for early printk on the r2d board.
- Add "earlyprink=bios" to the cmdline for early serial port
output by default.
- CONFIG_SUBTYPE_SH7751R support is added to the sh-specific
pci makefile.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
allmodconfig generates a lot of interesting code, a lot of the
generated symbols we've never exported before, so this fixes
those up. Verified with both GCC3 and GCC4 toolchains.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
struct plat_serial8250_port should contain a terminating zero entry
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Currently this has a prompt to allow users to change it. There's
no reason to do this, and it has caused breakage and confusion
in the past, so remove it entirely.
We'll get rid of this when the whole driver is tidied for
the driver model.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
We try to fetch the CIF entry pointer from %o4, but that
can get clobbered by the early OBP calls. It is saved
in %l7 already, so actually this "mov %o4, %l7" can just
be completely removed with no other changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
This gets the SH cpufreq working again. We follow the changes
in the AVR32 implementation for wrapping in to the clock framework.
CPUs that wish to use this are required to define rate rounding
primitives in order to satisfy clk_round_rate().
This works well enough for the common case, though we should
look at unifying this driver across all of the platforms that
implement clock framework support in one capacity or another.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Now that the round_rate() op is supported, hook it up on SH7722
for the FRQCR (CPU, PCLK, etc.) clocks.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This is an optional component of the clock framework. However,
as we're going to be using this in the cpufreq drivers, add
support for it to the framework.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The "id" property in vdc-port nodes are not unique, they
are all zero. Therefore assign ID's using the parent's
"cfg-handle" property which will be unique.
Signed-off-by: David S. Miller <davem@davemloft.net>
with the recent renames, we forgot to update the matches for
devspec. This is required to keep udev working and autoload modules.
Signed-off-by: David S. Miller <davem@davemloft.net>
and populate it with the common parts from PowerPC and Sparc[64].
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This is to make the of merge easier. Also rename of_bus_type.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
This moves all the common parts for the Sparc, Sparc64 and PowerPC
of_device.c files into drivers/of/device.c.
Apart from the simple move, Sparc gains of_match_node() and a call to
of_node_put in of_release_dev(). PowerPC gains better recovery if
device_create_file() fails in of_device_register().
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This consolidates the routines of_find_node_by_path, of_find_node_by_name,
of_find_node_by_type and of_find_compatible_device. Again, the comparison
of strings are done differently by Sparc and PowerPC and also these add
read_locks around the iterations.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This adds a read_lock around the child/next accesses on Sparc.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This requires creating dummy of_node_{get,put} routines for sparc and
sparc64. It also adds a read_lock around the parent accesses.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
The only change here is that a readlock is taken while the property list
is being traversed on Sparc where it was not taken previously.
Also, Sparc uses strcasecmp to compare property names while PowerPC
uses strcmp.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
The only difference here is that Sparc uses strncmp to match compatibility
names while PowerPC uses strncasecmp.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This creates drivers/of/base.c (depending on CONFIG_OF) and puts
the first trivially common bits from the prom.c files into it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This patch converts the cpu specific 7780 setup code to use the
new intc controller. Many new vectors are added and also support for
external interrupt sense configuration. So with this patch it is now
possible to configure external interrupt pins as edge or level
triggered using set_irq_type().
No external interrupts are registered by default.
Use plat_irq_setup_pins() to select between IRQ or IRL mode.
This patch also fixes the Alarm IRQ for the RTC.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch improves intc group support, ie it makes it possible to
group interrupts together and mask / unmask the entire group. This
also works with priorities, so setting a priority for an entire group
is also possible. This patch is needed to properly support certain
processors such as the 7780.
Fixes for NULL pointers in DECLARE_INTC_DESC() are also included.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This was accidentally set as "GPLv2", whereas the kernel expects v2
to be written "GPL v2", this caused complaints regarding the use
of the platform device APIs when built as a module.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This is only supported on SH-4, so don't expose it for the other
CPUs. Additionally, it's suffered some bitrot, so add a BROKEN
dependency as well until we fix it up.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch unifies the cpu specific interrupt setup functions for
interrupt controller blocks such as ipr, intc2 and intc. There is no
point in having separate functions for each interrupt controller, so
let's clean this up.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch cleans up solution engine 7722 specific interrupt code.
The main purpose is to replace the mux function with use of
set_irq_chained_handler() and replace hard coded register poking
code with set_irq_type(). The board specific interrupts are also
moved to start from SE7722_FPGA_IRQ_BASE.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch converts the cpu specific 7722 setup code to use the
new intc controller. Many new vectors are added and also support
for external interrupt sense configuration. So with this patch
it is now possible to configure external interrupt pins as edge
or level triggered using set_irq_type().
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This is the second version of the shared interrupt controller patch
for the sh architecture, fixing up handling of intc_reg_fns[].
The three main advantages with this controller over the existing
ones are:
- Both priority (ipr) and bitmap (intc2) registers are
supported
- External pin sense configuration is supported, ie edge
vs level triggered
- CPU/Board specific code maps 1:1 with datasheet for
easy verification
This controller can easily coexist with the current IPR and INTC2
controllers, but the idea is that CPUs/Boards should be moved over
to this controller over time so we have a single code base to
maintain.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch contains two serial port related fixes for sh7722:
- Make sure the irqs for the first serial port is correct
- Add the second and third serial port to the platform data
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Kill off the hd64461 io.c, as all of the hd64461 users are now
using the generic I/O routines.
[ hd64461/ moved to hd64461.c by Paul ]
Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This wires up the platform devices for the USB expansion boards for
the Highlander boards.
Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Change in 'kbuild: do section mismatch check on full vmlinux'
should've been replicated in arch/um/Makefile.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Update arch/ia64/defconfig: select 64K pagesize
Same for arch/ia64/configs/tiger_defconfig + CONFIG_COMPAT=n
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently, CONFIG_X86_CMPXCHG64 both enables boot-time checking of
the cmpxchg64b feature and enables compilation of the set_64bit() family.
Since the option is dependent on PAE, and since KVM depends on set_64bit(),
this effectively disables KVM on i386 nopae.
Simplify by removing the config option altogether: the boot check is made
dependent on CONFIG_X86_PAE directly, and the set_64bit() family is exposed
without constraints. It is up to users to check for the feature flag (KVM
does not as virtualiation extensions imply its existence).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>