Allow stack growth so the 'enter' instruction works. Also
fixes problem in compat_sys_kexec_load() which could allocate
more than 128 bytes using compat_alloc_user_space().
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Use tail call from clear_user to __clear_user to save some code size
- Use standard memcpy for forward memmove
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Only exports for assembler files are left in x8664_ksyms.c
Originally inspired by a patch from Al Viro
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
x86_64 and i386 behave inconsistently when sending an IPI on vector 2
(NMI_VECTOR). Make both behave the same, so IPI 2 is sent as NMI.
The crash code was abusing send_IPI_allbutself() by passing a code
instead of a vector, it only worked because crash knew about the
internal code of send_IPI_allbutself(). Change crash to use NMI_VECTOR
instead, and remove the comment about how crash was abusing the function.
This patch is a pre-requisite for fixing the problem where sending an
IPI as NMI would reboot some Dell Xeon systems. I cannot fix that
problem while crash continus to abuse send_IPI_allbutself().
It also removes the inconsistency between i386 and x86_64 for
NMI_VECTOR. That will simplify all the RAS code that needs to bring
all the cpus to a clean stop, even when one or more cpus are spinning
disabled.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It turned out that the following change is needed when the speaker is
compiled as a module.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the no longer used sys32_ni_syscall()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently CONFIG_REORDER uses -ffunction-sections for all code;
however, creating a separate section for each function is not useful
for modules and just adds bloat. Moving this option from CFLAGS to
CFLAGS_KERNEL shrinks module object files (e.g., the module tree for a
kernel built with most drivers as modules shrinked from 54M to 46M),
and decreases the number of sysfs files in /sys/module/*/sections/
directories.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Defaulting to a value not evenly divisible by four makes little sense,
as four values are displayed per line (and hence the rest of the line
would otherwise be wasted).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With (significantly) more than 10 CPUs online, the column headings
drifted off the positions of the column contents with growing CPU
numbers.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The APIC ID returned by hard_smp_processor_id can be beyond
NR_CPUS and then overflow the x86_cpu_to_apic[] array.
Add a check for overflow. If it happens then the slow loop below
will catch.
Bug pointed out by Doug Thompson
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Getting phys_proc_id and cpu_core_id information to be printed at boot
time for AMD processors. Also matching the Node related boot time
information that gets printed for Intel and AMD processors for NUMA
configurations.
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
During some profiling I noticed that default_idle causes a lot of
memory traffic. I think that is caused by the atomic operations
to clear/set the polling flag in thread_info. There is actually
no reason to make this atomic - only the idle thread does it
to itself, other CPUs only read it. So I moved it into ti->status.
Converted i386/x86-64/ia64 for now because that was the easiest
way to fix ACPI which also manipulates these flags in its idle
function.
Cc: Nick Piggin <npiggin@novell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
No red zone possible/needed on the alternative stack.
It caused confusion.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- fix an off-by-one error in phys_pmd_init()
- prevent phys_pmd_init() from removing mappings established earlier
- fix the direct mapping early printk to in fact show the end of the range
- remove an apparently orphan comment
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up mce_amd.c for readability and remove code no
longer needed.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for mce threshold registers found in future
AMD family 0x10 processors. Backwards compatible with
family 0xF hardware.
AK: fixed build on !SMP
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for extended APIC LVT found in future AMD processors.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After writing the CFG register, the first value written to the T0_CMP
register is the value at which next interrupt should be triggered, every
value after that sets the period of the interrupt. For that reason, the code
needs to write the value twice - to set both the phase and period.
[AK: I had already figured it out by myself, but it's still useful
to have a comment for this.]
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes use of the newly added conversion constants
in time.h to x86-64 time.c. The code gets significantly easier
to understand.
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove #ifdefed code to manually enable HPET on AMD8111, where the
BIOS doesn't have ACPI HPET tables and doesn't enable it for us.
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the X86_FEATURE_RDTSCP #define, so that kernel code can
check for the feature easily and also fixes the location of the "rdtscp"
string in the cpuinfo tables.
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rename oem_force_hpet_timer to apic_is_clustered_box, to give the
function a better fitting name - it really isn't at all about HPET.
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Most of the fields of cpuinfo are defined in cpuinfo_x86 structure.
This patch moves the phys_proc_id and cpu_core_id for each processor to
cpuinfo_x86 structure as well.
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch hooks Calgary into the build, the x86-64 IOMMU
initialization paths, and introduces the Calgary specific bits. The
implementation draws inspiration from both PPC (which has support for
the same chip but requires firmware support which we don't have on
x86-64) and gart. Calgary is different from gart in that it support a
translation table per PHB, as opposed to the single gart aperture.
Changes from previous version:
* Addition of boot-time disablement for bus-level translation/isolation
(e.g, enable userspace DMA for things like X)
* Usage of newer IOMMU abstraction functions
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch creates a new interface for IOMMUs by adding a centralized
location for IOMMU allocation (for translation tables/apertures) and
IOMMU initialization. In creating these, code was moved around for
abstraction, uniformity, and consiceness.
Take note of the move of the iommu_setup bootarg parsing code to
__setup. This is enabled by moving back the location of the aperture
allocation/detection to mem init (which while ugly, was already the
location of the swiotlb_init).
While a slight departure from the previous patch, I belive this provides
the true intention of the previous versions of the patch which changed
this code. It also makes the addition of the upcoming calgary code much
cleaner than previous patches.
[AK: Removed one broken change. iommu_setup still has to be called
early]
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
swiotlb relies on the gart specific iommu_aperture variable to know if
we discovered a hardware IOMMU before swiotlb initialization. Introduce
iommu_detected to do the same thing, but in a HW IOMMU neutral manner,
in preparation for adding the Calgary HW IOMMU.
Signed-Off-By: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-Off-By: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pud_offset_k() equivalent to pud_offset() now. Pointed out by Jan Beulich
Similar for __pud_offset_ok, which needs a small change in the callers.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up arch/{i386,x86_64}/boot/compressed/misc.c a bit to reduce their
differences. Should have zero effect on code generation.
Signed-off-by: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If no unwinding is possible at all for a certain exception instance,
fall back to the old style call trace instead of not showing any trace
at all.
Also, allow setting the stack trace mode at the command line.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adjust the CFA offset for 64- and 32-bit syscall entries so that the five
slots pre-subtracted from the stack pointer do not appear to reside outside
of the current frame.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the switching to/from the IRQ stack so that unwind annotations can
be added for it without requiring CFA expressions.
AK: I cleaned it up a bit, making it unconditional and removing the
obsolete DEBUG_INFO full frame code.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are the x86_64-specific pieces to enable reliable stack traces. The
only restriction with this is that it currently cannot unwind across the
interrupt->normal stack boundary, as that transition is lacking proper
annotation.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In x86_64 architecture, if device driver with msi function
gets 0xee vector by assign_irq_vector() function, system will
crash if this interrupt happens. It is because 0xee interrupt
entry is empty. This patch modifies this. This patch is based
on 2.6.17-rc6.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Rename the GART_IOMMU option to IOMMU to make clear it's not
just for AMD
- Rewrite the help text to better emphatise this fact
- Make it an embedded option because too many people get it wrong.
To my astonishment I discovered the aacraid driver tests this
symbol directly. This looks quite broken to me - it's an internal
implementation detail of the PCI DMA API. Can the maintainer
please clarify what this test was intended to do?
Cc: linux-scsi@vger.kernel.org
Cc: alan@redhat.com
Cc: markh@osdl.org
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previously it would only work in the first 32bit system call, not during
early process setup.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/asm-x86_64/gart-mapping.h is only ever used in
arch/x86_64/kernel/setup.c and none of its contents are referenced.
Looks to be leftover cruft not removed in the dma_ops patch.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It was originally added for 2.4 oprofile, but 2.6 oprofile doesn't
need that anymore. Shouldn't be any use in tree anymore and it doesn't
make much sense to export the ia32 syscalls when the main syscalls
are not exported.
I think Adrian Bunk asked for removing it several times.
Also included hunk from Adrian to remove the .globl ia32_sys_call_table
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Early development of x86-64 Linux was in CVS, but that hasn't been
the case for a long time now. Remove the obsolete $Id$s.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since END()/ENDPROC() are now available, add respective annotations to
x86_64's entry.S. This should help debugging activities.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sometimes e.g. with crashme the compat layer warnings can be noisy.
Add a way to turn them off by gating all output through compat_printk
that checks a global sysctl. The default is not changed.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix
initcall at 0xffffffff806c5b89: pci_iommu_init+0x0/0x53c(): returned with error code -1
Return -ENODEV instead when the IOMMU is not used.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since assign_irq_vector() can be called at runtime, its access of static
variables should be protected by a lock.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Factor out the duplicated access/cache code into a single file
* Shared between i386/x86-64.
- Share flush code between AGP and IOMMU
* Fix a bug: AGP didn't wait for end of flush before
- Drop 8 northbridges limit and allocate dynamically
- Add lock to serialize AGP and IOMMU GART flushes
- Add PCI ID for next AMD northbridge
- Random related cleanups
The old K8 NUMA discovery code is unchanged. New systems
should all use SRAT for this.
Cc: "Navin Boppuri" <navin.boppuri@newisys.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A trivial change to have gart_unmap_sg call gart_unmap_single directly,
instead of bouncing through the dma_unmap_single wrapper in
dma-mapping.h.
This change required moving the gart_unmap_single above gart_unmap_sg,
and under gart_map_single (which seems a more logical place that its
current location IMHO).
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previously we would just silently provide 64 bit services
for this to 32bit processes.
I also added all the other cases explicitely to the ptrace
compat wrapper to make sure this doesn't happen again.
And removed one bogus check in the wrapper.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow search for a contiguous block of iommu space to cross the next_bit
marker if we have already committed ourselves to flushing the gart.
There shouldn't be any reason why we'd restrict the search.
Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
enable large bzImages on x86_64. (fix is from x86's build.c) Using this
patch i have successfully built and booted an allyesconfig 13MB+ bzImage
on x86_64 too:
$ size64 vmlinux
text data bss dec hex filename
23444831 8202642 3439360 35086833 21761f1 vmlinux
-rw-rw-r-- 1 mingo mingo 13121740 Apr 19 09:32 arch/x86_64/boot/bzImage
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes are largely identical to the i386 version:
* alternative #define are moved to the new alternative.h file.
* one new elf section with pointers to the lock prefixes which can be
nop'ed out for non-smp.
* two new elf sections simliar to the "classic" alternatives to
replace SMP code with simpler UP code.
* fixup headers to use alternative.h instead of defining their own
LOCK / LOCK_PREFIX macros.
The patch reuses the i386 version of the alternatives code to avoid code
duplication. The code in alternatives.c was shuffled around a bit to
reduce the number of #ifdefs needed. It also got some tweaks needed for
x86_64 (vsyscall page handling) and new features (noreplacement option
which was x86_64 only up to now). Debug printk's are changed from
compile-time to runtime.
Loosely based on a early version from Bastian Blank <waldi@debian.org>
Signed-off-by: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Intel systems report the cache level data from CPUID 4 in sysfs.
Add a CPUID 4 emulation for AMD CPUs to report the same
information for them. This allows programs to read this
information in a uniform way.
The AMD way to report this is less flexible so some assumptions
are hardcoded (e.g. no L3)
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Previously the apicid<->coreid split was computed based on the max
number of cores. Now use a new CPUID AMD defined for that. On most
systems right now it should be 0 and the old method will be used.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
vSMPowered systems use apic_cluster too. Forcing apic_physflat works
on these systems too, but only if we change phys_pkg_id to use
hard_smp_prcoessor_id() instead of cpuid_ebx. I am guessing other
multichassi cluster systems would need this too.
Signed-off-by: ravikiran thirumalai <kiran@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enable some hwmon drivers as modules and tulip and stack unwinding
Kernel image should be somewhat bigger now because of the unwind
information being included, but you'll get exact backtraces for that.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- make firmware edid independent from framebuffer (No need to choose
framebuffer just to disable this option
- enable this option in X86_64
- check if VBE/DDC function is implemented before calling actual function
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently in the do_page_fault() code path, we call notify_die(DIE_PAGE_FAULT,
...) to notify the page fault. Since notify_die() is highly overloaded, this
page fault notification is currently being sent to all the components
registered with register_die_notification() which uses the same die_chain to
loop for all the registered components which is unnecessary.
In order to optimize the do_page_fault() code path, this critical page fault
notification is now moved to different call chain and the test results showed
great improvements.
And the kprobes which is interested in this notifications, now registers onto
this new call chain only when it need to, i.e Kprobes now registers for page
fault notification only when their are an active probes and unregisters from
this page fault notification when no probes are active.
I have incorporated all the feedback given by Ananth and Keith and everyone,
and thanks for all the review feedback.
This patch:
Overloading of page fault notification with the notify_die() has performance
issues(since the only interested components for page fault is kprobes and/or
kdb) and hence this patch introduces the new notifier call chain exclusively
for page fault notifications their by avoiding notifying unnecessary
components in the do_page_fault() code path.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- written on init only, accessed for every timer read --> __read_mostly
- fix broken sentence
Signed-off-by: Andreas Mohr <andi@lisas.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In a testament to the utter simplicity and logic of the English
language ;-), I found a single correct use - in kernel/panic.c - and
10-15 incorrect ones.
Signed-Off-By: Lee Revell <rlrevell@joe-job.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The wrapper routines are required when asmlinkage differs from the usual
calling convention. So we need to have them. However, by rearranging
the parameters, they will get optimised away to a single jump for most
people.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Up until now algorithms have been happy to get a context pointer since
they know everything that's in the tfm already (e.g., alignment, block
size).
However, once we have parameterised algorithms, such information will
be specific to each tfm. So the algorithm API needs to be changed to
pass the tfm structure instead of the context pointer.
This patch is basically a text substitution. The only tricky bit is
the assembly routines that need to get the context pointer offset
through asm-offsets.h.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commits
3e3318dee0 [PATCH] swsusp: x86_64 mark special saveable/unsaveable pages
b6370d96e0 [PATCH] swsusp: i386 mark special saveable/unsaveable pages
ce4ab0012b [PATCH] swsusp: add architecture special saveable pages support
because not only do they apparently cause page faults on x86, the
infrastructure doesn't compile on powerpc.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch will fix a boot memory reservation bug that trashes memory on
the ES7000 when loading the kdump crash kernel.
The code in arch/x86_64/kernel/setup.c to reserve boot memory for the crash
kernel uses the non-numa aware "reserve_bootmem" function instead of the
NUMA aware "reserve_bootmem_generic". I checked to make sure that no other
function was using "reserve_bootmem" and found none, except the ones that
had NUMA ifdef'ed out.
I have tested this patch only on an ES7000 with NUMA on and off (numa=off)
in a single (non-NUMA) and multi-cell (NUMA) configurations.
Signed-off-by: Amul Shah <amul.shah@unisys.com>
Looks-good-to: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (65 commits)
ACPI: suppress power button event on S3 resume
ACPI: resolve merge conflict between sem2mutex and processor_perflib.c
ACPI: use for_each_possible_cpu() instead of for_each_cpu()
ACPI: delete newly added debugging macros in processor_perflib.c
ACPI: UP build fix for bugzilla-5737
Enable P-state software coordination via _PDC
P-state software coordination for speedstep-centrino
P-state software coordination for acpi-cpufreq
P-state software coordination for ACPI core
ACPI: create acpi_thermal_resume()
ACPI: create acpi_fan_suspend()/acpi_fan_resume()
ACPI: pass pm_message_t from acpi_device_suspend() to root_suspend()
ACPI: create acpi_device_suspend()/acpi_device_resume()
ACPI: replace spin_lock_irq with mutex for ec poll mode
ACPI: Allow a WAN module enable/disable on a Thinkpad X60.
sem2mutex: acpi, acpi_link_lock
ACPI: delete unused acpi_bus_drivers_lock
sem2mutex: drivers/acpi/processor_perflib.c
ACPI add ia64 exports to build acpi_memhotplug as a module
ACPI: asus_acpi_init(): propagate correct return value
...
Manual resolve of conflicts in:
arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
include/acpi/processor.h
flush_tlb_all uses on_each_cpu, which will disable/enable interrupt.
In suspend/resume time, this will make interrupt wrongly enabled.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pages (Reserved/ACPI NVS/ACPI Data) below end_pfn will be saved/restored by S4
currently. We should mark 'Reserved' pages not saveable.
Pages (Reserved/ACPI NVS/ACPI Data) above end_pfn will not be saved/restored
by S4 currently. We should save the 'ACPI NVS/ACPI Data' pages.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_move_pages() support for 32bit (i386 plus x86_64 compat layer)
Add support for move_pages() on i386 and also add the compat functions
necessary to run 32 bit binaries on x86_64.
Add compat_sys_move_pages to the x86_64 32bit binary layer. Note that it is
not up to date so I added the missing pieces. Not sure if this is done the
right way.
[akpm@osdl.org: compile fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidate the various arch-specific implementations of pxm_to_node() and
node_to_pxm() into a single generic version.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (27 commits)
[PATCH] PCI: nVidia quirk to make AER PCI-E extended capability visible
[PATCH] PCI: fix issues with extended conf space when MMCONFIG disabled because of e820
[PATCH] PCI: Bus Parity Status sysfs interface
[PATCH] PCI: fix memory leak in MMCONFIG error path
[PATCH] PCI: fix error with pci_get_device() call in the mpc85xx driver
[PATCH] PCI: MSI-K8T-Neo2-Fir: run only where needed
[PATCH] PCI: fix race with pci_walk_bus and pci_destroy_dev
[PATCH] PCI: clean up pci documentation to be more specific
[PATCH] PCI: remove unneeded msi code
[PATCH] PCI: don't move ioapics below PCI bridge
[PATCH] PCI: cleanup unused variable about msi driver
[PATCH] PCI: disable msi mode in pci_disable_device
[PATCH] PCI: Allow MSI to work on kexec kernel
[PATCH] PCI: AMD 8131 MSI quirk called too late, bus_flags not inherited ?
[PATCH] PCI: Move various PCI IDs to header file
[PATCH] PCI Bus Parity Status-broken hardware attribute, EDAC foundation
[PATCH] PCI: i386/x86_84: disable PCI resource decode on device disable
[PATCH] PCI ACPI: Rename the functions to avoid multiple instances.
[PATCH] PCI: don't enable device if already enabled
[PATCH] PCI: Add a "enable" sysfs attribute to the pci devices to allow userspace (Xorg) to enable devices without doing foul direct access
...
The AGP default doesn't work well with other selects, so use a select for
GART_IOMMU as well. Remove a redundant default for SWIOTLB as well.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On 15 Jun 2006 03:45:10 +0200, Andi Kleen wrote:
> Anyways I would say that if the BIOS can't get MCFG right then
> it's likely not been validated on that board and shouldn't be used.
According to Petr Vandrovec:
... "What is important (and checked) is address of MMCONFIG reported by MCFG
table... Unfortunately code does not bother with printing that address :-(
"Another problem is that code has hardcoded that MMCONFIG area is 256MB large.
Unfortunately for the code PCI specification allows any power of two between 2MB
and 256MB if vendor knows that such amount of busses (from 2 to 128) will be
sufficient for system. With notebook it is quite possible that not full 8 bits
are implemented for MMCONFIG bus number."
So here is a patch. Unfortunately my system still fails the test because
it doesn't reserve any part of the MMCONFIG area, but this may fix others.
Booted on x86_64, only compiled on i386. x86_64 still remaps the max area
(256MB) even though only 2MB is checked... but 2.6.16 had no check at all
so it is still better.
PCI: reduce size of x86 MMCONFIG reserved area check
1. Print the address of the MMCONFIG area when the test for that area
being reserved fails.
2. Only check if the first 2MB is reserved, as that is the minimum.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
From: "Andy Currid" <ACurrid@nvidia.com>
This patch fixes a kernel panic during boot that occurs on NVIDIA platforms
that have HPET enabled.
When HPET is enabled, the standard timer IRQ is routed to IOAPIC pin 2 and is
advertised as such in the ACPI APIC table - but an earlier workaround in the
kernel was ignoring this override. The fix is to honor timer IRQ overrides
from ACPI when HPET is detected on an NVIDIA platform.
Signed-off-by: Andy Currid <acurrid@nvidia.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: "Yu, Luming" <luming.yu@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
int_ret_from_syscall already does syscall exit tracing, so
no need to do it again in the caller.
This caused problems for UML and some other special programs doing
syscall interception.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Robert Hentosh <robert_hentosh@dell.com>
Actually, we just stumbled on a different bug found in find_e820_area() in
e820.c. The following code does not handle the edge condition correctly:
while (bad_addr(&addr, size) && addr+size < ei->addr + ei->size)
;
last = addr + size;
if ( last > ei->addr + ei->size )
continue;
The second statement in the while loop needs to be a <= b so that it is the
logical negavite of the if (a > b) outside it. It needs to read:
while (bad_addr(&addr, size) && addr+size <= ei->addr + ei->size)
;
In the case that failed bad_addr was returning an address that is exactly size
bellow the end of the e820 range.
AK: Again together with the earlier avoid edma fix this fixes
boot on a Dell PE6850/16GB
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Daniel Yeisley <dan.yeisley@unisys.com>
It is possible to boot a Unisys ES7000 with CPUs from multiple cells, and not
also include the memory from those cells. This can create a scenario where
node 0 has cpus, but no associated memory. The system will boot fine in a
configuration where node 0 has memory, but nodes 2 and 3 do not.
[AK: I rechecked the code and generic code seems to indeed handle that already.
Dan's original patch had a change for mm/slab.c that seems to be already in now.]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: "Jan Beulich" <jbeulich@novell.com>
The PM timer code updates vxtime.last_tsc, but this update was done
incorrectly in two ways:
- offset_delay being in microseconds requires multiplying with cpu_mhz
rather than cpu_khz
- the multiplication of offset_delay and cpu_khz (both being 32-bit
values) on most current CPUs would overflow (observed value of the
delay was approximately 4000us, yielding an overflow for frequencies
starting a little above 1GHz)
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Complaining about the IOMMU not compiled in doesn't make sense
here because it is clearly compiled in.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ia32_setup_arg_pages would ignore the passed in random stack top
and use its own static value.
Now it uses the 8bit of randomness native i386 would use too.
This indirectly fixes mmap randomization for 32bit processes too,
which depends on the stack randomization.
Should also give slightly better virtual cache colouring and
possibly better performance with HyperThreading.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Problem:
If we put a probe onto a callq instruction and the probe is executed,
kernel panic of Bad RIP value occurs.
Root cause:
If resume_execution() found 0xff at first byte of p->ainsn.insn, it must
check the _second_ byte. But current resume_execution check _first_ byte
again.
I changed it checks second byte of p->ainsn.insn.
Kprobes on i386 don't have this problem, because the implementation is a
little bit different from x86_64.
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Satoshi Oshima <soshima@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Extends an earlier patch from John Blackwood to more exception handlers
that also run on the exception stacks.
Expand the use of preempt_conditional_{sti,cli} to all cases where
interrupts are to be re-enabled during exception handling while running
on an IST stack.
Based on original patch from Jan Beulich.
Cc: John Blackwood <john.blackwood@ccur.com>
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes some boot failures on Dell and Unisys systems with memory
hotadd added.
- Set hotadd_percent to 0 by default. This means anybody using hotadd
memory needs to specify the value on the command line. That's
because there are lots of Intel boxes which have a bogus hotplug area
in their SRAT and they would waste a lot of memory before.
- Fix calculation of how much memory to use when the hotplug area
exceeds hotadd_percent
- Fix fallback when the
- Fix fallback if memory hotadd is not compiled in.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This triggers for b44's 1GB DMA workaround which tries to map
first and then bounces.
The 32bit heuristic is reasonable because the IOMMU doesn't attempt
to handle < 32bit masks anyways.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Based on analysis&patch from Robert Hentosch
Observed on a Dell PE6850 with 16GB
The problem occurs very early on, when the kernel allocates space for the
temporary memory map called bootmap. The bootmap overlaps the EBDA region.
EBDA region is not historically reserved in the e820 mapping. When the
bootmap is freed it marks the EBDA region as usable.
If you notice in setup.c there is already code to work around the EBDA
in reserve_ebda_region(), this check however occurs after the bootmap
is allocated and doesn't prevent the bootmap from using this range.
AK: I redid the original patch. Thanks also to Jan Beulich for
spotting some mistakes.
Cc: Robert_Hentosch@dell.com
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Playing with NMI watchdog on x86_64, I discovered that it didn't
do what I expected. It always panic-ed, even when it didn't
happen from interrupt context. This patch solves that
problem for me. Also, in this case, do_exit() will be called
with interrupts disabled, I believe. Would it be wise to also
call local_irq_enable() after nmi_exit()?
[Yes I added it -AK]
Currently, on x86_64, any NMI watchdog timeout will cause a panic
because the irq count will always be set to be in an interrupt
when do_exit() is called from die_nmi(). If we add nmi_exit() to
the die_nmi() call (since the nmi will never exit "normally")
it seems to solve this problem. The following small program
can be used to trigger the NMI watchdog to reproduce this:
main ()
{
iopl(3);
for (;;) asm("cli");
}
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I noticed this when poking around in this area.
The oops_begin() function in x86_64 would only conditionally claim
the die_lock if the call is nested, but oops_end() would always
release the spinlock. This patch adds a nest count for the die lock
so that the release of the lock is only done on the final oops_end().
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>