this compiler warning:
arch/x86/kernel/ds.c: In function 'ds_request':
arch/x86/kernel/ds.c:368: warning: 'context' may be used uninitialized in this function
Shows that the code flow in ds_request() is buggy - it goes into
the unlock+release-context path even when the context is not allocated
yet.
First allocate the context, then do the other checks.
Also, take care with GFP allocations under the ds_lock spinlock.
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix incorrectly marked unstable TSC clock
Patch (commit 0d12cdd "sched: improve sched_clock() performance") has
a regression on one of the test systems here.
With the patch, I see:
checking TSC synchronization [CPU#0 -> CPU#1]:
Measured 28 cycles TSC warp between CPUs, turning off TSC clock.
Marking TSC unstable due to check_tsc_sync_source failed
Whereas, without the patch syncs pass fine on all CPUs:
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Due to this, TSC is marked unstable, when it is not actually unstable.
This is because syncs in check_tsc_wrap() goes away due to this commit.
As per the discussion on this thread, correct way to fix this is to add
explicit syncs as below?
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
All blame goes to: color white,red "[^[:graph:]]+$"
in .nanorc ;).
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix es7000 build
CC arch/x86/kernel/es7000_32.o
arch/x86/kernel/es7000_32.c: In function find_unisys_acpi_oem_table:
arch/x86/kernel/es7000_32.c:255: error: implicit declaration of function acpi_get_table_with_size
arch/x86/kernel/es7000_32.c:261: error: implicit declaration of function early_acpi_os_unmap_memory
arch/x86/kernel/es7000_32.c: In function unmap_unisys_acpi_oem_table:
arch/x86/kernel/es7000_32.c:277: error: implicit declaration of function __acpi_unmap_table
make[1]: *** [arch/x86/kernel/es7000_32.o] Error 1
we applied one patch out of order...
| commit a73aaedd95
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date: Sun Sep 14 02:33:14 2008 -0700
|
| x86: check dsdt before find oem table for es7000, v2
|
| v2: use __acpi_unmap_table()
that patch need:
x86: use early_ioremap in __acpi_map_table
x86: always explicitly map acpi memory
acpi: remove final __acpi_map_table mapping before setting acpi_gbl_permanent_mmap
acpi/x86: introduce __apci_map_table, v4
submitted to the ACPI tree but not upstream yet.
fix it until those patches applied, need to revert this one
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix a problem where ds_request() returned an error without releasing the
ds lock.
Reported-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Markus Metzger <markus.t.metzger@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit e51af66308, which was
wrongly hoovered up and submitted about a month after a better fix had
already been merged.
The better fix is commit cbda1ba898
("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do
this blacklisting based on the DMI identification for the offending
motherboard, since sometimes this chipset (or at least a chipset with
the same PCI ID) apparently _does_ actually have an IOMMU.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: Fix pit memory leak if unable to allocate irq source id
KVM: ia64: fix vmm_spin_{un}lock for !CONFIG_SMP
KVM: VMX: Set IGMT bit in EPT entry
KVM: Require the PCI subsystem
x86: KVM guest: fix section mismatch warning in kvmclock.c
KVM: ia64: Use guest signal mask when blocking
KVM: MMU: increase per-vcpu rmap cache alloc size
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (47 commits)
ACPI: pci_link: remove acpi_irq_balance_set() interface
fujitsu-laptop: Add DMI callback for Lifebook S6420
ACPI: EC: Don't do transaction from GPE handler in poll mode.
ACPI: EC: lower interrupt storm treshold
ACPICA: Use spinlock for acpi_{en|dis}able_gpe
ACPI: EC: restart failed command
ACPI: EC: wait for last write gpe
ACPI: EC: make kernel messages more useful when GPE storm is detected
ACPI: EC: revert msleep patch
thinkpad_acpi: fingers off backlight if video.ko is serving this functionality
sony-laptop: fingers off backlight if video.ko is serving this functionality
msi-laptop: fingers off backlight if video.ko is serving this functionality
fujitsu-laptop: fingers off backlight if video.ko is serving this functionality
eeepc-laptop: fingers off backlight if video.ko is serving this functionality
compal: fingers off backlight if video.ko is serving this functionality
asus-acpi: fingers off backlight if video.ko is serving this functionality
Acer-WMI: fingers off backlight if video.ko is serving this functionality
ACPI video: if no ACPI backlight support, use vendor drivers
ACPI: video: Ignore devices that aren't present in hardware
Delete an unwanted return statement at evgpe.c
...
This removes the acpi_irq_balance_set() interface from the PCI
interrupt link driver.
x86 used acpi_irq_balance_set() to tell the PCI interrupt link
driver to configure links to minimize IRQ sharing. But the link
driver can easily figure out whether to turn on IRQ balancing
based on the IRQ model (PIC/IOAPIC/etc), so we can get rid of
that external interface.
It's better for the driver to figure this out at init-time. If
we set it externally via the x86 code, the interface reduces
modularity, and we depend on the fact that acpi_process_madt()
happens before we process the kernel command line.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
There is a potential issue that, when guest using pagetable without vmexit when
EPT enabled, guest would use PAT/PCD/PWT bits to index PAT msr for it's memory,
which would be inconsistent with host side and would cause host MCE due to
inconsistent cache attribute.
The patch set IGMT bit in EPT entry to ignore guest PAT and use WB as default
memory type to protect host (notice that all memory mapped by KVM should be WB).
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
WARNING: arch/x86/kernel/built-in.o(.text+0x1722c): Section mismatch
in reference from the function kvm_setup_secondary_clock() to the
function .devinit.text:setup_secondary_APIC_clock()
The function kvm_setup_secondary_clock() references
the function __devinit setup_secondary_APIC_clock().
This is often because kvm_setup_secondary_clock lacks a __devinit
annotation or the annotation of setup_secondary_APIC_clock is wrong.
Signed-off-by: Md.Rakib H. Mullick <rakib.mullick@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context
nohz: disable tick_nohz_kick_tick() for now
irq: call __irq_enter() before calling the tick_idle_check
x86: HPET: enter hpet_interrupt_handler with interrupts disabled
x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP
x86: HPET: convert WARN_ON to WARN_ON_ONCE
The page fault path can use two rmap_desc structures, if:
- walk_addr's dirty pte update allocates one rmap_desc.
- mmu_lock is dropped, sptes are zapped resulting in rmap_desc being
freed.
- fetch->mmu_set_spte allocates another rmap_desc.
Increase to 4 for safety.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Impact: build/boot fix for x86/Voyager
This change:
| commit 3d44223327
| Author: Jens Axboe <jens.axboe@oracle.com>
| Date: Thu Jun 26 11:21:34 2008 +0200
|
| Add generic helpers for arch IPI function calls
didn't wire up the voyager smp call function correctly, so do that
here. Also make CONFIG_USE_GENERIC_SMP_HELPERS a def_bool y again,
since we now use the generic helpers for every x86 architecture.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <Jens.Axboe@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While investigating the failure of hibernation on 32-bit x86 with
CONFIG_NUMA set, as described in this message
http://marc.info/?l=linux-kernel&m=122634118116226&w=4
I asked some people for help and I was told that it wasn't really
worth the effort, because CONFIG_NUMA was generally broken on 32-bit
x86 systems and it shouldn't be used in such configs. For this
reason, make CONFIG_NUMA depend on BROKEN instead of EXPERIMENTAL on
x86-32.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Pavel Machek <pavel@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some functions that may be called from this handler require that
interrupts are disabled. Also, combining IRQF_DISABLED and
IRQF_SHARED does not reliably disable interrupts in a handler, so
remove IRQF_SHARED from the irq flags (this irq is not shared anyway).
Signed-off-by: Matt Fleming <mjf@gentoo.org>
Cc: mingo@elte.hu
Cc: venkatesh.pallipadi@intel.com
Cc: "Will Newton" <will.newton@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In hpet_next_event() we check that the value we just wrote to
HPET_Tn_CMP(timer) has reached the chip. Currently, we're checking that
the value we wrote to HPET_Tn_CMP(timer) is in HPET_T0_CMP, which, if
timer is anything other than timer 0, is likely to fail.
Signed-off-by: Matt Fleming <mjf@gentoo.org>
Cc: mingo@elte.hu
Cc: venkatesh.pallipadi@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It is possible to flood the console with call traces if the WARN_ON
condition is true because of the frequency with which this function is
called.
Signed-off-by: Matt Fleming <mjf@gentoo.org>
Cc: mingo@elte.hu
Cc: venkatesh.pallipadi@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: cleanup
Arches that have their own irq_regs definition are expected to
define ARCH_HAS_OWN_IRQ_REGS or else a generic (unused) set
will also be defined in lib/irq_regs.c
Sparse noticed the unused generic one had no prototype:
lib/irq_regs.c:15:1: warning: symbol 'per_cpu____irq_regs' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: remove unused variable
I forgot to remove the now unused "cycles_t cycles" parameter from
vget_cycles() - which triggers build warnings as tsc.h is included
in a number of files.
Remove it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a new file was accidentally added to include/asm-x86;
move it to the new arch/x86/include/asm location
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Impact: cleanup
Move rdtsc_barrier() use to vsyscall_64.c where it's relied on,
and point out its role in the context of its use.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: optimize sched_clock() a bit
sched: improve sched_clock() performance
sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
variant of __cycles_2_ns().
Most of the time sched_clock() is called with irqs disabled already.
The few places that call it with irqs enabled need to be updated.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
in scheduler-intense workloads native_read_tsc() overhead accounts for
20% of the system overhead:
659567 system_call 41222.9375
686796 schedule 435.7843
718382 __switch_to 665.1685
823875 switch_mm 4526.7857
1883122 native_read_tsc 55385.9412
9761990 total 2.8468
this is large part due to the rdtsc_barrier() that is done before
and after reading the TSC.
But sched_clock() is not a precise clock in the GTOD sense, using such
barriers is completely pointless. So remove the barriers and only use
them in vget_cycles().
This improves lat_ctx performance by about 5%.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, xen: fix use of pgd_page now that it really does return a page
Fix the counter overflow check for CPUs with counter width > 32
I had a similar change in a different patch that I didn't submit
and I didn't notice the problem earlier because it was always
tested together.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Xen requires that all mappings of pagetable pages are read-only, so
that they can't be updated illegally. As a result, if a page is being
turned into a pagetable page, we need to make sure all its mappings
are RO.
If the page had been used for ioremap or vmalloc, it may still have
left over mappings as a result of not having been lazily unmapped.
This change makes sure we explicitly mop them all up before pinning
the page.
Unlike aliases created by kmap, the there can be vmalloc aliases even
for non-high pages, so we must do the flush unconditionally.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Revert "x86: default to reboot via ACPI"
x86: align DirectMap in /proc/meminfo
AMD IOMMU: fix lazy IO/TLB flushing in unmap path
x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR
x86: remove VISWS and PARAVIRT around NR_IRQS puzzle
x86: mention ACPI in top-level Kconfig menu
x86: size NR_IRQS on 32-bit systems the same way as 64-bit
x86: don't allow nr_irqs > NR_IRQS
x86/docs: remove noirqbalance param docs
x86: don't use tsc_khz to calculate lpj if notsc is passed
x86, voyager: fix smp_intr_init() compile breakage
AMD IOMMU: fix detection of NP capable IOMMUs
Impact: fix 32-bit Xen guest boot crash
On 32-bit PAE, pud_page, for no good reason, didn't really return a
struct page *. Since Jan Beulich's fix "i386/PAE: fix pud_page()",
pud_page does return a struct page *.
Because PAE has 3 pagetable levels, the pud level is folded into the
pgd level, so pgd_page() is the same as pud_page(), and now returns
a struct page *. Update the xen/mmu.c code which uses pgd_page()
accordingly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit c7ffa6c262.
the assumptio of this change was that this would not break
any existing machine. Andrey Borzenkov reported troubles with
the ACPI reboot method: the system would hang on reboot, necessiating
a power cycle. Probably more systems are affected as well.
Also, there are patches queued up for v2.6.29 to disable virtualization
on emergency_restart() - which was the original motivation of
this change.
Reported-by: Andrey Borzenkov <arvidjaar@mail.ru>
Bisected-by: Andrey Borzenkov <arvidjaar@mail.ru>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: right-align /proc/meminfo consistent with other fields
When the split-LRU patches added Inactive(anon) and Inactive(file) lines
to /proc/meminfo, all counts were moved two columns rightwards to fit in.
Now move x86's DirectMap lines two columns rightwards to line up.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lazy flushing needs to take care of the unmap path too which is not yet
implemented and leads to stale IO/TLB entries. This is fixed by this
patch.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Impact: fix rare x2apic hang
On x86, x2apic mode accesses for sending IPI's don't have serializing
semantics. If the IPI receivner refers(in lock-free fashion) to some
memory setup by the sender, the need for smp_mb() before sending the
IPI becomes critical in x2apic mode.
Add the smp_mb() in native_flush_tlb_others() before sending the IPI.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix warning message when PARAVIRT is set in config
Remove stale #ifdef components from our IRQ sizing logic.
x86/Voyager is the only holdout.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clarify menuconfig text
Mention ACPI in the top-level menu to give a clue as to where
it lives. This matches what ia64 does.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make NR_IRQS big enough for system with lots of apic/pins
If lots of IO_APIC's are there (or can be there), size the same way
as 64-bit, depending on MAX_IO_APICS and NR_CPUS.
This fixes the boot problem reported by Ben Hutchings on a 32-bit
server with 5 IO-APICs and 240 IO-APIC pins.
Signed-off-by: Yinghai <yinghai@kernel.org>
Tested-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix boot hang on 32-bit systems with more than 224 IO-APIC pins
On some 32-bit systems with a lot of IO-APICs probe_nr_irqs() can
return a value larger than NR_IRQS. This will lead to probe_irq_on()
overrunning the irq_desc array.
I hit this when running net-next-2.6 (close to 2.6.28-rc3) on a
Supermicro dual Xeon system. NR_IRQS is 224 but probe_nr_irqs() detects
5 IOAPICs and returns 240. Here are the log messages:
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
Tue Nov 4 16:53:47 2008 IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x02] address[0xfec81000] gsi_base[24])
Tue Nov 4 16:53:47 2008 IOAPIC[1]: apic_id 2, version 32, address 0xfec81000, GSI 24-47
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x03] address[0xfec81400] gsi_base[48])
Tue Nov 4 16:53:47 2008 IOAPIC[2]: apic_id 3, version 32, address 0xfec81400, GSI 48-71
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x04] address[0xfec82000] gsi_base[72])
Tue Nov 4 16:53:47 2008 IOAPIC[3]: apic_id 4, version 32, address 0xfec82000, GSI 72-95
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x05] address[0xfec82400] gsi_base[96])
Tue Nov 4 16:53:47 2008 IOAPIC[4]: apic_id 5, version 32, address 0xfec82400, GSI 96-119
Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Tue Nov 4 16:53:47 2008 Enabling APIC mode: Flat. Using 5 I/O APICs
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Minor optimization.
Implement change_bit with immediate bit count as "lock xorb". This is
similar to "lock orb" and "lock andb" for set_bit and clear_bit
functions.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: improve wakeup affinity on NUMA systems, tweak SMP systems
Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain
balancing defaults on NUMA and SMP systems.
Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason
why we would not want to have wakeup affinity across nodes as well.
(we already do this in the standard NUMA template.)
lat_ctx on a NUMA box is particularly happy about this change:
before:
| phoenix:~/l> ./lat_ctx -s 0 2
| "size=0k ovr=2.60
| 2 5.70
after:
| phoenix:~/l> ./lat_ctx -s 0 2
| "size=0k ovr=2.65
| 2 2.07
a 2.75x speedup.
pipe-test is similarly happy about it too:
| phoenix:~/sched-tests> ./pipe-test
| 18.26 usecs/loop.
| 14.70 usecs/loop.
| 14.38 usecs/loop.
| 10.55 usecs/loop. # +WAKE_AFFINE on domain0+domain1
| 8.63 usecs/loop.
| 8.59 usecs/loop.
| 9.03 usecs/loop.
| 8.94 usecs/loop.
| 8.96 usecs/loop.
| 8.63 usecs/loop.
Also:
- disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings)
- enable SD_WAKE_BALANCE on SMP domains
Sysbench+postgresql improves all around the board, quite significantly:
.28-rc3-11474e2c .28-rc3-11474e2c-tune
-------------------------------------------------
1: 571 688 +17.08%
2: 1236 1206 -2.55%
4: 2381 2642 +9.89%
8: 4958 5164 +3.99%
16: 9580 9574 -0.07%
32: 7128 8118 +12.20%
64: 7342 8266 +11.18%
128: 7342 8064 +8.95%
256: 7519 7884 +4.62%
512: 7350 7731 +4.93%
-------------------------------------------------
SUM: 55412 59341 +6.62%
So it's a win both for the runup portion, the peak area and the tail.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix udelay when "notsc" boot parameter is passed
With notsc passed on commandline, tsc may not be used for
udelays, make sure that we do not use tsc_khz to calculate
the lpj value in such cases.
Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'io-mappings-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
io mapping: clean up #ifdefs
io mapping: improve documentation
i915: use io-mapping interfaces instead of a variety of mapping kludges
resources: add io-mapping functions to dynamically map large device apertures
x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps