kernel_optimize_test

Author	SHA1	Message	Date
Gleb Natapov	64a7ec0668	KVM: Fix unneeded instruction skipping during task switching. There is no need to skip instruction if the reason for a task switch is a task gate in IDT and access to it is caused by an external even. The problem is currently solved only for VMX since there is no reliable way to skip an instruction in SVM. We should emulate it instead. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:38 +03:00
Gleb Natapov	b237ac37a1	KVM: Fix task switch back link handling. Back link is written to a wrong TSS now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:37 +03:00
Gleb Natapov	8843419048	KVM: VMX: Do not zero idt_vectoring_info in vmx_complete_interrupts(). We will need it later in task_switch(). Code in handle_exception() is dead. is_external_interrupt(vect_info) will always be false since idt_vectoring_info is zeroed in vmx_complete_interrupts(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:37 +03:00
Gleb Natapov	37b96e9880	KVM: VMX: Rewrite vmx_complete_interrupt()'s twisted maze of if() statements ...with a more straightforward switch(). Also fix a bug when NMI could be dropped on exit. Although this should never happen in practice, since NMIs can only be injected, never triggered internally by the guest like exceptions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:37 +03:00
Gleb Natapov	7b4a25cb29	KVM: VMX: Fix handling of a fault during NMI unblocked due to IRET Bit 12 is undefined in any of the following cases: If the VM exit sets the valid bit in the IDT-vectoring information field. If the VM exit is due to a double fault. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:36 +03:00
Dong, Eddie	20c466b561	KVM: Use rsvd_bits_mask in load_pdptrs() Also remove bit 5-6 from rsvd_bits_mask per latest SDM. Signed-off-by: Eddie Dong <Eddie.Dong@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:36 +03:00
Sheng Yang	93ba03c2e2	KVM: VMX: Fix feature testing The testing of feature is too early now, before vmcs_config complete initialization. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:36 +03:00
Sheng Yang	045471563d	KVM: VMX: Clean up Flex Priority related And clean paranthes on returns. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:36 +03:00
Wei Yongjun	7a6ce84c74	KVM: remove pointless conditional before kfree() in lapic initialization Remove pointless conditional before kfree(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:35 +03:00
Avi Kivity	9645bb56b3	KVM: MMU: Use different shadows when EFER.NXE changes A pte that is shadowed when the guest EFER.NXE=1 is not valid when EFER.NXE=0; if bit 63 is set, the pte should cause a fault, and since the shadow EFER always has NX enabled, this won't happen. Fix by using a different shadow page table for different EFER.NXE bits. This allows vcpus to run correctly with different values of EFER.NXE, and for transitions on this bit to be handled correctly without requiring a full flush. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:35 +03:00
Dong, Eddie	82725b20e2	KVM: MMU: Emulate #PF error code of reserved bits violation Detect, indicate, and propagate page faults where reserved bits are set. Take care to handle the different paging modes, each of which has different sets of reserved bits. [avi: fix pte reserved bits for efer.nxe=0] Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:35 +03:00
Eddie Dong	a8b876b1a4	KVM: MMU: Fix comment in page_fault() The original one is for the code before refactoring. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:34 +03:00
Sheng Yang	f9c617f611	KVM: VMX: Correct wrong vmcs field sizes EXIT_QUALIFICATION and GUEST_LINEAR_ADDRESS are natural width, not 64-bit. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:34 +03:00
Avi Kivity	7d433b9f94	KVM: VMX: Make flexpriority module parameter reflect hardware capability If the hardware does not support flexpriority, zero the module parameter. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:34 +03:00
Gleb Natapov	78646121e9	KVM: Fix interrupt unhalting a vcpu when it shouldn't kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:33 +03:00
Gleb Natapov	09cec75488	KVM: Timer event should not unconditionally unhalt vcpu. Currently timer events are processed before entering guest mode. Move it to main vcpu event loop since timer events should be processed even while vcpu is halted. Timer may cause interrupt/nmi to be injected and only then vcpu will be unhalted. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:33 +03:00
Avi Kivity	089d034e0c	KVM: VMX: Fold vm_need_ept() into callers Trivial. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:32 +03:00
Avi Kivity	575ff2dcb2	KVM: VMX: Zero ept module parameter if ept is not present Allows reading back hardware capability. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:32 +03:00
Avi Kivity	919818abc2	KVM: VMX: Zero the vpid module parameter if vpid is not supported This allows reading back how the hardware is configured. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:32 +03:00
Avi Kivity	4462d21a61	KVM: VMX: Annotate module parameters as __read_mostly Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:32 +03:00
Avi Kivity	736caefe15	KVM: VMX: Simplify module parameter names Instead of 'enable_vpid=1', use a simple 'vpid=1'. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:31 +03:00
Avi Kivity	6062d012ed	KVM: VMX: Rename kvm_handle_exit() to vmx_handle_exit() It is a static vmx-specific function. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:31 +03:00
Avi Kivity	c1f8bc04c6	KVM: VMX: Make module parameters readable Useful to see how the module was loaded. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:31 +03:00
Gleb Natapov	fe4c7b1914	KVM: reuse (pop\|push)_irq from svm.c in vmx.c The prioritized bit vector manipulation functions are useful in both vmx and svm. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:31 +03:00
Gleb Natapov	61c50edfcd	KVM: SVM: Remove duplicate code in svm_do_inject_vector() svm_do_inject_vector() reimplements pop_irq(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:30 +03:00
Amit Shah	7fe29e0faa	KVM: x86: Ignore reads to EVNTSEL MSRs We ignore writes to the performance counters and performance event selector registers already. Kaspersky antivirus reads the eventsel MSR causing it to crash with the current behaviour. Return 0 as data when the eventsel registers are read to stop the crash. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:30 +03:00
Gleb Natapov	f00be0cae4	KVM: MMU: do not free active mmu pages in free_mmu_pages() free_mmu_pages() should only undo what alloc_mmu_pages() does. Free mmu pages from the generic VM destruction function, kvm_destroy_vm(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:30 +03:00
Sheng Yang	e56d532f20	KVM: Device assignment framework rework After discussion with Marcelo, we decided to rework device assignment framework together. The old problems are kernel logic is unnecessary complex. So Marcelo suggest to split it into a more elegant way: 1. Split host IRQ assign and guest IRQ assign. And userspace determine the combination. Also discard msi2intx parameter, userspace can specific KVM_DEV_IRQ_HOST_MSI \| KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to enable MSI to INTx convertion. 2. Split assign IRQ and deassign IRQ. Import two new ioctls: KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ. This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the old interface). [avi: replace homemade bitcount() by hweight_long()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:29 +03:00
Hannes Eder	386eb6e8b3	KVM: make 'lapic_timer_ops' and 'kpit_ops' static Fix this sparse warnings: arch/x86/kvm/lapic.c:916:22: warning: symbol 'lapic_timer_ops' was not declared. Should it be static? arch/x86/kvm/i8254.c:268:22: warning: symbol 'kpit_ops' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:29 +03:00
Gleb Natapov	58c2dde17d	KVM: APIC: get rid of deliver_bitmask Deliver interrupt during destination matching loop. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:27 +03:00
Gleb Natapov	e1035715ef	KVM: change the way how lowest priority vcpu is calculated The new way does not require additional loop over vcpus to calculate the one with lowest priority as one is chosen during delivery bitmap construction. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:27 +03:00
Gleb Natapov	343f94fe4d	KVM: consolidate ioapic/ipi interrupt delivery logic Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in apic_send_ipi() to figure out ipi destination instead of reimplementing the logic. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:27 +03:00
Gleb Natapov	6da7e3f643	KVM: APIC: kvm_apic_set_irq deliver all kinds of interrupts Get rid of ioapic_inj_irq() and ioapic_inj_nmi() functions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:26 +03:00
Joerg Roedel	f5a1e9f895	KVM: MMU: remove call to kvm_mmu_pte_write from walk_addr There is no reason to update the shadow pte here because the guest pte is only changed to dirty state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-06-10 11:48:26 +03:00
Marcelo Tosatti	d3c7b77d1a	KVM: unify part of generic timer handling Hide the internals of vcpu awakening / injection from the in-kernel emulated timers. This makes future changes in this logic easier and decreases the distance to more generic timer handling. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:25 +03:00
Marcelo Tosatti	fd66842370	KVM: PIT: remove usage of count_load_time for channel 0 We can infer elapsed time from hrtimer_expires_remaining. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:25 +03:00
Marcelo Tosatti	5a05d54554	KVM: PIT: remove unused scheduled variable Unused. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:25 +03:00
Marcelo Tosatti	a90ede7b17	KVM: x86: paravirt skip pit-through-ioapic boot check Skip the test which checks if the PIT is properly routed when using the IOAPIC, aimed at buggy hardware. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:24 +03:00
Matt T. Yourst	2dea4c84bc	KVM: x86: silence preempt warning on kvm_write_guest_time This issue just appeared in kvm-84 when running on 2.6.28.7 (x86-64) with PREEMPT enabled. We're getting syslog warnings like this many (but not all) times qemu tells KVM to run the VCPU: BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-x86/28938 caller is kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] Pid: 28938, comm: qemu-system-x86 2.6.28.7-mtyrel-64bit Call Trace: debug_smp_processor_id+0xf7/0x100 kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] ? __wake_up+0x4e/0x70 ? wake_futex+0x27/0x40 kvm_vcpu_ioctl+0x2e9/0x5a0 [kvm] enqueue_hrtimer+0x8a/0x110 _spin_unlock_irqrestore+0x27/0x50 vfs_ioctl+0x31/0xa0 do_vfs_ioctl+0x74/0x480 sys_futex+0xb4/0x140 sys_ioctl+0x99/0xa0 system_call_fastpath+0x16/0x1b As it turns out, the call trace is messed up due to gcc's inlining, but I isolated the problem anyway: kvm_write_guest_time() is being used in a non-thread-safe manner on preemptable kernels. Basically kvm_write_guest_time()'s body needs to be surrounded by preempt_disable() and preempt_enable(), since the kernel won't let us query any per-CPU data (indirectly using smp_processor_id()) without preemption disabled. The attached patch fixes this issue by disabling preemption inside kvm_write_guest_time(). [marcelo: surround only __get_cpu_var calls since the warning is harmless] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:24 +03:00
Sheng Yang	d510d6cc65	KVM: Enable MSI-X for KVM assigned device This patch finally enable MSI-X. What we need for MSI-X: 1. Intercept one page in MMIO region of device. So that we can get guest desired MSI-X table and set up the real one. Now this have been done by guest, and transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY. 2. Information for incoming interrupt. Now one device can have more than one interrupt, and they are all handled by one workqueue structure. So we need to identify them. The previous patch enable gsi_msg_pending_bitmap get this done. 3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X message address/data. We used same entry number for the host and guest here, so that it's easy to find the correlated guest gsi. What we lack for now: 1. The PCI spec said nothing can existed with MSI-X table in the same page of MMIO region, except pending bits. The patch ignore pending bits as the first step (so they are always 0 - no pending). 2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch didn't support this, and Linux also don't work in this way. 3. The patch didn't implement MSI-X mask all and mask single entry. I would implement the former in driver/pci/msi.c later. And for single entry, userspace should have reposibility to handle it. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:23 +03:00
Sheng Yang	bfd349d073	KVM: bit ops for deliver_bitmap It's also convenient when we extend KVM supported vcpu number in the future. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:22 +03:00
Sheng Yang	110c2faeba	KVM: Update intr delivery func to accept unsigned long* bitmap Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is increased. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:22 +03:00
Avi Kivity	5897297bc2	KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE Windows 2008 accesses this MSR often on context switch intensive workloads; since we run in guest context with the guest MSR value loaded (so swapgs can work correctly), we can simply disable interception of rdmsr/wrmsr for this MSR. A complication occurs since in legacy mode, we run with the host MSR value loaded. In this case we enable interception. This means we need two MSR bitmaps, one for legacy mode and one for long mode. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:21 +03:00
Avi Kivity	3e7c73e9b1	KVM: VMX: Don't use highmem pages for the msr and pio bitmaps Highmem pages are a pain, and saving three lowmem pages on i386 isn't worth the extra code. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:21 +03:00
Chuck Ebbert	0b8c3d5ab0	x86: Clear TS in irq_ts_save() when in an atomic section The dynamic FPU context allocation changes caused the padlock driver to generate the below warning. Fix it by masking TS when doing padlock encryption operations in an atomic section. This solves: BUG: sleeping function called from invalid context at mm/slub.c:1602 in_atomic(): 1, irqs_disabled(): 0, pid: 82, name: cryptomgr_test Pid: 82, comm: cryptomgr_test Not tainted 2.6.29.4-168.test7.fc11.x86_64 #1 Call Trace: [<ffffffff8103ff16>] __might_sleep+0x10b/0x110 [<ffffffff810cd3b2>] kmem_cache_alloc+0x37/0xf1 [<ffffffff81018505>] init_fpu+0x49/0x8a [<ffffffff81012a83>] math_state_restore+0x3e/0xbc [<ffffffff813ac6d0>] do_device_not_available+0x9/0xb [<ffffffff810123ab>] device_not_available+0x1b/0x20 [<ffffffffa001c066>] ? aes_crypt+0x66/0x74 [padlock_aes] [<ffffffff8119a51a>] ? blkcipher_walk_next+0x257/0x2e0 [<ffffffff8119a731>] ? blkcipher_walk_first+0x18e/0x19d [<ffffffffa001c1fe>] aes_encrypt+0x9d/0xe5 [padlock_aes] [<ffffffffa0027253>] crypt+0x6b/0x114 [xts] [<ffffffffa001c161>] ? aes_encrypt+0x0/0xe5 [padlock_aes] [<ffffffffa001c161>] ? aes_encrypt+0x0/0xe5 [padlock_aes] [<ffffffffa0027390>] encrypt+0x49/0x4b [xts] [<ffffffff81199acc>] async_encrypt+0x3c/0x3e [<ffffffff8119dafc>] test_skcipher+0x1da/0x658 [<ffffffff811979c3>] ? crypto_spawn_tfm+0x8e/0xb1 [<ffffffff8119672d>] ? __crypto_alloc_tfm+0x11b/0x15f [<ffffffff811979c3>] ? crypto_spawn_tfm+0x8e/0xb1 [<ffffffff81199dbe>] ? skcipher_geniv_init+0x2b/0x47 [<ffffffff8119a905>] ? async_chainiv_init+0x5c/0x61 [<ffffffff8119dfdd>] alg_test_skcipher+0x63/0x9b [<ffffffff8119e1bc>] alg_test+0x12d/0x175 [<ffffffff8119c488>] cryptomgr_test+0x38/0x54 [<ffffffff8119c450>] ? cryptomgr_test+0x0/0x54 [<ffffffff8105c6c9>] kthread+0x4d/0x78 [<ffffffff8101264a>] child_rip+0xa/0x20 [<ffffffff81011f67>] ? restore_args+0x0/0x30 [<ffffffff8105c67c>] ? kthread+0x0/0x78 [<ffffffff81012640>] ? child_rip+0x0/0x20 Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090609104050.50158cfe@dhcp-100-2-144.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-09 16:50:43 +02:00
Yong Wang	fecc8ac849	perf_counter, x86: Correct some event and umask values for Intel processors Correct some event and UMASK values according to Intel SDM, in the Nehalem and Atom tables. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090609131553.GA12489@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-09 16:50:07 +02:00
Andreas Herrmann	42937e81a8	x86: Detect use of extended APIC ID for AMD CPUs Booting a 32-bit kernel on Magny-Cours results in the following panic: ... Using APIC driver default ... Overriding APIC driver with bigsmp ... Getting VERSION: 80050010 Getting VERSION: 80050010 Getting ID: 10000000 Getting ID: ef000000 Getting LVT0: 700 Getting LVT1: 10000 Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (16 vs 0) Pid: 1, comm: swapper Not tainted 2.6.30-rcX #2 Call Trace: [<c05194da>] ? panic+0x38/0xd3 [<c0743102>] ? native_smp_prepare_cpus+0x259/0x31f [<c073b19d>] ? kernel_init+0x3e/0x141 [<c073b15f>] ? kernel_init+0x0/0x141 [<c020325f>] ? kernel_thread_helper+0x7/0x10 The reason is that default_get_apic_id handled extension of local APIC ID field just in case of XAPIC. Thus for this AMD CPU, default_get_apic_id() returns 0 and bigsmp_get_apic_id() returns 16 which leads to the respective kernel panic. This patch introduces a Linux specific feature flag to indicate support for extended APIC id (8 bits instead of 4 bits width) and sets the flag on AMD CPUs if applicable. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: <stable@kernel.org> LKML-Reference: <20090608135509.GA12431@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-09 15:28:46 +02:00
Yinghai Lu	eaa958402e	cpumask: alloc zeroed cpumask for static cpumask_var_ts These are defined as static cpumask_var_t so if MAXSMP is not used, they are cleared already. Avoid surprises when MAXSMP is enabled. Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-09 22:30:27 +09:30
Joerg Roedel	e9a22a13c7	amd-iommu: remove unnecessary "AMD IOMMU: " prefix That prefix is already included in the DUMP_printk macro. So there is no need to repeat it in the format string. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-09 12:01:58 +02:00
Joerg Roedel	71ff3bca2f	amd-iommu: detach device explicitly before attaching it to a new domain This fixes a bug with a device that could not be assigned to a KVM guest because it is still assigned to a dma_ops protection domain. [chrisw: simply remove WARN_ON(), will always fire since dev->driver will be pci-sub] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-09 11:14:14 +02:00
Joerg Roedel	29150078d7	amd-iommu: remove BUS_NOTIFY_BOUND_DRIVER handling Handling this event causes device assignment in KVM to fail because the device gets re-attached as soon as the pci-stub registers as the driver for the device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-09 10:54:18 +02:00
Joerg Roedel	d2dd01de99	Merge commit 'tip/core/iommu' into amd-iommu/fixes	2009-06-09 10:50:57 +02:00
Thomas Gleixner	820a644211	perf_counter, x86: Clean up hw_cache_event ids copies Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 23:10:42 +02:00
Thomas Gleixner	f86748e91a	perf_counter, x86: Implement generalized cache event types, add AMD support Fill in amd_hw_cache_event_id[] with the AMD CPU specific events, for family 0x0f, 0x10 and 0x11. There's apparently no distinction between load and store events, so we only fill in the load events. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 23:10:37 +02:00
Andreas Herrmann	c9690998ef	x86: memtest: remove 64-bit division Using gcc 3.3.5 a "make allmodconfig" + "CONFIG_KVM=n" triggers a build error: arch/x86/mm/built-in.o(.init.text+0x43f7): In function `__change_page_attr': arch/x86/mm/pageattr.c:114: undefined reference to `__udivdi3' make: *** [.tmp_vmlinux1] Error 1 The culprit turned out to be a division in arch/x86/mm/memtest.c For more info see this thread: http://marc.info/?l=linux-kernel&m=124416232620683 The patch entirely removes the division that caused the build error. [ Impact: build fix with certain GCC versions ] Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: xiyou.wangcong@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <20090608170939.GB12431@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 19:18:25 +02:00
Jack Steiner	c4ed3f04ba	x86, UV: Fix macros for multiple coherency domains Fix bug in the SGI UV macros that support systems with multiple coherency domains. The macros used for referencing global MMR (chipset registers) are failing to correctly "or" the NASID (node identifier) bits that reside above M+N. These high bits are supplied automatically by the chipset for memory accesses coming from the processor socket. However, the bits must be present for references to the special global MMR space used to map chipset registers. (See uv_hub.h for more details ...) The bug results in references to invalid/incorrect nodes. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: <stable@kernel.org> LKML-Reference: <20090608154405.GA16395@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 18:57:47 +02:00
Ingo Molnar	1123e3ad73	perf_counter: Clean up x86 boot messages Standardize and tidy up all the messages we print during perfcounter initialization. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 12:29:30 +02:00
Thomas Gleixner	ad68922061	perf_counter, x86: Implement generalized cache event types, add Atom support Fill in core2_hw_cache_event_id[] with the Atom model specific events. The events can be used in all the tools via the -e (--event) parameter, for example "-e l1-misses" or -"-e l2-accesses" or "-e l2-write-misses". ( Note: these are straight from the Intel manuals - not tested yet.) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 11:18:27 +02:00
Thomas Gleixner	0312af8416	perf_counter, x86: Implement generalized cache event types, add Core2 support Fill in core2_hw_cache_event_id[] with the Core2 model specific events. The events can be used in all the tools via the -e (--event) parameter, for example "-e l1-misses" or -"-e l2-accesses" or "-e l2-write-misses". Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 11:18:26 +02:00
Figo.zhang	aeef50bc04	x86, microcode: Simplify vfree() use vfree() does its own 'NULL' check, so no need for check before calling it. In v2, remove the stray newline. [ Impact: cleanup ] Signed-off-by: Figo.zhang <figo1802@gmail.com> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com> LKML-Reference: <1244385036.3402.11.camel@myhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:35:11 +02:00
Lubomir Rintel	3aa6b186f8	x86: Fix non-lazy GS handling in sys_vm86() This fixes a stack corruption panic or null dereference oops due to a bad GS in resume_userspace() when returning from sys_vm86() and calling lockdep_sys_exit(). Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR enabled. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1244384628.2323.4.camel@bimbo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:31:23 +02:00
Cyrill Gorcunov	a4046f8d29	x86, nmi: Use predefined numbers instead of hardcoded one [ Impact: cleanup ] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090607081937.GC4547@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:22:02 +02:00
Cyrill Gorcunov	103428e57b	x86, apic: Fix dummy apic read operation together with broken MP handling Ingo Molnar reported that read_apic is buggy novadays: [ 0.000000] Using APIC driver default [ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs [ 0.000000] Local APIC disabled by BIOS -- you can enable it with "lapic" [ 0.000000] APIC: disable apic facility [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at arch/x86/kernel/apic/apic.c:254 native_apic_read_dummy+0x2d/0x3b() [ 0.000000] Hardware name: HP OmniBook PC Indeed we still rely on apic->read operation for SMP compiled kernel. And instead of disfigure the SMP code with #ifdef we allow to call apic->read. To capture any unexpected results we check for apic->read being called for sane reason via WARN_ON_ONCE but(!) instead of OR we should use AND logical operation (thanks Yinghai for spotting the root of the problem). Along with that we could be have bad MP table and we are to fix it that way no SMP started and no complains about BIOS bug if apic was just disabled via command line. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090607124840.GD4547@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:08:05 +02:00
Jean Delvare	4a4aca641b	x86: Add quirk for reboot stalls on a Dell Optiplex 360 The Dell Optiplex 360 hangs on reboot, just like the Optiplex 330, so the same quirk is needed. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Steve Conklin <steve.conklin@canonical.com> Cc: Leann Ogasawara <leann.ogasawara@canonical.com> Cc: <stable@kernel.org> LKML-Reference: <200906051202.38311.jdelvare@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 15:51:20 +02:00
Jaswinder Singh Rajput	5095f59bda	x86: cpu_debug: Remove model information to reduce encoding-decoding Remove model information, encoding/decoding and reduce bookkeeping. This, besides removing a lot of code and cleaning up the code, also enables these features on many more CPUs that were enumerated before. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> LKML-Reference: <1244224637.8212.6.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 12:22:56 +02:00
Ingo Molnar	5f4457a4f6	Merge branch 'linus' into x86/cpu	2009-06-07 12:22:15 +02:00
Ingo Molnar	56fdd18c7b	Merge branch 'linus' into core/iommu Merge reason: This branch was on an -rc5 base so pull almost-2.6.30 to resync with the latest upstream fixes and make sure the combination works fine. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 11:35:05 +02:00
Linus Torvalds	ccc0d38ec1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86/pci: fix mmconfig detection with 32bit near 4g PCI: use fixed-up device class when configuring device	2009-06-06 14:33:54 -07:00
Ingo Molnar	75b5032212	Merge branch 'linus' into perfcounters/core Merge reason: Pick up the latest fixes before the -v8 perfcounters release. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 20:21:28 +02:00
Ingo Molnar	8326f44da0	perf_counter: Implement generalized cache event types Extend generic event enumeration with the PERF_TYPE_HW_CACHE method. This is a 3-dimensional space: { L1-D, L1-I, L2, ITLB, DTLB, BPU } x { load, store, prefetch } x { accesses, misses } User-space passes in the 3 coordinates and the kernel provides a counter. (if the hardware supports that type and if the combination makes sense.) Combinations that make no sense produce a -EINVAL. Combinations that are not supported by the hardware produce -ENOTSUP. Extend the tools to deal with this, and rewrite the event symbol parsing code with various popular aliases for the units and access methods above. So 'l1-cache-miss' and 'l1d-read-ops' are both valid aliases. ( x86 is supported for now, with the Nehalem event table filled in, and with Core2 and Atom having placeholder tables. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 13:14:47 +02:00
Ingo Molnar	a21ca2cac5	perf_counter: Separate out attr->type from attr->config Counter type is a frequently used value and we do a lot of bit juggling by encoding and decoding it from attr->config. Clean this up by creating a separate attr->type field. Also clean up the various similarly complex user-space bits all around counter attribute management. The net improvement is significant, and it will be easier to add a new major type (which is what triggered this cleanup). (This changes the ABI, all tools are adapted.) (PowerPC build-tested.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 11:37:22 +02:00
Mark Langsdorf	fe2245c905	x86: enable GART-IOMMU only after setting up protection methods The current code to set up the GART as an IOMMU enables GART translations before it removes the aperture from the kernel memory map, sets the GART PTEs to UC, sets up the guard and scratch pages, or does a wbinvd(). This leaves the possibility of cache aliasing open and can cause system crashes. Re-order the code so as to enable the GART translations only after all safeguards are in place and the tlb has been flushed. AMD has tested this patch on both Istanbul systems and 1st generation Opteron systems with APG enabled and seen no adverse effects. Istanbul systems with HT Assist enabled sometimes see MCE errors due to cache artifacts with the unmodified code. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Cc: <stable@kernel.org> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: akpm@linux-foundation.org Cc: jbarnes@virtuousgeek.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 09:42:09 +02:00
Dave Jones	2c701b1028	[CPUFREQ] powernow-k8: check space_id of _PCT registers to be FFH The powernow-k8 driver checks to see that the Performance Control/Status Registers are declared as FFH (functional fixed hardware) by the BIOS. However, this check got broken in the commit: `0e64a0c982` [CPUFREQ] checkpatch cleanups for powernow-k8 Fix based on an original patch from Naga Chumbalkar. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-05 13:25:25 -04:00
Peter Zijlstra	f7b6eb3fa0	x86: Set context.vdso before installing the mapping In order to make arch_vma_name() work from inside install_special_mapping() we need to set the context.vdso before calling it. ( This is needed for performance counters to be able to track this special executable area. ) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-05 14:46:40 +02:00
Rusty Russell	2cb7878a3a	lguest: fix 'unhandled trap 13' with CONFIG_CC_STACKPROTECTOR We don't set up the canary; let's disable stack protector on boot.c so we can get into lguest_init, then set it up. As a side effect, switch_to_new_gdt() sets up %fs for us properly too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-04 11:50:06 -07:00
Yinghai Lu	75e613cdc7	x86/pci: fix mmconfig detection with 32bit near 4g Pascal reported and bisected a commit: \| x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case which broke one system system. ACPI: Using IOAPIC for interrupt routing PCI: MCFG configuration 0: base f0000000 segment 0 buses 0 - 255 PCI: MCFG area at f0000000 reserved in ACPI motherboard resources PCI: Using MMCONFIG for extended config space it didn't have PCI: updated MCFG configuration 0: base f0000000 segment 0 buses 0 - 63 anymore, and try to use 0xf000000 - 0xffffffff for mmconfig For 32bit, mcfg_res->end could be 32bit only (if 64 resources aren't used) So use end - 1 to pass the value in mcfg->end to avoid overflow. We don't need to worry about the e820 path, they are always 64 bit. Reported-by: Pascal Terjan <pterjan@mandriva.com> Bisected-by: Pascal Terjan <pterjan@mandriva.com> Tested-by: Pascal Terjan <pterjan@mandriva.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: stable@kernel.org Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-06-04 11:31:13 +01:00
Ingo Molnar	128f048f0f	perf_counter: Fix throttling lock-up Throttling logic is broken and we can lock up with too small hw sampling intervals. Make the throttling code more robust: disable counters even if we already disabled them. ( Also clean up whitespace damage i noticed while reading various pieces of code related to throttling. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-03 23:39:51 +02:00
Cliff Wickman	0e2595cdfd	x86: Fix UV BAU activation descriptor init The UV tlb shootdown code has a serious initialization error. An array of structures [328] is initialized as if it were [32]. The array is indexed by (cpu number on the blade)8, so the short initialization works for up to 4 cpus on a blade. But above that, we provide an invalid opcode to the hub's broadcast assist unit. This patch changes the allocation of the array to use its symbolic dimensions for better clarity. And initializes all 32*8 entries. Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's recommendation. Tested on the UV simulator. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: <stable@kernel.org> LKML-Reference: <E1M6lZR-0007kV-Aq@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-03 13:07:31 +02:00
Jiri Slaby	367d04c4ec	amd_iommu: fix lock imbalance In alloc_coherent there is an omitted unlock on the path where mapping fails. Add the unlock. [ Impact: fix lock imbalance in alloc_coherent ] Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-03 10:34:55 +02:00
Yong Wang	a32881066e	perf_counter/x86: Remove the IRQ (non-NMI) handling bits Remove the IRQ (non-NMI) handling bits as NMI will be used always. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090603051255.GA2791@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-03 09:53:34 +02:00
Mike Galbraith	6799687a53	x86, boot: add new generated files to the appropriate .gitignore files git status complains of untracked (generated) files in arch/x86/boot.. # Untracked files: # (use "git add <file>..." to include in what will be committed) # # ../../arch/x86/boot/compressed/mkpiggy # ../../arch/x86/boot/compressed/piggy.S # ../../arch/x86/boot/compressed/vmlinux.lds # ../../arch/x86/boot/voffset.h # ../../arch/x86/boot/zoffset.h ..so adjust .gitignore files accordingly. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-02 21:13:30 -07:00
Peter Zijlstra	0d48696f87	perf_counter: Rename perf_counter_hw_event => perf_counter_attr The structure isn't hw only and when I read event, I think about those things that fall out the other end. Rename the thing. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> Cc: Stephane Eranian <eranian@googlemail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:33 +02:00
Peter Zijlstra	e4abb5d4f7	perf_counter: x86: Emulate longer sample periods Do as Power already does, emulate sample periods up to 2^63-1 by composing them of smaller values limited by hardware capabilities. Only once we wrap the software period do we generate an overflow event. Just 10 lines of new code. Reported-by: Stephane Eranian <eranian@googlemail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:31 +02:00
Peter Zijlstra	8a016db386	perf_counter: Remove the last nmi/irq bits IRQ (non-NMI) sampling is not used anymore - remove the last few bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:31 +02:00
Peter Zijlstra	b23f3325ed	perf_counter: Rename various fields A few renames: s/irq_period/sample_period/ s/irq_freq/sample_freq/ s/PERF_RECORD_/PERF_SAMPLE_/ s/record_type/sample_type/ And change both the new sample_type and read_format to u64. Reported-by: Stephane Eranian <eranian@googlemail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:30 +02:00
Jiri Slaby	3d58829b05	x86, apic: Restore irqs on fail paths lapic_resume forgets to restore interrupts on fail paths. Fix that. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <1243497289-18591-1-git-send-email-jirislaby@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com>	2009-06-02 02:48:59 +02:00
Naga Chumbalkar	58f892e022	x86: Print real IOAPIC version for x86-64 Fix the fact that the IOAPIC version number in the x86_64 code path always gets assigned to 0, instead of the correct value. Before the patch: (from "dmesg" output): ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 0, address 0xfec00000, GSI 0-23 <--- After the patch: ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 <--- History: io_apic_get_version() was compiled out of the x86_64 code path in the commit `f2c2cca3ac`: Author: Andi Kleen <ak@suse.de> Date: Tue Sep 26 10:52:37 2006 +0200 [PATCH] Remove APIC version/cpu capability mpparse checking/printing ACPI went to great trouble to get the APIC version and CPU capabilities of different CPUs before passing them to the mpparser. But all that data was used was to print it out. Actually it even faked some data based on the boot cpu, not on the actual CPU being booted. Remove all this code because it's not needed. Cc: len.brown@intel.com At the time, the IOAPIC version number was deliberately not printed in the x86_64 code path. However, after the x86 and x86_64 files were merged, the net result is that the IOAPIC version is printed incorrectly in the x86_64 code path. The patch below provides a fix. I have tested it with acpi, and with acpi=off, and did not see any problems. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <20090416014230.4885.94926.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu> *************************	2009-06-02 02:03:18 +02:00
Ingo Molnar	ee4c24a5c9	Merge branch 'x86/cpufeature' into irq/numa Merge reason: irq/numa didnt build because this commit: `2759c32`: x86: don't call read_apic_id if !cpu_has_apic Had a dependency on x86/cpufeature changes. Pull in that (small) branch to fix the dependency. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 22:30:01 +02:00
Ingo Molnar	3d58f48ba0	Merge branch 'linus' into irq/numa Conflicts: arch/mips/sibyte/bcm1480/irq.c arch/mips/sibyte/sb1250/irq.c Merge reason: we gathered a few conflicts plus update to latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 21:06:21 +02:00
Ingo Molnar	23db9f430b	Merge branch 'linus' into perfcounters/core Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6 based - to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 10:01:39 +02:00
Joe Perches	61c8c67e3a	acpi-cpufreq: fix printk typo and indentation Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>	2009-05-29 21:26:26 -04:00
Mel Gorman	32b154c0b0	x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302 On x86 and x86-64, it is possible that page tables are shared beween shared mappings backed by hugetlbfs. As part of this, page_table_shareable() checks a pair of vma->vm_flags and they must match if they are to be shared. All VMA flags are taken into account, including VM_LOCKED. The problem is that VM_LOCKED is cleared on fork(). When a process with a shared memory segment forks() to exec() a helper, there will be shared VMAs with different flags. The impact is that the shared segment is sometimes considered shareable and other times not, depending on what process is checking. What happens is that the segment page tables are being shared but the count is inaccurate depending on the ordering of events. As the page tables are freed with put_page(), bad pmd's are found when some of the children exit. The hugepage counters also get corrupted and the Total and Free count will no longer match even when all the hugepage-backed regions are freed. This requires a reboot of the machine to "fix". This patch addresses the problem by comparing all flags except VM_LOCKED when deciding if pagetables should be shared or not for hugetlbfs-backed mapping. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <starlight@binnacle.cx> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-05-29 08:40:03 -07:00
Yong Wang	c323d95fa4	perf_counter/x86: Always use NMI for performance-monitoring interrupt Always use NMI for performance-monitoring interrupt as there could be racy situations if we switch between irq and nmi mode frequently. Signed-off-by: Yong Wang <yong.y.wang@intel.com> LKML-Reference: <20090529052835.GA13657@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-29 09:04:58 +02:00
Joerg Roedel	83cce2b69e	Merge branches 'amd-iommu/fixes', 'amd-iommu/debug', 'amd-iommu/suspend-resume' and 'amd-iommu/extended-allocator' into amd-iommu/2.6.31 Conflicts: arch/x86/kernel/amd_iommu.c arch/x86/kernel/amd_iommu_init.c	2009-05-28 18:23:56 +02:00
Joerg Roedel	47bccd6bb2	amd-iommu: don't free dma adresses below 512MB with CONFIG_IOMMU_STRESS This will test the automatic aperture enlargement code. This is important because only very few devices will ever trigger this code path. So force it under CONFIG_IOMMU_STRESS. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:18:33 +02:00
Joerg Roedel	f5e9705c64	amd-iommu: don't preallocate page tables with CONFIG_IOMMU_STRESS This forces testing of on-demand page table allocation code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:18:08 +02:00
Joerg Roedel	fe16f088a8	amd-iommu: disable round-robin allocator for CONFIG_IOMMU_STRESS Disabling the round-robin allocator results in reusing the same dma-addresses again very fast. This is a good test if the iotlb flushing is working correctly. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:17:13 +02:00
Joerg Roedel	d9cfed9254	amd-iommu: remove amd_iommu_size kernel parameter This parameter is not longer necessary when aperture increases dynamically. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:16:49 +02:00
Joerg Roedel	11b83888ae	amd-iommu: enlarge the aperture dynamically By dynamically increasing the aperture the extended allocator is now ready for use. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:15:57 +02:00
Joerg Roedel	00cd122ae5	amd-iommu: handle exlusion ranges and unity mappings in alloc_new_range This patch makes sure no reserved addresses are allocated in an dma_ops domain when the aperture is increased dynamically. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:15:19 +02:00
Joerg Roedel	9cabe89b99	amd-iommu: move aperture_range allocation code to seperate function This patch prepares the dynamic increasement of dma_ops domain apertures. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:14:35 +02:00
Joerg Roedel	803b8cb4d9	amd-iommu: change dma_dom->next_bit to dma_dom->next_address Simplify the code a little bit by using the same unit for all address space related state in the dma_ops domain structure. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:14:26 +02:00
Joerg Roedel	384de72910	amd-iommu: make address allocator aware of multiple aperture ranges This patch changes the AMD IOMMU address allocator to allow up to 32 aperture ranges per dma_ops domain. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:14:15 +02:00
Joerg Roedel	53812c115c	amd-iommu: handle page table allocation failures in dma_ops code The code will be required when the aperture size increases dynamically in the extended address allocator. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:13:43 +02:00
Joerg Roedel	8bda3092bc	amd-iommu: move page table allocation code to seperate function This patch makes page table allocation usable for dma_ops code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:13:20 +02:00
Joerg Roedel	c3239567a2	amd-iommu: introduce aperture_range structure This is a preperation for extended address allocator. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:12:52 +02:00
Joerg Roedel	736501ee00	amd-iommu: implement suspend/resume This patch puts everything together and enables suspend/resume support in the AMD IOMMU driver. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:11:39 +02:00
Joerg Roedel	05f92db9f4	amd_iommu: un __init functions required for suspend/resume This patch makes sure that no function required for suspend/resume of AMD IOMMU driver is thrown away after boot. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:10:56 +02:00
Joerg Roedel	7d7a110c61	amd-iommu: add function to flush tlb for all devices This function is required for suspend/resume support with AMD IOMMU enabled. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:10:43 +02:00
Joerg Roedel	bfd1be1857	amd-iommu: add function to flush tlb for all domains This function is required for suspend/resume support with AMD IOMMU enabled. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:10:12 +02:00
Joerg Roedel	92ac4320af	amd-iommu: add function to disable all iommus This function is required for suspend/resume support with AMD IOMMU enabled. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:09:26 +02:00
Joerg Roedel	d91cecdd79	amd-iommu: remove support for msi-x Current hardware uses msi instead of msi-x so this code it not necessary and can not be tested. The best thing is to drop this code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:09:18 +02:00
Joerg Roedel	fab6afa309	amd-iommu: drop pointless iommu-loop in msi setup code It is not necessary to loop again over all IOMMUs in this code. So drop the loop. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:09:08 +02:00
Joerg Roedel	58492e1288	amd-iommu: consolidate hardware initialization to one function This patch restructures the AMD IOMMU initialization code to initialize all hardware registers with one single function call. This is helpful for suspend/resume support. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:08:58 +02:00
Joerg Roedel	3bd221724a	amd-iommu: introduce for_each_iommu* macros This patch introduces the for_each_iommu and for_each_iommu_safe macros to simplify the developers life when having to iterate over all AMD IOMMUs in the system. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:08:50 +02:00
Chris Wright	c1eee67b2d	amd iommu: properly detach from protection domain on ->remove Some drivers may use the dma api during ->remove which will cause a protection domain to get reattached to a device. Delay the detach until after the driver is completely unbound. [ joro: added a little merge helper ] [ Impact: fix too early device<->domain removal ] Signed-off-by: Chris Wright <chrisw@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:06:54 +02:00
Joerg Roedel	0bc252f430	amd-iommu: make sure only ivmd entries are parsed The bug never triggered. But it should be fixed to protect against broken ACPI tables in the future. [ Impact: protect against broken ivrs acpi table ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:06:47 +02:00
Neil Turton	7455aab1f9	amd-iommu: fix the handling of device aliases in the AMD IOMMU driver. The devid parameter to set_dev_entry_from_acpi is the requester ID rather than the device ID since it is used to index the IOMMU device table. The handling of IVHD_DEV_ALIAS used to pass the device ID. This patch fixes it to pass the requester ID. [ Impact: fix setting the wrong req-id in acpi-table parsing ] Signed-off-by: Neil Turton <nturton@solarflare.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:06:38 +02:00
Neil Turton	421f909c80	amd-iommu: fix an off-by-one error in the AMD IOMMU driver. The variable amd_iommu_last_bdf holds the maximum bdf of any device controlled by an IOMMU, so the number of device entries needed is amd_iommu_last_bdf+1. The function tbl_size used amd_iommu_last_bdf instead. This would be a problem if the last device were a large enough power of 2. [ Impact: fix amd_iommu_last_bdf off-by-one error ] Signed-off-by: Neil Turton <nturton@solarflare.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:06:27 +02:00
Joerg Roedel	2e8b569614	amd-iommu: disable device isolation with CONFIG_IOMMU_STRESS With device isolation disabled we can test better for race conditions in dma_ops related code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:56:57 +02:00
Joerg Roedel	2be69c79e9	x86/iommu: add IOMMU_STRESS Kconfig entry This Kconfig option is intended to enable various code paths or parameters in IOMMU implementations to stress test the code and/or the hardware. This can also be done by disabling optimizations in the code when this option is switched on. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	2009-05-28 17:55:33 +02:00
Joerg Roedel	b3b99ef8b4	amd-iommu: move protection domain printk to dump code This information is only helpful for debugging. Don't print it anymore unless explicitly requested. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:55:08 +02:00
Joerg Roedel	02acc43a29	amd-iommu: print ivmd information to dmesg when requested Add information about device memory mapping requirements for the IOMMU as described in the IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the kernel command line. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:53:30 +02:00
Joerg Roedel	42a698f40a	amd-iommu: print ivhd information to dmesg when requested Add information about devices belonging to an IOMMU as described in the IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the kernel command line. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:52:04 +02:00
Joerg Roedel	9c72041f71	amd-iommu: add dump for iommus described in ivrs table Add information about IOMMU devices described in the IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the kernel command line. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:50:56 +02:00
Joerg Roedel	fefda117dd	amd-iommu: add amd_iommu_dump parameter This kernel parameter will be useful to get some AMD IOMMU related information in dmesg that is not necessary for the default user but may be helpful in debug situations. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:49:56 +02:00
Petr Tesarik	7d96fd41ca	x86: move rdtsc_barrier() into the TSC vread method The *fence instructions were moved to vsyscall_64.c by commit `cb9e35dce9`. But this breaks the vDSO, because vread methods are also called from there. Besides, the synchronization might be unnecessary for other time sources than TSC. [ Impact: fix potential time warp in VDSO ] Signed-off-by: Petr Tesarik <ptesarik@suse.cz> LKML-Reference: <9d0ea9ea0f866bdc1f4d76831221ae117f11ea67.1243241859.git.ptesarik@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org>	2009-05-28 14:15:54 +02:00
Yinghai Lu	abfe0af981	x86: enable_update_mptable should be a macro instead of declaring one variant as an inline function... because other case is a variable Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4A13B344.7030307@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-28 01:59:05 +02:00
Linus Torvalds	cd86a536c8	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: avoid back to back on_each_cpu in cpa_flush_array x86, relocs: ignore R_386_NONE in kernel relocation entries	2009-05-26 15:06:12 -07:00
Pallipadi, Venkatesh	2171787be2	x86: avoid back to back on_each_cpu in cpa_flush_array Cleanup cpa_flush_array() to avoid back to back on_each_cpu() calls. [ Impact: optimizes fix `0af48f42df` ] Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-26 13:12:12 -07:00
Andreas Herrmann	ca446d0635	[CPUFREQ] powernow-k8: determine exact CPU frequency for HW Pstates Slightly modified by trenn@suse.de -> only do this on fam 10h and fam 11h. Currently powernow-k8 determines CPU frequency from ACPI PSS objects, but according to AMD family 11h BKDG this frequency is just a rounded value: "CoreFreq (MHz) = The CPU COF specified by MSRC001_00[6B:64][CpuFid] rounded to the nearest 100 Mhz." As a consequnce powernow-k8 reports wrong CPU frequency on some systems, e.g. on Turion X2 Ultra: powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82 processors (2 cpu cores) (version 2.20.00) powernow-k8: 0 : pstate 0 (2200 MHz) powernow-k8: 1 : pstate 1 (1100 MHz) powernow-k8: 2 : pstate 2 (600 MHz) But this is wrong as frequency for Pstate2 is 550 MHz. x86info reports it correctly: #x86info -a \|grep Pstate ... Pstate-0: fid=e, did=0, vid=24 (2200MHz) Pstate-1: fid=e, did=1, vid=30 (1100MHz) Pstate-2: fid=e, did=2, vid=3c (550MHz) (current) Solution is to determine the frequency directly from Pstate MSRs instead of using rounded values from ACPI table. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>	2009-05-26 12:04:51 -04:00
Thomas Renninger	df1829770d	[CPUFREQ] powernow-k8 cleanup msg if BIOS does not export ACPI _PSS cpufreq data - Make the message shorter and easier to grep for - Use printk_once instead of WARN_ONCE (functionality of these was mixed) Signed-off-by: Thomas Renninger <trenn@suse.de> Cc: Langsdorf, Mark <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-05-26 12:04:51 -04:00
Dave Jones	d38e73e8da	[CPUFREQ] powernow-k7 build fix when ACPI=n arch/x86/kernel/cpu/cpufreq/powernow-k7.c:172: warning: 'invalidate_entry' defined but not used Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Dave Jones <davej@redhat.com>	2009-05-26 12:04:50 -04:00
Jarod Wilson	4319503779	[CPUFREQ] add atom family to p4-clockmod Some atom procs don't do freq scaling (such as the atom 330 on my own littlefalls2 board). By adding the atom family here, we at least get the benefit of passive cooling in a thermal emergency. Not sure how to see that its actually helping any, but the driver does bind and claim its functioning on my atom 330. Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-05-26 12:04:50 -04:00
Ingo Molnar	aaba98018b	perf_counter, x86: Make NMI lockups more robust We have a debug check that detects stuck NMIs and returns with the PMU disabled in the global ctrl MSR - but i managed to trigger a situation where this was not enough to deassert the NMI. So clear/reset the full PMU and keep the disable count balanced when exiting from here. This way the box produces a debug warning but stays up and is more debuggable. [ Impact: in case of PMU related bugs, recover more gracefully ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-26 09:52:03 +02:00
Ingo Molnar	79202ba9ff	perf_counter, x86: Fix APIC NMI programming My Nehalem box locks up in certain situations (with an always-asserted NMI causing a lockup) if the PMU LVT entry is programmed between NMI and IRQ mode with a high frequency. Standardize exlusively on NMIs instead. [ Impact: fix lockup ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-26 09:49:28 +02:00
Tejun Heo	46176b4f6b	x86, relocs: ignore R_386_NONE in kernel relocation entries For relocatable 32bit kernels, boot/compressed/relocs.c processes relocation entries in the kernel image and appends it to the kernel image such that boot/compressed/head_32.S can relocate the kernel. The kernel image is one statically linked object and only uses two relocation types - R_386_PC32 and R_386_32, of the two only the latter needs massaging during kernel relocation and thus handled by relocs. R_386_PC32 is ignored and all other relocation types are considered error. When the target of a relocation resides in a discarded section, binutils doesn't throw away the relocation record but nullifies it by changing it to R_386_NONE, which unfortunately makes relocs fail. The problem was triggered by yet out-of-tree x86 stack unwind patches but given the binutils behavior, ignoring R_386_NONE is the right thing to do. The problem has been tracked down to binutils behavior by Jan Beulich. [ Impact: fix build with certain binutils by ignoring R_386_NONE ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <4A1B8150.40702@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-25 22:52:49 -07:00
Linus Torvalds	b18f1e2199	Merge branch 'kvm-updates/2.6.30' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.30' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix PDPTR reloading on CR4 writes KVM: Make paravirt tlb flush also reload the PAE PDPTRs	2009-05-25 15:51:27 -07:00
Ingo Molnar	53b441a565	Revert "perf_counter, x86: speed up the scheduling fast-path" This reverts commit `b68f1d2e7a`. It is causing problems (stuck/stuttering profiling) - when mixed NMI and non-NMI counters are used. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525153931.703093461@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 21:41:28 +02:00
Peter Zijlstra	a78ac32587	perf_counter: Generic per counter interrupt throttle Introduce a generic per counter interrupt throttle. This uses the perf_counter_overflow() quick disable to throttle a specific counter when its going too fast when a pmu->unthrottle() method is provided which can undo the quick disable. Power needs to implement both the quick disable and the unthrottle method. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525153931.703093461@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 21:41:12 +02:00
Peter Zijlstra	48e22d56ec	perf_counter: x86: Remove interrupt throttle remove the x86 specific interrupt throttle Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525153931.616671838@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 21:41:12 +02:00
Peter Zijlstra	ff99be573e	perf_counter: x86: Expose INV and EDGE bits Expose the INV and EDGE bits of the PMU to raw configs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525153931.494709027@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 21:41:11 +02:00
Avi Kivity	a2edf57f51	KVM: Fix PDPTR reloading on CR4 writes The processor is documented to reload the PDPTRs while in PAE mode if any of the CR4 bits PSE, PGE, or PAE change. Linux relies on this behaviour when zapping the low mappings of PAE kernels during boot. The code already handled changes to CR4.PAE; augment it to also notice changes to PSE and PGE. This triggered while booting an F11 PAE kernel; the futex initialization code runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem ended up uninitialized, killing PI futexes and pulseaudio which uses them. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2009-05-25 20:00:53 +03:00
Avi Kivity	a8cd0244e9	KVM: Make paravirt tlb flush also reload the PAE PDPTRs The paravirt tlb flush may be used not only to flush TLBs, but also to reload the four page-directory-pointer-table entries, as it is used as a replacement for reloading CR3. Change the code to do the entire CR3 reloading dance instead of simply flushing the TLB. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2009-05-25 20:00:50 +03:00
Tejun Heo	71c9d8b68b	x86: Remove remap percpu allocator for the time being Remap percpu allocator has subtle bug when combined with page attribute changing. Remap percpu allocator aliases PMD pages for the first chunk and as pageattr doesn't know about the alias it ends up updating page attributes of the original mapping thus leaving the alises in inconsistent state which might lead to subtle data corruption. Please read the following threads for more information: http://thread.gmane.org/gmane.linux.kernel/835783 The following is the proposed fix which teaches pageattr about percpu aliases. http://thread.gmane.org/gmane.linux.kernel/837157 However, the above changes are deemed too pervasive for upstream inclusion for 2.6.30 release, so this patch essentially disables the remap allocator for the time being. Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <4A1A0A27.4050301@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 05:37:55 +02:00
H. Peter Anvin	ee0736627d	Merge branch 'x86/urgent' into x86/setup Resolved conflicts: arch/x86/boot/memory.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-23 16:42:19 -07:00
venkatesh.pallipadi@intel.com	0af48f42df	x86: cpa_flush_array wbinvd should be done on all CPUs cpa_flush_array seems to prefer wbinvd() over clflush at 4M threshold. clflush needs to be done on only one CPU as per instruction definition. wbinvd() however, should be done on all CPUs. [ Impact: fix missing flush which could cause data corruption ] Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-22 13:33:59 -07:00
venkatesh.pallipadi@intel.com	0b827537e3	x86: bugfix wbinvd() model check instead of family check wbinvd is supported on all CPUs 486 or later. But, pageattr.c is checking x86_model >= 4 before wbinvd(), which looks like an oversight bug. It was first introduced at one place by changeset `d7c8f21a8c` and got copied over to second place in the same file later. [ Impact: fix missing cache flush on early-model CPUs, potential data corruption ] Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-22 13:33:27 -07:00
Suresh Siddha	0c752a9335	x86: introduce noxsave boot parameter Introduce "noxsave" boot parameter which will disable the cpu's xsave/xrstor capabilities. Useful for debugging and working around xsave related issues. [ Impact: make it possible to debug problems in the field ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-22 13:10:54 -07:00
H. Peter Anvin	bca23dba76	x86, setup: revert ACPI 3 E820 extended attributes support Remove ACPI 3 E820 extended memory attributes support. At least one vendor actively set all the flags to zero, but left ECX on return at 24. This bug may be present in other BIOSes. The breakage functionally means the ACPI 3 flags are probably completely useless, and that no OS any time soon is going to rely on their existence. Therefore, drop support completely. We may want to revisit this question in the future, if we find ourselves actually needing the flags. This reverts all or part of the following checkins: `cd670599b7` `c549e71d07` However, retain the part from the latter commit that copies e820 into a temporary buffer; that is an unrelated BIOS workaround. Put in a comment to explain that part. See https://bugzilla.redhat.com/show_bug.cgi?id=499396 for some additional information. [ Impact: detect all memory on affected machines ] Reported-by: Thomas J. Baker <tjb@unh.edu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Len Brown <len.brown@intel.com> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: Kyle McMartin <kmcmartin@redhat.com> Cc: Matt Domsch <matt_domsch@dell.com>	2009-05-22 11:14:02 -07:00

1 2 3 4 5 ...

7788 Commits