kernel_optimize_test

Author	SHA1	Message	Date
Peter Zijlstra	7c423e9885	sched: x86: Name old_perf in a unique way Silly percpu bits don't respect static.. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission>	2009-09-16 11:21:07 +02:00
Thomas Gleixner	54e2603f1a	x86: platform: Fix section annotations init_IRQ() and x86_late_time_init() are missing __init annotations. The x86 platform ops variables are annotated, but the annotation needs to be put between the variable name and the "=" of the initializer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-09-16 10:21:10 +02:00
Feng Tang	3834f47291	SFI: remove unneeded includes Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2009-09-15 15:08:40 -04:00
Thomas Gleixner	8079ce34f2	sfi: Remove unused code Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2009-09-15 15:04:57 -04:00
Linus Torvalds	ada3fa1505	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c	2009-09-15 09:39:44 -07:00
Linus Torvalds	227423904c	Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: Fix cacheflush address in change_page_attr_set_clr() mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set x86: Fix earlyprintk=dbgp for machines without NX x86, pat: Sanity check remap_pfn_range for RAM region x86, pat: Lookup the protection from memtype list on vm_insert_pfn() x86, pat: Add lookup_memtype to get the current memtype of a paddr x86, pat: Use page flags to track memtypes of RAM pages x86, pat: Generalize the use of page flag PG_uncached x86, pat: Add rbtree to do quick lookup in memtype tracking x86, pat: Add PAT reserve free to io_mapping* APIs x86, pat: New i/f for driver to request memtype for IO regions x86, pat: ioremap to follow same PAT restrictions as other PAT users x86, pat: Keep identity maps consistent with mmaps even when pat_disabled x86, mtrr: make mtrr_aps_delayed_init static bool x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled x86: Fix an incorrect argument of reserve_bootmem() x86: Fix system crash when loading with "reservetop" parameter	2009-09-15 09:19:38 -07:00
Linus Torvalds	1aaf2e5913	Merge branch 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, intel_txt: clean up the impact on generic code, unbreak non-x86 x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE x86, intel_txt: Fix typos in Kconfig help x86, intel_txt: Factor out the code for S3 setup x86, intel_txt: tboot.c needs <asm/fixmap.h> intel_txt: Force IOMMU on for Intel TXT launch x86, intel_txt: Intel TXT Sx shutdown support x86, intel_txt: Intel TXT reboot/halt shutdown support x86, intel_txt: Intel TXT boot support	2009-09-15 09:19:20 -07:00
Linus Torvalds	66a4fe0cb8	Merge branch 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6 * 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6: agp/intel: remove restore in resume agp: fix uninorth build intel-agp: Set dma mask for i915 agp: kill phys_to_gart() and gart_to_phys() intel-agp: fix sglist allocation to avoid vmalloc() intel-agp: Move repeated sglist free into separate function agp: Switch agp_{un,}map_page() to take struct page * argument agp: tidy up handling of scratch pages w.r.t. DMA API intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU agp: Add generic support for graphics dma remapping agp: Switch mask_memory() method to take address argument again, not page	2009-09-15 09:18:07 -07:00
Peter Zijlstra	47fe38fcff	x86: sched: Provide arch implementations using aperf/mperf APERF/MPERF support for cpu_power. APERF/MPERF is arch defined to be a relative scale of work capacity per logical cpu, this is assumed to include SMT and Turbo mode. APERF/MPERF are specified to both reset to 0 when either counter wraps, which is highly inconvenient, since that'll give a blimp when that happens. The manual specifies writing 0 to the counters after each read, but that's 1) too expensive, and 2) destroys the possibility of sharing these counters with other users, so we live with the blimp - the other existing user does too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:27 +02:00
Peter Zijlstra	5cbc19a983	x86: Add generic aperf/mperf code Move some of the aperf/mperf code out from the cpufreq driver thingy so that other people can enjoy it too. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: cpufreq@vger.kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:26 +02:00
Peter Zijlstra	a8303aaf2b	x86: Move APERF/MPERF into a X86_FEATURE Move the APERFMPERF capacility into a X86_FEATURE flag so that it can be used outside of the acpi cpufreq driver. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: cpufreq@vger.kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:25 +02:00
Peter Zijlstra	b8a543ea5a	sched: Reduce forkexec_idx If we're looking to place a new task, we might as well find the idlest position _now_, not 1 tick ago. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:23 +02:00
Mike Galbraith	0ec9fab3d1	sched: Improve latencies and throughput Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:16 +02:00
Peter Zijlstra	78e7ed53c9	sched: Tweak wake_idx When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:07 +02:00
Peter Zijlstra	c88d591089	sched: Merge select_task_rq_fair() and sched_balance_self() The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:05 +02:00
Linus Torvalds	f86054c245	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (23 commits) at_hdmac: Rework suspend_late()/resume_early() PM: Reset transition_started at dpm_resume_noirq PM: Update kerneldoc comments in drivers/base/power/main.c PM: Add convenience macro to make switching to dev_pm_ops less error-prone hp-wmi: Switch driver to dev_pm_ops floppy: Switch driver to dev_pm_ops PM: Trivial fixes PM / Hibernate / Memory hotplug: Always use for_each_populated_zone() PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2) PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2) PM/Hibernate: Rework shrinking of memory PM: Fix typo in label name s/Platofrm_finish/Platform_finish/ PM: Run-time PM platform device bus support PM: Introduce core framework for run-time PM of I/O devices (rev. 17) Driver Core: Make PM operations a const pointer PM: Remove platform device suspend_late()/resume_early() V2 USB: Rework musb suspend()/resume_early() I2C: Rework i2c-s3c2410 suspend_late()/resume() V2 I2C: Rework i2c-pxa suspend_late()/resume_early() DMA: Rework txx9dmac suspend_late()/resume_early() ... Fix trivial conflict in drivers/base/platform.c (due to same constification patch being merged in both sides, along with some other PM work in the PM branch)	2009-09-14 20:03:54 -07:00
Linus Torvalds	69def9f05d	Merge branch 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (202 commits) MAINTAINERS: update KVM entry KVM: correct error-handling code KVM: fix compile warnings on s390 KVM: VMX: Check cpl before emulating debug register access KVM: fix misreporting of coalesced interrupts by kvm tracer KVM: x86: drop duplicate kvm_flush_remote_tlb calls KVM: VMX: call vmx_load_host_state() only if msr is cached KVM: VMX: Conditionally reload debug register 6 KVM: Use thread debug register storage instead of kvm specific data KVM guest: do not batch pte updates from interrupt context KVM: Fix coalesced interrupt reporting in IOAPIC KVM guest: fix bogus wallclock physical address calculation KVM: VMX: Fix cr8 exiting control clobbering by EPT KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp KVM: Document KVM_CAP_IRQCHIP KVM: Protect update_cr8_intercept() when running without an apic KVM: VMX: Fix EPT with WP bit change during paging KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors KVM: x86 emulator: Add adc and sbb missing decoder flags KVM: Add missing #include ...	2009-09-14 17:43:43 -07:00
Linus Torvalds	f65ac45e20	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: x86, mce: do not compile mcelog message on AMD EDAC, AMD: decode FR MCEs EDAC, AMD: decode load store MCEs EDAC, AMD: decode bus unit MCEs EDAC, AMD: decode instruction cache MCEs EDAC, AMD: decode data cache MCEs EDAC, AMD: carve out decoding of MCi_STATUS ErrorCode EDAC, AMD: carve out MCi_STATUS decoding x86, mce: pass mce info to EDAC for decoding amd64_edac: cleanup amd64_decode_bus_error amd64_edac: remove memory and GART TLB error decoders amd64_edac: cleanup/complete NB MCE decoding amd64_edac: cleanup amd64_process_error_info EDAC: beef up ErrorCodeExt error signatures EDAC: move MCE error descriptions to EDAC core	2009-09-14 17:38:38 -07:00
Andi Kleen	e34e77ce34	x86, mce: Fix compilation with !CONFIG_DEBUG_FS in mce-severity.c Fix compilation error in arch/x86/kernel/cpu/mcheck/mce-severity.c when CONFIG_DEBUG_FS is disabled, introduced in commit `5be9ed251f`. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-14 12:01:04 -07:00
Rafael J. Wysocki	ac8d513a68	Merge branch 'master' into for-linus	2009-09-14 20:26:05 +02:00
Linus Torvalds	b8cb48aae1	Merge branch 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: split __phys_addr out into separate file xen: use stronger barrier after unlocking lock xen: only enable interrupts while actually blocking for spinlock xen: make -fstack-protector work under Xen	2009-09-14 10:23:49 -07:00
Borislav Petkov	22223c9b41	x86, mce: do not compile mcelog message on AMD Now that decoding is done in-kernel, suppress mcelog message part. CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 19:01:41 +02:00
Borislav Petkov	549d042df2	x86, mce: pass mce info to EDAC for decoding Move NB decoder along with required defines to EDAC MCE core. Add registration routines for further decoding of the MCE info in the AMD64 EDAC module. CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 18:59:17 +02:00
Linus Torvalds	0cc6d77e55	Merge branch 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, e820: Guard against array overflowed in __e820_add_region() x86, setup: remove obsolete pre-Kconfig CONFIG_VIDEO_ variables	2009-09-14 08:01:47 -07:00
Linus Torvalds	55e0715f61	Merge branch 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, percpu: Collect hot percpu variables into one cacheline x86, percpu: Fix DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED() x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses	2009-09-14 08:01:28 -07:00
Linus Torvalds	7dfd54a905	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, highmem_32.c: Clean up comment x86, pgtable.h: Clean up types x86: Clean up dump_pagetable()	2009-09-14 07:59:32 -07:00
Linus Torvalds	6512c0d625	Merge branch 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Simplify the Makefile in a minor way through use of cc-ifversion	2009-09-14 07:59:07 -07:00
Linus Torvalds	625037cc40	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64: move clts into batch cpu state updates when preloading fpu x86-64: move unlazy_fpu() into lazy cpu state part of context switch x86-32: make sure clts is batched during context switch x86: split out core __math_state_restore	2009-09-14 07:58:08 -07:00
Linus Torvalds	8fafa0a789	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Decrease the level of some NUMA messages to KERN_DEBUG	2009-09-14 07:57:50 -07:00
Linus Torvalds	c7208de304	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) x86: Fix code patching for paravirt-alternatives on 486 x86, msr: change msr-reg.o to obj-y, and export its symbols x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus x86, sched: Workaround broken sched domain creation for AMD Magny-Cours x86, mcheck: Use correct cpumask for shared bank4 x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors x86: Fix CPU llc_shared_map information for AMD Magny-Cours x86, msr: Fix msr-reg.S compilation with gas 2.16.1, on 32-bit too x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h x86, msr: fix msr-reg.S compilation with gas 2.16.1 x86, msr: Export the register-setting MSR functions via /dev/*/msr x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs() x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT x86, msr: CFI annotations, cleanups for msr-reg.S x86, asm: Make _ASM_EXTABLE() usable from assembly code x86, asm: Add 32-bit versions of the combined CFI macros x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit x86, msr: Rewrite AMD rd/wrmsr variants x86, msr: Add rd/wrmsr interfaces with preset registers x86: add specific support for Intel Atom architecture ...	2009-09-14 07:57:32 -07:00
Linus Torvalds	15b0404272	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Make memtype_seq_ops const x86: uv: Clean up uv_ptc_init(), use proc_create() x86: Use printk_once() x86/cpu: Clean up various files a bit x86: Remove duplicated #include x86, ipi: Clean up safe_smp_processor_id() by using the cpu_has_apic() macro helper x86: Clean up idt_descr and idt_tableby using NR_VECTORS instead of hardcoded number x86: Further clean up of mtrr/generic.c x86: Clean up mtrr/main.c x86: Clean up mtrr/state.c x86: Clean up mtrr/mtrr.h x86: Clean up mtrr/if.c x86: Clean up mtrr/generic.c x86: Clean up mtrr/cyrix.c x86: Clean up mtrr/cleanup.c x86: Clean up mtrr/centaur.c x86: Clean up mtrr/amd.c: x86: ds.c fix invalid assignment	2009-09-14 07:56:43 -07:00
Linus Torvalds	9b74aec028	Merge branch 'x86-asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: remove all now-duplicate header files x86: convert termios.h to the asm-generic version x86: convert almost generic headers to asm-generic version x86: convert trivial headers to asm-generic version x86: add copies of some headers to convert to asm-generic	2009-09-14 07:55:37 -07:00
Linus Torvalds	b581af5110	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/i386: Put aligned stack-canary in percpu shared_aligned section x86/i386: Make sure stack-protector segment base is cache aligned x86: Detect stack protector for i386 builds on x86_64 x86: allow "=rm" in native_save_fl() x86: properly annotate alternatives.c x86: Introduce GDT_ENTRY_INIT(), initialize bad_bios_desc statically x86, 32-bit: Use generic sys_pipe() x86: Introduce GDT_ENTRY_INIT(), fix APM x86: Introduce GDT_ENTRY_INIT() x86: Introduce set_desc_base() and set_desc_limit() x86: Remove unused patch_espfix_desc() x86: Use get_desc_base()	2009-09-14 07:53:49 -07:00
Linus Torvalds	ffaf854b01	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=n x86, apic: Slim down stack usage in early_init_lapic_mapping() x86, ioapic: Get rid of needless check and simplify ioapic_setup_resources() x86, ioapic: Define IO_APIC_DEFAULT_PHYS_BASE constant x86: Fix x86_model test in es7000_apic_is_cluster() x86, apic: Move dmar_table_init() out of enable_IR() x86, ioapic: Panic on irq-pin binding only if needed x86/apic: Enable x2APIC without interrupt remapping under KVM x86, apic: Drop redundant bit assignment x86, ioapic: Throw BUG instead of NULL dereference x86, ioapic: Introduce for_each_irq_pin() helper x86: Remove superfluous NULL pointer check in destroy_irq() x86/ioapic.c: unify ioapic_retrigger_irq() x86/ioapic.c: convert __target_IO_APIC_irq to conventional for() loop x86/ioapic.c: clean up replace_pin_at_irq_node logic and comments x86/ioapic.c: convert replace_pin_at_irq_node to conventional for() loop x86/ioapic.c: simplify add_pin_to_irq_node() x86/ioapic.c: convert io_apic_level_ack_pending loop to normal for() loop x86/ioapic.c: move lost comment to what seems like appropriate place x86/ioapic.c: remove redundant declaration of irq_pin_list ...	2009-09-14 07:51:20 -07:00
Jiri Olsa	4818d80942	tracing/function-graph: x86_64 stack allocation cleanup Only 24 bytes needs to be reserved on the stack for the function graph tracer on x86_64. Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <20090729085837.GB4998@jolsa.lab.eng.brq.redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-12 22:13:43 -04:00
Linus Torvalds	86373435d2	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (25 commits) pata_rz1000: use printk_once ahci: kill @force_restart and refine CLO for ahci_kick_engine() pata_cs5535: add pci id for AMD based CS5535 controllers ahci: Add AMD SB900 SATA/IDE controller device IDs drivers/ata: use resource_size sata_fsl: Defer non-ncq commands when ncq commands active libata: add SATA PMP revision information for spec 1.2 libata: fix off-by-one error in ata_tf_read_block() ahci: Gigabyte GA-MA69VM-S2 can't do 64bit DMA ahci: make ahci_asus_m2a_vm_32bit_only() quirk more generic dmi: extend dmi_get_year() to dmi_get_date() dmi: fix date handling in dmi_get_year() libata: unbreak TPM filtering by reorganizing ata_scsi_pass_thru() sata_sis: convert to slave_link sata_sil24: always set protocol override for non-ATAPI data commands libata: Export AHCI capabilities libata: Delegate nonrot flag setting to SCSI [libata] Add pata_rdc driver for RDC ATA devices drivers/ata: Remove unnecessary semicolons libata: remove spindown skipping and warning ...	2009-09-11 16:38:33 -07:00
Linus Torvalds	483e3cd6a3	Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits) ring-buffer: only enable ring_buffer_swap_cpu when needed ring-buffer: check for swapped buffers in start of committing tracing: report error in trace if we fail to swap latency buffer tracing: add trace_array_printk for internal tracers to use tracing: pass around ring buffer instead of tracer tracing: make tracing_reset safe for external use tracing: use timestamp to determine start of latency traces tracing: Remove mentioning of legacy latency_trace file from documentation tracing/filters: Defer pred allocation, fix memory leak tracing: remove users of tracing_reset tracing: disable buffers and synchronize_sched before resetting tracing: disable update max tracer while reading trace tracing: print out start and stop in latency traces ring-buffer: disable all cpu buffers when one finds a problem ring-buffer: do not count discarded events ring-buffer: remove ring_buffer_event_discard ring-buffer: fix ring_buffer_read crossing pages ring-buffer: remove unnecessary cpu_relax ring-buffer: do not swap buffers during a commit ring-buffer: do not reset while in a commit ...	2009-09-11 13:24:03 -07:00
Linus Torvalds	774a694f8c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits) sched: Fix sched::sched_stat_wait tracepoint field sched: Disable NEW_FAIR_SLEEPERS for now sched: Keep kthreads at default priority sched: Re-tune the scheduler latency defaults to decrease worst-case latencies sched: Turn off child_runs_first sched: Ensure that a child can't gain time over it's parent after fork() sched: enable SD_WAKE_IDLE sched: Deal with low-load in wake_affine() sched: Remove short cut from select_task_rq_fair() sched: Turn on SD_BALANCE_NEWIDLE sched: Clean up topology.h sched: Fix dynamic power-balancing crash sched: Remove reciprocal for cpu_power sched: Try to deal with low capacity, fix update_sd_power_savings_stats() sched: Try to deal with low capacity sched: Scale down cpu_power due to RT tasks sched: Implement dynamic cpu_power sched: Add smt_gain sched: Update the cpu_power sum during load-balance sched: Add SD_PREFER_SIBLING ...	2009-09-11 13:23:18 -07:00
Linus Torvalds	4f0ac85416	Merge branch 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits) perf tools: Avoid unnecessary work in directory lookups perf stat: Clean up statistics calculations a bit more perf stat: More advanced variance computation perf stat: Use stddev_mean in stead of stddev perf stat: Remove the limit on repeat perf stat: Change noise calculation to use stddev x86, perf_counter, bts: Do not allow kernel BTS tracing for now x86, perf_counter, bts: Correct pointer-to-u64 casts x86, perf_counter, bts: Fail if BTS is not available perf_counter: Fix output-sharing error path perf trace: Fix read_string() perf trace: Print out in nanoseconds perf tools: Seek to the end of the header area perf trace: Fix parsing of perf.data perf trace: Sample timestamps as well perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint access perf trace: Sample the CPU too perf tools: Work around strict aliasing related warnings perf tools: Clean up warnings list in the Makefile perf tools: Complete support for dynamic strings ...	2009-09-11 13:22:43 -07:00
Linus Torvalds	b9356c53ba	Merge branch 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (55 commits) arch/x86/oprofile/op_model_amd.c: fix op_amd_handle_ibs() return type Revert "x86: oprofile/op_model_amd.c set return values for op_amd_handle_ibs()" x86/oprofile: Small coding style fixes x86/oprofile: Add counter reservation check for virtual counters x86/oprofile: Implement op_x86_virt_to_phys() oprofile: Adding switch counter to oprofile statistic variables x86/oprofile: Implement mux_clone() x86/oprofile: Enable multiplexing only if the model supports it x86/oprofile: Add function has_mux() to check multiplexing support x86/oprofile: Modify initialization of num_virt_counters x86/oprofile: Remove unused num_virt_controls from struct op_x86_model_spec x86/oprofile: Remove const qualifier from struct op_x86_model_spec x86/oprofile: Moving nmi_cpu_switch() in nmi_int.c x86/oprofile: Moving nmi_cpu_save/restore_mpx_registers() in nmi_int.c x86/oprofile: Moving nmi_setup_cpu_mux() in nmi_int.c x86/oprofile: Implement multiplexing setup/shutdown functions oprofile: Grouping multiplexing code in op_model_amd.c oprofile: Introduce op_x86_phys_to_virt() oprofile: Grouping multiplexing code in oprof.c oprofile: Remove oprofile_multiplexing_init() ...	2009-09-11 13:22:30 -07:00
Linus Torvalds	a66a50054e	Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (59 commits) x86/gart: Do not select AGP for GART_IOMMU x86/amd-iommu: Initialize passthrough mode when requested x86/amd-iommu: Don't detach device from pt domain on driver unbind x86/amd-iommu: Make sure a device is assigned in passthrough mode x86/amd-iommu: Align locking between attach_device and detach_device x86/amd-iommu: Fix device table write order x86/amd-iommu: Add passthrough mode initialization functions x86/amd-iommu: Add core functions for pd allocation/freeing x86/dma: Mark iommu_pass_through as __read_mostly x86/amd-iommu: Change iommu_map_page to support multiple page sizes x86/amd-iommu: Support higher level PTEs in iommu_page_unmap x86/amd-iommu: Remove old page table handling macros x86/amd-iommu: Use 2-level page tables for dma_ops domains x86/amd-iommu: Remove bus_addr check in iommu_map_page x86/amd-iommu: Remove last usages of IOMMU_PTE_L0_INDEX x86/amd-iommu: Change alloc_pte to support 64 bit address space x86/amd-iommu: Introduce increase_address_space function x86/amd-iommu: Flush domains if address space size was increased x86/amd-iommu: Introduce set_dte_entry function x86/amd-iommu: Add a gneric version of amd_iommu_flush_all_devices ...	2009-09-11 13:16:37 -07:00
Linus Torvalds	989aa44a5f	Merge branch 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: debug lockups: Improve lockup detection, fix generic arch fallback debug lockups: Improve lockup detection	2009-09-11 13:15:55 -07:00
Eric Anholt	e517a5e970	agp/intel: Fix the pre-9xx chipset flush. Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had serious stability issues. Back in May a wbinvd was added to the DRM to work around much of the problem. Some failure remained -- easily visible by dragging a window around on an X -retro desktop, or by looking at bugzilla. The chipset flush was on the right track -- hitting the right amount of memory, and it appears to be the only way to flush on these chipsets, but the flush page was mapped uncached. As a result, the writes trying to clear the writeback cache ended up bypassing the cache, and not flushing anything! The wbinvd would flush out other writeback data and often cause the data we wanted to get flushed, but not always. By removing the setting of the page to UC and instead just clflushing the data we write to try to flush it, we get the desired behavior with no wbinvd. This exports clflush_cache_range(), which was laying around and happened to basically match the code I was otherwise going to copy from the DRM. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org> Cc: stable@kernel.org	2009-09-11 11:39:23 -07:00
Linus Torvalds	332a339218	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (102 commits) crypto: sha-s390 - Fix warnings in import function crypto: vmac - New hash algorithm for intel_txt support crypto: api - Do not displace newly registered algorithms crypto: ansi_cprng - Fix module initialization crypto: xcbc - Fix alignment calculation of xcbc_tfm_ctx crypto: fips - Depend on ansi_cprng crypto: blkcipher - Do not use eseqiv on stream ciphers crypto: ctr - Use chainiv on raw counter mode Revert crypto: fips - Select CPRNG crypto: rng - Fix typo crypto: talitos - add support for 36 bit addressing crypto: talitos - align locks on cache lines crypto: talitos - simplify hmac data size calculation crypto: mv_cesa - Add support for Orion5X crypto engine crypto: cryptd - Add support to access underlaying shash crypto: gcm - Use GHASH digest algorithm crypto: ghash - Add GHASH digest algorithm for GCM crypto: authenc - Convert to ahash crypto: api - Fix aligned ctx helper crypto: hmac - Prehash ipad/opad ...	2009-09-11 09:38:37 -07:00
Linus Torvalds	1b195b170d	Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6 * 'kmemleak' of git://linux-arm.org/linux-2.6: kmemleak: Improve the "Early log buffer exceeded" error message kmemleak: fix sparse warning for static declarations kmemleak: fix sparse warning over overshadowed flags kmemleak: move common painting code together kmemleak: add clear command support kmemleak: use bool for true/false questions kmemleak: Do no create the clean-up thread during kmemleak_disable() kmemleak: Scan all thread stacks kmemleak: Don't scan uninitialized memory when kmemcheck is enabled kmemleak: Ignore the aperture memory hole on x86_64 kmemleak: Printing of the objects hex dump kmemleak: Do not report alloc_bootmem blocks as leaks kmemleak: Save the stack trace for early allocations kmemleak: Mark the early log buffer as __initdata kmemleak: Dump object information on request kmemleak: Allow rescheduling during an object scanning	2009-09-11 09:16:22 -07:00
Michal Hocko	80938332d8	x86: Increase MIN_GAP to include randomized stack Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: Michal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Stable Team <stable@kernel.org>	2009-09-10 17:00:12 -07:00
Ben Hutchings	5367b6887e	x86: Fix code patching for paravirt-alternatives on 486 As reported in <http://bugs.debian.org/511703> and <http://bugs.debian.org/515982>, kernels with paravirt-alternatives enabled crash in text_poke_early() on at least some 486-class processors. The problem is that text_poke_early() itself uses inline functions affected by paravirt-alternatives and so will modify instructions that have already been prefetched. Pentium and later processors will invalidate the prefetched instructions in this case, but 486-class processors do not. Change sync_core() to limit prefetching on 486-class (and 386-class) processors, and move the call to sync_core() above the call to the modifiable local_irq_restore(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> LKML-Reference: <1252547631.3423.134.camel@localhost> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-10 16:50:19 -07:00
James Morris	a3c8b97396	Merge branch 'next' into for-linus	2009-09-11 08:04:49 +10:00
Steven Rostedt	fc06b8520b	x86/tracing: comment need for atomic nop The dynamic function tracer relys on the macro P6_NOP5 always being an atomic NOP. If for some reason it is changed to be two operations (like a nop2 nop3) it can faults within the kernel when the function tracer modifies the code. This patch adds a comment to note that the P6_NOPs are expected to be atomic. This will hopefully prevent anyone from changing that. Reported-by: Mathieu Desnoyer <mathieu.desnoyers@polymtl.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-10 17:22:44 -04:00
Jeremy Fitzhardinge	78c86e5e56	x86: split __phys_addr out into separate file Split __phys_addr out into its own file so we can disable -fstack-protector in a fine-grained fashion. Also it doesn't have terribly much to do with the rest of ioremap.c. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-09-10 11:48:55 -07:00
Avi Kivity	0a79b00952	KVM: VMX: Check cpl before emulating debug register access Debug registers may only be accessed from cpl 0. Unfortunately, vmx will code to emulate the instruction even though it was issued from guest userspace, possibly leading to an unexpected trap later. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 18:11:10 +03:00
Gleb Natapov	4da748960a	KVM: fix misreporting of coalesced interrupts by kvm tracer Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 18:11:09 +03:00
Marcelo Tosatti	e3904e6ed0	KVM: x86: drop duplicate kvm_flush_remote_tlb calls kvm_mmu_slot_remove_write_access already calls it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 18:11:08 +03:00
Gleb Natapov	542423b0dd	KVM: VMX: call vmx_load_host_state() only if msr is cached No need to call it before each kvm_(set\|get)_msr_common() Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 18:11:07 +03:00
Avi Kivity	e8a48342e9	KVM: VMX: Conditionally reload debug register 6 Only reload debug register 6 if we're running with the guest's debug registers. Saves around 150 cycles from the guest lightweight exit path. dr6 contains a couple of bits that are updated on #DB, so intercept that unconditionally and update those bits then. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 18:11:06 +03:00
Avi Kivity	3d53c27d05	KVM: Use thread debug register storage instead of kvm specific data Instead of saving the debug registers from the processor to a kvm data structure, rely in the debug registers stored in the thread structure. This allows us not to save dr6 and dr7. Reduces lightweight vmexit cost by 350 cycles, or 11 percent. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 18:11:04 +03:00
Marcelo Tosatti	6ba6617875	KVM guest: do not batch pte updates from interrupt context Commit `b8bcfe997e` made paravirt pte updates synchronous in interrupt context. Unfortunately the KVM pv mmu code caches the lazy/nonlazy mode internally, so a pte update from interrupt context during a lazy mmu operation can be batched while it should be performed synchronously. https://bugzilla.redhat.com/show_bug.cgi?id=518022 Drop the internal mode variable and use paravirt_get_lazy_mode(), which returns the correct state. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 18:10:50 +03:00
Glauber Costa	a20316d2aa	KVM guest: fix bogus wallclock physical address calculation The use of __pa() to calculate the address of a C-visible symbol is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h for details. It should be replaced with __pa_symbol(), that does the correct math here, by taking relocations into account. This ensures the correct wallclock data structure physical address is passed to the hypervisor. Cc: stable@kernel.org Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:58 +03:00
Gleb Natapov	5fff7d270b	KVM: VMX: Fix cr8 exiting control clobbering by EPT Don't call adjust_vmx_controls() two times for the same control. It restores options that were dropped earlier. This loses us the cr8 exit control, which causes a massive performance regression Windows x64. Cc: stable@kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:57 +03:00
Avi Kivity	60f24784a9	KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp We know no pages are protected, so we can short-circuit the whole thing (including fairly nasty guest memory accesses). Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:56 +03:00
Avi Kivity	88c808fd42	KVM: Protect update_cr8_intercept() when running without an apic update_cr8_intercept() can be triggered from userspace while there is no apic present. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:54 +03:00
Sheng Yang	95eb84a758	KVM: VMX: Fix EPT with WP bit change during paging QNX update WP bit when paging enabled, which is not covered yet. This one fix QNX boot with EPT. Cc: stable@kernel.org Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:53 +03:00
Mikhail Ershov	d9048d3278	KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors Segment descriptors tables can be placed on two non-contiguous pages. This patch makes reading segment descriptors by linear address. Signed-off-by: Mikhail Ershov <Mike.Ershov@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:52 +03:00
Mohammed Gamal	7bdb588827	KVM: x86 emulator: Add adc and sbb missing decoder flags Add missing decoder flags for adc and sbb instructions (opcodes 0x14-0x15, 0x1c-0x1d) Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:51 +03:00
Avi Kivity	fa6870c6b6	KVM: Add missing #include Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:49 +03:00
Avi Kivity	56e8231841	KVM: Rename x86_emulate.c to emulate.c We're in arch/x86, what could we possibly be emulating? Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:45 +03:00
Anthony Liguori	c0c7c04b87	KVM: When switching to a vm8086 task, load segments as 16-bit According to 16.2.5 in the SDM, eflags.vm in the tss is consulted before loading and new segments. If eflags.vm == 1, then the segments are treated as 16-bit segments. The LDTR and TR are not normally available in vm86 mode so if they happen to somehow get loaded, they need to be treated as 32-bit segments. This fixes an invalid vmentry failure in a custom OS that was happening after a task switch into vm8086 mode. Since the segments were being mistakenly treated as 32-bit, we loaded garbage state. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:44 +03:00
Avi Kivity	345dcaa8fd	KVM: VMX: Adjust rflags if in real mode emulation We set rflags.vm86 when virtualizing real mode to do through vm8086 mode; so we need to take it out again when reading rflags. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:43 +03:00
Avi Kivity	52c7847d12	KVM: SVM: Drop tlb flush workaround in npt It is no longer possible to reproduce the problem any more, so presumably it has been fixed. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:40 +03:00
Gleb Natapov	cb142eb743	KVM: Update cr8 intercept when APIC TPR is changed by userspace Since on vcpu entry we do it only if apic is enabled we should do it when TPR is changed while apic is disabled. This happens when windows resets HW without setting TPR to zero. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:39 +03:00
Joerg Roedel	4b6e4dca70	KVM: SVM: enable nested svm by default Nested SVM is (in my experience) stable enough to be enabled by default. So omit the requirement to pass a module parameter. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:38 +03:00
Joerg Roedel	108768de55	KVM: SVM: check for nested VINTR flag in svm_interrupt_allowed Not checking for this flag breaks any nested hypervisor that does not set VINTR. So fix it with this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:37 +03:00
Joerg Roedel	26666957a5	KVM: SVM: move nested_svm_intr main logic out of if-clause This patch removes one indentation level from nested_svm_intr and makes the logic more readable. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:36 +03:00
Joerg Roedel	cda0ffdd86	KVM: SVM: remove unnecessary is_nested check from svm_cpu_run This check is not necessary. We have to sync the vcpu->arch.cr2 always back to the VMCB. This patch remove the is_nested check. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:34 +03:00
Joerg Roedel	410e4d573d	KVM: SVM: move special nested exit handling to separate function This patch moves the handling for special nested vmexits like #pf to a separate function. This makes the kvm_override parameter obsolete and makes the code more readable. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:33 +03:00
Joerg Roedel	1f8da47805	KVM: SVM: handle errors in vmrun emulation path appropriatly If nested svm fails to load the msrpm the vmrun succeeds with the old msrpm which is not correct. This patch changes the logic to roll back to host mode in case the msrpm cannot be loaded. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:32 +03:00
Joerg Roedel	ea8e064fe2	KVM: SVM: remove nested_svm_do and helper functions This function is not longer required. So remove it. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:31 +03:00
Joerg Roedel	9738b2c97d	KVM: SVM: clean up nested vmrun path This patch removes the usage of nested_svm_do from the vmrun emulation path. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:29 +03:00
Joerg Roedel	9966bf6872	KVM: SVM: clean up nestec vmload/vmsave paths This patch removes the usage of nested_svm_do from the vmload and vmsave emulation code paths. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:46:28 +03:00
Joerg Roedel	3d62d9aa98	KVM: SVM: clean up nested_svm_exit_handled_msr This patch changes nested svm to call nested_svm_exit_handled_msr directly and not through nested_svm_do. [alex: fix oops due to nested kmap_atomics] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 10:45:43 +03:00
Joerg Roedel	34f80cfad5	KVM: SVM: get rid of nested_svm_vmexit_real This patch is the starting point of removing nested_svm_do from the nested svm code. The nested_svm_do function basically maps two guest physical pages to host virtual addresses and calls a passed function on it. This function pointer code flow is hard to read and not the best technical solution here. As a side effect this patch indroduces the nested_svm_[un]map helper functions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:26 +03:00
Joerg Roedel	0295ad7de8	KVM: SVM: simplify nested_svm_check_exception Makes the code of this function more readable by removing on indentation level for the core logic. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:25 +03:00
Joerg Roedel	9c4e40b994	KVM: SVM: do nested vmexit in nested_svm_exit_handled If this function returns true a nested vmexit is required. Move that vmexit into the nested_svm_exit_handled function. This also simplifies the handling of nested #pf intercepts in this function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:25 +03:00
Joerg Roedel	4c2161aed5	KVM: SVM: consolidate nested_svm_exit_handled When caching guest intercepts there is no need anymore for the nested_svm_exit_handled_real function. So move its code into nested_svm_exit_handled. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:24 +03:00
Joerg Roedel	aad42c641c	KVM: SVM: cache nested intercepts When the nested intercepts are cached we don't need to call get_user_pages and/or map the nested vmcb on every nested #vmexit to check who will handle the intercept. Further this patch aligns the emulated svm behavior better to real hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:24 +03:00
Joerg Roedel	e6aa9abd73	KVM: SVM: move nested svm state into seperate struct This makes it more clear for which purpose these members in the vcpu_svm exist. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:24 +03:00
Joerg Roedel	a5c3832dfe	KVM: SVM: complete interrupts after handling nested exits The interrupt completion code must run after nested exits are handled because not injected interrupts or exceptions may be handled by the l1 guest first. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:24 +03:00
Joerg Roedel	0460a979b4	KVM: SVM: copy only necessary parts of the control area on vmrun/vmexit The vmcb control area contains more then 800 bytes of reserved fields which are unnecessarily copied. Fix this by introducing a copy function which only copies the relevant part and saves time. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:23 +03:00
Joerg Roedel	defbba5660	KVM: SVM: optimize nested vmrun Only copy the necessary parts of the vmcb save area on vmrun and save precious time. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:23 +03:00
Joerg Roedel	33740e4009	KVM: SVM: optimize nested #vmexit It is more efficient to copy only the relevant parts of the vmcb back to the nested vmcb when we emulate an vmexit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:23 +03:00
Joerg Roedel	2af9194d1b	KVM: SVM: add helper functions for global interrupt flag This patch makes the code easier to read when it comes to setting, clearing and checking the status of the virtualized global interrupt flag for the VCPU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:23 +03:00
Avi Kivity	256cd2ef4f	x86: Export kmap_atomic_to_page() Needed by KVM. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:22 +03:00
Gleb Natapov	88ba63c265	KVM: Replace pic_lock()/pic_unlock() with direct call to spinlock functions They are not doing anything else now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:22 +03:00
Gleb Natapov	938396a234	KVM: Call ack notifiers from PIC when guest OS acks an IRQ. Currently they are called when irq vector is been delivered. Calling ack notifiers at this point is wrong. Device assignment ack notifier enables host interrupts, but guest not yet had a chance to clear interrupt condition in a device. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:22 +03:00
Gleb Natapov	956f97cf66	KVM: Call kvm_vcpu_kick() inside pic spinlock d5ecfdd25 moved it out because back than it was impossible to call it inside spinlock. This restriction no longer exists. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:21 +03:00
Roel Kluin	3a34a8810b	KVM: fix EFER read buffer overflow Check whether index is within bounds before grabbing the element. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:21 +03:00
Amit Shah	1f3ee616dd	KVM: ignore reads to perfctr msrs We ignore writes to the perfctr msrs. Ignore reads as well. Kaspersky antivirus crashes Windows guests if it can't read these MSRs. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:21 +03:00
Avi Kivity	eab4b8aa34	KVM: VMX: Optimize vmx_get_cpl() Instead of calling vmx_get_segment() (which reads a whole bunch of vmcs fields), read only the cs selector which contains the cpl. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:21 +03:00
Jan Kiszka	07708c4af1	KVM: x86: Disallow hypercalls for guest callers in rings > 0 So far unprivileged guest callers running in ring 3 can issue, e.g., MMU hypercalls. Normally, such callers cannot provide any hand-crafted MMU command structure as it has to be passed by its physical address, but they can still crash the guest kernel by passing random addresses. To close the hole, this patch considers hypercalls valid only if issued from guest ring 0. This may still be relaxed on a per-hypercall base in the future once required. Cc: stable@kernel.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:20 +03:00
Marcelo Tosatti	b90c062c65	KVM: MMU: fix bogus alloc_mmu_pages assignment Remove the bogus n_free_mmu_pages assignment from alloc_mmu_pages. It breaks accounting of mmu pages, since n_free_mmu_pages is modified but the real number of pages remains the same. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:20 +03:00
Izik Eidus	3b80fffe2b	KVM: MMU: make __kvm_mmu_free_some_pages handle empty list First check if the list is empty before attempting to look at list entries. Cc: stable@kernel.org Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:20 +03:00
Bartlomiej Zolnierkiewicz	95fb4eb698	KVM: remove superfluous NULL pointer check in kvm_inject_pit_timer_irqs() This takes care of the following entries from Dan's list: arch/x86/kvm/i8254.c +714 kvm_inject_pit_timer_irqs(6) warning: variable derefenced in initializer 'vcpu' arch/x86/kvm/i8254.c +714 kvm_inject_pit_timer_irqs(6) warning: variable derefenced before check 'vcpu' Reported-by: Dan Carpenter <error27@gmail.com> Cc: corbet@lwn.net Cc: eteo@redhat.com Cc: Julia Lawall <julia@diku.dk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Acked-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:19 +03:00
Joerg Roedel	344f414fa0	KVM: report 1GB page support to userspace If userspace knows that the kernel part supports 1GB pages it can enable the corresponding cpuid bit so that guests actually use GB pages. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:19 +03:00
Joerg Roedel	04326caacf	KVM: MMU: enable gbpages by increasing nr of pagesizes Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:19 +03:00
Joerg Roedel	7e4e4056f7	KVM: MMU: shadow support for 1gb pages This patch adds support for shadow paging to the 1gb page table code in KVM. With this code the guest can use 1gb pages even if the host does not support them. [ Marcelo: fix shadow page collision on pmd level if a guest 1gb page is mapped with 4kb ptes on host level ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:19 +03:00
Joerg Roedel	e04da980c3	KVM: MMU: make page walker aware of mapping levels The page walker may be used with nested paging too when accessing mmio areas. Make it support the additional page-level too. [ Marcelo: fix reserved bit check for 1gb pte ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:18 +03:00
Joerg Roedel	852e3c19ac	KVM: MMU: make direct mapping paths aware of mapping levels Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:18 +03:00
Joerg Roedel	d25797b24c	KVM: MMU: rename is_largepage_backed to mapping_level With the new name and the corresponding backend changes this function can now support multiple hugepage sizes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:18 +03:00
Joerg Roedel	44ad9944f1	KVM: MMU: make rmap code aware of mapping levels This patch removes the largepage parameter from the rmap_add function. Together with rmap_remove this function now uses the role.level field to find determine if the page is a huge page. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:18 +03:00
Marcelo Tosatti	1444885a04	KVM: limit lapic periodic timer frequency Otherwise its possible to starve the host by programming lapic timer with a very high frequency. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:17 +03:00
Mikhail Ershov	5f0269f5d7	KVM: Align cr8 threshold when userspace changes cr8 Commit f0a3602c20 ("KVM: Move interrupt injection logic to x86.c") does not update the cr8 intercept if the lapic is disabled, so when userspace updates cr8, the cr8 threshold control is not updated and we are left with illegal control fields. Fix by explicitly resetting the cr8 threshold. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:17 +03:00
Jan Kiszka	7f582ab6d8	KVM: VMX: Avoid to return ENOTSUPP to userland Choose some allowed error values for the cases VMX returned ENOTSUPP so far as these values could be returned by the KVM_RUN IOCTL. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:17 +03:00
Gleb Natapov	84fde248fe	KVM: PIT: Unregister ack notifier callback when freeing Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:16 +03:00
Sheng Yang	b927a3cec0	KVM: VMX: Introduce KVM_SET_IDENTITY_MAP_ADDR ioctl Now KVM allow guest to modify guest's physical address of EPT's identity mapping page. (change from v1, discard unnecessary check, change ioctl to accept parameter address rather than value) Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:16 +03:00
Akinobu Mita	b792c344df	KVM: x86: use kvm_get_gdt() and kvm_read_ldt() Use kvm_get_gdt() and kvm_read_ldt() to reduce inline assembly code. Cc: Avi Kivity <avi@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:15 +03:00
Akinobu Mita	46a359e715	KVM: x86: use get_desc_base() and get_desc_limit() Use get_desc_base() and get_desc_limit() to get the base address and limit in desc_struct. Cc: Avi Kivity <avi@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:15 +03:00
Marcelo Tosatti	6a1ac77110	KVM: MMU: fix missing locking in alloc_mmu_pages n_requested_mmu_pages/n_free_mmu_pages are used by kvm_mmu_change_mmu_pages to calculate the number of pages to zap. alloc_mmu_pages, called from the vcpu initialization path, modifies this variables without proper locking, which can result in a negative value in kvm_mmu_change_mmu_pages (say, with cpu hotplug). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:14 +03:00
Sheng Yang	3662cb1cd6	KVM: Discard unnecessary kvm_mmu_flush_tlb() in kvm_mmu_load() set_cr3() should already cover the TLB flushing. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:14 +03:00
Gleb Natapov	4088bb3cae	KVM: silence lapic kernel messages that can be triggered by a guest Some Linux versions (f8) try to read EOI register that is write only. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:14 +03:00
Gleb Natapov	a1b37100d9	KVM: Reduce runnability interface with arch support code Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from interface between general code and arch code. kvm_arch_vcpu_runnable() checks for interrupts instead. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:13 +03:00
Gleb Natapov	0b71785dc0	KVM: Move kvm_cpu_get_interrupt() declaration to x86 code It is implemented only by x86. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:13 +03:00
Gleb Natapov	b59bb7bdf0	KVM: Move exception handling to the same place as other events Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:13 +03:00
Joerg Roedel	a205bc19f0	KVM: MMU: Fix MMU_DEBUG compile breakage Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:13 +03:00
Gregory Haskins	d34e6b175e	KVM: add ioeventfd support ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called "doorbell". This module has a function called "doorbell_ring()" which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via ioeventfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 110000 iops, 9.09us rtt ioeventfd-mmio: 200100 iops, 5.00us rtt ioeventfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt ioeventfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. -------------------- Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:12 +03:00
Gregory Haskins	090b7aff27	KVM: make io_bus interface more robust Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it fails. We want to create dynamic MMIO/PIO entries driven from userspace later in the series, so we need to enhance the code to be more robust with the following changes: 1) Add a return value to the registration function 2) Fix up all the callsites to check the return code, handle any failures, and percolate the error up to the caller. 3) Add an unregister function that collapses holes in the array Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:12 +03:00
Beth Kon	e9f4275732	KVM: PIT support for HPET legacy mode When kvm is in hpet_legacy_mode, the hpet is providing the timer interrupt and the pit should not be. So in legacy mode, the pit timer is destroyed, but the state of the pit is maintained. So if kvm or the guest tries to modify the state of the pit, this modification is accepted, except that the timer isn't actually started. When we exit hpet_legacy_mode, the current state of the pit (which is up to date since we've been accepting modifications) is used to restart the pit timer. The saved_mode code in kvm_pit_load_count temporarily changes mode to 0xff in order to destroy the timer, but then restores the actual value, again maintaining "current" state of the pit for possible later reenablement. [avi: add some reserved storage in the ioctl; make SET_PIT2 IOW] [marcelo: fix memory corruption due to reserved storage] Signed-off-by: Beth Kon <eak@us.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:12 +03:00
Gleb Natapov	0d1de2d901	KVM: Always report x2apic as supported feature We emulate x2apic in software, so host support is not required. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:11 +03:00
Gleb Natapov	c7f0f24b1f	KVM: No need to kick cpu if not in a guest mode This will save a couple of IPIs. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:11 +03:00
Gleb Natapov	1000ff8d89	KVM: Add trace points in irqchip code Add tracepoint in msi/ioapic/pic set_irq() functions, in IPI sending and in the point where IRQ is placed into apic's IRR. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:11 +03:00
Andre Przywara	f7c6d14003	KVM: fix MMIO_CONF_BASE MSR access Some Windows versions check whether the BIOS has setup MMI/O for config space accesses on AMD Fam10h CPUs, we say "no" by returning 0 on reads and only allow disabling of MMI/O CfgSpace setup by igoring "0" writes. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:10 +03:00
Avi Kivity	f691fe1da7	KVM: Trace shadow page lifecycle Create, sync, unsync, zap. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:10 +03:00
Avi Kivity	0742017159	KVM: MMU: Trace guest pagetable walker Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:09 +03:00
Jan Kiszka	dc7e795e3d	Revert "KVM: x86: check for cr3 validity in ioctl_set_sregs" This reverts commit 6c20e1442bb1c62914bb85b7f4a38973d2a423ba. To my understanding, it became obsolete with the advent of the more robust check in mmu_alloc_roots (89da4ff17f). Moreover, it prevents the conceptually safe pattern 1. set sregs 2. register mem-slots 3. run vcpu by setting a sticky triple fault during step 1. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:09 +03:00
Andre Przywara	6098ca939e	KVM: handle AMD microcode MSR Windows 7 tries to update the CPU's microcode on some processors, so we ignore the MSR write here. The patchlevel register is already handled (returning 0), because the MSR number is the same as Intel's. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:08 +03:00
Sheng Yang	756975bbfd	KVM: Fix apic_mmio_write return for unaligned write Some in-famous OS do unaligned writing for APIC MMIO, and the return value has been missed in recent change, then the OS hangs. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:08 +03:00
Gleb Natapov	0105d1a526	KVM: x2apic interface to lapic This patch implements MSR interface to local apic as defines by x2apic Intel specification. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:08 +03:00
Gleb Natapov	fc61b800f9	KVM: Add Directed EOI support to APIC emulation Directed EOI is specified by x2APIC, but is available even when lapic is in xAPIC mode. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:07 +03:00
Avi Kivity	cb24772140	KVM: Trace apic registers using their symbolic names Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:07 +03:00
Avi Kivity	aec51dc4f1	KVM: Trace mmio Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:07 +03:00
Andre Przywara	c323c0e5f0	KVM: Ignore PCI ECS I/O enablement Linux guests will try to enable access to the extended PCI config space via the I/O ports 0xCF8/0xCFC on AMD Fam10h CPU. Since we (currently?) don't use ECS, simply ignore write and read attempts. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:06 +03:00
Michael S. Tsirkin	bda9020e24	KVM: remove in_range from io devices This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write functions. in_range now becomes unused so it is removed from device ops in favor of read/write callbacks performing range checks internally. This allows aliasing (mostly for in-kernel virtio), as well as better error handling by making it possible to pass errors up to userspace. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:05 +03:00
Michael S. Tsirkin	6c47469453	KVM: convert bus to slots_lock Use slots_lock to protect device list on the bus. slots_lock is already taken for read everywhere, so we only need to take it for write when registering devices. This is in preparation to removing in_range and kvm->lock around it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:05 +03:00
Michael S. Tsirkin	108b56690f	KVM: switch pit creation to slots_lock switch pit creation to slots_lock. slots_lock is already taken for read everywhere, so we only need to take it for write when creating pit. This is in preparation to removing in_range and kvm->lock around it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:05 +03:00
Marcelo Tosatti	2023a29cbe	KVM: remove old KVMTRACE support code Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:03 +03:00
Andre Przywara	ed85c06853	KVM: introduce module parameter for ignoring unknown MSRs accesses KVM will inject a #GP into the guest if that tries to access unhandled MSRs. This will crash many guests. Although it would be the correct way to actually handle these MSRs, we introduce a runtime switchable module param called "ignore_msrs" (defaults to 0). If this is Y, unknown MSR reads will return 0, while MSR writes are simply dropped. In both cases we print a message to dmesg to inform the user about that. You can change the behaviour at any time by saying: # echo 1 > /sys/modules/kvm/parameters/ignore_msrs Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:03 +03:00
Andre Przywara	1fdbd48c24	KVM: ignore reads from AMDs C1E enabled MSR If the Linux kernel detects an C1E capable AMD processor (K8 RevF and higher), it will access a certain MSR on every attempt to go to halt. Explicitly handle this read and return 0 to let KVM run a Linux guest with the native AMD host CPU propagated to the guest. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:03 +03:00
Andre Przywara	8f1589d95e	KVM: ignore AMDs HWCR register access to set the FFDIS bit Linux tries to disable the flush filter on all AMD K8 CPUs. Since KVM does not handle the needed MSR, the injected #GP will panic the Linux kernel. Ignore setting of the HWCR.FFDIS bit in this MSR to let Linux boot with an AMD K8 family guest CPU. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:02 +03:00
Marcelo Tosatti	894a9c5543	KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths Correct missing locking in a few places in x86's vm_ioctl handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:02 +03:00
Joerg Roedel	ec04b2604c	KVM: Prepare memslot data structures for multiple hugepage sizes [avi: fix build on non-x86] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:02 +03:00
Andre Przywara	4668f05078	KVM: x86 emulator: Add sysexit emulation Handle #UD intercept of the sysexit instruction in 64bit mode returning to 32bit compat mode on an AMD host. Setup the segment descriptors for CS and SS and the EIP/ESP registers according to the manual. Signed-off-by: Christoph Egger <christoph.egger@amd.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:01 +03:00

1 2 3 4 5 ...

8903 Commits