kernel_optimize_test

Author	SHA1	Message	Date
Paolo Bonzini	0cd665bd20	KVM: x86: cleanup kvm_inject_emulated_page_fault To reconstruct the kvm_mmu to be used for page fault injection, we can simply use fault->nested_page_fault. This matches how fault->nested_page_fault is assigned in the first place by FNAME(walk_addr_generic). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-20 17:26:05 -04:00
Paolo Bonzini	5efac0741c	KVM: x86: introduce kvm_mmu_invalidate_gva Wrap the combination of mmu->invlpg and kvm_x86_ops->tlb_flush_gva into a new function. This function also lets us specify the host PGD to invalidate and also the MMU, both of which will be useful in fixing and simplifying kvm_inject_emulated_page_fault. A nested guest's MMU however has g_context->invlpg == NULL. Instead of setting it to nonpaging_invlpg, make kvm_mmu_invalidate_gva the only entry point to mmu->invlpg and make a NULL invlpg pointer equivalent to nonpaging_invlpg, saving a retpoline. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-20 17:25:55 -04:00
Sean Christopherson	53b3d8e9d5	KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault) Export the page fault propagation helper so that VMX can use it to correctly emulate TLB invalidation on page faults in an upcoming patch. In the (hopefully) not-too-distant future, SGX virtualization will also want access to the helper for injecting page faults to the correct level (L1 vs. L2) when emulating ENCLS instructions. Rename the function to kvm_inject_emulated_page_fault() to clarify that it is (a) injecting a fault and (b) only for page faults. WARN if it's invoked with an exception other than PF_VECTOR. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-6-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:50 -04:00
Junaid Shahid	d6e3f8385d	KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT Free all roots when emulating INVVPID for L1 and EPT is disabled, as outstanding changes to the page tables managed by L1 need to be recognized. Because L1 and L2 share an MMU when EPT is disabled, and because VPID is not tracked by the MMU role, all roots in the current MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or VM-Exit could do a fast CR3 switch (without a flush/sync) and consume stale SPTEs. Fixes: `5c614b3583` ("KVM: nVMX: nested VPID emulation") Signed-off-by: Junaid Shahid <junaids@google.com> [sean: ported to upstream KVM, reworded the comment and changelog] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-5-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:49 -04:00
Sean Christopherson	f8aa7e3958	KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 Free all L2 (guest_mmu) roots when emulating INVEPT for L1. Outstanding changes to the EPT tables managed by L1 need to be recognized, and relying on KVM to always flush L2's EPTP context on nested VM-Enter is dangerous. Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote TLB flush if necessary, e.g. if L1 has never entered L2 then there is nothing to be done. Nuking all L2 roots is overkill for the single-context variant, but it's the safe and easy bet. A more precise zap mechanism will be added in the future. Add a TODO to call out that KVM only needs to invalidate affected contexts. Fixes: `14c07ad89f` ("x86/kvm/mmu: introduce guest_mmu") Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-4-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:49 -04:00
Sean Christopherson	eed0030e4c	KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) Signal VM-Fail for the single-context variant of INVEPT if the specified EPTP is invalid. Per the INEVPT pseudocode in Intel's SDM, it's subject to the standard EPT checks: If VM entry with the "enable EPT" VM execution control set to 1 would fail due to the EPTP value then VMfail(Invalid operand to INVEPT/INVVPID); Fixes: `bfd0a56b90` ("nEPT: Nested INVEPT") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-3-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:48 -04:00
Sean Christopherson	e8eff28215	KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH. Remote TLB flushes require all contexts to be invalidated, not just the active contexts, e.g. all mappings in all contexts for a given HVA need to be invalidated on a mmu_notifier invalidation. Similarly, the instigator of the deferred TLB flush may be expecting all contexts to be flushed, e.g. vmx_vcpu_load_vmcs(). Without nested VMX, flushing only the current EPTP/VPID context isn't problematic because KVM uses a constant VPID for each vCPU, and mmu_alloc_direct_roots() all but guarantees KVM will use a single EPTP for L1. In the rare case where a different EPTP is created or reused, KVM (currently) unconditionally flushes the new EPTP context prior to entering the guest. With nested VMX, KVM conditionally uses a different VPID for L2, and unconditionally uses a different EPTP for L2. Because KVM doesn't _intentionally_ guarantee L2's EPTP/VPID context is flushed on nested VM-Enter, it'd be possible for a malicious L1 to attack the host and/or different VMs by exploiting the lack of flushing for L2. 1) Launch nested guest from malicious L1. 2) Nested VM-Enter to L2. 3) Access target GPA 'g'. CPU inserts TLB entry tagged with L2's ASID mapping 'g' to host PFN 'x'. 2) Nested VM-Exit to L1. 3) L1 triggers kernel same-page merging (ksm) by duplicating/zeroing the page for PFN 'x'. 4) Host kernel merges PFN 'x' with PFN 'y', i.e. unmaps PFN 'x' and remaps the page to PFN 'y'. mmu_notifier sends invalidate command, KVM flushes TLB only for L1's ASID. 4) Host kernel reallocates PFN 'x' to some other task/guest. 5) Nested VM-Enter to L2. KVM does not invalidate L2's EPTP or VPID. 6) L2 accesses GPA 'g' and gains read/write access to PFN 'x' via its stale TLB entry. However, current KVM unconditionally flushes L1's EPTP/VPID context on nested VM-Exit. But, that behavior is mostly unintentional, KVM doesn't go out of its way to flush EPTP/VPID on nested VM-Enter/VM-Exit, rather a TLB flush is guaranteed to occur prior to re-entering L1 due to __kvm_mmu_new_cr3() always being called with skip_tlb_flush=false. On nested VM-Enter, this happens via kvm_init_shadow_ept_mmu() (nested EPT enabled) or in nested_vmx_load_cr3() (nested EPT disabled). On nested VM-Exit it occurs via nested_vmx_load_cr3(). This also fixes a bug where a deferred TLB flush in the context of L2, with EPT disabled, would flush L1's VPID instead of L2's VPID, as vmx_flush_tlb() flushes L1's VPID regardless of is_guest_mode(). Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Ben Gardon <bgardon@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Junaid Shahid <junaids@google.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: John Haxby <john.haxby@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Fixes: `efebf0aaec` ("KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:48 -04:00
Wainer dos Santos Moschetta	909e0abaac	selftests: kvm: Add testcase for creating max number of memslots This patch introduces test_add_max_memory_regions(), which checks that a VM can have added memory slots up to the limit defined in KVM_CAP_NR_MEMSLOTS. Then attempt to add one more slot to verify it fails as expected. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-11-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:47 -04:00
Sean Christopherson	5b4f758f45	KVM: selftests: Make set_memory_region_test common to all architectures Make set_memory_region_test available on all architectures by wrapping the bits that are x86-specific in ifdefs. A future testcase to create the maximum number of memslots will be architecture agnostic. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-10-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:47 -04:00
Sean Christopherson	8cc2dd637b	KVM: selftests: Add "zero" testcase to set_memory_region_test Add a testcase for running a guest with no memslots to the memory region test. The expected result on x86_64 is that the guest will trigger an internal KVM error due to the initial code fetch encountering a non-existent memslot and resulting in an emulation failure. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-9-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:45 -04:00
Wainer dos Santos Moschetta	4cd94d125d	selftests: kvm: Add vm_get_fd() in kvm_util Introduces the vm_get_fd() function in kvm_util which returns the VM file descriptor. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:44 -04:00
Sean Christopherson	8fb38f05ca	KVM: selftests: Add "delete" testcase to set_memory_region_test Add a testcase for deleting memslots while the guest is running. Like the "move" testcase, this is x86_64-only as it relies on MMIO happening when a non-existent memslot is encountered. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:44 -04:00
Sean Christopherson	8a0639fe92	KVM: sefltests: Add explicit synchronization to move mem region test Use sem_post() and sem_timedwait() to synchronize test stages between the vCPU thread and the main thread instead of using usleep() to wait for the vCPU thread and hoping for the best. Opportunistically refactor the code to make it suck less in general, and to prepare for adding more testcases. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:43 -04:00
Sean Christopherson	3e6b941267	KVM: selftests: Add GUEST_ASSERT variants to pass values to host Add variants of GUEST_ASSERT to pass values back to the host, e.g. to help debug/understand a failure when the the cause of the assert isn't necessarily binary. It'd probably be possible to auto-calculate the number of arguments and just have a single GUEST_ASSERT, but there are a limited number of variants and silently eating arguments could lead to subtle code bugs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:43 -04:00
Sean Christopherson	8c996e4dae	KVM: selftests: Add util to delete memory region Add a utility to delete a memory region, it will be used by x86's set_memory_region_test. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <20200410231707.7128-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:42 -04:00
Sean Christopherson	4d9bba9007	KVM: selftests: Use kernel's list instead of homebrewed replacement Replace the KVM selftests' homebrewed linked lists for vCPUs and memory regions with the kernel's 'struct list_head'. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <20200410231707.7128-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:42 -04:00
Sean Christopherson	238022ff5d	KVM: selftests: Take vcpu pointer instead of id in vm_vcpu_rm() The sole caller of vm_vcpu_rm() already has the vcpu pointer, take it directly instead of doing an extra lookup. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <20200410231707.7128-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:41 -04:00
Eric Northup	43d05de2be	KVM: pass through CPUID(0x80000006) Return the host's L2 cache and TLB information for CPUID.0x80000006 instead of zeroing out the entry as part of KVM_GET_SUPPORTED_CPUID. This allows a userspace VMM to feed KVM_GET_SUPPORTED_CPUID's output directly into KVM_SET_CPUID2 (without breaking the guest). Signed-off-by: Eric Northup (Google) <digitaleric@gmail.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Jon Cargille <jcargill@google.com> Message-Id: <20200415012320.236065-1-jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:41 -04:00
Peter Shier	24647e0a39	KVM: x86: Return updated timer current count register from KVM_GET_LAPIC kvm_vcpu_ioctl_get_lapic (implements KVM_GET_LAPIC ioctl) does a bulk copy of the LAPIC registers but must take into account that the one-shot and periodic timer current count register is computed upon reads and is not present in register state. When restoring LAPIC state (e.g. after migration), restart timers from their their current count values at time of save. Note: When a one-shot timer expires, the code in arch/x86/kvm/lapic.c does not zero the value of the LAPIC initial count register (emulating HW behavior). If no other timer is run and pending prior to a subsequent KVM_GET_LAPIC call, the returned register set will include the expired one-shot initial count. On a subsequent KVM_SET_LAPIC call the code will see a non-zero initial count and start a new one-shot timer using the expired timer's count. This is a prior existing bug and will be addressed in a separate patch. Thanks to jmattson@google.com for this find. Signed-off-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <20181010225653.238911-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:40 -04:00
Colin Ian King	788109c1cc	KVM: remove redundant assignment to variable r The variable r is being assigned with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Message-Id: <20200410113526.13822-1-colin.king@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:40 -04:00
Uros Bizjak	56a87e5d99	KVM: SVM: Fix __svm_vcpu_run declaration. The function returns no value. Cc: Paolo Bonzini <pbonzini@redhat.com> Fixes: `199cd1d7b5` ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200409114926.1407442-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:39 -04:00
Uros Bizjak	b61f62d408	KVM: SVM: Do not setup frame pointer in __svm_vcpu_run __svm_vcpu_run is a leaf function and does not need a frame pointer. %rbp is also destroyed a few instructions later when guest registers are loaded. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200409120440.1427215-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:38 -04:00
Borislav Petkov	b2bce0a589	KVM: SVM: Fix build error due to missing release_pages() include Fix: arch/x86/kvm/svm/sev.c: In function ‘sev_pin_memory’: arch/x86/kvm/svm/sev.c:360:3: error: implicit declaration of function ‘release_pages’;\ did you mean ‘reclaim_pages’? [-Werror=implicit-function-declaration] 360 \| release_pages(pages, npinned); \| ^~~~~~~~~~~~~ \| reclaim_pages because svm.c includes pagemap.h but the carved out sev.c needs it too. Triggered by a randconfig build. Fixes: `eaf78265a4` ("KVM: SVM: Move SEV code to separate file") Signed-off-by: Borislav Petkov <bp@suse.de> Message-Id: <20200411160927.27954-1-bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:37 -04:00
Uros Bizjak	b4fd630812	KVM: SVM: Do not mark svm_vcpu_run with STACK_FRAME_NON_STANDARD svm_vcpu_run does not change stack or frame pointer anymore. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200414113612.104501-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:36 -04:00
Oliver Upton	69c0975525	kvm: nVMX: match comment with return type for nested_vmx_exit_reflected nested_vmx_exit_reflected() returns a bool, not int. As such, refer to the return values as true/false in the comment instead of 1/0. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20200414221241.134103-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:35 -04:00
Oliver Upton	b045ae906b	kvm: nVMX: reflect MTF VM-exits if injected by L1 According to SDM 26.6.2, it is possible to inject an MTF VM-exit via the VM-entry interruption-information field regardless of the 'monitor trap flag' VM-execution control. KVM appropriately copies the VM-entry interruption-information field from vmcs12 to vmcs02. However, if L1 has not set the 'monitor trap flag' VM-execution control, KVM fails to reflect the subsequent MTF VM-exit into L1. Fix this by consulting the VM-entry interruption-information field of vmcs12 to determine if L1 has injected the MTF VM-exit. If so, reflect the exit, regardless of the 'monitor trap flag' VM-execution control. Fixes: `5f3d45e7f2` ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200414224746.240324-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:35 -04:00
Sean Christopherson	97daa028f3	KVM: s390: Return last valid slot if approx index is out-of-bounds Return the index of the last valid slot from gfn_to_memslot_approx() if its binary search loop yielded an out-of-bounds index. The index can be out-of-bounds if the specified gfn is less than the base of the lowest memslot (which is also the last valid memslot). Note, the sole caller, kvm_s390_get_cmma(), ensures used_slots is non-zero. Fixes: `afdad61615` ("KVM: s390: Fix storage attributes migration with memory slots") Cc: stable@vger.kernel.org # 4.19.x: 0774a964ef56: KVM: Fix out of range accesses to memslots Cc: stable@vger.kernel.org # 4.19.x Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200408064059.8957-3-sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-14 10:39:57 -04:00
Sean Christopherson	b6467ab142	KVM: Check validity of resolved slot when searching memslots Check that the resolved slot (somewhat confusingly named 'start') is a valid/allocated slot before doing the final comparison to see if the specified gfn resides in the associated slot. The resolved slot can be invalid if the binary search loop terminated because the search index was incremented beyond the number of used slots. This bug has existed since the binary search algorithm was introduced, but went unnoticed because KVM statically allocated memory for the max number of slots, i.e. the access would only be truly out-of-bounds if all possible slots were allocated and the specified gfn was less than the base of the lowest memslot. Commit `36947254e5` ("KVM: Dynamically size memslot array based on number of used slots") eliminated the "all possible slots allocated" condition and made the bug embarrasingly easy to hit. Fixes: `9c1a5d3878` ("kvm: optimize GFN to memslot lookup with large slots amount") Reported-by: syzbot+d889b59b2bb87d4047a2@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200408064059.8957-2-sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-14 10:39:56 -04:00
Uros Bizjak	fb56baae5e	KVM: VMX: Enable machine check support for 32bit targets There is no reason to limit the use of do_machine_check to 64bit targets. MCE handling works for both target familes. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Fixes: `a0861c02a9` ("KVM: Add VT-x machine check support") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200414071414.45636-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-14 04:22:10 -04:00
Paolo Bonzini	f14eec0a32	KVM: SVM: move more vmentry code to assembly Manipulate IF around vmload/vmsave to remove the confusing usage of local_irq_enable where interrupts are actually disabled via GIF. And stuff the RSB immediately without waiting for a RET to avoid Spectre-v2 attacks. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-14 04:21:21 -04:00
Paolo Bonzini	9ef1530c0c	KVM: SVM: fix compilation with modular PSP and non-modular KVM Use svm_sev_enabled() in order to cull all calls to PSP code. Otherwise, compilation fails with undefined symbols if the PSP device driver is compiled as a module and KVM is not. Reported-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-14 04:21:15 -04:00
Vitaly Kuznetsov	dbef2808af	KVM: VMX: fix crash cleanup when KVM wasn't used If KVM wasn't used at all before we crash the cleanup procedure fails with BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23215067 P4D 23215067 PUD 23217067 PMD 0 Oops: 0000 [#8] SMP PTI CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ #823 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel] The root cause is that loaded_vmcss_on_cpu list is not yet initialized, we initialize it in hardware_enable() but this only happens when we start a VM. Previously, we used to have a bitmap with enabled CPUs and that was preventing [masking] the issue. Initialized loaded_vmcss_on_cpu list earlier, right before we assign crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and blocked_vcpu_on_cpu_lock are moved altogether for consistency. Fixes: `31603d4fc2` ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-07 08:35:36 -04:00
Wanpeng Li	4064a4c6a1	KVM: X86: Filter out the broadcast dest for IPI fastpath Except destination shorthand, a destination value 0xffffffff is used to broadcast interrupts, let's also filter out this for single target IPI fastpath. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1585815626-28370-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-07 08:34:16 -04:00
Paolo Bonzini	1b0c58a34b	KVM: s390: Fixes for vsie (nested hypervisors) - Several fixes for corner cases of nesting. Still relevant as it might crash host or first level guest or temporarily leak memory. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJejGBFAAoJEBF7vIC1phx8aMkP/R5LNqloyrEE7ysMbrrLgF9T OZ9phn9R1+wVST1Ktf8XAJDronmENaiqMNHUfEaIzmxv8IERkcTrhGcUwFCdjL4c ozMBIS7lYN/6o3Gh/Bbslovh1GscpsCBZT2AgHPDo6KsJsBCJIlBI6hNzGgg/6LI tci92q6JeFdhJNrYYB4odwFpG5Kl3o6S9rLaBjVFCKh6p/wrOMj6WV4+prKBLcJq udJZFzzzkxGAvIUSFamLRXsKXCxze+nLW7ao1SN8t1zjqiB+Geksju6CJZ37xv5L CZ4bJpwy2qc6kFN1ym1rto9g5xG/hxf+aZc+j6T7v0pURxGhLVwTwN3qDi63Mt6s eZE0WyyQVEHt3dypJMMKh/BNcMJKTixv1aav2yjR7PBpwjUCNFXnqel3OpICgO1k VBD8t5oQjvxvVrX9ekEinKY8vfk3MbsmCaYMwL6wDiUylra1f11aHvwjN/ew3qm3 RyCdl01q0Pdb7kpcuXz1uvli6rZQMi7tIZouKRhHWmRxheZKVVrJwmgJqGdA9Ru0 yEs+iuARCcB/Kw1nTRZuQvV8VJpPx2kIIRtCp2IrnPJuMEn558qpsMR+GRkmsKxk mCzpQMSBr8NkKbxfDEkXQyDIPCDDGLuDSNHiYH5PcG05o03ld3U0COtbwjkvWk2r S9uWVz1+R4XuR2nsOyTo =oinE -----END PGP SIGNATURE----- Merge tag 'kvm-s390-master-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for vsie (nested hypervisors) - Several fixes for corner cases of nesting. Still relevant as it might crash host or first level guest or temporarily leak memory.	2020-04-07 08:31:44 -04:00
David Hildenbrand	1493e0f944	KVM: s390: vsie: Fix possible race when shadowing region 3 tables We have to properly retry again by returning -EINVAL immediately in case somebody else instantiated the table concurrently. We missed to add the goto in this function only. The code now matches the other, similar shadowing functions. We are overwriting an existing region 2 table entry. All allocated pages are added to the crst_list to be freed later, so they are not lost forever. However, when unshadowing the region 2 table, we wouldn't trigger unshadowing of the original shadowed region 3 table that we replaced. It would get unshadowed when the original region 3 table is modified. As it's not connected to the page table hierarchy anymore, it's not going to get used anymore. However, for a limited time, this page table will stick around, so it's in some sense a temporary memory leak. Identified by manual code inspection. I don't think this classifies as stable material. Fixes: `998f637cc4` ("s390/mm: avoid races on region/segment/page table shadowing") Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-4-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2020-04-07 13:12:38 +02:00
David Hildenbrand	4d4cee96fb	KVM: s390: vsie: Fix delivery of addressing exceptions Whenever we get an -EFAULT, we failed to read in guest 2 physical address space. Such addressing exceptions are reported via a program intercept to the nested hypervisor. We faked the intercept, we have to return to guest 2. Instead, right now we would be returning -EFAULT from the intercept handler, eventually crashing the VM. the correct thing to do is to return 1 as rc == 1 is the internal representation of "we have to go back into g2". Addressing exceptions can only happen if the g2->g3 page tables reference invalid g2 addresses (say, either a table or the final page is not accessible - so something that basically never happens in sane environments. Identified by manual code inspection. Fixes: `a3508fbe9d` ("KVM: s390: vsie: initial support for nested virtualization") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-3-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: fix patch description] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2020-04-07 13:12:34 +02:00
David Hildenbrand	a1d032a495	KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks In case we have a region 1 the following calculation (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11) results in 64. As shifts beyond the size are undefined the compiler is free to use instructions like sllg. sllg will only use 6 bits of the shift value (here 64) resulting in no shift at all. That means that ALL addresses will be rejected. The can result in endless loops, e.g. when prefix cannot get mapped. Fixes: `4be130a084` ("s390/mm: add shadow gmap support") Tested-by: Janosch Frank <frankja@linux.ibm.com> Reported-by: Janosch Frank <frankja@linux.ibm.com> Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-2-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: fix patch description, remove WARN_ON_ONCE] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2020-04-07 13:12:18 +02:00
Oliver Upton	5c8beb4746	KVM: nVMX: don't clear mtf_pending when nested events are blocked If nested events are blocked, don't clear the mtf_pending flag to avoid missing later delivery of the MTF VM-exit. Fixes: `5ef8acbdd6` ("KVM: nVMX: Emulate MTF when performing instruction emulation") Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20200406201237.178725-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-07 04:21:41 -04:00
Uros Bizjak	da7e423209	KVM: VMX: Remove unnecessary exception trampoline in vmx_vmenter The exception trampoline in .fixup section is not needed, the exception handling code can jump directly to the label in the .text section. Changes since v1: - Fix commit message. Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200406202108.74300-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-07 04:21:20 -04:00
Uros Bizjak	199cd1d7b5	KVM: SVM: Split svm_vcpu_run inline assembly to separate file The compiler (GCC) does not like the situation, where there is inline assembly block that clobbers all available machine registers in the middle of the function. This situation can be found in function svm_vcpu_run in file kvm/svm.c and results in many register spills and fills to/from stack frame. This patch fixes the issue with the same approach as was done for VMX some time ago. The big inline assembly is moved to a separate assembly .S file, taking into account all ABI requirements. There are two main benefits of the above approach: * elimination of several register spills and fills to/from stack frame, and consequently smaller function .text size. The binary size of svm_vcpu_run is lowered from 2019 to 1626 bytes. * more efficient access to a register save array. Currently, register save array is accessed as: 7b00: 48 8b 98 28 02 00 00 mov 0x228(%rax),%rbx 7b07: 48 8b 88 18 02 00 00 mov 0x218(%rax),%rcx 7b0e: 48 8b 90 20 02 00 00 mov 0x220(%rax),%rdx and passing ia pointer to a register array as an argument to a function one gets: 12: 48 8b 48 08 mov 0x8(%rax),%rcx 16: 48 8b 50 10 mov 0x10(%rax),%rdx 1a: 48 8b 58 18 mov 0x18(%rax),%rbx As a result, the total size, considering that the new function size is 229 bytes, gets lowered by 164 bytes. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-03 10:53:57 -04:00
Joerg Roedel	eaf78265a4	KVM: SVM: Move SEV code to separate file Move the SEV specific parts of svm.c into the new sev.c file. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200324094154.32352-5-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-03 10:53:56 -04:00
Joerg Roedel	ef0f64960d	KVM: SVM: Move AVIC code to separate file Move the AVIC related functions from svm.c to the new avic.c file. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200324094154.32352-4-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-03 10:53:56 -04:00
Joerg Roedel	883b0a91f4	KVM: SVM: Move Nested SVM Implementation to nested.c Split out the code for the nested SVM implementation and move it to a separate file. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200324094154.32352-3-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-03 10:53:55 -04:00
Joerg Roedel	46a010dd68	kVM SVM: Move SVM related files to own sub-directory Move svm.c and pmu_amd.c into their own arch/x86/kvm/svm/ subdirectory. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200324094154.32352-2-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-03 10:53:47 -04:00
Linus Torvalds	8c1b724ddb	ARM: * GICv4.1 support * 32bit host removal PPC: * secure (encrypted) using under the Protected Execution Framework ultravisor s390: * allow disabling GISA (hardware interrupt injection) and protected VMs/ultravisor support. x86: * New dirty bitmap flag that sets all bits in the bitmap when dirty page logging is enabled; this is faster because it doesn't require bulk modification of the page tables. * Initial work on making nested SVM event injection more similar to VMX, and less buggy. * Various cleanups to MMU code (though the big ones and related optimizations were delayed to 5.8). Instead of using cr3 in function names which occasionally means eptp, KVM too has standardized on "pgd". * A large refactoring of CPUID features, which now use an array that parallels the core x86_features. * Some removal of pointer chasing from kvm_x86_ops, which will also be switched to static calls as soon as they are available. * New Tigerlake CPUID features. * More bugfixes, optimizations and cleanups. Generic: * selftests: cleanups, new MMU notifier stress test, steal-time test * CSV output for kvm_stat. KVM/MIPS has been broken since 5.5, it does not compile due to a patch committed by MIPS maintainers. I had already prepared a fix, but the MIPS maintainers prefer to fix it in generic code rather than KVM so they are taking care of it. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6GOnIUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMfxwf/ZKLZiRoaovXCOG71M/eHtQb8ZIqU 3MPy+On3eC5Sk/aBxWUL9EFZsbYG6kYdbZ1VOvG9XPBoLlnkDSm/IR0kaELHtnjj oGVda/tvGn46Ne39y8xBptmb91WDcWH0vFthT/CwlMxAw3xjr+gG7Qyo+8F2CW6m SSSuLiHSBnyO1cQKruBTHZ8qnR8LlnfXEqtd6Y4LFLic0LbLIoIdRcT3wjQrcZrm Djd7wbTEYZjUfoqZ72ekwEDUsONcDLDSKcguDO9pSMSCGhpxCVT5Vy68KRpoIMs2 nzNWDKjvqQo5zb2+GWxJgkd12Hv+n7PCXZMbVrWBu1pQsewUns9m4mkpGw== =6fGt -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - GICv4.1 support - 32bit host removal PPC: - secure (encrypted) using under the Protected Execution Framework ultravisor s390: - allow disabling GISA (hardware interrupt injection) and protected VMs/ultravisor support. x86: - New dirty bitmap flag that sets all bits in the bitmap when dirty page logging is enabled; this is faster because it doesn't require bulk modification of the page tables. - Initial work on making nested SVM event injection more similar to VMX, and less buggy. - Various cleanups to MMU code (though the big ones and related optimizations were delayed to 5.8). Instead of using cr3 in function names which occasionally means eptp, KVM too has standardized on "pgd". - A large refactoring of CPUID features, which now use an array that parallels the core x86_features. - Some removal of pointer chasing from kvm_x86_ops, which will also be switched to static calls as soon as they are available. - New Tigerlake CPUID features. - More bugfixes, optimizations and cleanups. Generic: - selftests: cleanups, new MMU notifier stress test, steal-time test - CSV output for kvm_stat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits) x86/kvm: fix a missing-prototypes "vmread_error" KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y KVM: VMX: Add a trampoline to fix VMREAD error handling KVM: SVM: Annotate svm_x86_ops as __initdata KVM: VMX: Annotate vmx_x86_ops as __initdata KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup() KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes KVM: VMX: Configure runtime hooks using vmx_x86_ops KVM: VMX: Move hardware_setup() definition below vmx_x86_ops KVM: x86: Move init-only kvm_x86_ops to separate struct KVM: Pass kvm_init()'s opaque param to additional arch funcs s390/gmap: return proper error code on ksm unsharing KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move() KVM: Fix out of range accesses to memslots KVM: X86: Micro-optimize IPI fastpath delay KVM: X86: Delay read msr data iff writes ICR MSR KVM: PPC: Book3S HV: Add a capability for enabling secure guests KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs ...	2020-04-02 15:13:15 -07:00
Linus Torvalds	f14a9532ee	A single fix addressing Sparse warnings. <asm/bitops.h> is changed non-trivially to avoid the warnings, but generated code is not supposed to be affected. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl6Fs/QRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iOLQ//R3i1QAIvDasdG9EMbSkJAZP2cliK8r3B Vc+3quYv8wjJcgj5LGPrlVeoV2X96jopZiN3YeWWJg0rA30ZyZMPiVkQWtYVUEtU VjGvX5RNw0ShcWjzbetcPXhyczCpJKFwFVv2fEVPwAvI3OyqGuL044aFQhgksra+ RE4n8eYWB9pastFeJGn1WPWdJOw40fOcC7YbAF3USo7e8aO/Wv3KJiZxahhGFnPt 5spBnZHSPbvZp9O8pgdYVJ09mExK2wBxk/GClQw/E4i7d/TLcHEzBIOAekS98H0F 9lNgCnFLVmEK5DA4TXMPhz+aYfEb5VFoBgz4wA4VOiwcPrTJKa0IukcG+oWXWPrB PRb8StNB3IHU0pqKPHRemyPNzl9d4DMm22NMfRBCVUrPrDlYkOb1tCANgcyHOyMf G/w2nbcNDgzi9m2L38gWCFIY5AP1AKW+0X8MdsvyESlTXIC6lsBFsjsLE69nbv7c dBYYxwEKb41bjXpWIxbdCEyW9kNZTSt5RZP+Md/2DGoeWLHba4iHmmXjhJKGF1F3 pf1yJZDVoaQkwX+mLgDyC9681UzDA0lRMrSBIhQOpw2OuCkvBRTifwvH0efbLjtN cXxeCZvK8O1Zmc/BTtdRRPWybItjtZmkfm2iVviFUxY566i/vAQdmQBST4Uqq4qv 2V9nnZVEJas= =Vx32 -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "A single fix addressing Sparse warnings. <asm/bitops.h> is changed non-trivially to avoid the warnings, but generated code is not supposed to be affected" * tag 'x86-urgent-2020-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix bitops.h warning with a moved cast	2020-04-02 14:52:12 -07:00
Linus Torvalds	7f218319ca	Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Just a couple of updates for linux-5.7: - A new Kconfig option to enable IMA architecture specific runtime policy rules needed for secure and/or trusted boot, as requested. - Some message cleanup (eg. pr_fmt, additional error messages)" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: add a new CONFIG for loading arch-specific policies integrity: Remove duplicate pr_fmt definitions IMA: Add log statements for failure conditions IMA: Update KBUILD_MODNAME for IMA files to ima	2020-04-02 14:49:46 -07:00
Linus Torvalds	6cad420cc6	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: "A large amount of MM, plenty more to come. Subsystems affected by this patch series: - tools - kthread - kbuild - scripts - ocfs2 - vfs - mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap, sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy, hugetlbfs, hugetlb" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits) include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS selftests/vm: fix map_hugetlb length used for testing read and write mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge() mm/hugetlb.c: clean code by removing unnecessary initialization hugetlb_cgroup: add hugetlb_cgroup reservation docs hugetlb_cgroup: add hugetlb_cgroup reservation tests hugetlb: support file_region coalescing again hugetlb_cgroup: support noreserve mappings hugetlb_cgroup: add accounting for shared mappings hugetlb: disable region_add file_region coalescing hugetlb_cgroup: add reservation accounting for private mappings mm/hugetlb_cgroup: fix hugetlb_cgroup migration hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations hugetlb_cgroup: add hugetlb_cgroup reservation counter hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization mm/memblock.c: remove redundant assignment to variable max_addr mm: mempolicy: require at least one nodeid for MPOL_PREFERRED mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk() ...	2020-04-02 13:55:34 -07:00
Linus Torvalds	7be97138e7	New code for 5.7: - Fix a hard to trigger race between iclog error checking and log shutdown. - Strengthen the AGF verifier. - Ratelimit some of the more spammy error messages. - Remove the icdinode uid/gid members and just use the ones in the vfs inode. - Hold ILOCK across insert/collapse range. - Clean up the extended attribute interfaces. - Clean up the attr flags mess. - Restore PF_MEMALLOC after exiting xfsaild thread to avoid triggering warnings in the process accounting code. - Remove the flexibly-sized array from struct xfs_agfl to eliminate compiler warnings about unaligned pointers and packed structures. - Various macro and typedef removals. - Stale metadata buffers if we decide they're corrupt outside of a verifier. - Check directory data/block/free block owners. - Fix a UAF when aborting inactivation of a corrupt xattr fork. - Teach online scrub to report failed directory and attr name lookups as a metadata corruption instead of a runtime error. - Avoid potential buffer overflows in sysfs files by using scnprintf. - Fix a regression in getdents lookups due to a mistake in pointer arithmetic. - Refactor btree cursor private data structures to use anonymous unions. - Cleanups in the log unmounting code. - Fix a potential mishandling of ENOMEM errors on multi-block directory buffer lookups. - Fix an incorrect test in the block allocation code. - Cleanups and name prefix shortening in the scrub code. - Introduce btree bulk loading code for online repair and scrub. - Fix a quotaoff log item leak (and hang) when the fs goes down midway through a quotaoff operation. - Remove di_version from the incore inode. - Refactor some of the log shutdown checking code. - Record the forcing of the log unmount records in the log force counters. - Fix a longstanding bug where quotacheck would purge the administrator's default quota grace interval and warning limits. - Reduce memory usage when scrubbing directory and xattr trees. - Don't let fsfreeze race with GETFSMAP or online scrub. - Handle bio_add_page failures more gracefully in xlog_write_iclog. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl58z3AACgkQ+H93GTRK tOukrRAAhJmowV5+Req5YMYawRjafkIbCDH3WlFy9AdpFFA6pXSfX6YCtIKwKfq8 +yRj/BFRGoMc6SouXo+J0i3YMS6yQZTjcmVWrQPVnj/+DGVjh+Y70gKExtz2CyjO ItGGxpRwOhpw49zVYmcH6Mrw8sBztHR0VsM0cq6YfJrkNcm0BsnAC+W6zQNaDG24 UO1ivehBOooVh0C8pv0smVcPtBL2N+RRyS3XRT5hGFozUJgLLGDqnHAl1d+KOrWp hPQhUlDw9luiHPBxWkxUuFDr79gjUi7kyHILNt7TIkByyRcTUO9jhS2VpZd4oXlj /J3i1AS+9lhP1yGVxw2RHQhKMvdYBQiLADSCpzkA1dMma99cFGyzMMA6rG0WRMJ4 erXxhAEoM4um3gxDka6+HJxySLOT8E22FesJbn6YIv4QSAkXDBPWz/9hPbjJuJQm 6Y/YkFOZLp3c+xJM0tpCWxWaWW7A+t2OMRIFISSsXesrySpalpbkVXkHwz3NwO6L 3SeTnLWqnADbjl2qsuyF0uYHqURygVz7g+r4X7AO5D1IRyCCkmtDOuwumxERiQ3p 3vZMQrWh+y3SgRiF8brDG5KTshhxcinKdHEYXrwq3XgaHZg4mtLI4XjOyZlJruoX MGWhZjga6+RGysH0RKjZbHaMr/f4m3X00SHa/5Ibcp6Q21TIx6M= =8iJB -----END PGP SIGNATURE----- Merge tag 'xfs-5.7-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Darrick Wong: "There's a lot going on this cycle with cleanups in the log code, the btree code, and the xattr code. We're tightening of metadata validation and online fsck checking, and introducing a common btree rebuilding library so that we can refactor xfs_repair and introduce online repair in a future cycle. We also fixed a few visible bugs -- most notably there's one in getdents that we introduced in 5.6; and a fix for hangs when disabling quotas. This series has been running fstests & other QA in the background for over a week and looks good so far. I anticipate sending a second pull request next week. That batch will change how xfs interacts with memory reclaim; how the log batches and throttles log items; how hard writes near ENOSPC will try to squeeze more space out of the filesystem; and hopefully fix the last of the umount hangs after a catastrophic failure. That should ease a lot of problems when running at the limits, but for now I'm leaving that in for-next for another week to make sure we got all the subtleties right. Summary: - Fix a hard to trigger race between iclog error checking and log shutdown. - Strengthen the AGF verifier. - Ratelimit some of the more spammy error messages. - Remove the icdinode uid/gid members and just use the ones in the vfs inode. - Hold ILOCK across insert/collapse range. - Clean up the extended attribute interfaces. - Clean up the attr flags mess. - Restore PF_MEMALLOC after exiting xfsaild thread to avoid triggering warnings in the process accounting code. - Remove the flexibly-sized array from struct xfs_agfl to eliminate compiler warnings about unaligned pointers and packed structures. - Various macro and typedef removals. - Stale metadata buffers if we decide they're corrupt outside of a verifier. - Check directory data/block/free block owners. - Fix a UAF when aborting inactivation of a corrupt xattr fork. - Teach online scrub to report failed directory and attr name lookups as a metadata corruption instead of a runtime error. - Avoid potential buffer overflows in sysfs files by using scnprintf. - Fix a regression in getdents lookups due to a mistake in pointer arithmetic. - Refactor btree cursor private data structures to use anonymous unions. - Cleanups in the log unmounting code. - Fix a potential mishandling of ENOMEM errors on multi-block directory buffer lookups. - Fix an incorrect test in the block allocation code. - Cleanups and name prefix shortening in the scrub code. - Introduce btree bulk loading code for online repair and scrub. - Fix a quotaoff log item leak (and hang) when the fs goes down midway through a quotaoff operation. - Remove di_version from the incore inode. - Refactor some of the log shutdown checking code. - Record the forcing of the log unmount records in the log force counters. - Fix a longstanding bug where quotacheck would purge the administrator's default quota grace interval and warning limits. - Reduce memory usage when scrubbing directory and xattr trees. - Don't let fsfreeze race with GETFSMAP or online scrub. - Handle bio_add_page failures more gracefully in xlog_write_iclog" * tag 'xfs-5.7-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (108 commits) xfs: prohibit fs freezing when using empty transactions xfs: shutdown on failure to add page to log bio xfs: directory bestfree check should release buffers xfs: drop all altpath buffers at the end of the sibling check xfs: preserve default grace interval during quotacheck xfs: remove xlog_state_want_sync xfs: move the ioerror check out of xlog_state_clean_iclog xfs: refactor xlog_state_clean_iclog xfs: remove the aborted parameter to xlog_state_done_syncing xfs: simplify log shutdown checking in xfs_log_release_iclog xfs: simplify the xfs_log_release_iclog calling convention xfs: factor out a xlog_wait_on_iclog helper xfs: merge xlog_cil_push into xlog_cil_push_work xfs: remove the di_version field from struct icdinode xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize xfs: simplify di_flags2 inheritance in xfs_ialloc xfs: only check the superblock version for dinode size calculation xfs: add a new xfs_sb_version_has_v3inode helper xfs: fix unmount hang and memory leak on shutdown during quotaoff xfs: factor out quotaoff intent AIL removal and memory free ...	2020-04-02 13:02:07 -07:00
Linus Torvalds	7db83c070b	New code for 5.7: - Fix a regression where we broke the userspace hibernation driver by disallowing writes to the swap device. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl542q0ACgkQ+H93GTRK tOun8A//QIAvIuMQl9k/S4lDqvVNAmSMJDdp0v3x+BOMBDmbqJeDO+D9u59nVWAP zun1Zp3weO7v8kMBPDyvTVhKP0Z9v8ogQj4yT22W0YiBsKgsaqM9tupc3NPm036V oPusLFC44RRXyLZSjBhNr3xYTBqeGmJMKBUGrnwYeQK2g87o3gi8s9KmVq3olp/L W/ZvFgmTl4FpbA1aNaMtZ1YBawu9wyQDvmtZtnD7xuXGKGsQjGUt20P7yuFu2Mb8 vmUHNcCBG29j8Fwd+6Gub2Jg25BhLGBSjftLHcGdG8aRN4Y5DQ3w+rBwUD7fHQmi u0DXMnPIP8twsQPKwWabfZ3PMqyfUiz5rSnJGGd+T7uPP5xYvpKhYGm8IBPQb699 2LY4NZKQqp9IWSbwmU7jSwCEl0x/GDMflF3frpfTmvCDvpW7TUQf8lJsVsF6OcNP uGJPz3AoE5ebt2XD+IWCurWrfn/nGnxp9ZEKjK69nm3BFXI0GRdqBq6lueBsh6re zKUoFp7IHIb4ET6V21JPq9iyKUlKLqgb+rpqcA4CwA4tvJZkZXVlYOdwi1CWse3o 8o9xaDmW1murvc0XrnQu8Z8way1nkUYBBkhRJsHCdy8Qn2xA3fyVumZGichNNccO Mzu8+IKttCTBK8ZY6iAXsjbL61eR1vrr3GGbz4kh6dZy5fBX99c= =y7GN -----END PGP SIGNATURE----- Merge tag 'vfs-5.7-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull hibernation fix from Darrick Wong: "Fix a regression where we broke the userspace hibernation driver by disallowing writes to the swap device" * tag 'vfs-5.7-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: hibernate: Allow uswsusp to write to swap	2020-04-02 12:59:36 -07:00

1 2 3 4 5 ...

910930 Commits