Pass @opaque to kvm_arch_hardware_setup() and
kvm_arch_check_processor_compat() to allow architecture specific code to
reference @opaque without having to stash it away in a temporary global
variable. This will enable x86 to separate its vendor specific callback
ops, which are passed via @opaque, into "init" and "runtime" ops without
having to stash away the "init" ops.
No functional change intended.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Cornelia Huck <cohuck@redhat.com> #s390
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Add a capability for enabling secure guests under the Protected
Execution Framework ultravisor
* Various bug fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJegnq3AAoJEJ2a6ncsY3GfrU8IANpaxS7kTAsW3ZN0wJnP2PHq
i8j7ZPlCVLpkQWArrMyMimDLdiN9VP7lvaWonAWjG0HJrxlmMUUVnwVSMuPTXhWY
vSwEqhUXwSF5KKxN7DYkgxqaKFxkElJIubVE/AYJjH9zFpu9ca4vM5sCxnzWcvS3
hxPNe756nKhFH9xhrC/9NRUZWmDAiv75wvzq+5DRKbSVPJGJugchdIPBbi3Yrr7P
qxnUCPZdCxzXXU94bfQl038wrjSMR3S7b4FvekJ12go2FalujqzsL2lVtift4Rf0
jvu+RIINcmTiaQqclY332j7l24LZ6Pni456RygT5OwFxuYoxWKZRafN16JmaMCo=
=rpJL
-----END PGP SIGNATURE-----
Merge tag 'kvm-ppc-next-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
KVM PPC update for 5.7
* Add a capability for enabling secure guests under the Protected
Execution Framework ultravisor
* Various bug fixes and cleanups.
- return the proper error to userspace when a signal interrupts the
KSM unsharing operation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJegd4AAAoJEBF7vIC1phx8sAAP/A28obxzB1GoCujctNQWA4UG
xBTLdgjMS4gb/zE0szA9zK7A+rz9rtmZEUNRDayBnaAg69PvLz/WZTSjt6AqHfJr
rN7rVd4x3YOCmLGLM1ENqyNF19R+PsESxUiwMQ12nYC9jA9rPSP7VjcSQJnJa6A5
1BfTvHdrBW/M5rCsuYX5wFqcwx5VmktaLpwfm8GaXAaF7zqjOF1v8c33n6VDZ/u8
2ZtvEIOWOdufeiZ+hlJd8J+1jrXVYf1ZowdadAxKnqwilgB/gmK55i/Akq/z+/2Y
MbmlKQWh4u5C+TZ1fF11qjRSanFT/4nd/RCPCa3mGc/GvxCkmsWjhQPy6Rih6e7f
QHkMcbcLQHmWnlS8b7B1jgPzr29BgkRppi4CETmlxzBjVPxujuOTfBjQItTUHdvu
vN9M9D0SNJ6vejUI+umvJSPkLeflj2/ym74aWXqzuPKJW493t7ngkyjMQN4Ffnz+
RHX2wDH3igclPRy9j9emln7edYdxHKYPG9ZKe4puNQAs6U5fzwE+i3NtpVxzN7OY
vWM94CN2CQ7yCdzhDXEBjD2MSShdz30HiLP2fNc369I6gyVap18mXHbhZmybenvM
laUTTCArqlkGzBRpOZZzkpYPlySEnpTRMftqWCPu3AfBNLi7lwbaxHp2lFBChrZ1
gbcznpu8iuNi767cS9/d
=Wf8x
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-5.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fix for error codes
- return the proper error to userspace when a signal interrupts the
KSM unsharing operation
If a signal is pending we might return -ENOMEM instead of -EINTR.
We should propagate the proper error during KSM unsharing.
unmerge_ksm_pages returns -ERESTARTSYS on signal_pending. This gets
translated by entry.S to -EINTR. It is important to get this error
code so that userspace can retry.
To make this clearer we also add -EINTR to the documentation of the
PV_ENABLE call, which calls unmerge_ksm_pages.
Fixes: 3ac8e38015 ("s390/mm: disable KSM for storage key enabled pages")
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
- mark sie control block as 512 byte aligned
- use fallthrough;
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJeegAeAAoJEBF7vIC1phx8PgMP/jcN57GRXU0yML+r8izA1JBZ
0hnV9Mesec5sUDyDLTLuAJBS7tDvfLKi8A6QLdyTuPkUihPpWp0VD/QW4tJe1WDD
MOPA56xH3yN7ll/q+IX7QurwY4jDCReV8DIsHjrOvDYKSBHxZSsgrR/G5ivtqzRJ
VWl1qpB3/FgPtsdTW4x1nciaAYulWp7K75F2zWFpoRe0EZUXY8IyLJMZjScMVRnk
DbdA8+jsFqeYCJC1JSnE6TShP2RvDNu5NjE2pInyWbXAA1PwcFM+bML+ZZ4kJlii
9+cDqctXNC4MdxOJVMKEOEdoQ1M3yYzKn/J7an7Zly7kY54G8uJkQGCaBmlg1K0k
r57WP0DpFu5kuNFFJg52bBcAOUPEgFiyIitICG+nMn9BjLA3zY7bjUSCGHviIzLm
pNPL+t7KVWyyn5t8X25CuFUkCQskDSALpC3SFdL4iPo+fpBBw/GPqngjmx4zdrHg
jkOImPWt2n2KK9I4kfxw/NIx8Q06wHHzIpY2uutHA++TNF8WMgXR1RvJgZvOAO0G
JLGaOUZNV5BcTsK38ftEMh9awqBG9J4l8DPwPoRG/9r394W25O1KdjiNnTVH9qYU
INUArhtfeeAYoSteDCflD7cgORbGQLOXBePtCO8+Ayt2lTJLW98CDfCrCAKawQLl
O+RMxuDshobeH/AGT7yr
=nsOM
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: cleanups for 5.7
- mark sie control block as 512 byte aligned
- use fallthrough;
Fix a copy-paste typo in a comment and error message.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320205546.2396-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reset the LRU slot if it becomes invalid when deleting a memslot to fix
an out-of-bounds/use-after-free access when searching through memslots.
Explicitly check for there being no used slots in search_memslots(), and
in the caller of s390's approximation variant.
Fixes: 36947254e5 ("KVM: Dynamically size memslot array based on number of used slots")
Reported-by: Qian Cai <cai@lca.pw>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320205546.2396-2-sean.j.christopherson@intel.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Delay read msr data until we identify guest accesses ICR MSR to avoid
to penalize all other MSR writes.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1585189202-1708-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
At present, on Power systems with Protected Execution Facility
hardware and an ultravisor, a KVM guest can transition to being a
secure guest at will. Userspace (QEMU) has no way of knowing
whether a host system is capable of running secure guests. This
will present a problem in future when the ultravisor is capable of
migrating secure guests from one host to another, because
virtualization management software will have no way to ensure that
secure guests only run in domains where all of the hosts can
support secure guests.
This adds a VM capability which has two functions: (a) userspace
can query it to find out whether the host can support secure guests,
and (b) userspace can enable it for a guest, which allows that
guest to become a secure guest. If userspace does not enable it,
KVM will return an error when the ultravisor does the hypercall
that indicates that the guest is starting to transition to a
secure guest. The ultravisor will then abort the transition and
the guest will terminate.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
The vgic-state debugfs file could do with showing the pending state
of the HW-backed SGIs. Plug it into the low-level code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-24-maz@kernel.org
Just like for VLPIs, it is beneficial to avoid trapping on WFI when the
vcpu is using the GICv4.1 SGIs.
Add such a check to vcpu_clear_wfx_traps().
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-23-maz@kernel.org
Each time a Group-enable bit gets flipped, the state of these bits
needs to be forwarded to the hardware. This is a pretty heavy
handed operation, requiring all vcpus to reload their GICv4
configuration. It is thus implemented as a new request type.
These enable bits are programmed into the HW by setting the VGrp{0,1}En
fields of GICR_VPENDBASER when the vPEs are made resident again.
Of course, we only support Group-1 for now...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-22-maz@kernel.org
The GICv4.1 architecture gives the hypervisor the option to let
the guest choose whether it wants the good old SGIs with an
active state, or the new, HW-based ones that do not have one.
For this, plumb the configuration of SGIs into the GICv3 MMIO
handling, present the GICD_TYPER2.nASSGIcap to the guest,
and handle the GICD_CTLR.nASSGIreq setting.
In order to be able to deal with the restore of a guest, also
apply the GICD_CTLR.nASSGIreq setting at first run so that we
can move the restored SGIs to the HW if that's what the guest
had selected in a previous life.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-21-maz@kernel.org
In order to let a guest buy in the new, active-less SGIs, we
need to be able to switch between the two modes.
Handle this by stopping all guest activity, transfer the state
from one mode to the other, and resume the guest. Nothing calls
this code so far, but a later patch will plug it into the MMIO
emulation.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-20-maz@kernel.org
Most of the GICv3 emulation code that deals with SGIs now has to be
aware of the v4.1 capabilities in order to benefit from it.
Add such support, keyed on the interrupt having the hw flag set and
being a SGI.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-19-maz@kernel.org
As GICv4.1 understands the life cycle of doorbells (instead of
just randomly firing them at the most inconvenient time), just
enable them at irq_request time, and be done with it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-18-maz@kernel.org
Now that we have HW-accelerated SGIs being delivered to VPEs, it
becomes required to map the VPEs on all ITSs instead of relying
on the lazy approach that we would use when using the ITS-list
mechanism.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-17-maz@kernel.org
In order to hide some of the differences between v4.0 and v4.1, move
the doorbell management out of the KVM code, and into the GICv4-specific
layer. This allows the calling code to ask for the doorbell when blocking,
and otherwise to leave the doorbell permanently disabled.
This matches the v4.1 code perfectly, and only results in a minor
refactoring of the v4.0 code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-14-maz@kernel.org
Just like for vLPIs, there is some configuration information that cannot
be directly communicated through the normal irqchip API, and we have to
use our good old friend set_vcpu_affinity as a side-band communication
mechanism.
This is used to configure group and priority for a given vSGI.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-13-maz@kernel.org
To implement the get/set_irqchip_state callbacks (limited to the
PENDING state), we have to use a particular set of hacks:
- Reading the pending state is done by using a pair of new redistributor
registers (GICR_VSGIR, GICR_VSGIPENDR), which allow the 16 interrupts
state to be retrieved.
- Setting the pending state is done by generating it as we'd otherwise do
for a guest (writing to GITS_SGIR).
- Clearing the pending state is done by emitting a VSGI command with the
"clear" bit set.
This requires some interesting locking though:
- When talking to the redistributor, we must make sure that the VPE
affinity doesn't change, hence taking the VPE lock.
- At the same time, we must ensure that nobody accesses the same
redistributor's GICR_VSGIR registers for a different VPE, which
would corrupt the reading of the pending bits. We thus take the
per-RD spinlock. Much fun.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-12-maz@kernel.org
Implement mask/unmask for virtual SGIs by calling into the
configuration helper.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-11-maz@kernel.org
The GICv4.1 ITS has yet another new command (VSGI) which allows
a VPE-targeted SGI to be configured (or have its pending state
cleared). Add support for this command and plumb it into the
activate irqdomain callback so that it is ready to be used.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-10-maz@kernel.org
Since GICv4.1 has the capability to inject 16 SGIs into each VPE,
and that I'm keen not to invent too many specific interfaces to
manipulate these interrupts, let's pretend that each of these SGIs
is an actual Linux interrupt.
For that matter, let's introduce a minimal irqchip and irqdomain
setup that will get fleshed up in the following patches.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-9-maz@kernel.org
Although we have to bounce between HYP and SVC to decompress and
relocate the kernel, we don't need to be able to use it in the
kernel itself. So let's drop the functionnality.
Since the vectors are never changed, there is no need to reset them
either, and nobody calls that stub anyway. The last function
(SOFT_RESTART) is still present in order to support kexec.
Signed-off-by: Marc Zyngier <maz@kernel.org>
We used to use a set of macros to provide support of vgic-v3 to 32bit
without duplicating everything. We don't need it anymore, so drop it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
That's it. Remove all references to KVM itself, and document
that although it is no more, the ABI between SVC and HYP still
exists.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Only one platform is building KVM by default. How crazy! Remove
it whilst nobody is watching.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
As we're about to drop KVM/arm on the floor, carefully unplug
it from the build system.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
When the call to UV_REGISTER_MEM_SLOT is failing, for instance because
there is not enough free secured memory, the Hypervisor (HV) has to call
UV_RETURN to report the error to the Ultravisor (UV). Then the UV will call
H_SVM_INIT_ABORT to abort the securing phase and go back to the calling VM.
If the kvm->arch.secure_guest is not set, in the return path rfid is called
but there is no valid context to get back to the SVM since the Hcall has
been routed by the Ultravisor.
Move the setting of kvm->arch.secure_guest earlier in
kvmppc_h_svm_init_start() so in the return path, UV_RETURN will be called
instead of rfid.
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The Hcall named H_SVM_* are reserved to the Ultravisor. However, nothing
prevent a malicious VM or SVM to call them. This could lead to weird result
and should be filtered out.
Checking the Secure bit of the calling MSR ensure that the call is coming
from either the Ultravisor or a SVM. But any system call made from a SVM
are going through the Ultravisor, and the Ultravisor should filter out
these malicious call. This way, only the Ultravisor is able to make such a
Hcall.
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Ram Pai <linuxram@us.ibnm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Gracefully handle faults on VMXON, e.g. #GP due to VMX being disabled by
BIOS, instead of letting the fault crash the system. Now that KVM uses
cpufeatures to query support instead of reading MSR_IA32_FEAT_CTL
directly, it's possible for a bug in a different subsystem to cause KVM
to incorrectly attempt VMXON[*]. Crashing the system is especially
annoying if the system is configured such that hardware_enable() will
be triggered during boot.
Oppurtunistically rename @addr to @vmxon_pointer and use a named param
to reference it in the inline assembly.
Print 0xdeadbeef in the ultra-"rare" case that reading MSR_IA32_FEAT_CTL
also faults.
[*] https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.com
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-4-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Subsume loaded_vmcs_init() into alloc_loaded_vmcs(), its only remaining
caller, and drop the VMCLEAR on the shadow VMCS, which is guaranteed to
be NULL. loaded_vmcs_init() was previously used by loaded_vmcs_clear(),
but loaded_vmcs_clear() also subsumed loaded_vmcs_init() to properly
handle smp_wmb() with respect to VMCLEAR.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-3-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.
Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs(). This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.
Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel. Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.
Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration. list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.
In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR. Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().
Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.
[*] https://patchwork.kernel.org/patch/1675731/#3720461
Fixes: 8f536b7697 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For CPU supporting fast short REP MOV (XF86_FEATURE_FSRM) e.g Icelake,
Tigerlake, expose it in KVM supported cpuid as well.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Message-Id: <20200323092236.3703-1-zhenyuw@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add an alternative format that can be more easily used for further
processing later on.
Note that we add a timestamp in the first column for both, the regular
and the new csv format.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Message-Id: <20200306114250.57585-5-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This now controls both, the refresh rate of the interactive mode as well
as the logging mode. Which, as a consequence, means that the default of
logging mode is now 3s, too (use command line switch '-s' to adjust to
your liking).
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Message-Id: <20200306114250.57585-4-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
optparse is deprecated for a while, hence switching over to argparse
(which also works with python2).
As a consequence, help output has some subtle changes, the most
significant one being that the options are all listed explicitly
instead of a universal '[options]' indicator. Also, some of the error
messages are phrased slightly different.
While at it, squashed a number of minor PEP8 issues.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Message-Id: <20200306114250.57585-3-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make sure command line arguments are sorted alphabetically
everywhere, and adjusted existing texts for interactive command 's' to
become consistent with the long form --set-delay.
Throwing in some PEP8 fixes (all cosmetics) for good measure.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Message-Id: <20200306114250.57585-2-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The sie block must be aligned to 512 bytes. Mark it as such.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
One of the new features of GICv4.1 is to allow virtual SGIs to be
directly signaled to a VPE. For that, the ITS has grown a new
64kB page containing only a single register that is used to
signal a SGI to a given VPE.
Add a second mapping covering this new 64kB range, and take this
opportunity to limit the original mapping to 64kB, which is enough
to cover the span of the ITS registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-8-maz@kernel.org