Go to file
Sean Christopherson eca6be566d KVM: doc: Document the life cycle of a VM and its resources
The series to add memcg accounting to KVM allocations[1] states:

  There are many KVM kernel memory allocations which are tied to the
  life of the VM process and should be charged to the VM process's
  cgroup.

While it is correct to account KVM kernel allocations to the cgroup of
the process that created the VM, it's technically incorrect to state
that the KVM kernel memory allocations are tied to the life of the VM
process.  This is because the VM itself, i.e. struct kvm, is not tied to
the life of the process which created it, rather it is tied to the life
of its associated file descriptor.  In other words, kvm_destroy_vm() is
not invoked until fput() decrements its associated file's refcount to
zero.  A simple example is to fork() in Qemu and have the child sleep
indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
descriptor *and* the rogue child is killed.

The allocations are guaranteed to be *accounted* to the process which
created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
ioctl() with -EIO if kvm->mm != current->mm.  I.e. the child can keep
the VM "alive" but can't do anything useful with its reference.

Note that because 'struct kvm' also holds a reference to the mm_struct
of its owner, the above behavior also applies to userspace allocations.

Given that mucking with a VM's file descriptor can lead to subtle and
undesirable behavior, e.g. memcg charges persisting after a VM is shut
down, explicitly document a VM's lifecycle and its impact on the VM's
resources.

Alternatively, KVM could aggressively free resources when the creating
process exits, e.g. via mmu_notifier->release().  However, mmu_notifier
isn't guaranteed to be available, and freeing resources when the creator
exits is likely to be error prone and fragile as KVM would need to
ensure that it only freed resources that are truly out of reach. In
practice, the existing behavior shouldn't be problematic as a properly
configured system will prevent a child process from being moved out of
the appropriate cgroup hierarchy, i.e. prevent hiding the process from
the OOM killer, and will prevent an unprivileged user from being able to
to hold a reference to struct kvm via another method, e.g. debugfs.

[1]https://patchwork.kernel.org/patch/10806707/

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-15 19:24:33 +01:00
arch Third PPC KVM update for 5.1 2019-03-15 19:16:51 +01:00
block for-linus-20190209 2019-02-09 10:26:09 -08:00
certs kbuild: remove redundant target cleaning on failure 2019-01-06 09:46:51 +09:00
crypto crypto: sm3 - fix undefined shift by >= width of value 2019-01-10 21:37:32 +08:00
Documentation KVM: doc: Document the life cycle of a VM and its resources 2019-03-15 19:24:33 +01:00
drivers KVM/arm updates for Linux v5.1 2019-02-22 17:45:05 +01:00
firmware kbuild: change filechk to surround the given command with { } 2019-01-06 09:46:51 +09:00
fs for-linus-20190209 2019-02-09 10:26:09 -08:00
include KVM/arm updates for Linux v5.1 2019-02-22 17:45:05 +01:00
init psi: clarify the Kconfig text for the default-disable option 2019-02-01 15:46:24 -08:00
ipc ipc: IPCMNI limit check for semmni 2018-10-31 08:54:14 -07:00
kernel Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-02-10 09:48:18 -08:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-08 11:21:54 -08:00
LICENSES This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
mm mm: migrate: don't rely on __PageMovable() of newpage after unlocking it 2019-02-01 15:46:24 -08:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-08 11:21:54 -08:00
samples samples: mei: use /dev/mei0 instead of /dev/mei 2019-01-30 15:24:45 +01:00
scripts Bug fixes for gcc-plugins 2019-01-21 13:07:03 +13:00
security apparmor: Fix aa_label_build() error handling for failed merges 2019-02-01 08:01:39 -08:00
sound sound fixes for 5.0-rc6 2019-02-07 08:33:56 +00:00
tools selftests: kvm: add selftest for releasing VM file descriptor while in L2 2019-02-12 13:12:12 +01:00
usr user/Makefile: Fix typo and capitalization in comment section 2018-12-11 00:18:03 +09:00
virt KVM/arm updates for Linux v5.1 2019-02-22 17:45:05 +01:00
.clang-format clang-format: Update .clang-format with the latest for_each macro list 2019-01-19 19:26:06 +01:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: Add support for DT binding schema checks 2018-12-13 09:41:32 -06:00
.mailmap A few early MIPS fixes for 4.21: 2019-01-05 12:48:25 -08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS Add CREDITS entry for Shaohua Li 2019-01-04 14:27:09 -07:00
Kbuild kbuild: use assignment instead of define ... endef for filechk_* rules 2019-01-06 10:22:35 +09:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS MAINTAINERS: Add KVM selftests to existing KVM entry 2019-03-15 19:16:48 +01:00
Makefile Linux 5.0-rc6 2019-02-10 14:42:20 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.