kernel_optimize_test

Go to file

Johannes Weiner 4e37504d1c psi: avoid divide-by-zero crash inside virtual machines We've been seeing hard-to-trigger psi crashes when running inside VM instances: divide error: 0000 [#1] SMP PTI Modules linked in: [...] CPU: 0 PID: 212 Comm: kworker/0:2 Not tainted 4.16.18-119_fbk9_3817_gfe944c98d695 #119 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: events psi_clock RIP: 0010:psi_update_stats+0x270/0x490 RSP: 0018:ffffc90001117e10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800a35a13f8 RDX: 0000000000000000 RSI: ffff8800a35a1340 RDI: 0000000000000000 RBP: 0000000000000658 R08: ffff8800a35a1470 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000f8502 FS: 0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fbe370fa000 CR3: 00000000b1e3a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: psi_clock+0x12/0x50 process_one_work+0x1e0/0x390 worker_thread+0x2b/0x3c0 ? rescuer_thread+0x330/0x330 kthread+0x113/0x130 ? kthread_create_worker_on_cpu+0x40/0x40 ? SyS_exit_group+0x10/0x10 ret_from_fork+0x35/0x40 Code: 48 0f 47 c7 48 01 c2 45 85 e4 48 89 16 0f 85 e6 00 00 00 4c 8b 49 10 4c 8b 51 08 49 69 d9 f2 07 00 00 48 6b c0 64 4c 8b 29 31 d2 <48> f7 f7 49 69 d5 8d 06 00 00 48 89 c5 4c 69 f0 00 98 0b 00 48 The Code-line points to `period` being 0 inside update_stats(), and we divide by that when calculating that period's pressure percentage. The elapsed period should never be 0. The reason this can happen is due to an off-by-one in the idle time / missing period calculation combined with a coarse sched_clock() in the virtual machine. The target time for aggregation is advanced into the future on a fixed grid to prevent clock drift. So when an aggregation runs after some idle period, we can not just set it to "now + psi_period", but have to calculate the downtime and advance the target time relative to itself. However, if the aggregator was disabled exactly one psi_period (ns), we drop one idle period in the calculation due to a > when we should do >=. In that case, next_update will be advanced from 'now - psi_period' to 'now' when it should be moved to 'now + psi_period'. The run finishes with last_update == next_update == sched_clock(). With hardware clocks, this exact nanosecond match isn't likely in the first place; but if it does happen, the clock will still have moved on and the period non-zero by the time the worker runs. A pointlessly short period, but besides the extra work, no harm no foul. However, a slow sched_clock() like we have on VMs might not have advanced either by the time the worker runs again. And when we calculate the elapsed period, the result, our pressure divisor, will be 0. Ouch. Fix this by correctly handling the situation when the elapsed time between aggregation runs is precisely two periods, and advance the expiration timestamp correctly to period into the future. Link: http://lkml.kernel.org/r/20190214193157.15788-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Łukasz Siudut <lsiudut@fb.com Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2019-02-21 09:01:00 -08:00
arch	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-02-19 16:13:19 -08:00
block	for-linus-20190215	2019-02-15 09:12:28 -08:00
certs	kbuild: remove redundant target cleaning on failure	2019-01-06 09:46:51 +09:00
crypto	net: crypto set sk to NULL when af_alg_release.	2019-02-18 12:01:24 -08:00
Documentation	doc: Mention MSG_ZEROCOPY implementation for UDP	2019-02-17 15:30:02 -08:00
drivers	Pin control fixes for the v5.0 series:	2019-02-20 09:39:53 -08:00
firmware	kbuild: change filechk to surround the given command with { }	2019-01-06 09:46:51 +09:00
fs	proc, oom: do not report alien mms when setting oom_score_adj	2019-02-21 09:01:00 -08:00
include	Merge branch 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security	2019-02-20 09:09:33 -08:00
init	revert "initramfs: cleanup incomplete rootfs"	2019-02-21 09:00:59 -08:00
ipc	ipc: IPCMNI limit check for semmni	2018-10-31 08:54:14 -07:00
kernel	psi: avoid divide-by-zero crash inside virtual machines	2019-02-21 09:01:00 -08:00
lib	Merge branch 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security	2019-02-20 09:09:33 -08:00
LICENSES	This is a fairly typical cycle for documentation. There's some welcome	2018-10-24 18:01:11 +01:00
mm	mm: handle lru_add_drain_all for UP properly	2019-02-21 09:01:00 -08:00
net	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-02-19 16:13:19 -08:00
samples	samples: mei: use /dev/mei0 instead of /dev/mei	2019-01-30 15:24:45 +01:00
scripts	Bug fixes for gcc-plugins	2019-01-21 13:07:03 +13:00
security	keys: Timestamp new keys	2019-02-15 14:12:09 -08:00
sound	sound fixes for 5.0	2019-02-20 09:42:52 -08:00
tools	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-02-15 08:00:11 -08:00
usr	user/Makefile: Fix typo and capitalization in comment section	2018-12-11 00:18:03 +09:00
virt	KVM/ARM fixes for 5.0:	2019-02-13 19:39:24 +01:00
.clang-format	clang-format: Update .clang-format with the latest for_each macro list	2019-01-19 19:26:06 +01:00
.cocciconfig
.get_maintainer.ignore
.gitattributes	.gitattributes: set git diff driver for C source code files	2016-10-07 18:46:30 -07:00
.gitignore	kbuild: Add support for DT binding schema checks	2018-12-13 09:41:32 -06:00
.mailmap	A few early MIPS fixes for 4.21:	2019-01-05 12:48:25 -08:00
COPYING	COPYING: use the new text with points to the license files	2018-03-23 12:41:45 -06:00
CREDITS	Add CREDITS entry for Shaohua Li	2019-01-04 14:27:09 -07:00
Kbuild	kbuild: use assignment instead of define ... endef for filechk_* rules	2019-01-06 10:22:35 +09:00
Kconfig	kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt	2018-08-02 08:06:55 +09:00
MAINTAINERS	sound fixes for 5.0-rc7	2019-02-12 10:18:08 -08:00
Makefile	Linux 5.0-rc7	2019-02-17 18:46:40 -08:00
README	Drop all 00-INDEX files from Documentation/	2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.