kernel_optimize_test/fs/proc
Feng Tang 1455083c1d proc/meminfo: avoid open coded reading of vm_committed_as
Patch series "make vm_committed_as_batch aware of vm overcommit policy", v6.

When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
enlarge it for not-so-strict OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS
policies.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  And for that case,
whether it shows improvements depends on if the test mmap size is bigger
than the batch number computed.

We tested 10+ platforms in 0day (server, desktop and laptop).  If we lift
it to 64X, 80%+ platforms show improvements, and for 16X lift, 1/3 of the
platforms will show improvements.

And generally it should help the mmap/unmap usage,as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit
: from a larger batch. E.g. large in memory databases which do large
: mmaps during startups from multiple threads.

Note: There are some style complain from checkpatch for patch 4, as sysctl
handler declaration follows the similar format of sibling functions

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

This patch (of 4):

Use the existing vm_memory_committed() instead, which is also convenient
for future change.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Qian Cai <cai@lca.pw>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1594389708-60781-1-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-2-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
..
array.c seccomp: Report number of loaded filters in /proc/$pid/status 2020-07-10 16:01:51 -07:00
base.c proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE 2020-07-19 20:14:42 +02:00
bootconfig.c proc/bootconfig: Fix to use correct quotes for value 2020-06-16 21:21:03 -04:00
cmdline.c
consoles.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 191 2019-05-30 11:29:21 -07:00
cpuinfo.c proc: faster open/read/close with "permanent" files 2020-04-07 10:43:42 -07:00
devices.c block: move block-related definitions out of fs.h 2020-06-24 09:16:02 -06:00
fd.c
fd.h
generic.c proc: add option to mount only a pids subset 2020-04-22 10:51:22 -05:00
inode.c proc: Use new_inode not new_inode_pseudo 2020-06-12 14:13:33 -05:00
internal.h proc: faster open/read/close with "permanent" files 2020-04-07 10:43:42 -07:00
interrupts.c
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
kcore.c maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault 2020-06-17 10:57:41 -07:00
kmsg.c proc: faster open/read/close with "permanent" files 2020-04-07 10:43:42 -07:00
loadavg.c sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD 2018-10-26 16:26:32 -07:00
Makefile proc: bootconfig: Add /proc/bootconfig to show boot config list 2020-01-13 13:19:39 -05:00
meminfo.c proc/meminfo: avoid open coded reading of vm_committed_as 2020-08-07 11:33:26 -07:00
namespaces.c Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-01-29 11:20:24 -08:00
nommu.c mm: don't include asm/pgtable.h if linux/mm.h is already included 2020-06-09 09:39:13 -07:00
page.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
proc_net.c bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t 2020-07-25 20:16:32 -07:00
proc_sysctl.c Call sysctl_head_finish on error 2020-07-03 14:10:46 -04:00
proc_tty.c
root.c proc: s_fs_info may be NULL when proc_kill_sb is called 2020-06-10 14:54:54 -05:00
self.c proc: Use new_inode not new_inode_pseudo 2020-06-12 14:13:33 -05:00
softirqs.c
stat.c proc: faster open/read/close with "permanent" files 2020-04-07 10:43:42 -07:00
task_mmu.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
task_nommu.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
thread_self.c proc: Use new_inode not new_inode_pseudo 2020-06-12 14:13:33 -05:00
uptime.c fs/proc: Respect boottime inside time namespace for /proc/uptime 2020-01-14 12:20:56 +01:00
util.c fs/proc/util.c: include fs/proc/internal.h for name_to_int() 2019-01-04 13:13:45 -08:00
version.c
vmcore.c mm: don't include asm/pgtable.h if linux/mm.h is already included 2020-06-09 09:39:13 -07:00