kernel_optimize_test/mm
Shakeel Butt 9c4e6b1a70 mm, mlock, vmscan: no more skipping pagevecs
When a thread mlocks an address space backed either by file pages which
are currently not present in memory or swapped out anon pages (not in
swapcache), a new page is allocated and added to the local pagevec
(lru_add_pvec), I/O is triggered and the thread then sleeps on the page.
On I/O completion, the thread can wake on a different CPU, the mlock
syscall will then sets the PageMlocked() bit of the page but will not be
able to put that page in unevictable LRU as the page is on the pagevec
of a different CPU.  Even on drain, that page will go to evictable LRU
because the PageMlocked() bit is not checked on pagevec drain.

The page will eventually go to right LRU on reclaim but the LRU stats
will remain skewed for a long time.

This patch puts all the pages, even unevictable, to the pagevecs and on
the drain, the pages will be added on their LRUs correctly by checking
their evictability.  This resolves the mlocked pages on pagevec of other
CPUs issue because when those pagevecs will be drained, the mlocked file
pages will go to unevictable LRU.  Also this makes the race with munlock
easier to resolve because the pagevec drains happen in LRU lock.

However there is still one place which makes a page evictable and does
PageLRU check on that page without LRU lock and needs special attention.
TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock().

	#0: __pagevec_lru_add_fn	#1: clear_page_mlock

	SetPageLRU()			if (!TestClearPageMlocked())
					  return
	smp_mb() // <--required
					// inside does PageLRU
	if (!PageMlocked())		if (isolate_lru_page())
	  move to evictable LRU		  putback_lru_page()
	else
	  move to unevictable LRU

In '#1', TestClearPageMlocked() provides full memory barrier semantics
and thus the PageLRU check (inside isolate_lru_page) can not be
reordered before it.

In '#0', without explicit memory barrier, the PageMlocked() check can be
reordered before SetPageLRU().  If that happens, '#0' can put a page in
unevictable LRU and '#1' might have just cleared the Mlocked bit of that
page but fails to isolate as PageLRU fails as '#0' still hasn't set
PageLRU bit of that page.  That page will be stranded on the unevictable
LRU.

There is one (good) side effect though.  Without this patch, the pages
allocated for System V shared memory segment are added to evictable LRUs
even after shmctl(SHM_LOCK) on that segment.  This patch will correctly
put such pages to unevictable LRU.

Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21 15:35:42 -08:00
..
kasan kasan: fix prototype author email address 2018-02-06 18:32:43 -08:00
backing-dev.c
balloon_compaction.c
bootmem.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c mm/compaction.c: fix comment for try_to_compact_pages() 2018-01-31 17:18:39 -08:00
debug_page_ref.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c mm/fadvise: discard partial page if endbyte is also EOF 2018-01-31 17:18:39 -08:00
failslab.c
filemap.c mm/filemap.c: remove include of hardirq.h 2018-01-31 17:18:36 -08:00
frame_vector.c
frontswap.c
gup_benchmark.c
gup.c libnvdimm for 4.16 2018-02-06 10:41:33 -08:00
highmem.c
hmm.c libnvdimm for 4.16 2018-02-06 10:41:33 -08:00
huge_memory.c mm/thp: remove pmd_huge_split_prepare() 2018-01-31 17:18:38 -08:00
hugetlb_cgroup.c
hugetlb.c hugetlb, mbind: fall back to default policy if vma is NULL 2018-01-31 17:18:40 -08:00
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c mm/interval_tree.c: use vma_pages() helper 2018-01-31 17:18:37 -08:00
Kconfig mm: relax deferred struct page requirements 2018-01-31 17:18:36 -08:00
Kconfig.debug
khugepaged.c mm: thp: use down_read_trylock() in khugepaged to avoid long block 2018-01-31 17:18:38 -08:00
kmemleak-test.c
kmemleak.c mm: kmemleak: remove unused hardirq.h 2018-01-31 17:18:36 -08:00
ksm.c mm: docs: fixup punctuation 2018-02-06 18:32:48 -08:00
list_lru.c
maccess.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
madvise.c
Makefile
memblock.c mm/memblock: memblock_is_map/region_memory can be boolean 2018-02-06 18:32:47 -08:00
memcontrol.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
memory_hotplug.c libnvdimm for 4.16 2018-02-06 10:41:33 -08:00
memory-failure.c x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages 2018-02-13 16:25:06 +01:00
memory.c mm: hide a #warning for COMPILE_TEST 2018-02-16 09:41:36 -08:00
mempolicy.c hugetlb, mbind: fall back to default policy if vma is NULL 2018-01-31 17:18:40 -08:00
mempool.c kasan: detect invalid frees for large mempool objects 2018-02-06 18:32:43 -08:00
memtest.c
migrate.c mm, hugetlb: do not rely on overcommit limit during migration 2018-01-31 17:18:40 -08:00
mincore.c
mlock.c mm, mlock, vmscan: no more skipping pagevecs 2018-02-21 15:35:42 -08:00
mm_init.c
mmap.c
mmu_context.c
mmu_notifier.c mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks 2018-01-31 17:18:38 -08:00
mmzone.c
mprotect.c mm: numa: do not trap faults on shared data section pages. 2018-01-31 17:18:40 -08:00
mremap.c
msync.c
nobootmem.c
nommu.c mm: docs: fixup punctuation 2018-02-06 18:32:48 -08:00
oom_kill.c mm, oom: avoid reaping only for mm's with blockable invalidate callbacks 2018-01-31 17:18:38 -08:00
page_alloc.c libnvdimm for 4.16 2018-02-06 10:41:33 -08:00
page_counter.c
page_ext.c mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it 2018-01-31 17:18:39 -08:00
page_idle.c
page_io.c
page_isolation.c
page_owner.c mm/page_owner.c: clean up init_pages_in_zone() 2018-01-31 17:18:39 -08:00
page_poison.c
page_vma_mapped.c
page-writeback.c
pagewalk.c mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors 2018-02-06 18:32:48 -08:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgtable-generic.c mm: do not lose dirty and accessed bits in pmdp_invalidate() 2018-01-31 17:18:38 -08:00
process_vm_access.c mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors 2018-02-06 18:32:48 -08:00
quicklist.c
readahead.c
rmap.c
rodata_test.c
shmem.c shmem: add sealing support to hugetlb-backed memfd 2018-01-31 17:18:39 -08:00
slab_common.c Currently, hardened usercopy performs dynamic bounds checking on slab 2018-02-03 16:25:42 -08:00
slab.c kasan: don't use __builtin_return_address(1) 2018-02-06 18:32:43 -08:00
slab.h Currently, hardened usercopy performs dynamic bounds checking on slab 2018-02-03 16:25:42 -08:00
slob.c
slub.c kasan: don't use __builtin_return_address(1) 2018-02-06 18:32:43 -08:00
sparse-vmemmap.c
sparse.c libnvdimm for 4.16 2018-02-06 10:41:33 -08:00
swap_cgroup.c
swap_slots.c
swap_state.c
swap.c mm, mlock, vmscan: no more skipping pagevecs 2018-02-21 15:35:42 -08:00
swapfile.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
truncate.c mm: add unmap_mapping_pages() 2018-01-31 17:18:37 -08:00
usercopy.c
userfaultfd.c mm/userfaultfd.c: remove duplicate include 2018-02-06 18:32:47 -08:00
util.c
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c mm, mlock, vmscan: no more skipping pagevecs 2018-02-21 15:35:42 -08:00
vmstat.c
workingset.c
z3fold.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zbud.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zpool.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zsmalloc.c zsmalloc: use U suffix for negative literals being shifted 2018-01-31 17:18:39 -08:00
zswap.c zswap: only save zswap header when necessary 2018-01-31 17:18:39 -08:00