kernel_optimize_test/mm
Pavel Tatashin ab1e8d8960 mm: don't allow deferred pages with NEED_PER_CPU_KM
It is unsafe to do virtual to physical translations before mm_init() is
called if struct page is needed in order to determine the memory section
number (see SECTION_IN_PAGE_FLAGS).  This is because only in mm_init()
we initialize struct pages for all the allocated memory when deferred
struct pages are used.

My recent fix in commit c9e97a1997 ("mm: initialize pages on demand
during boot") exposed this problem, because it greatly reduced number of
pages that are initialized before mm_init(), but the problem existed
even before my fix, as Fengguang Wu found.

Below is a more detailed explanation of the problem.

We initialize struct pages in four places:

1. Early in boot a small set of struct pages is initialized to fill the
   first section, and lower zones.

2. During mm_init() we initialize "struct pages" for all the memory that
   is allocated, i.e reserved in memblock.

3. Using on-demand logic when pages are allocated after mm_init call
   (when memblock is finished)

4. After smp_init() when the rest free deferred pages are initialized.

The problem occurs if we try to do va to phys translation of a memory
between steps 1 and 2.  Because we have not yet initialized struct pages
for all the reserved pages, it is inherently unsafe to do va to phys if
the translation itself requires access of "struct page" as in case of
this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP

The following path exposes the problem:

  start_kernel()
   trap_init()
    setup_cpu_entry_areas()
     setup_cpu_entry_area(cpu)
      get_cpu_gdt_paddr(cpu)
       per_cpu_ptr_to_phys(addr)
        pcpu_addr_to_page(addr)
         virt_to_page(addr)
          pfn_to_page(__pa(addr) >> PAGE_SHIFT)

We disable this path by not allowing NEED_PER_CPU_KM with deferred
struct pages feature.

The problems are discussed in these threads:
  http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com
  http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com
  http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com

Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com
Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18 17:17:12 -07:00
..
kasan
backing-dev.c bdi: Fix use after free bug in debugfs_remove() 2018-05-03 09:36:24 -06:00
balloon_compaction.c
bootmem.c
cleancache.c
cma_debug.c
cma.c mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE 2018-04-11 10:28:32 -07:00
cma.h
compaction.c mm/cma: remove ALLOC_CMA 2018-04-11 10:28:32 -07:00
debug_page_ref.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/filemap.c: fix NULL pointer in page_cache_tree_insert() 2018-04-20 17:18:36 -07:00
frame_vector.c
frontswap.c
gup_benchmark.c mm/gup_benchmark: handle gup failures 2018-04-13 17:10:27 -07:00
gup.c proc: do not access cmdline nor environ from file-backed areas 2018-05-17 09:27:47 -07:00
highmem.c
hmm.c mm/hmm.c: remove superfluous RCU protection around radix tree lookup 2018-04-11 10:28:31 -07:00
huge_memory.c mm: enable thp migration for shmem thp 2018-04-20 17:18:35 -07:00
hugetlb_cgroup.c
hugetlb.c
hwpoison-inject.c
init-mm.c
internal.h mm/cma: remove ALLOC_CMA 2018-04-11 10:28:32 -07:00
interval_tree.c
Kconfig mm: don't allow deferred pages with NEED_PER_CPU_KM 2018-05-18 17:17:12 -07:00
Kconfig.debug
khugepaged.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
kmemleak-test.c
kmemleak.c
ksm.c mm/ksm.c: fix inconsistent accounting of zero pages 2018-04-11 10:28:31 -07:00
list_lru.c
maccess.c
madvise.c
Makefile
memblock.c
memcontrol.c mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create() 2018-04-20 17:18:36 -07:00
memory_hotplug.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
memory-failure.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
memory.c
mempolicy.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
mempool.c
memtest.c
migrate.c mm: migrate: fix double call of radix_tree_replace_slot() 2018-05-11 17:28:45 -07:00
mincore.c
mlock.c
mm_init.c
mmap.c Merge branch 'akpm' (patches from Andrew) 2018-05-11 18:04:12 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c sched/numa: avoid trapping faults and attempting migration of file-backed dirty pages 2018-04-11 10:28:31 -07:00
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c mm, oom: fix concurrent munlock and oom reaper unmap, v3 2018-05-11 17:28:45 -07:00
page_alloc.c xen, mm: allow deferred page initialization for xen pv domains 2018-04-11 10:28:38 -07:00
page_counter.c
page_ext.c
page_idle.c
page_io.c
page_isolation.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
page_owner.c
page_poison.c
page_vma_mapped.c
page-writeback.c writeback: safer lock nesting 2018-04-20 17:18:35 -07:00
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
rmap.c mm: enable thp migration for shmem thp 2018-04-20 17:18:35 -07:00
rodata_test.c
shmem.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
slab_common.c
slab.c mm, slab: reschedule cache_reap() on the same CPU 2018-04-13 17:10:27 -07:00
slab.h
slob.c
slub.c kasan, slub: fix handling of kasan_slab_free hook 2018-04-11 10:28:32 -07:00
sparse-vmemmap.c
sparse.c mm: sections are not offlined during memory hotremove 2018-05-11 17:28:45 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
swap.c
swapfile.c mm/swapfile.c: make pointer swap_avail_heads static 2018-04-11 10:28:32 -07:00
truncate.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
usercopy.c
userfaultfd.c
util.c mm/gup.c: document return value 2018-04-13 17:10:27 -07:00
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c mm,vmscan: Allow preallocating memory for register_shrinker(). 2018-04-16 02:06:47 -04:00
vmstat.c mm: don't show nr_indirectly_reclaimable in /proc/vmstat 2018-05-11 17:28:45 -07:00
workingset.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
z3fold.c z3fold: fix reclaim lock-ups 2018-05-11 17:28:45 -07:00
zbud.c
zpool.c
zsmalloc.c
zswap.c