kernel_optimize_test/mm
Pavel Tatashin ab1e8d8960 mm: don't allow deferred pages with NEED_PER_CPU_KM
It is unsafe to do virtual to physical translations before mm_init() is
called if struct page is needed in order to determine the memory section
number (see SECTION_IN_PAGE_FLAGS).  This is because only in mm_init()
we initialize struct pages for all the allocated memory when deferred
struct pages are used.

My recent fix in commit c9e97a1997 ("mm: initialize pages on demand
during boot") exposed this problem, because it greatly reduced number of
pages that are initialized before mm_init(), but the problem existed
even before my fix, as Fengguang Wu found.

Below is a more detailed explanation of the problem.

We initialize struct pages in four places:

1. Early in boot a small set of struct pages is initialized to fill the
   first section, and lower zones.

2. During mm_init() we initialize "struct pages" for all the memory that
   is allocated, i.e reserved in memblock.

3. Using on-demand logic when pages are allocated after mm_init call
   (when memblock is finished)

4. After smp_init() when the rest free deferred pages are initialized.

The problem occurs if we try to do va to phys translation of a memory
between steps 1 and 2.  Because we have not yet initialized struct pages
for all the reserved pages, it is inherently unsafe to do va to phys if
the translation itself requires access of "struct page" as in case of
this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP

The following path exposes the problem:

  start_kernel()
   trap_init()
    setup_cpu_entry_areas()
     setup_cpu_entry_area(cpu)
      get_cpu_gdt_paddr(cpu)
       per_cpu_ptr_to_phys(addr)
        pcpu_addr_to_page(addr)
         virt_to_page(addr)
          pfn_to_page(__pa(addr) >> PAGE_SHIFT)

We disable this path by not allowing NEED_PER_CPU_KM with deferred
struct pages feature.

The problems are discussed in these threads:
  http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com
  http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com
  http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com

Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com
Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18 17:17:12 -07:00
..
kasan
backing-dev.c
balloon_compaction.c
bootmem.c
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c
debug_page_ref.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c
frame_vector.c
frontswap.c
gup_benchmark.c
gup.c proc: do not access cmdline nor environ from file-backed areas 2018-05-17 09:27:47 -07:00
highmem.c
hmm.c
huge_memory.c
hugetlb_cgroup.c
hugetlb.c
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
Kconfig mm: don't allow deferred pages with NEED_PER_CPU_KM 2018-05-18 17:17:12 -07:00
Kconfig.debug
khugepaged.c
kmemleak-test.c
kmemleak.c
ksm.c
list_lru.c
maccess.c
madvise.c
Makefile
memblock.c
memcontrol.c
memory_hotplug.c
memory-failure.c
memory.c
mempolicy.c
mempool.c
memtest.c
migrate.c mm: migrate: fix double call of radix_tree_replace_slot() 2018-05-11 17:28:45 -07:00
mincore.c
mlock.c
mm_init.c
mmap.c Merge branch 'akpm' (patches from Andrew) 2018-05-11 18:04:12 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c mm, oom: fix concurrent munlock and oom reaper unmap, v3 2018-05-11 17:28:45 -07:00
page_alloc.c
page_counter.c
page_ext.c
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c
page_vma_mapped.c
page-writeback.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
rodata_test.c
shmem.c
slab_common.c
slab.c
slab.h
slob.c
slub.c
sparse-vmemmap.c
sparse.c mm: sections are not offlined during memory hotremove 2018-05-11 17:28:45 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c
swap.c
swapfile.c
truncate.c
usercopy.c
userfaultfd.c
util.c
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c
vmstat.c mm: don't show nr_indirectly_reclaimable in /proc/vmstat 2018-05-11 17:28:45 -07:00
workingset.c
z3fold.c z3fold: fix reclaim lock-ups 2018-05-11 17:28:45 -07:00
zbud.c
zpool.c
zsmalloc.c
zswap.c