kernel_optimize_test/mm
Longpeng 3c1d7e6ccb mm/hugetlb: fix a addressing exception caused by huge_pte_offset
Our machine encountered a panic(addressing exception) after run for a
long time and the calltrace is:

    RIP: hugetlb_fault+0x307/0xbe0
    RSP: 0018:ffff9567fc27f808  EFLAGS: 00010286
    RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
    RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
    RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
    R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
    R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
    FS:  00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
      follow_hugetlb_page+0x175/0x540
      __get_user_pages+0x2a0/0x7e0
      __get_user_pages_unlocked+0x15d/0x210
      __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
      try_async_pf+0x6e/0x2a0 [kvm]
      tdp_page_fault+0x151/0x2d0 [kvm]
     ...
      kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
      kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
      do_vfs_ioctl+0x3f0/0x540
      SyS_ioctl+0xa1/0xc0
      system_call_fastpath+0x22/0x27

For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race.  Please look at the
following code snippet:

    ...
    pud = pud_offset(p4d, addr);
    if (sz != PUD_SIZE && pud_none(*pud))
        return NULL;
    /* hugepage or swap? */
    if (pud_huge(*pud) || !pud_present(*pud))
        return (pte_t *)pud;

    pmd = pmd_offset(pud, addr);
    if (sz != PMD_SIZE && pmd_none(*pmd))
        return NULL;
    /* hugepage or swap? */
    if (pmd_huge(*pmd) || !pmd_present(*pmd))
        return (pte_t *)pmd;
    ...

The following sequence would trigger this bug:

 - CPU0: sz = PUD_SIZE and *pud = 0 , continue
 - CPU0: "pud_huge(*pud)" is false
 - CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
 - CPU0: "!pud_present(*pud)" is false, continue
 - CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp

However, we want CPU0 to return NULL or pudp in this case.

We must make sure there is exactly one dereference of pud and pmd.

Signed-off-by: Longpeng <longpeng2@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200413010342.771-1-longpeng2@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-21 11:11:55 -07:00
..
kasan kasan: unset panic_on_warn before calling panic() 2020-04-07 10:43:44 -07:00
backing-dev.c blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it 2020-04-01 14:56:42 -06:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma_debug.c mm/cma_debug.c: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops 2019-12-01 12:59:09 -08:00
cma.c mm: cma: NUMA node interface 2020-04-10 15:36:21 -07:00
cma.h
compaction.c mm/compaction: add missing annotation for compact_lock_irqsave 2020-04-07 10:43:41 -07:00
debug_page_ref.c
debug.c mm: dump_page(): additional diagnostics for huge pinned pages 2020-04-02 09:35:27 -07:00
dmapool.c mm/dmapool.c: micro-optimisation remove unnecessary branch 2020-04-07 10:43:42 -07:00
early_ioremap.c mm/early_ioremap.c: use %pa to print resource_size_t variables 2020-01-31 10:30:38 -08:00
fadvise.c fs: Export generic_fadvise() 2019-08-30 22:43:58 -07:00
failslab.c
filemap.c mm: huge tmpfs: try to split_huge_page() when punching hole 2020-04-07 10:43:41 -07:00
frame_vector.c mm: untag user pointers in get_vaddr_frames 2019-09-25 17:51:41 -07:00
frontswap.c
gup_benchmark.c mm/gup_benchmark: support pin_user_pages() and related calls 2020-04-02 09:35:27 -07:00
gup.c mm/gup: Let __get_user_pages_locked() return -EINTR for fatal signal 2020-04-08 09:05:58 -07:00
highmem.c mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions 2019-12-10 10:12:55 +01:00
hmm.c mm/hmm: return error for non-vma snapshots 2020-03-30 16:58:36 -03:00
huge_memory.c userfaultfd: wp: support swap and page migration 2020-04-07 10:43:39 -07:00
hugetlb_cgroup.c mm: use fallthrough; 2020-04-07 10:43:41 -07:00
hugetlb.c mm/hugetlb: fix a addressing exception caused by huge_pte_offset 2020-04-21 11:11:55 -07:00
hwpoison-inject.c mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops 2019-12-01 12:59:09 -08:00
init-mm.c mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch 2019-10-19 06:32:32 -04:00
internal.h mm: add function __putback_isolated_page 2020-04-07 10:43:38 -07:00
interval_tree.c
Kconfig libnvdimm for 5.7 2020-04-08 21:03:40 -07:00
Kconfig.debug mm: add generic ptdump 2020-02-04 03:05:25 +00:00
khugepaged.c khugepaged: skip collapse if uffd-wp detected 2020-04-07 10:43:39 -07:00
kmemleak-test.c
kmemleak.c mm/kmemleak.c: use address-of operator on section symbols 2020-04-02 09:35:26 -07:00
ksm.c mm: use fallthrough; 2020-04-07 10:43:41 -07:00
list_lru.c mm: use fallthrough; 2020-04-07 10:43:41 -07:00
maccess.c uaccess: Add strict non-pagefault kernel-space read function 2019-11-02 12:39:12 -07:00
madvise.c mm: do not allow MADV_PAGEOUT for CoW pages 2020-03-21 18:56:06 -07:00
Makefile mm: introduce Reported pages 2020-04-07 10:43:38 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: update huge page-table entry callbacks 2020-04-02 09:35:29 -07:00
memblock.c mm: cma: NUMA node interface 2020-04-10 15:36:21 -07:00
memcontrol.c mm, memcg: do not high throttle allocators based on wraparound 2020-04-10 15:36:20 -07:00
memfd.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
memory_hotplug.c mm/memory_hotplug: add pgprot_t to mhp_params 2020-04-10 15:36:21 -07:00
memory-failure.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
memory.c mm/memory.c: add vm_insert_pages() 2020-04-10 15:36:21 -07:00
mempolicy.c libnvdimm for 5.7 2020-04-08 21:03:40 -07:00
mempool.c
memremap.c mm/memremap: set caching mode for PCI P2PDMA memory to WC 2020-04-10 15:36:21 -07:00
memtest.c
migrate.c userfaultfd: wp: support swap and page migration 2020-04-07 10:43:39 -07:00
mincore.c mm: pagewalk: add 'depth' parameter to pte_hole 2020-02-04 03:05:25 +00:00
mlock.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mm_init.c mm/mm_init.c: clean code. Use BUILD_BUG_ON when comparing compile time constant 2020-04-07 10:43:41 -07:00
mmap.c mm/vma: introduce VM_ACCESS_FLAGS 2020-04-10 15:36:21 -07:00
mmu_context.c
mmu_gather.c asm-generic/tlb: provide MMU_GATHER_TABLE_FREE 2020-02-04 03:05:26 +00:00
mmu_notifier.c mm/mmu_notifier: silence PROVE_RCU_LIST warnings 2020-03-21 18:56:06 -07:00
mmzone.c
mprotect.c mm/vma: introduce VM_ACCESS_FLAGS 2020-04-10 15:36:21 -07:00
mremap.c mm: Fix MREMAP_DONTUNMAP accounting on VMA merge 2020-04-19 14:07:10 -07:00
msync.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
nommu.c x86/mm: split vmalloc_sync_all() 2020-03-21 18:56:06 -07:00
oom_kill.c mm, oom: dump stack of victim when reaping failed 2020-01-31 10:30:38 -08:00
page_alloc.c mm/page_alloc: make pcpu_drain_mutex and pcpu_drain static 2020-04-10 15:36:21 -07:00
page_counter.c mm, memcg: prevent memory.min load/store tearing 2020-04-02 09:35:29 -07:00
page_ext.c mm/page_ext.c: drop pfn_present() check when onlining 2020-04-07 10:43:40 -07:00
page_idle.c
page_io.c fs: Enable bmap() function to properly return errors 2020-02-03 08:05:37 -05:00
page_isolation.c mm: add function __putback_isolated_page 2020-04-07 10:43:38 -07:00
page_owner.c mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo 2019-10-19 06:32:31 -04:00
page_poison.c mm/page_poison.c: fix a typo in a comment 2019-09-24 15:54:08 -07:00
page_reporting.c mm/page_reporting: add budget limit on how many pages can be reported per pass 2020-04-07 10:43:39 -07:00
page_reporting.h mm: introduce Reported pages 2020-04-07 10:43:38 -07:00
page_vma_mapped.c mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page 2020-01-31 10:30:38 -08:00
page-writeback.c mm/gup/writeback: add callbacks for inaccessible pages 2020-04-02 09:35:27 -07:00
pagewalk.c x86: mm: avoid allocating struct mm_struct on the stack 2020-02-04 03:05:25 +00:00
percpu-internal.h
percpu-km.c
percpu-stats.c percpu: update copyright emails to dennis@kernel.org 2020-04-01 10:09:12 -07:00
percpu-vm.c
percpu.c percpu: update copyright emails to dennis@kernel.org 2020-04-01 10:09:12 -07:00
pgtable-generic.c asm-generic/mm: stub out p{4,u}d_clear_bad() if __PAGETABLE_P{4,U}D_FOLDED 2019-12-01 06:29:19 -08:00
process_vm_access.c mm: docs: Fix a comment in process_vm_rw_core 2020-03-25 10:04:01 -05:00
ptdump.c x86: mm: avoid allocating struct mm_struct on the stack 2020-02-04 03:05:25 +00:00
readahead.c
rmap.c mm: prevent a warning when casting void* -> enum 2020-04-07 10:43:41 -07:00
rodata_test.c
shmem.c mm: use fallthrough; 2020-04-07 10:43:41 -07:00
shuffle.c mm: adjust shuffle code to allow for future coalescing 2020-04-07 10:43:38 -07:00
shuffle.h mm: adjust shuffle code to allow for future coalescing 2020-04-07 10:43:38 -07:00
slab_common.c mm, slab_common: fix a typo in comment "eariler"->"earlier" 2020-04-10 15:36:20 -07:00
slab.c mm, debug_pagealloc: don't rely on static keys too early 2020-01-13 18:19:02 -08:00
slab.h mm: kmem: rename (__)memcg_kmem_(un)charge_memcg() to __memcg_kmem_(un)charge() 2020-04-02 09:35:28 -07:00
slob.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slub.c slub: avoid redzone when choosing freepointer location 2020-04-21 11:11:55 -07:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm/sparse.c: move subsection_map related functions together 2020-04-07 10:43:40 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: assign|reset cache slot by value directly 2020-04-02 09:35:27 -07:00
swap_state.c mm/swap_state.c: use the same way to count page in [add_to|delete_from]_swap_cache 2020-04-02 09:35:28 -07:00
swap.c mm: huge tmpfs: try to split_huge_page() when punching hole 2020-04-07 10:43:41 -07:00
swapfile.c proc: faster open/read/close with "permanent" files 2020-04-07 10:43:42 -07:00
truncate.c mm/thp: allow dropping THP from page cache 2019-10-19 06:32:33 -04:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-09-17 15:20:17 -07:00
userfaultfd.c userfaultfd: wp: support write protection for userfault vma range 2020-04-07 10:43:39 -07:00
util.c mm/mmap.c: rb_parent is not necessary in __vma_link_list() 2019-12-01 06:29:19 -08:00
vmacache.c
vmalloc.c mm/vmalloc: fix a typo in comment 2020-04-07 10:43:38 -07:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm: code cleanup for MADV_FREE 2020-04-07 10:43:38 -07:00
vmstat.c mm, thp: track fallbacks due to failed memcg charges separately 2020-04-07 10:43:38 -07:00
workingset.c mm: vmscan: detect file thrashing at the reclaim root 2019-12-01 12:59:07 -08:00
z3fold.c mm/z3fold.c: do not include rwlock.h directly 2020-03-06 07:06:09 -06:00
zbud.c
zpool.c zpool: add malloc_support_movable to zpool_driver 2019-09-24 15:54:12 -07:00
zsmalloc.c mm: use fallthrough; 2020-04-07 10:43:41 -07:00
zswap.c mm/zswap: allow setting default status, compressor and allocator in Kconfig 2020-04-07 10:43:41 -07:00