kernel_optimize_test/mm
Mel Gorman a8bef8ff6e mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks
Page migration requires rmap to be able to find all ptes mapping a page
at all times, otherwise the migration entry can be instantiated, but it
is possible to leave one behind if the second rmap_walk fails to find
the page.  If this page is later faulted, migration_entry_to_page() will
call BUG because the page is locked indicating the page was migrated by
the migration PTE not cleaned up. For example

  kernel BUG at include/linux/swapops.h:105!
  invalid opcode: 0000 [#1] PREEMPT SMP
  ...
  Call Trace:
   [<ffffffff810e951a>] handle_mm_fault+0x3f8/0x76a
   [<ffffffff8130c7a2>] do_page_fault+0x44a/0x46e
   [<ffffffff813099b5>] page_fault+0x25/0x30
   [<ffffffff8114de33>] load_elf_binary+0x152a/0x192b
   [<ffffffff8111329b>] search_binary_handler+0x173/0x313
   [<ffffffff81114896>] do_execve+0x219/0x30a
   [<ffffffff8100a5c6>] sys_execve+0x43/0x5e
   [<ffffffff8100320a>] stub_execve+0x6a/0xc0
  RIP  [<ffffffff811094ff>] migration_entry_wait+0xc1/0x129

There is a race between shift_arg_pages and migration that triggers this
bug.  A temporary stack is setup during exec and later moved.  If
migration moves a page in the temporary stack and the VMA is then removed
before migration completes, the migration PTE may not be found leading to
a BUG when the stack is faulted.

This patch causes pages within the temporary stack during exec to be
skipped by migration.  It does this by marking the VMA covering the
temporary stack with an otherwise impossible combination of VMA flags.
These flags are cleared when the temporary stack is moved to its final
location.

[kamezawa.hiroyu@jp.fujitsu.com: idea for having migration skip temporary stacks]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:59 -07:00
..
backing-dev.c writeback: fixups for !dirty_writeback_centisecs 2010-05-21 20:00:35 +02:00
bootmem.c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip 2010-04-07 11:02:23 -07:00
bounce.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapools: protect page_list walk in show_pools() 2009-06-30 18:56:00 -07:00
fadvise.c readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM 2010-03-06 11:26:25 -08:00
failslab.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
filemap_xip.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
filemap.c cpuset,mm: fix no node to alloc memory when changing cpuset's mems 2010-05-25 08:06:57 -07:00
fremap.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
highmem.c grammar fix in comment 2010-02-05 12:22:40 +01:00
hugetlb.c cpuset,mm: fix no node to alloc memory when changing cpuset's mems 2010-05-25 08:06:57 -07:00
hwpoison-inject.c HWPOISON: Don't do early filtering if filter is disabled 2009-12-16 12:20:01 +01:00
init-mm.c mm: consolidate init_mm definition 2009-06-16 19:47:28 -07:00
internal.h HWPOISON: add an interface to switch off/on all the page filters 2009-12-16 12:19:59 +01:00
Kconfig mm: allow CONFIG_MIGRATION to be set without CONFIG_NUMA or memory hot-remove 2010-05-25 08:06:59 -07:00
Kconfig.debug trivial: improve help text for mm debug config options 2009-09-21 15:14:57 +02:00
kmemcheck.c kmemcheck: Fix build errors due to missing slab.h 2010-03-30 22:02:32 +09:00
kmemleak-test.c percpu: clean up percpu variable definitions 2009-06-24 15:13:48 +09:00
kmemleak.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
ksm.c mm: migration: share the anon_vma ref counts between KSM and page migration 2010-05-25 08:06:58 -07:00
maccess.c maccess,probe_kernel: Allow arch specific override probe_kernel_(read|write) 2010-01-07 11:58:36 -06:00
madvise.c HWPOISON: Add a madvise() injector for soft page offlining 2009-12-16 12:20:00 +01:00
Makefile percpu: don't implicitly include slab.h from percpu.h 2010-03-30 22:02:32 +09:00
memcontrol.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-05-20 09:20:59 -07:00
memory_hotplug.c mm: introduce dump_page() and print symbolic flag names 2010-03-12 15:52:28 -08:00
memory-failure.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
memory.c mm: avoid null-pointer deref in sync_mm_rss() 2010-04-07 08:38:02 -07:00
mempolicy.c cpuset,mm: fix no node to alloc memory when changing cpuset's mems 2010-05-25 08:06:57 -07:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c mm: migration: allow the migration of PageSwapCache pages 2010-05-25 08:06:59 -07:00
mincore.c mincore: do nested page table walks 2010-05-25 08:06:58 -07:00
mlock.c x86, perf, bts, mm: Delete the never used BTS-ptrace code 2010-03-26 11:33:55 +01:00
mm_init.c
mmap.c mmap: check ->vm_ops before dereferencing 2010-04-27 08:26:51 -07:00
mmu_context.c exit: fix oops in sync_mm_rss 2010-03-24 16:31:21 -07:00
mmu_notifier.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
mmzone.c [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 2009-05-18 11:22:24 +01:00
mprotect.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
mremap.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
msync.c sanitize vfs_fsync calling conventions 2010-05-21 18:31:21 -04:00
nommu.c NOMMU: Fix __get_user_pages() to pin last page on offset buffers 2010-03-25 14:13:27 -07:00
oom_kill.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
page_alloc.c mm: default to node zonelist ordering when nodes have only lowmem 2010-05-25 08:06:58 -07:00
page_cgroup.c memcg: avoid use cmpxchg in swap cgroup maintainance 2010-03-17 18:43:47 -07:00
page_io.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c writeback: fix mixed up arguments to bdi_start_writeback() 2010-05-21 20:01:54 +02:00
pagewalk.c pagemap: fix pfn calculation for hugepage 2010-04-07 08:38:04 -07:00
percpu_up.c percpu: don't implicitly include slab.h from percpu.h 2010-03-30 22:02:32 +09:00
percpu-km.c percpu: implement kernel memory based chunk allocation 2010-05-01 08:30:50 +02:00
percpu-vm.c percpu: move vmalloc based chunk management into percpu-vm.c 2010-05-01 08:30:50 +02:00
percpu.c percpu: implement kernel memory based chunk allocation 2010-05-01 08:30:50 +02:00
prio_tree.c
quicklist.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
readahead.c readahead: fix NULL filp dereference 2010-04-07 08:38:03 -07:00
rmap.c mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks 2010-05-25 08:06:59 -07:00
shmem.c shmem: remove redundant code 2010-05-25 08:06:57 -07:00
slab.c cpuset,mm: fix no node to alloc memory when changing cpuset's mems 2010-05-25 08:06:57 -07:00
slob.c mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slob_def.h> 2010-05-19 22:03:13 +03:00
slub.c cpuset,mm: fix no node to alloc memory when changing cpuset's mems 2010-05-25 08:06:57 -07:00
sparse-vmemmap.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
sparse.c sparsemem: on no vmemmap path put mem_map on node high too 2010-05-25 08:06:56 -07:00
swap_state.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
swap.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
swapfile.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 2010-05-21 15:26:46 -07:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
util.c slab: Generify kernel pointer validation 2010-04-09 10:09:50 -07:00
vmalloc.c mm: purge fragmented percpu vmap blocks 2010-02-02 12:50:47 -08:00
vmscan.c cpuset,mm: fix no node to alloc memory when changing cpuset's mems 2010-05-25 08:06:57 -07:00
vmstat.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00