kernel_optimize_test/mm
Hugh Dickins 59927fb984 memcg: free mem_cgroup by RCU to fix oops
After fixing the GPF in mem_cgroup_lru_del_list(), three times one
machine running a similar load (moving and removing memcgs while
swapping) has oopsed in mem_cgroup_zone_nr_lru_pages(), when retrieving
memcg zone numbers for get_scan_count() for shrink_mem_cgroup_zone():
this is where a struct mem_cgroup is first accessed after being chosen
by mem_cgroup_iter().

Just what protects a struct mem_cgroup from being freed, in between
mem_cgroup_iter()'s css_get_next() and its css_tryget()? css_tryget()
fails once css->refcnt is zero with CSS_REMOVED set in flags, yes: but
what if that memory is freed and reused for something else, which sets
"refcnt" non-zero? Hmm, and scope for an indefinite freeze if refcnt is
left at zero but flags are cleared.

It's tempting to move the css_tryget() into css_get_next(), to make it
really "get" the css, but I don't think that actually solves anything:
the same difficulty in moving from css_id found to stable css remains.

But we already have rcu_read_lock() around the two, so it's easily fixed
if __mem_cgroup_free() just uses kfree_rcu() to free mem_cgroup.

However, a big struct mem_cgroup is allocated with vzalloc() instead of
kzalloc(), and we're not allowed to vfree() at interrupt time: there
doesn't appear to be a general vfree_rcu() to help with this, so roll
our own using schedule_work().  The compiler decently removes
vfree_work() and vfree_rcu() when the config doesn't need them.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-15 17:03:03 -07:00
..
backing-dev.c backing-dev: fix wakeup timer races with bdi_unregister() 2012-02-01 16:52:49 +08:00
bootmem.c mm: bootmem: try harder to free pages in bulk 2012-01-10 16:30:45 -08:00
bounce.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
cleancache.c
compaction.c mm: compaction: check for overlapping nodes during isolation for migration 2012-02-08 19:03:51 -08:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c mm: fix implicit stat.h usage in dmapool.c 2011-10-31 09:20:12 -04:00
fadvise.c fadvise: only initiate writeback for specified range with FADV_DONTNEED 2012-01-10 16:30:43 -08:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-03 16:16:41 -08:00
filemap.c readahead: fix pipeline break caused by block plug 2012-02-03 16:16:41 -08:00
fremap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
highmem.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
huge_memory.c mm: thp: fix BUG on mm->nr_ptes 2012-03-05 15:49:43 -08:00
hugetlb.c flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held 2012-03-05 13:51:32 -08:00
hwpoison-inject.c
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm: thp: tail page refcounting fix 2011-11-02 16:06:57 -07:00
Kconfig Merge branch 'master' into x86/memblock 2011-11-28 09:46:22 -08:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c memcg: fix GPF when cgroup removal races with last exit 2012-03-05 15:49:43 -08:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c fs: kill i_alloc_sem 2011-07-20 20:47:46 -04:00
Makefile Cross Memory Attach 2011-10-31 17:30:44 -07:00
memblock.c memblock: Fix size aligning of memblock_alloc_base_nid() 2012-03-01 10:53:18 +01:00
memcontrol.c memcg: free mem_cgroup by RCU to fix oops 2012-03-15 17:03:03 -07:00
memory_hotplug.c mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
memory-failure.c mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
memory.c mm: fix rss count leakage during migration 2012-01-23 08:38:49 -08:00
mempolicy.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c memcg: fix GPF when cgroup removal races with last exit 2012-03-05 15:49:43 -08:00
mincore.c mm: clarify the radix_tree exceptional cases 2011-08-03 14:25:24 -10:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c mm: fix find_vma_prev 2012-03-06 16:48:03 -08:00
mmu_context.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmu_notifier.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmzone.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
mprotect.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mremap.c mremap: enforce rmap src/dst vma ordering in case of vma_merge() succeeding in copy_vma() 2012-01-10 16:30:44 -08:00
msync.c
nobootmem.c Merge branch 'master' into x86/memblock 2011-11-28 09:46:22 -08:00
nommu.c NOMMU: Don't need to clear vm_mm when deleting a VMA 2012-02-24 08:59:04 -08:00
oom_kill.c mm: unify remaining mem_cont, mem, etc. variable names to memcg 2012-01-12 20:13:06 -08:00
page_alloc.c vfs: fix panic in __d_lookup() with high dentry hashtable counts 2012-02-13 20:45:38 -05:00
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c
page-writeback.c Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux 2012-01-10 16:59:59 -08:00
pagewalk.c pagewalk: fix code comment for THP 2011-07-25 20:57:09 -07:00
percpu-km.c
percpu-vm.c percpu: use bitmap_clear 2012-01-20 09:23:16 -08:00
percpu.c Kmemleak patches 2012-01-14 18:11:11 -08:00
pgtable-generic.c
prio_tree.c
process_vm_access.c Fix race in process_vm_rw_core 2012-02-02 12:55:17 -08:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
rmap.c mm: unify remaining mem_cont, mem, etc. variable names to memcg 2012-01-12 20:13:06 -08:00
shmem.c SHM_UNLOCK: fix Unevictable pages stranded after swap 2012-01-23 08:38:48 -08:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-01-11 18:52:23 -08:00
slob.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
slub.c mm,x86,um: move CMPXCHG_DOUBLE config option 2012-01-12 20:13:03 -08:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
swap_state.c memcg: fix GPF when cgroup removal races with last exit 2012-03-05 15:49:43 -08:00
swap.c memcg: fix GPF when cgroup removal races with last exit 2012-03-05 15:49:43 -08:00
swapfile.c mm: unify remaining mem_cont, mem, etc. variable names to memcg 2012-01-12 20:13:06 -08:00
thrash.c mm/thrash.c: quiet sparse noise 2011-10-31 17:30:50 -07:00
truncate.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
util.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
vmalloc.c mm/vmalloc.c: eliminate extra loop in pcpu_get_vm_areas error path 2012-01-12 20:13:10 -08:00
vmscan.c SHM_UNLOCK: fix Unevictable pages stranded after swap 2012-01-23 08:38:48 -08:00
vmstat.c mm,x86,um: move CMPXCHG_LOCAL config option 2012-01-12 20:13:03 -08:00