kernel_optimize_test

History

Mel Gorman 2457aec637 mm: non-atomically mark page accessed during page cache allocation where possible aops->write_begin may allocate a new page and make it visible only to have mark_page_accessed called almost immediately after. Once the page is visible the atomic operations are necessary which is noticable overhead when writing to an in-memory filesystem like tmpfs but should also be noticable with fast storage. The objective of the patch is to initialse the accessed information with non-atomic operations before the page is visible. The bulk of filesystems directly or indirectly use grab_cache_page_write_begin or find_or_create_page for the initial allocation of a page cache page. This patch adds an init_page_accessed() helper which behaves like the first call to mark_page_accessed() but may called before the page is visible and can be done non-atomically. The primary APIs of concern in this care are the following and are used by most filesystems. find_get_page find_lock_page find_or_create_page grab_cache_page_nowait grab_cache_page_write_begin All of them are very similar in detail to the patch creates a core helper pagecache_get_page() which takes a flags parameter that affects its behavior such as whether the page should be marked accessed or not. Then old API is preserved but is basically a thin wrapper around this core function. Each of the filesystems are then updated to avoid calling mark_page_accessed when it is known that the VM interfaces have already done the job. There is a slight snag in that the timing of the mark_page_accessed() has now changed so in rare cases it's possible a page gets to the end of the LRU as PageReferenced where as previously it might have been repromoted. This is expected to be rare but it's worth the filesystem people thinking about it in case they see a problem with the timing change. It is also the case that some filesystems may be marking pages accessed that previously did not but it makes sense that filesystems have consistent behaviour in this regard. The test case used to evaulate this is a simple dd of a large file done multiple times with the file deleted on each iterations. The size of the file is 1/10th physical memory to avoid dirty page balancing. In the async case it will be possible that the workload completes without even hitting the disk and will have variable results but highlight the impact of mark_page_accessed for async IO. The sync results are expected to be more stable. The exception is tmpfs where the normal case is for the "IO" to not hit the disk. The test machine was single socket and UMA to avoid any scheduling or NUMA artifacts. Throughput and wall times are presented for sync IO, only wall times are shown for async as the granularity reported by dd and the variability is unsuitable for comparison. As async results were variable do to writback timings, I'm only reporting the maximum figures. The sync results were stable enough to make the mean and stddev uninteresting. The performance results are reported based on a run with no profiling. Profile data is based on a separate run with oprofile running. async dd 3.15.0-rc3 3.15.0-rc3 vanilla accessed-v2 ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%) tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%) btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%) ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%) xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%) The XFS figure is a bit strange as it managed to avoid a worst case by sheer luck but the average figures looked reasonable. samples percentage ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed [akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2014-06-04 16:54:10 -07:00
..
backing-dev.c	arch: Mass conversion of smp_mb__*()	2014-04-18 14:20:48 +02:00
balloon_compaction.c
bootmem.c
cleancache.c
compaction.c	mm/compaction: avoid rescanning pageblocks in isolate_freepages	2014-06-04 16:54:07 -07:00
debug-pagealloc.c
dmapool.c	mm/dmapool.c: reuse devres_release() to free resources	2014-06-04 16:54:08 -07:00
early_ioremap.c	mm: create generic early_ioremap() support	2014-04-07 16:36:15 -07:00
fadvise.c
failslab.c
filemap_xip.c
filemap.c	mm: non-atomically mark page accessed during page cache allocation where possible	2014-06-04 16:54:10 -07:00
fremap.c	mm: softdirty: make freshly remapped file pages being softdirty unconditionally	2014-06-04 16:53:56 -07:00
frontswap.c	swap: change swap_list_head to plist, add swap_avail_head	2014-06-04 16:54:07 -07:00
gup.c	mm: cleanup __get_user_pages()	2014-06-04 16:54:05 -07:00
highmem.c
huge_memory.c	mm/huge_memory.c: complete conversion to pr_foo()	2014-06-04 16:53:58 -07:00
hugetlb_cgroup.c
hugetlb.c	hugetlb: add support for gigantic page allocation at runtime	2014-06-04 16:53:59 -07:00
hwpoison-inject.c
init-mm.c
internal.h	mm: fold mlocked_vma_newpage() into its only call site	2014-06-04 16:54:07 -07:00
interval_tree.c
iov_iter.c
Kconfig	hugetlb: restrict hugepage_migration_support() to x86_64	2014-06-04 16:53:51 -07:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c	mem-hotplug: implement get/put_online_mems	2014-06-04 16:53:59 -07:00
ksm.c
list_lru.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
maccess.c
madvise.c	mm: madvise: fix MADV_WILLNEED on shmem swapouts	2014-05-23 09:37:29 -07:00
Makefile	mm: move get_user_pages()-related code to separate file	2014-06-04 16:54:04 -07:00
memblock.c	mm/memblock.c: use PFN_DOWN	2014-06-04 16:54:02 -07:00
memcontrol.c	memcg: cleanup kmem cache creation/destruction functions naming	2014-06-04 16:54:08 -07:00
memory_hotplug.c	mm, migration: add destination page freeing callback	2014-06-04 16:54:06 -07:00
memory-failure.c	mm, migration: add destination page freeing callback	2014-06-04 16:54:06 -07:00
memory.c	mm: move get_user_pages()-related code to separate file	2014-06-04 16:54:04 -07:00
mempolicy.c	mm, migration: add destination page freeing callback	2014-06-04 16:54:06 -07:00
mempool.c	mm/mempool: warn about __GFP_ZERO usage	2014-06-04 16:53:58 -07:00
migrate.c	mm, migration: add destination page freeing callback	2014-06-04 16:54:06 -07:00
mincore.c	mm + fs: prepare for non-page entries in page cache radix trees	2014-04-03 16:21:00 -07:00
mlock.c	mm: try_to_unmap_cluster() should lock_page() before mlocking	2014-04-07 16:35:57 -07:00
mm_init.c
mmap.c	mm/mmap.c: remove the first mapping check	2014-06-04 16:54:01 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c	mm: move mmu notifier call from change_protection to change_pmd_range	2014-04-07 16:35:50 -07:00
mremap.c	mm, thp: close race between mremap() and split_huge_page()	2014-05-11 17:55:48 +09:00
msync.c
nobootmem.c	mm/nobootmem.c: mark function as static	2014-04-03 16:21:02 -07:00
nommu.c	mm: fix 'ERROR: do not initialise globals to 0 or NULL' and coding style	2014-04-07 16:35:55 -07:00
oom_kill.c
page_alloc.c	mm: page_alloc: convert hot/cold parameter and immediate callers to bool	2014-06-04 16:54:09 -07:00
page_cgroup.c	mm/page_cgroup.c: mark functions as static	2014-04-03 16:21:02 -07:00
page_io.c	swap: use bdev_read_page() / bdev_write_page()	2014-06-04 16:54:02 -07:00
page_isolation.c
page-writeback.c	mm: replace __get_cpu_var uses with this_cpu_ptr	2014-06-04 16:54:03 -07:00
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c	percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree()	2014-04-14 16:18:06 -04:00
pgtable-generic.c
process_vm_access.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2014-04-12 14:49:50 -07:00
quicklist.c
readahead.c	mm/readahead.c: inline ra_submit	2014-04-07 16:35:58 -07:00
rmap.c	mm: fold mlocked_vma_newpage() into its only call site	2014-06-04 16:54:07 -07:00
shmem.c	mm: non-atomically mark page accessed during page cache allocation where possible	2014-06-04 16:54:10 -07:00
slab_common.c	slab: delete cache from list after __kmem_cache_shutdown succeeds	2014-06-04 16:54:08 -07:00
slab.c	memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab	2014-06-04 16:54:01 -07:00
slab.h	memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab	2014-06-04 16:54:01 -07:00
slob.c	slab: get_online_mems for kmem_cache_{create,destroy,shrink}	2014-06-04 16:53:59 -07:00
slub.c	mm: replace __get_cpu_var uses with this_cpu_ptr	2014-06-04 16:54:03 -07:00
sparse-vmemmap.c
sparse.c	mm: use macros from compiler.h instead of __attribute__((...))	2014-04-07 16:35:54 -07:00
swap_state.c	mm: page_alloc: convert hot/cold parameter and immediate callers to bool	2014-06-04 16:54:09 -07:00
swap.c	mm: non-atomically mark page accessed during page cache allocation where possible	2014-06-04 16:54:10 -07:00
swapfile.c	swap: change swap_list_head to plist, add swap_avail_head	2014-06-04 16:54:07 -07:00
truncate.c	mm: filemap: update find_get_pages_tag() to deal with shadow entries	2014-05-06 13:04:59 -07:00
util.c	nick kvfree() from apparmor	2014-05-06 14:02:53 -04:00
vmacache.c	mm,vmacache: optimize overflow system-wide flushing	2014-06-04 16:53:57 -07:00
vmalloc.c	mm/vmalloc.c: replace seq_printf by seq_puts	2014-06-04 16:54:04 -07:00
vmpressure.c
vmscan.c	mm: page_alloc: convert hot/cold parameter and immediate callers to bool	2014-06-04 16:54:09 -07:00
vmstat.c	mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()	2014-06-04 16:54:07 -07:00
workingset.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
zbud.c
zsmalloc.c	mm: replace __get_cpu_var uses with this_cpu_ptr	2014-06-04 16:54:03 -07:00
zswap.c	Merge branch 'akpm' (incoming from Andrew)	2014-04-07 16:38:06 -07:00