forked from luck/tmp_suning_uos_patched
6b4f7799c6
The slab shrinkers are currently invoked from the zonelist walkers in kswapd, direct reclaim, and zone reclaim, all of which roughly gauge the eligible LRU pages and assemble a nodemask to pass to NUMA-aware shrinkers, which then again have to walk over the nodemask. This is redundant code, extra runtime work, and fairly inaccurate when it comes to the estimation of actually scannable LRU pages. The code duplication will only get worse when making the shrinkers cgroup-aware and requiring them to have out-of-band cgroup hierarchy walks as well. Instead, invoke the shrinkers from shrink_zone(), which is where all reclaimers end up, to avoid this duplication. Take the count for eligible LRU pages out of get_scan_count(), which considers many more factors than just the availability of swap space, like zone_reclaimable_pages() currently does. Accumulate the number over all visited lruvecs to get the per-zone value. Some nodes have multiple zones due to memory addressing restrictions. To avoid putting too much pressure on the shrinkers, only invoke them once for each such node, using the class zone of the allocation as the pivot zone. For now, this integrates the slab shrinking better into the reclaim logic and gets rid of duplicative invocations from kswapd, direct reclaim, and zone reclaim. It also prepares for cgroup-awareness, allowing memcg-capable shrinkers to be added at the lruvec level without much duplication of both code and runtime work. This changes kswapd behavior, which used to invoke the shrinkers for each zone, but with scan ratios gathered from the entire node, resulting in meaningless pressure quantities on multi-zone nodes. Zone reclaim behavior also changes. It used to shrink slabs until the same amount of pages were shrunk as were reclaimed from the LRUs. Now it merely invokes the shrinkers once with the zone's scan ratio, which makes the shrinkers go easier on caches that implement aging and would prefer feeding back pressure from recently used slab objects to unused LRU pages. [vdavydov@parallels.com: assure class zone is populated] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
69 lines
2.3 KiB
C
69 lines
2.3 KiB
C
#ifndef _LINUX_SHRINKER_H
|
|
#define _LINUX_SHRINKER_H
|
|
|
|
/*
|
|
* This struct is used to pass information from page reclaim to the shrinkers.
|
|
* We consolidate the values for easier extention later.
|
|
*
|
|
* The 'gfpmask' refers to the allocation we are currently trying to
|
|
* fulfil.
|
|
*/
|
|
struct shrink_control {
|
|
gfp_t gfp_mask;
|
|
|
|
/*
|
|
* How many objects scan_objects should scan and try to reclaim.
|
|
* This is reset before every call, so it is safe for callees
|
|
* to modify.
|
|
*/
|
|
unsigned long nr_to_scan;
|
|
|
|
/* current node being shrunk (for NUMA aware shrinkers) */
|
|
int nid;
|
|
};
|
|
|
|
#define SHRINK_STOP (~0UL)
|
|
/*
|
|
* A callback you can register to apply pressure to ageable caches.
|
|
*
|
|
* @count_objects should return the number of freeable items in the cache. If
|
|
* there are no objects to free or the number of freeable items cannot be
|
|
* determined, it should return 0. No deadlock checks should be done during the
|
|
* count callback - the shrinker relies on aggregating scan counts that couldn't
|
|
* be executed due to potential deadlocks to be run at a later call when the
|
|
* deadlock condition is no longer pending.
|
|
*
|
|
* @scan_objects will only be called if @count_objects returned a non-zero
|
|
* value for the number of freeable objects. The callout should scan the cache
|
|
* and attempt to free items from the cache. It should then return the number
|
|
* of objects freed during the scan, or SHRINK_STOP if progress cannot be made
|
|
* due to potential deadlocks. If SHRINK_STOP is returned, then no further
|
|
* attempts to call the @scan_objects will be made from the current reclaim
|
|
* context.
|
|
*
|
|
* @flags determine the shrinker abilities, like numa awareness
|
|
*/
|
|
struct shrinker {
|
|
unsigned long (*count_objects)(struct shrinker *,
|
|
struct shrink_control *sc);
|
|
unsigned long (*scan_objects)(struct shrinker *,
|
|
struct shrink_control *sc);
|
|
|
|
int seeks; /* seeks to recreate an obj */
|
|
long batch; /* reclaim batch size, 0 = default */
|
|
unsigned long flags;
|
|
|
|
/* These are for internal use */
|
|
struct list_head list;
|
|
/* objs pending delete, per node */
|
|
atomic_long_t *nr_deferred;
|
|
};
|
|
#define DEFAULT_SEEKS 2 /* A good number if you don't know better. */
|
|
|
|
/* Flags */
|
|
#define SHRINKER_NUMA_AWARE (1 << 0)
|
|
|
|
extern int register_shrinker(struct shrinker *);
|
|
extern void unregister_shrinker(struct shrinker *);
|
|
#endif
|