mm: memcontrol: rewrite uncharge API

The memcg uncharging code that is involved towards the end of a page's
lifetime - truncation, reclaim, swapout, migration - is impressively
complicated and fragile.

Because anonymous and file pages were always charged before they had their
page->mapping established, uncharges had to happen when the page type
could still be known from the context; as in unmap for anonymous, page
cache removal for file and shmem pages, and swap cache truncation for swap
pages.  However, these operations happen well before the page is actually
freed, and so a lot of synchronization is necessary:

- Charging, uncharging, page migration, and charge migration all need
  to take a per-page bit spinlock as they could race with uncharging.

- Swap cache truncation happens during both swap-in and swap-out, and
  possibly repeatedly before the page is actually freed.  This means
  that the memcg swapout code is called from many contexts that make
  no sense and it has to figure out the direction from page state to
  make sure memory and memory+swap are always correctly charged.

- On page migration, the old page might be unmapped but then reused,
  so memcg code has to prevent untimely uncharging in that case.
  Because this code - which should be a simple charge transfer - is so
  special-cased, it is not reusable for replace_page_cache().

But now that charged pages always have a page->mapping, introduce
mem_cgroup_uncharge(), which is called after the final put_page(), when we
know for sure that nobody is looking at the page anymore.

For page migration, introduce mem_cgroup_migrate(), which is called after
the migration is successful and the new page is fully rmapped.  Because
the old page is no longer uncharged after migration, prevent double
charges by decoupling the page's memcg association (PCG_USED and
pc->mem_cgroup) from the page holding an actual charge.  The new bits
PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
to the new page during migration.

mem_cgroup_migrate() is suitable for replace_page_cache() as well,
which gets rid of mem_cgroup_replace_page_cache().  However, care
needs to be taken because both the source and the target page can
already be charged and on the LRU when fuse is splicing: grab the page
lock on the charge moving side to prevent changing pc->mem_cgroup of a
page under migration.  Also, the lruvecs of both pages change as we
uncharge the old and charge the new during migration, and putback may
race with us, so grab the lru lock and isolate the pages iff on LRU to
prevent races and ensure the pages are on the right lruvec afterward.

Swap accounting is massively simplified: because the page is no longer
uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
before the final put_page() in page reclaim.

Finally, page_cgroup changes are now protected by whatever protection the
page itself offers: anonymous pages are charged under the page table lock,
whereas page cache insertions, swapin, and migration hold the page lock.
Uncharging happens under full exclusion with no outstanding references.
Charging and uncharging also ensure that the page is off-LRU, which
serializes against charge migration.  Remove the very costly page_cgroup
lock and set pc->flags non-atomically.

[mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
[vdavydov@parallels.com: fix flags definition]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Johannes Weiner 2014-08-08 14:19:22 -07:00 committed by Linus Torvalds
parent 00501b531c
commit 0a31bc97c8
16 changed files with 389 additions and 768 deletions

View File

@ -29,28 +29,13 @@ Please note that implementation details can be changed.
2. Uncharge 2. Uncharge
a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by
mem_cgroup_uncharge_page() mem_cgroup_uncharge()
Called when an anonymous page is fully unmapped. I.e., mapcount goes Called when a page's refcount goes down to 0.
to 0. If the page is SwapCache, uncharge is delayed until
mem_cgroup_uncharge_swapcache().
mem_cgroup_uncharge_cache_page()
Called when a page-cache is deleted from radix-tree. If the page is
SwapCache, uncharge is delayed until mem_cgroup_uncharge_swapcache().
mem_cgroup_uncharge_swapcache()
Called when SwapCache is removed from radix-tree. The charge itself
is moved to swap_cgroup. (If mem+swap controller is disabled, no
charge to swap occurs.)
mem_cgroup_uncharge_swap() mem_cgroup_uncharge_swap()
Called when swp_entry's refcnt goes down to 0. A charge against swap Called when swp_entry's refcnt goes down to 0. A charge against swap
disappears. disappears.
mem_cgroup_end_migration(old, new)
At success of migration old is uncharged (if necessary), a charge
to new page is committed. At failure, charge to old page is committed.
3. charge-commit-cancel 3. charge-commit-cancel
Memcg pages are charged in two steps: Memcg pages are charged in two steps:
mem_cgroup_try_charge() mem_cgroup_try_charge()
@ -69,18 +54,6 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
Anonymous page is newly allocated at Anonymous page is newly allocated at
- page fault into MAP_ANONYMOUS mapping. - page fault into MAP_ANONYMOUS mapping.
- Copy-On-Write. - Copy-On-Write.
It is charged right after it's allocated before doing any page table
related operations. Of course, it's uncharged when another page is used
for the fault address.
At freeing anonymous page (by exit() or munmap()), zap_pte() is called
and pages for ptes are freed one by one.(see mm/memory.c). Uncharges
are done at page_remove_rmap() when page_mapcount() goes down to 0.
Another page freeing is by page-reclaim (vmscan.c) and anonymous
pages are swapped out. In this case, the page is marked as
PageSwapCache(). uncharge() routine doesn't uncharge the page marked
as SwapCache(). It's delayed until __delete_from_swap_cache().
4.1 Swap-in. 4.1 Swap-in.
At swap-in, the page is taken from swap-cache. There are 2 cases. At swap-in, the page is taken from swap-cache. There are 2 cases.
@ -89,41 +62,6 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
(b) If the SwapCache has been mapped by processes, it has been (b) If the SwapCache has been mapped by processes, it has been
charged already. charged already.
This swap-in is one of the most complicated work. In do_swap_page(),
following events occur when pte is unchanged.
(1) the page (SwapCache) is looked up.
(2) lock_page()
(3) try_charge_swapin()
(4) reuse_swap_page() (may call delete_swap_cache())
(5) commit_charge_swapin()
(6) swap_free().
Considering following situation for example.
(A) The page has not been charged before (2) and reuse_swap_page()
doesn't call delete_from_swap_cache().
(B) The page has not been charged before (2) and reuse_swap_page()
calls delete_from_swap_cache().
(C) The page has been charged before (2) and reuse_swap_page() doesn't
call delete_from_swap_cache().
(D) The page has been charged before (2) and reuse_swap_page() calls
delete_from_swap_cache().
memory.usage/memsw.usage changes to this page/swp_entry will be
Case (A) (B) (C) (D)
Event
Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
===========================================
(3) +1/+1 +1/+1 +1/+1 +1/+1
(4) - 0/ 0 - -1/ 0
(5) 0/-1 0/ 0 -1/-1 0/ 0
(6) - 0/-1 - 0/-1
===========================================
Result 1/ 1 1/ 1 1/ 1 1/ 1
In any cases, charges to this page should be 1/ 1.
4.2 Swap-out. 4.2 Swap-out.
At swap-out, typical state transition is below. At swap-out, typical state transition is below.
@ -136,28 +74,20 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
swp_entry's refcnt -= 1. swp_entry's refcnt -= 1.
At (b), the page is marked as SwapCache and not uncharged.
At (d), the page is removed from SwapCache and a charge in page_cgroup
is moved to swap_cgroup.
Finally, at task exit, Finally, at task exit,
(e) zap_pte() is called and swp_entry's refcnt -=1 -> 0. (e) zap_pte() is called and swp_entry's refcnt -=1 -> 0.
Here, a charge in swap_cgroup disappears.
5. Page Cache 5. Page Cache
Page Cache is charged at Page Cache is charged at
- add_to_page_cache_locked(). - add_to_page_cache_locked().
uncharged at
- __remove_from_page_cache().
The logic is very clear. (About migration, see below) The logic is very clear. (About migration, see below)
Note: __remove_from_page_cache() is called by remove_from_page_cache() Note: __remove_from_page_cache() is called by remove_from_page_cache()
and __remove_mapping(). and __remove_mapping().
6. Shmem(tmpfs) Page Cache 6. Shmem(tmpfs) Page Cache
Memcg's charge/uncharge have special handlers of shmem. The best way The best way to understand shmem's page state transition is to read
to understand shmem's page state transition is to read mm/shmem.c. mm/shmem.c.
But brief explanation of the behavior of memcg around shmem will be But brief explanation of the behavior of memcg around shmem will be
helpful to understand the logic. helpful to understand the logic.
@ -170,56 +100,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
It's charged when... It's charged when...
- A new page is added to shmem's radix-tree. - A new page is added to shmem's radix-tree.
- A swp page is read. (move a charge from swap_cgroup to page_cgroup) - A swp page is read. (move a charge from swap_cgroup to page_cgroup)
It's uncharged when
- A page is removed from radix-tree and not SwapCache.
- When SwapCache is removed, a charge is moved to swap_cgroup.
- When swp_entry's refcnt goes down to 0, a charge in swap_cgroup
disappears.
7. Page Migration 7. Page Migration
One of the most complicated functions is page-migration-handler.
Memcg has 2 routines. Assume that we are migrating a page's contents
from OLDPAGE to NEWPAGE.
Usual migration logic is.. mem_cgroup_migrate()
(a) remove the page from LRU.
(b) allocate NEWPAGE (migration target)
(c) lock by lock_page().
(d) unmap all mappings.
(e-1) If necessary, replace entry in radix-tree.
(e-2) move contents of a page.
(f) map all mappings again.
(g) pushback the page to LRU.
(-) OLDPAGE will be freed.
Before (g), memcg should complete all necessary charge/uncharge to
NEWPAGE/OLDPAGE.
The point is....
- If OLDPAGE is anonymous, all charges will be dropped at (d) because
try_to_unmap() drops all mapcount and the page will not be
SwapCache.
- If OLDPAGE is SwapCache, charges will be kept at (g) because
__delete_from_swap_cache() isn't called at (e-1)
- If OLDPAGE is page-cache, charges will be kept at (g) because
remove_from_swap_cache() isn't called at (e-1)
memcg provides following hooks.
- mem_cgroup_prepare_migration(OLDPAGE)
Called after (b) to account a charge (usage += PAGE_SIZE) against
memcg which OLDPAGE belongs to.
- mem_cgroup_end_migration(OLDPAGE, NEWPAGE)
Called after (f) before (g).
If OLDPAGE is used, commit OLDPAGE again. If OLDPAGE is already
charged, a charge by prepare_migration() is automatically canceled.
If NEWPAGE is used, commit NEWPAGE and uncharge OLDPAGE.
But zap_pte() (by exit or munmap) can be called while migration,
we have to check if OLDPAGE/NEWPAGE is a valid page after commit().
8. LRU 8. LRU
Each memcg has its own private LRU. Now, its handling is under global Each memcg has its own private LRU. Now, its handling is under global

View File

@ -60,16 +60,18 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
bool lrucare); bool lrucare);
void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg); void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
void mem_cgroup_uncharge(struct page *page);
/* Batched uncharging */
void mem_cgroup_uncharge_start(void);
void mem_cgroup_uncharge_end(void);
void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
bool lrucare);
struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
/* For coalescing uncharge for reducing memcg' overhead*/
extern void mem_cgroup_uncharge_start(void);
extern void mem_cgroup_uncharge_end(void);
extern void mem_cgroup_uncharge_page(struct page *page);
extern void mem_cgroup_uncharge_cache_page(struct page *page);
bool __mem_cgroup_same_or_subtree(const struct mem_cgroup *root_memcg, bool __mem_cgroup_same_or_subtree(const struct mem_cgroup *root_memcg,
struct mem_cgroup *memcg); struct mem_cgroup *memcg);
bool task_in_mem_cgroup(struct task_struct *task, bool task_in_mem_cgroup(struct task_struct *task,
@ -96,12 +98,6 @@ bool mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *memcg)
extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg); extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg);
extern void
mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
struct mem_cgroup **memcgp);
extern void mem_cgroup_end_migration(struct mem_cgroup *memcg,
struct page *oldpage, struct page *newpage, bool migration_ok);
struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *, struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *,
struct mem_cgroup *, struct mem_cgroup *,
struct mem_cgroup_reclaim_cookie *); struct mem_cgroup_reclaim_cookie *);
@ -116,8 +112,6 @@ unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list);
void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int); void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int);
extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
struct task_struct *p); struct task_struct *p);
extern void mem_cgroup_replace_page_cache(struct page *oldpage,
struct page *newpage);
static inline void mem_cgroup_oom_enable(void) static inline void mem_cgroup_oom_enable(void)
{ {
@ -235,6 +229,10 @@ static inline void mem_cgroup_cancel_charge(struct page *page,
{ {
} }
static inline void mem_cgroup_uncharge(struct page *page)
{
}
static inline void mem_cgroup_uncharge_start(void) static inline void mem_cgroup_uncharge_start(void)
{ {
} }
@ -243,11 +241,9 @@ static inline void mem_cgroup_uncharge_end(void)
{ {
} }
static inline void mem_cgroup_uncharge_page(struct page *page) static inline void mem_cgroup_migrate(struct page *oldpage,
{ struct page *newpage,
} bool lrucare)
static inline void mem_cgroup_uncharge_cache_page(struct page *page)
{ {
} }
@ -286,17 +282,6 @@ static inline struct cgroup_subsys_state
return NULL; return NULL;
} }
static inline void
mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
struct mem_cgroup **memcgp)
{
}
static inline void mem_cgroup_end_migration(struct mem_cgroup *memcg,
struct page *oldpage, struct page *newpage, bool migration_ok)
{
}
static inline struct mem_cgroup * static inline struct mem_cgroup *
mem_cgroup_iter(struct mem_cgroup *root, mem_cgroup_iter(struct mem_cgroup *root,
struct mem_cgroup *prev, struct mem_cgroup *prev,
@ -392,10 +377,6 @@ static inline
void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx) void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
{ {
} }
static inline void mem_cgroup_replace_page_cache(struct page *oldpage,
struct page *newpage)
{
}
#endif /* CONFIG_MEMCG */ #endif /* CONFIG_MEMCG */
#if !defined(CONFIG_MEMCG) || !defined(CONFIG_DEBUG_VM) #if !defined(CONFIG_MEMCG) || !defined(CONFIG_DEBUG_VM)

View File

@ -3,9 +3,9 @@
enum { enum {
/* flags for mem_cgroup */ /* flags for mem_cgroup */
PCG_LOCK, /* Lock for pc->mem_cgroup and following bits. */ PCG_USED = 0x01, /* This page is charged to a memcg */
PCG_USED, /* this object is in use. */ PCG_MEM = 0x02, /* This page holds a memory charge */
PCG_MIGRATION, /* under page migration */ PCG_MEMSW = 0x04, /* This page holds a memory+swap charge */
__NR_PCG_FLAGS, __NR_PCG_FLAGS,
}; };
@ -44,42 +44,9 @@ static inline void __init page_cgroup_init(void)
struct page_cgroup *lookup_page_cgroup(struct page *page); struct page_cgroup *lookup_page_cgroup(struct page *page);
struct page *lookup_cgroup_page(struct page_cgroup *pc); struct page *lookup_cgroup_page(struct page_cgroup *pc);
#define TESTPCGFLAG(uname, lname) \ static inline int PageCgroupUsed(struct page_cgroup *pc)
static inline int PageCgroup##uname(struct page_cgroup *pc) \
{ return test_bit(PCG_##lname, &pc->flags); }
#define SETPCGFLAG(uname, lname) \
static inline void SetPageCgroup##uname(struct page_cgroup *pc)\
{ set_bit(PCG_##lname, &pc->flags); }
#define CLEARPCGFLAG(uname, lname) \
static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \
{ clear_bit(PCG_##lname, &pc->flags); }
#define TESTCLEARPCGFLAG(uname, lname) \
static inline int TestClearPageCgroup##uname(struct page_cgroup *pc) \
{ return test_and_clear_bit(PCG_##lname, &pc->flags); }
TESTPCGFLAG(Used, USED)
CLEARPCGFLAG(Used, USED)
SETPCGFLAG(Used, USED)
SETPCGFLAG(Migration, MIGRATION)
CLEARPCGFLAG(Migration, MIGRATION)
TESTPCGFLAG(Migration, MIGRATION)
static inline void lock_page_cgroup(struct page_cgroup *pc)
{ {
/* return !!(pc->flags & PCG_USED);
* Don't take this lock in IRQ context.
* This lock is for pc->mem_cgroup, USED, MIGRATION
*/
bit_spin_lock(PCG_LOCK, &pc->flags);
}
static inline void unlock_page_cgroup(struct page_cgroup *pc)
{
bit_spin_unlock(PCG_LOCK, &pc->flags);
} }
#else /* CONFIG_MEMCG */ #else /* CONFIG_MEMCG */

View File

@ -381,9 +381,13 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
} }
#endif #endif
#ifdef CONFIG_MEMCG_SWAP #ifdef CONFIG_MEMCG_SWAP
extern void mem_cgroup_uncharge_swap(swp_entry_t ent); extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry);
extern void mem_cgroup_uncharge_swap(swp_entry_t entry);
#else #else
static inline void mem_cgroup_uncharge_swap(swp_entry_t ent) static inline void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
{
}
static inline void mem_cgroup_uncharge_swap(swp_entry_t entry)
{ {
} }
#endif #endif
@ -443,7 +447,7 @@ extern void swap_shmem_alloc(swp_entry_t);
extern int swap_duplicate(swp_entry_t); extern int swap_duplicate(swp_entry_t);
extern int swapcache_prepare(swp_entry_t); extern int swapcache_prepare(swp_entry_t);
extern void swap_free(swp_entry_t); extern void swap_free(swp_entry_t);
extern void swapcache_free(swp_entry_t, struct page *page); extern void swapcache_free(swp_entry_t);
extern int free_swap_and_cache(swp_entry_t); extern int free_swap_and_cache(swp_entry_t);
extern int swap_type_of(dev_t, sector_t, struct block_device **); extern int swap_type_of(dev_t, sector_t, struct block_device **);
extern unsigned int count_swap_pages(int, int); extern unsigned int count_swap_pages(int, int);
@ -507,7 +511,7 @@ static inline void swap_free(swp_entry_t swp)
{ {
} }
static inline void swapcache_free(swp_entry_t swp, struct page *page) static inline void swapcache_free(swp_entry_t swp)
{ {
} }

View File

@ -234,7 +234,6 @@ void delete_from_page_cache(struct page *page)
spin_lock_irq(&mapping->tree_lock); spin_lock_irq(&mapping->tree_lock);
__delete_from_page_cache(page, NULL); __delete_from_page_cache(page, NULL);
spin_unlock_irq(&mapping->tree_lock); spin_unlock_irq(&mapping->tree_lock);
mem_cgroup_uncharge_cache_page(page);
if (freepage) if (freepage)
freepage(page); freepage(page);
@ -490,8 +489,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
if (PageSwapBacked(new)) if (PageSwapBacked(new))
__inc_zone_page_state(new, NR_SHMEM); __inc_zone_page_state(new, NR_SHMEM);
spin_unlock_irq(&mapping->tree_lock); spin_unlock_irq(&mapping->tree_lock);
/* mem_cgroup codes must not be called under tree_lock */ mem_cgroup_migrate(old, new, true);
mem_cgroup_replace_page_cache(old, new);
radix_tree_preload_end(); radix_tree_preload_end();
if (freepage) if (freepage)
freepage(old); freepage(old);

File diff suppressed because it is too large Load Diff

View File

@ -1292,7 +1292,6 @@ static void unmap_page_range(struct mmu_gather *tlb,
details = NULL; details = NULL;
BUG_ON(addr >= end); BUG_ON(addr >= end);
mem_cgroup_uncharge_start();
tlb_start_vma(tlb, vma); tlb_start_vma(tlb, vma);
pgd = pgd_offset(vma->vm_mm, addr); pgd = pgd_offset(vma->vm_mm, addr);
do { do {
@ -1302,7 +1301,6 @@ static void unmap_page_range(struct mmu_gather *tlb,
next = zap_pud_range(tlb, vma, pgd, addr, next, details); next = zap_pud_range(tlb, vma, pgd, addr, next, details);
} while (pgd++, addr = next, addr != end); } while (pgd++, addr = next, addr != end);
tlb_end_vma(tlb, vma); tlb_end_vma(tlb, vma);
mem_cgroup_uncharge_end();
} }

View File

@ -780,6 +780,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
if (rc != MIGRATEPAGE_SUCCESS) { if (rc != MIGRATEPAGE_SUCCESS) {
newpage->mapping = NULL; newpage->mapping = NULL;
} else { } else {
mem_cgroup_migrate(page, newpage, false);
if (remap_swapcache) if (remap_swapcache)
remove_migration_ptes(page, newpage); remove_migration_ptes(page, newpage);
page->mapping = NULL; page->mapping = NULL;
@ -795,7 +796,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
{ {
int rc = -EAGAIN; int rc = -EAGAIN;
int remap_swapcache = 1; int remap_swapcache = 1;
struct mem_cgroup *mem;
struct anon_vma *anon_vma = NULL; struct anon_vma *anon_vma = NULL;
if (!trylock_page(page)) { if (!trylock_page(page)) {
@ -821,9 +821,6 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
lock_page(page); lock_page(page);
} }
/* charge against new page */
mem_cgroup_prepare_migration(page, newpage, &mem);
if (PageWriteback(page)) { if (PageWriteback(page)) {
/* /*
* Only in the case of a full synchronous migration is it * Only in the case of a full synchronous migration is it
@ -833,10 +830,10 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
*/ */
if (mode != MIGRATE_SYNC) { if (mode != MIGRATE_SYNC) {
rc = -EBUSY; rc = -EBUSY;
goto uncharge; goto out_unlock;
} }
if (!force) if (!force)
goto uncharge; goto out_unlock;
wait_on_page_writeback(page); wait_on_page_writeback(page);
} }
/* /*
@ -872,7 +869,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
*/ */
remap_swapcache = 0; remap_swapcache = 0;
} else { } else {
goto uncharge; goto out_unlock;
} }
} }
@ -885,7 +882,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
* the page migration right away (proteced by page lock). * the page migration right away (proteced by page lock).
*/ */
rc = balloon_page_migrate(newpage, page, mode); rc = balloon_page_migrate(newpage, page, mode);
goto uncharge; goto out_unlock;
} }
/* /*
@ -904,7 +901,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
VM_BUG_ON_PAGE(PageAnon(page), page); VM_BUG_ON_PAGE(PageAnon(page), page);
if (page_has_private(page)) { if (page_has_private(page)) {
try_to_free_buffers(page); try_to_free_buffers(page);
goto uncharge; goto out_unlock;
} }
goto skip_unmap; goto skip_unmap;
} }
@ -923,10 +920,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
if (anon_vma) if (anon_vma)
put_anon_vma(anon_vma); put_anon_vma(anon_vma);
uncharge: out_unlock:
mem_cgroup_end_migration(mem, page, newpage,
(rc == MIGRATEPAGE_SUCCESS ||
rc == MIGRATEPAGE_BALLOON_SUCCESS));
unlock_page(page); unlock_page(page);
out: out:
return rc; return rc;
@ -1786,7 +1780,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
pg_data_t *pgdat = NODE_DATA(node); pg_data_t *pgdat = NODE_DATA(node);
int isolated = 0; int isolated = 0;
struct page *new_page = NULL; struct page *new_page = NULL;
struct mem_cgroup *memcg = NULL;
int page_lru = page_is_file_cache(page); int page_lru = page_is_file_cache(page);
unsigned long mmun_start = address & HPAGE_PMD_MASK; unsigned long mmun_start = address & HPAGE_PMD_MASK;
unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE; unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
@ -1852,15 +1845,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
goto out_unlock; goto out_unlock;
} }
/*
* Traditional migration needs to prepare the memcg charge
* transaction early to prevent the old page from being
* uncharged when installing migration entries. Here we can
* save the potential rollback and start the charge transfer
* only when migration is already known to end successfully.
*/
mem_cgroup_prepare_migration(page, new_page, &memcg);
orig_entry = *pmd; orig_entry = *pmd;
entry = mk_pmd(new_page, vma->vm_page_prot); entry = mk_pmd(new_page, vma->vm_page_prot);
entry = pmd_mkhuge(entry); entry = pmd_mkhuge(entry);
@ -1888,14 +1872,10 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
goto fail_putback; goto fail_putback;
} }
mem_cgroup_migrate(page, new_page, false);
page_remove_rmap(page); page_remove_rmap(page);
/*
* Finish the charge transaction under the page table lock to
* prevent split_huge_page() from dividing up the charge
* before it's fully transferred to the new page.
*/
mem_cgroup_end_migration(memcg, page, new_page, true);
spin_unlock(ptl); spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

View File

@ -1089,7 +1089,6 @@ void page_remove_rmap(struct page *page)
if (unlikely(PageHuge(page))) if (unlikely(PageHuge(page)))
goto out; goto out;
if (anon) { if (anon) {
mem_cgroup_uncharge_page(page);
if (PageTransHuge(page)) if (PageTransHuge(page))
__dec_zone_page_state(page, __dec_zone_page_state(page,
NR_ANON_TRANSPARENT_HUGEPAGES); NR_ANON_TRANSPARENT_HUGEPAGES);

View File

@ -419,7 +419,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
pvec.pages, indices); pvec.pages, indices);
if (!pvec.nr) if (!pvec.nr)
break; break;
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) { for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i]; struct page *page = pvec.pages[i];
@ -447,7 +446,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
} }
pagevec_remove_exceptionals(&pvec); pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec); pagevec_release(&pvec);
mem_cgroup_uncharge_end();
cond_resched(); cond_resched();
index++; index++;
} }
@ -495,7 +493,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
index = start; index = start;
continue; continue;
} }
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) { for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i]; struct page *page = pvec.pages[i];
@ -531,7 +528,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
} }
pagevec_remove_exceptionals(&pvec); pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec); pagevec_release(&pvec);
mem_cgroup_uncharge_end();
index++; index++;
} }
@ -835,7 +831,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
} }
mutex_unlock(&shmem_swaplist_mutex); mutex_unlock(&shmem_swaplist_mutex);
swapcache_free(swap, NULL); swapcache_free(swap);
redirty: redirty:
set_page_dirty(page); set_page_dirty(page);
if (wbc->for_reclaim) if (wbc->for_reclaim)
@ -1008,7 +1004,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
*/ */
oldpage = newpage; oldpage = newpage;
} else { } else {
mem_cgroup_replace_page_cache(oldpage, newpage); mem_cgroup_migrate(oldpage, newpage, false);
lru_cache_add_anon(newpage); lru_cache_add_anon(newpage);
*pagep = newpage; *pagep = newpage;
} }

View File

@ -62,6 +62,7 @@ static void __page_cache_release(struct page *page)
del_page_from_lru_list(page, lruvec, page_off_lru(page)); del_page_from_lru_list(page, lruvec, page_off_lru(page));
spin_unlock_irqrestore(&zone->lru_lock, flags); spin_unlock_irqrestore(&zone->lru_lock, flags);
} }
mem_cgroup_uncharge(page);
} }
static void __put_single_page(struct page *page) static void __put_single_page(struct page *page)
@ -907,6 +908,8 @@ void release_pages(struct page **pages, int nr, bool cold)
struct lruvec *lruvec; struct lruvec *lruvec;
unsigned long uninitialized_var(flags); unsigned long uninitialized_var(flags);
mem_cgroup_uncharge_start();
for (i = 0; i < nr; i++) { for (i = 0; i < nr; i++) {
struct page *page = pages[i]; struct page *page = pages[i];
@ -938,6 +941,7 @@ void release_pages(struct page **pages, int nr, bool cold)
__ClearPageLRU(page); __ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, page_off_lru(page)); del_page_from_lru_list(page, lruvec, page_off_lru(page));
} }
mem_cgroup_uncharge(page);
/* Clear Active bit in case of parallel mark_page_accessed */ /* Clear Active bit in case of parallel mark_page_accessed */
__ClearPageActive(page); __ClearPageActive(page);
@ -947,6 +951,8 @@ void release_pages(struct page **pages, int nr, bool cold)
if (zone) if (zone)
spin_unlock_irqrestore(&zone->lru_lock, flags); spin_unlock_irqrestore(&zone->lru_lock, flags);
mem_cgroup_uncharge_end();
free_hot_cold_page_list(&pages_to_free, cold); free_hot_cold_page_list(&pages_to_free, cold);
} }
EXPORT_SYMBOL(release_pages); EXPORT_SYMBOL(release_pages);

View File

@ -176,7 +176,7 @@ int add_to_swap(struct page *page, struct list_head *list)
if (unlikely(PageTransHuge(page))) if (unlikely(PageTransHuge(page)))
if (unlikely(split_huge_page_to_list(page, list))) { if (unlikely(split_huge_page_to_list(page, list))) {
swapcache_free(entry, NULL); swapcache_free(entry);
return 0; return 0;
} }
@ -202,7 +202,7 @@ int add_to_swap(struct page *page, struct list_head *list)
* add_to_swap_cache() doesn't return -EEXIST, so we can safely * add_to_swap_cache() doesn't return -EEXIST, so we can safely
* clear SWAP_HAS_CACHE flag. * clear SWAP_HAS_CACHE flag.
*/ */
swapcache_free(entry, NULL); swapcache_free(entry);
return 0; return 0;
} }
} }
@ -225,7 +225,7 @@ void delete_from_swap_cache(struct page *page)
__delete_from_swap_cache(page); __delete_from_swap_cache(page);
spin_unlock_irq(&address_space->tree_lock); spin_unlock_irq(&address_space->tree_lock);
swapcache_free(entry, page); swapcache_free(entry);
page_cache_release(page); page_cache_release(page);
} }
@ -386,7 +386,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
* add_to_swap_cache() doesn't return -EEXIST, so we can safely * add_to_swap_cache() doesn't return -EEXIST, so we can safely
* clear SWAP_HAS_CACHE flag. * clear SWAP_HAS_CACHE flag.
*/ */
swapcache_free(entry, NULL); swapcache_free(entry);
} while (err != -ENOMEM); } while (err != -ENOMEM);
if (new_page) if (new_page)

View File

@ -843,16 +843,13 @@ void swap_free(swp_entry_t entry)
/* /*
* Called after dropping swapcache to decrease refcnt to swap entries. * Called after dropping swapcache to decrease refcnt to swap entries.
*/ */
void swapcache_free(swp_entry_t entry, struct page *page) void swapcache_free(swp_entry_t entry)
{ {
struct swap_info_struct *p; struct swap_info_struct *p;
unsigned char count;
p = swap_info_get(entry); p = swap_info_get(entry);
if (p) { if (p) {
count = swap_entry_free(p, entry, SWAP_HAS_CACHE); swap_entry_free(p, entry, SWAP_HAS_CACHE);
if (page)
mem_cgroup_uncharge_swapcache(page, entry, count != 0);
spin_unlock(&p->lock); spin_unlock(&p->lock);
} }
} }

View File

@ -281,7 +281,6 @@ void truncate_inode_pages_range(struct address_space *mapping,
while (index < end && pagevec_lookup_entries(&pvec, mapping, index, while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE), min(end - index, (pgoff_t)PAGEVEC_SIZE),
indices)) { indices)) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) { for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i]; struct page *page = pvec.pages[i];
@ -307,7 +306,6 @@ void truncate_inode_pages_range(struct address_space *mapping,
} }
pagevec_remove_exceptionals(&pvec); pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec); pagevec_release(&pvec);
mem_cgroup_uncharge_end();
cond_resched(); cond_resched();
index++; index++;
} }
@ -369,7 +367,6 @@ void truncate_inode_pages_range(struct address_space *mapping,
pagevec_release(&pvec); pagevec_release(&pvec);
break; break;
} }
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) { for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i]; struct page *page = pvec.pages[i];
@ -394,7 +391,6 @@ void truncate_inode_pages_range(struct address_space *mapping,
} }
pagevec_remove_exceptionals(&pvec); pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec); pagevec_release(&pvec);
mem_cgroup_uncharge_end();
index++; index++;
} }
cleancache_invalidate_inode(mapping); cleancache_invalidate_inode(mapping);
@ -493,7 +489,6 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
while (index <= end && pagevec_lookup_entries(&pvec, mapping, index, while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1, min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
indices)) { indices)) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) { for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i]; struct page *page = pvec.pages[i];
@ -522,7 +517,6 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
} }
pagevec_remove_exceptionals(&pvec); pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec); pagevec_release(&pvec);
mem_cgroup_uncharge_end();
cond_resched(); cond_resched();
index++; index++;
} }
@ -553,7 +547,6 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)
BUG_ON(page_has_private(page)); BUG_ON(page_has_private(page));
__delete_from_page_cache(page, NULL); __delete_from_page_cache(page, NULL);
spin_unlock_irq(&mapping->tree_lock); spin_unlock_irq(&mapping->tree_lock);
mem_cgroup_uncharge_cache_page(page);
if (mapping->a_ops->freepage) if (mapping->a_ops->freepage)
mapping->a_ops->freepage(page); mapping->a_ops->freepage(page);
@ -602,7 +595,6 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
while (index <= end && pagevec_lookup_entries(&pvec, mapping, index, while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1, min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
indices)) { indices)) {
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) { for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i]; struct page *page = pvec.pages[i];
@ -655,7 +647,6 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
} }
pagevec_remove_exceptionals(&pvec); pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec); pagevec_release(&pvec);
mem_cgroup_uncharge_end();
cond_resched(); cond_resched();
index++; index++;
} }

View File

@ -577,9 +577,10 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
if (PageSwapCache(page)) { if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page_private(page) }; swp_entry_t swap = { .val = page_private(page) };
mem_cgroup_swapout(page, swap);
__delete_from_swap_cache(page); __delete_from_swap_cache(page);
spin_unlock_irq(&mapping->tree_lock); spin_unlock_irq(&mapping->tree_lock);
swapcache_free(swap, page); swapcache_free(swap);
} else { } else {
void (*freepage)(struct page *); void (*freepage)(struct page *);
void *shadow = NULL; void *shadow = NULL;
@ -600,7 +601,6 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
shadow = workingset_eviction(mapping, page); shadow = workingset_eviction(mapping, page);
__delete_from_page_cache(page, shadow); __delete_from_page_cache(page, shadow);
spin_unlock_irq(&mapping->tree_lock); spin_unlock_irq(&mapping->tree_lock);
mem_cgroup_uncharge_cache_page(page);
if (freepage != NULL) if (freepage != NULL)
freepage(page); freepage(page);
@ -1103,6 +1103,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
*/ */
__clear_page_locked(page); __clear_page_locked(page);
free_it: free_it:
mem_cgroup_uncharge(page);
nr_reclaimed++; nr_reclaimed++;
/* /*
@ -1132,12 +1133,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
list_add(&page->lru, &ret_pages); list_add(&page->lru, &ret_pages);
VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page); VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page);
} }
mem_cgroup_uncharge_end();
free_hot_cold_page_list(&free_pages, true); free_hot_cold_page_list(&free_pages, true);
list_splice(&ret_pages, page_list); list_splice(&ret_pages, page_list);
count_vm_events(PGACTIVATE, pgactivate); count_vm_events(PGACTIVATE, pgactivate);
mem_cgroup_uncharge_end();
*ret_nr_dirty += nr_dirty; *ret_nr_dirty += nr_dirty;
*ret_nr_congested += nr_congested; *ret_nr_congested += nr_congested;
*ret_nr_unqueued_dirty += nr_unqueued_dirty; *ret_nr_unqueued_dirty += nr_unqueued_dirty;
@ -1435,6 +1437,8 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list)
__ClearPageActive(page); __ClearPageActive(page);
del_page_from_lru_list(page, lruvec, lru); del_page_from_lru_list(page, lruvec, lru);
mem_cgroup_uncharge(page);
if (unlikely(PageCompound(page))) { if (unlikely(PageCompound(page))) {
spin_unlock_irq(&zone->lru_lock); spin_unlock_irq(&zone->lru_lock);
(*get_compound_page_dtor(page))(page); (*get_compound_page_dtor(page))(page);
@ -1656,6 +1660,8 @@ static void move_active_pages_to_lru(struct lruvec *lruvec,
__ClearPageActive(page); __ClearPageActive(page);
del_page_from_lru_list(page, lruvec, lru); del_page_from_lru_list(page, lruvec, lru);
mem_cgroup_uncharge(page);
if (unlikely(PageCompound(page))) { if (unlikely(PageCompound(page))) {
spin_unlock_irq(&zone->lru_lock); spin_unlock_irq(&zone->lru_lock);
(*get_compound_page_dtor(page))(page); (*get_compound_page_dtor(page))(page);

View File

@ -507,7 +507,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
* add_to_swap_cache() doesn't return -EEXIST, so we can safely * add_to_swap_cache() doesn't return -EEXIST, so we can safely
* clear SWAP_HAS_CACHE flag. * clear SWAP_HAS_CACHE flag.
*/ */
swapcache_free(entry, NULL); swapcache_free(entry);
} while (err != -ENOMEM); } while (err != -ENOMEM);
if (new_page) if (new_page)