kernel_optimize_test/arch/powerpc/mm
Aneesh Kumar K.V 0ac52dd766 powerpc: Make linux pagetable walk safe with THP enabled
We need to have irqs disabled to handle all the possible parallel update for
linux page table without holding locks.

Events that we are intersted in while walking page tables are
1) Page fault
2) umap
3) THP split
4) THP collapse

A) local_irq_disabled:
------------------------
1) page fault:
A none to valid transition via page fault is not an issue because we
would either see a none or valid. If it is none, we would error out
the page table walk. We may need to use on stack values when checking for
type of page table elements, because if we do

if (!is_hugepd()) {
    if (!pmd_none() {
       if (pmd_bad() {

We could take that bad condition because the pmd got converted to a hugepd
after the !is_hugepd check via a hugetlb fault.

The right way would be to check for pmd_none higher up or use on stack value.

2) A valid to none conversion via unmap:
We can safely walk the upper level table, because we don't remove the the
page table entries until rcu grace period. So even if we followed a
wrong pointer we still have the pointer valid till the grace period.

A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and
 _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent
pte_clear we take _PAGE_BUSY.

3) THP split:
A valid transparent hugepage is converted to nomal page. Before we split we
do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING
So when walking page table we need to check for pmd_trans_splitting and
handle that. The pte returned should also need to be checked for
_PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save
the value of PTE on stack and check for the flag in the local pte value.
If we don't have the value set we can safely operate on the local pte value
and we atomicaly set _PAGE_BUSY.

4) THP collapse:
A normal page gets converted to hugepage. In the collapse path, we
mark the pmd none early (pmdp_clear_flush). With irq disabled, if we
are aleady walking page table we would see the pmd_none and won't continue.
If we see a valid PMD, we should still check for _PAGE_PRESENT before
setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:56 +10:00
..
40x_mmu.c memblock: Remove rmo_size, burry it in arch/powerpc where it belongs 2010-08-05 12:56:08 +10:00
44x_mmu.c Disintegrate asm/system.h for PowerPC 2012-03-28 18:30:02 +01:00
dma-noncoherent.c powerpc: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:14 +08:00
fault.c powerpc: Exception hooks for context tracking subsystem 2013-05-14 16:00:19 +10:00
fsl_booke_mmu.c powerpc/fsl-booke: Fixup calc_cam_sz to support MMU v2 2012-03-15 12:12:19 -05:00
gup.c powerpc: Update gup_pmd_range to handle transparent hugepages 2013-06-21 16:01:55 +10:00
hash_low_32.S powerpc: Use CURRENT_THREAD_INFO instead of open coded assembly 2012-07-11 14:18:22 +10:00
hash_low_64.S powerpc/mm: handle hugepage size correctly when invalidating hpte entries 2013-06-21 16:01:52 +10:00
hash_native_64.c powerpc/mm: handle hugepage size correctly when invalidating hpte entries 2013-06-21 16:01:52 +10:00
hash_utils_64.c powerpc: Make linux pagetable walk safe with THP enabled 2013-06-21 16:01:56 +10:00
highmem.c mm: fix race in kunmap_atomic() 2010-10-27 18:03:05 -07:00
hugepage-hash64.c powerpc: Make linux pagetable walk safe with THP enabled 2013-06-21 16:01:56 +10:00
hugetlbpage-book3e.c powerpc/book3e: Change hugetlb preload to take vma argument 2011-12-07 16:26:24 +11:00
hugetlbpage-hash64.c powerpc/mm: handle hugepage size correctly when invalidating hpte entries 2013-06-21 16:01:52 +10:00
hugetlbpage.c powerpc: Make linux pagetable walk safe with THP enabled 2013-06-21 16:01:56 +10:00
icswx_pid.c powerpc: Split ICSWX ACOP and PID processing 2011-11-25 14:11:27 +11:00
icswx.c powerpc: Fix typo "CONFIG_ICSWX_PID" 2013-04-18 13:03:54 +10:00
icswx.h powerpc/icswx: Fix race condition with IPI setting ACOP 2012-03-07 17:06:09 +11:00
init_32.c Disintegrate asm/system.h for PowerPC 2012-03-28 18:30:02 +01:00
init_64.c powerpc/THP: Double the PMD table size for THP 2013-06-21 16:01:53 +10:00
Makefile powerpc/THP: Add code to handle HPTE faults for hugepages 2013-06-21 16:01:56 +10:00
mem.c powerpc: Make linux pagetable walk safe with THP enabled 2013-06-21 16:01:56 +10:00
mmap.c powerpc/mm: Make mmap_64.c compile on 32bit powerpc 2013-06-20 16:55:11 +10:00
mmu_context_hash32.c powerpc: include export.h for files using EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:30:38 -04:00
mmu_context_hash64.c powerpc: Reduce PTE table memory wastage 2013-04-30 16:00:07 +10:00
mmu_context_nohash.c powerpc/mm/nohash: Ignore NULL stale_map entries 2013-06-20 16:55:10 +10:00
mmu_decl.h powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map 2011-10-11 23:30:41 -05:00
numa.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-05-02 10:16:16 -07:00
pgtable_32.c Disintegrate asm/system.h for PowerPC 2012-03-28 18:30:02 +01:00
pgtable_64.c powerpc/THP: Implement transparent hugepages for ppc64 2013-06-21 16:01:53 +10:00
pgtable.c powerpc: Hugetlb for BookE 2011-09-20 09:19:40 +10:00
ppc_mmu_32.c memblock: Remove rmo_size, burry it in arch/powerpc where it belongs 2010-08-05 12:56:08 +10:00
slb_low.S powerpc: Rename USER_ESID_BITS* to ESID_BITS* 2013-03-17 12:45:44 +11:00
slb.c powerpc: Remove FW_FEATURE ISERIES from arch code 2012-03-21 11:16:11 +11:00
slice.c mm: use vm_unmapped_area() on powerpc architecture 2013-04-30 11:05:17 +10:00
stab.c powerpc/mm: Remove uses of abs_to_virt() and virt_to_abs() 2012-09-05 15:19:31 +10:00
subpage-prot.c powerpc/mm: Match variable types to API 2012-09-10 14:37:31 +10:00
tlb_hash32.c powerpc: include export.h for files using EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:30:38 -04:00
tlb_hash64.c powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte 2013-06-21 16:01:54 +10:00
tlb_low_64e.S powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int 2012-09-05 15:35:52 +10:00
tlb_nohash_low.S powerpc/47x: Use the new ppc-opcode infrastructure 2012-11-15 12:59:24 +11:00
tlb_nohash.c powerpc/fsl-booke: Support detection of page sizes on e6500 2013-03-05 17:10:27 -06:00