kernel_optimize_test/arch/arm64/include/asm
Lorenzo Pieralisi 976d7d3f79 arm64: kernel: build MPIDR_EL1 hash function data structure
On ARM64 SMP systems, cores are identified by their MPIDR_EL1 register.
The MPIDR_EL1 guidelines in the ARM ARM do not provide strict enforcement of
MPIDR_EL1 layout, only recommendations that, if followed, split the MPIDR_EL1
on ARM 64 bit platforms in four affinity levels. In multi-cluster
systems like big.LITTLE, if the affinity guidelines are followed, the
MPIDR_EL1 can not be considered a linear index. This means that the
association between logical CPU in the kernel and the HW CPU identifier
becomes somewhat more complicated requiring methods like hashing to
associate a given MPIDR_EL1 to a CPU logical index, in order for the look-up
to be carried out in an efficient and scalable way.

This patch provides a function in the kernel that starting from the
cpu_logical_map, implement collision-free hashing of MPIDR_EL1 values by
checking all significative bits of MPIDR_EL1 affinity level bitfields.
The hashing can then be carried out through bits shifting and ORing; the
resulting hash algorithm is a collision-free though not minimal hash that can
be executed with few assembly instructions. The mpidr_el1 is filtered through a
mpidr mask that is built by checking all bits that toggle in the set of
MPIDR_EL1s corresponding to possible CPUs. Bits that do not toggle do not
carry information so they do not contribute to the resulting hash.

Pseudo code:

/* check all bits that toggle, so they are required */
for (i = 1, mpidr_el1_mask = 0; i < num_possible_cpus(); i++)
	mpidr_el1_mask |= (cpu_logical_map(i) ^ cpu_logical_map(0));

/*
 * Build shifts to be applied to aff0, aff1, aff2, aff3 values to hash the
 * mpidr_el1
 * fls() returns the last bit set in a word, 0 if none
 * ffs() returns the first bit set in a word, 0 if none
 */
fs0 = mpidr_el1_mask[7:0] ? ffs(mpidr_el1_mask[7:0]) - 1 : 0;
fs1 = mpidr_el1_mask[15:8] ? ffs(mpidr_el1_mask[15:8]) - 1 : 0;
fs2 = mpidr_el1_mask[23:16] ? ffs(mpidr_el1_mask[23:16]) - 1 : 0;
fs3 = mpidr_el1_mask[39:32] ? ffs(mpidr_el1_mask[39:32]) - 1 : 0;
ls0 = fls(mpidr_el1_mask[7:0]);
ls1 = fls(mpidr_el1_mask[15:8]);
ls2 = fls(mpidr_el1_mask[23:16]);
ls3 = fls(mpidr_el1_mask[39:32]);
bits0 = ls0 - fs0;
bits1 = ls1 - fs1;
bits2 = ls2 - fs2;
bits3 = ls3 - fs3;
aff0_shift = fs0;
aff1_shift = 8 + fs1 - bits0;
aff2_shift = 16 + fs2 - (bits0 + bits1);
aff3_shift = 32 + fs3 - (bits0 + bits1 + bits2);
u32 hash(u64 mpidr_el1) {
	u32 l[4];
	u64 mpidr_el1_masked = mpidr_el1 & mpidr_el1_mask;
	l[0] = mpidr_el1_masked & 0xff;
	l[1] = mpidr_el1_masked & 0xff00;
	l[2] = mpidr_el1_masked & 0xff0000;
	l[3] = mpidr_el1_masked & 0xff00000000;
	return (l[0] >> aff0_shift | l[1] >> aff1_shift | l[2] >> aff2_shift |
		l[3] >> aff3_shift);
}

The hashing algorithm relies on the inherent properties set in the ARM ARM
recommendations for the MPIDR_EL1. Exotic configurations, where for instance
the MPIDR_EL1 values at a given affinity level have large holes, can end up
requiring big hash tables since the compression of values that can be achieved
through shifting is somewhat crippled when holes are present. Kernel warns if
the number of buckets of the resulting hash table exceeds the number of
possible CPUs by a factor of 4, which is a symptom of a very sparse HW
MPIDR_EL1 configuration.

The hash algorithm is quite simple and can easily be implemented in assembly
code, to be used in code paths where the kernel virtual address space is
not set-up (ie cpu_resume) and instruction and data fetches are strongly
ordered so code must be compact and must carry out few data accesses.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-16 17:17:30 +00:00
..
xen xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device 2013-10-25 10:39:49 +00:00
arch_timer.h ARM64: arch_timer: add support to configure and enable event stream 2013-09-26 09:47:43 +01:00
asm-offsets.h
assembler.h arm64: asm: add CPU_LE & CPU_BE assembler helpers 2013-10-25 15:59:38 +01:00
atomic.h ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h" 2013-11-09 00:00:13 +00:00
barrier.h arm64: Miscellaneous header files 2012-09-17 13:42:21 +01:00
bitops.h arm64: klib: Optimised atomic bitops 2013-03-21 17:39:31 +00:00
cache.h arm64: Cache maintenance routines 2012-09-17 13:42:00 +01:00
cacheflush.h arm64: Remove __flush_dcache_page() 2013-06-07 17:58:30 +01:00
cachetype.h arm64: Cache maintenance routines 2012-09-17 13:42:00 +01:00
cmpxchg.h arm64: cmpxchg: implement cmpxchg64_relaxed 2013-10-24 15:46:35 +01:00
compat.h arm64: compat: add support for big-endian (BE8) AArch32 binaries 2013-10-25 15:59:35 +01:00
compiler.h arm64: Miscellaneous header files 2012-09-17 13:42:21 +01:00
cpu_ops.h arm64: add CPU_HOTPLUG infrastructure 2013-10-25 11:33:21 +01:00
cputable.h arm64: CPU support 2012-09-17 13:41:59 +01:00
cputype.h arm64: kernel: add MPIDR_EL1 accessors macros 2013-12-16 17:17:29 +00:00
debug-monitors.h arm64: add '#ifdef CONFIG_COMPAT' for aarch32_break_handler() 2013-07-19 15:49:43 +01:00
device.h arm64: device: add iommu pointer to device archdata 2013-06-11 18:15:55 +01:00
dma-mapping.h arm64/xen: get_dma_ops: return xen_dma_ops if we are running as xen_initial_domain 2013-10-18 16:01:32 +00:00
elf.h arm64: compat: add support for big-endian (BE8) AArch32 binaries 2013-10-25 15:59:35 +01:00
esr.h arm64: add explicit symbols to ESR_EL1 decoding 2013-04-17 15:58:25 +01:00
exception.h arm64: Use irqchip_init() for interrupt controller initialisation 2013-03-26 16:02:23 +00:00
exec.h arm64: Miscellaneous header files 2012-09-17 13:42:21 +01:00
fb.h arm64: Device specific operations 2012-09-17 13:42:04 +01:00
fpsimd.h arm64: elf: fix core dumping definitions for GP and FP registers 2012-11-08 16:06:20 +00:00
fpsimdmacros.h arm64: move FP-SIMD save/restore code to a macro 2012-12-05 11:26:50 +00:00
futex.h arm64: atomics: fix grossly inconsistent asm constraints for exclusives 2013-02-11 18:16:41 +00:00
hardirq.h arm64: Use irqchip_init() for interrupt controller initialisation 2013-03-26 16:02:23 +00:00
hugetlb.h ARM64: mm: HugeTLB support. 2013-06-14 09:52:40 +01:00
hw_breakpoint.h arm64: Debugging support 2012-09-17 13:42:14 +01:00
hwcap.h ARM64: arch_timer: add support to configure and enable event stream 2013-09-26 09:47:43 +01:00
hypervisor.h arm64/xen: introduce asm/xen header files on arm64 2013-06-07 10:39:45 +00:00
io.h arm64: Fix memory shareability attribute for ioremap_wc/cache 2013-12-06 17:21:52 +00:00
irq.h arm64: add CPU_HOTPLUG infrastructure 2013-10-25 11:33:21 +01:00
irqflags.h arm64: Unmask asynchronous aborts when in kernel mode 2013-11-25 16:44:05 +00:00
Kbuild sched, arch: Create asm/preempt.h 2013-09-25 14:07:50 +02:00
kvm_arm.h arm64: KVM: Yield CPU when vcpu executes a WFE 2013-10-29 18:25:25 +00:00
kvm_asm.h arm64: KVM: perform save/restore of PAR_EL1 2013-08-09 13:19:28 +01:00
kvm_coproc.h arm64: KVM: 32bit handling of coprocessor traps 2013-06-12 16:42:16 +01:00
kvm_emulate.h A handful of fixes for KVM/arm64: 2013-11-11 12:05:20 +01:00
kvm_host.h Updates for KVM/ARM including cpu=host and Cortex-A7 support 2013-10-16 15:30:32 +03:00
kvm_mmio.h arm64: KVM: MMIO access backend 2013-06-07 14:03:38 +01:00
kvm_mmu.h KVM: ARM: Support hugetlbfs backed huge pages 2013-10-17 17:06:20 -07:00
kvm_psci.h arm64: KVM: PSCI implementation 2013-06-12 16:40:32 +01:00
linkage.h arm64: fix alignment padding in assembly code 2012-10-20 11:12:01 +01:00
memblock.h arm64: MMU initialisation 2012-09-17 13:41:56 +01:00
memory.h arm64: Use 42-bit address space with 64K pages 2013-11-05 17:23:52 +00:00
mmu_context.h arm64: mm: don't bother invalidating the icache in switch_mm 2013-06-07 18:00:11 +01:00
mmu.h arm64: Add simple earlyprintk support 2013-01-22 17:51:01 +00:00
module.h arm64: Loadable modules 2012-09-17 13:42:19 +01:00
neon.h arm64: add support for kernel mode NEON 2013-08-20 12:12:26 +01:00
page.h arm64: MMU fault handling and page table management 2012-09-17 13:41:57 +01:00
perf_event.h arm64: perf: add guest vs host discrimination 2013-01-29 16:56:17 +00:00
pgalloc.h arm64: handle pgtable_page_ctor() fail 2013-11-15 09:32:16 +09:00
pgtable-2level-hwdef.h arm64: Use 42-bit address space with 64K pages 2013-11-05 17:23:52 +00:00
pgtable-2level-types.h ARM64: include: asm: include "asm/types.h" in "pgtable-2level-types.h" and "pgtable-3level-types.h" 2013-08-22 11:44:41 +01:00
pgtable-3level-hwdef.h arm64: MMU definitions 2012-09-17 13:41:56 +01:00
pgtable-3level-types.h ARM64: include: asm: include "asm/types.h" in "pgtable-2level-types.h" and "pgtable-3level-types.h" 2013-08-22 11:44:41 +01:00
pgtable-hwdef.h arm64: mm: Fix PMD_SECT_PROT_NONE definition 2013-12-06 17:22:44 +00:00
pgtable.h arm64: Move PTE_PROT_NONE higher up 2013-11-29 15:22:59 +00:00
pmu.h arm64: Performance counters support 2012-09-17 13:42:17 +01:00
proc-fns.h arm64: CPU support 2012-09-17 13:41:59 +01:00
processor.h arm64: compat: add support for big-endian (BE8) AArch32 binaries 2013-10-25 15:59:35 +01:00
psci.h arm64: unify smp_psci.c and psci.c 2013-10-25 11:33:19 +01:00
ptrace.h arm64: compat: add support for big-endian (BE8) AArch32 binaries 2013-10-25 15:59:35 +01:00
shmparam.h arm64: ELF definitions 2012-09-17 13:42:07 +01:00
sigcontext.h UAPI: (Scripted) Disintegrate arch/arm64/include/asm 2012-10-11 11:05:13 +01:00
signal32.h arm64: 32-bit (compat) applications support 2012-09-17 13:42:12 +01:00
smp_plat.h arm64: kernel: build MPIDR_EL1 hash function data structure 2013-12-16 17:17:30 +00:00
smp.h arm64: add CPU_HOTPLUG infrastructure 2013-10-25 11:33:21 +01:00
sparsemem.h arm64: MMU definitions 2012-09-17 13:41:56 +01:00
spinlock_types.h arm64: Fix the endianness of arch_spinlock_t 2013-10-25 16:10:22 +01:00
spinlock.h arm64: lockref: add support for lockless lockrefs using cmpxchg 2013-10-24 15:46:34 +01:00
stacktrace.h arm64: Exception handling 2012-09-17 10:24:46 +01:00
stat.h UAPI: (Scripted) Disintegrate arch/arm64/include/asm 2012-10-11 11:05:13 +01:00
string.h arm64: klib: Optimised string functions 2013-03-21 17:39:30 +00:00
sync_bitops.h arm64/xen: introduce asm/xen header files on arm64 2013-06-07 10:39:45 +00:00
syscall.h arm64: check for number of arguments in syscall_get/set_arguments() 2013-10-23 15:45:35 +01:00
syscalls.h arm64: switch to generic sigaltstack 2013-02-14 09:17:29 -05:00
system_misc.h arm64: use common reboot infrastructure 2013-07-19 15:57:08 +01:00
thread_info.h preempt: Make PREEMPT_ACTIVE generic 2013-11-13 20:21:47 +01:00
timex.h arm64: kernel: compiling issue, need delete read_current_timer() 2013-06-10 17:58:20 +01:00
tlb.h Fix TLB gather virtual address range invalidation corner cases 2013-08-16 08:52:46 -07:00
tlbflush.h ARM64: mm: THP support. 2013-06-14 09:52:41 +01:00
traps.h arm64: Exception handling 2012-09-17 10:24:46 +01:00
uaccess.h arm64: avoid multiple evaluation of ptr in get_user/put_user() 2013-09-25 16:42:21 +01:00
ucontext.h arm64: fix padding computation in struct ucontext 2013-03-18 10:42:16 +00:00
unistd32.h unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
unistd.h burying unused conditionals 2013-02-14 09:21:15 -05:00
vdso_datapage.h arm64: VDSO support 2012-09-17 13:42:09 +01:00
vdso.h arm64: VDSO support 2012-09-17 13:42:09 +01:00
virt.h arm64: head: create a new function for setting the boot_cpu_mode flag 2013-10-25 15:59:39 +01:00