kernel_optimize_test/kernel
Linus Torvalds e56b3bc794 cpu masks: optimize and clean up cpumask_of_cpu()
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.

Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.

In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.

So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).

And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:

  static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
  {
          const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
          p -= cpu / BITS_PER_LONG;
          return (const cpumask_t *)p;
  }

This brings other advantages and simplifications as well:

 - we are not wasting memory that is just filled with a single bit in
   various different places

 - we don't need all those games to re-create the arrays in some dense
   format, because they're already going to be dense enough.

if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).

[ mingo@elte.hu:

  Converted Linus's mails into a commit. See:

     http://lkml.org/lkml/2008/7/27/156
     http://lkml.org/lkml/2008/7/28/320

  Also applied a family filter - which also has the side-effect of leaving
  out the bits where Linus calls me an idio... Oh, never mind ;-)
]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 22:20:41 +02:00
..
irq use WARN() in kernel/irq/chip.c 2008-07-26 12:00:07 -07:00
power kexec jump: save/restore device state 2008-07-26 12:00:04 -07:00
time cpumask: change cpumask_of_cpu_ptr to use new cpumask_of_cpu 2008-07-26 16:40:33 +02:00
trace Merge branch 'linus' into cpus4096 2008-07-28 21:14:43 +02:00
.gitignore
acct.c bsdacct: fix and add comments around acct_process() 2008-07-25 10:53:47 -07:00
audit_tree.c
audit.c
audit.h
auditfilter.c
auditsc.c x86_64 syscall audit fast-path 2008-07-23 17:47:32 -07:00
backtracetest.c
bounds.c
capability.c security: filesystem capabilities refactor kernel code 2008-07-24 10:47:22 -07:00
cgroup_debug.c
cgroup.c [PATCH] get rid of indirect users of namei.h 2008-07-26 20:53:42 -04:00
compat.c
configs.c
cpu.c cpu masks: optimize and clean up cpumask_of_cpu() 2008-07-28 22:20:41 +02:00
cpuset.c cpuset: two minor code-cleanups 2008-07-25 10:53:38 -07:00
delayacct.c per-task-delay-accounting: update taskstats for memory reclaim delay 2008-07-25 10:53:47 -07:00
dma.c
exec_domain.c [PATCH] kill altroot 2008-07-26 20:53:20 -04:00
exit.c task IO accounting: improve code readability 2008-07-27 09:58:20 -07:00
extable.c
fork.c task IO accounting: improve code readability 2008-07-27 09:58:20 -07:00
futex_compat.c
futex.c
hrtimer.c Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
itimer.c
kallsyms.c kallsyms: fix potential overflow in binary search 2008-07-25 10:53:27 -07:00
Kconfig.hz sched: fix hrtick & generic-ipi dependency 2008-07-23 11:18:28 +02:00
Kconfig.preempt
kexec.c kexec jump: save/restore device state 2008-07-26 12:00:04 -07:00
kfifo.c
kgdb.c
kmod.c call_usermodehelper(): increase reliability 2008-07-25 10:53:28 -07:00
kprobes.c kprobes: remove redundant config check 2008-07-25 10:53:30 -07:00
ksysfs.c
kthread.c tracehook: wait_task_inactive 2008-07-26 12:00:09 -07:00
latencytop.c
lockdep_internals.h
lockdep_proc.c
lockdep.c Merge branch 'core/locking' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-14 14:55:13 -07:00
Makefile build kernel/profile.o only when requested 2008-07-25 10:53:27 -07:00
marker.c markers: use rcu_barrier_sched() and call_rcu_sched() 2008-07-25 10:53:45 -07:00
module.c modules: Take a shortcut for checking if an address is in a module 2008-07-22 19:24:28 +10:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
ns_cgroup.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
nsproxy.c cgroup_clone: use pid of newly created task for new cgroup 2008-07-25 10:53:37 -07:00
panic.c Add a WARN() macro; this is WARN_ON() + printk arguments 2008-07-25 10:53:29 -07:00
params.c
pid_namespace.c bsdacct: switch from global bsd_acct_struct instance to per-pidns one 2008-07-25 10:53:47 -07:00
pid.c pidns: remove now unused find_pid function. 2008-07-25 10:53:45 -07:00
pm_qos_params.c
posix-cpu-timers.c
posix-timers.c posix timers: release_posix_timer: kill the bogus put_task_struct(->it_process); 2008-07-25 10:53:38 -07:00
printk.c printk ratelimiting rewrite 2008-07-25 10:53:29 -07:00
profile.c build kernel/profile.o only when requested 2008-07-25 10:53:27 -07:00
ptrace.c tracehook: wait_task_inactive 2008-07-26 12:00:09 -07:00
rcuclassic.c Merge branch 'linus' into cpus4096 2008-07-16 00:29:07 +02:00
rcupdate.c Merge branch 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-15 14:12:03 -07:00
rcupreempt_trace.c
rcupreempt.c Merge branch 'linus' into cpus4096 2008-07-16 00:29:07 +02:00
rcutorture.c
relay.c relay: add buffer-only channels; useful for early logging 2008-07-26 12:00:04 -07:00
res_counter.c cgroup files: convert res_counter_write() to be a cgroups write_string() handler 2008-07-25 10:53:36 -07:00
resource.c
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c sysdev: Pass the attribute to the low level sysdev show/store function 2008-07-21 21:55:02 -07:00
rtmutex.c
rtmutex.h
rwsem.c
sched_clock.c Merge branch 'sched/clock' into sched/devel 2008-07-14 12:19:13 +02:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c
sched_fair.c Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-23 19:36:53 -07:00
sched_features.h
sched_idletask.c
sched_rt.c Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-07-24 12:53:51 -07:00
sched_stats.h
sched.c tracehook: wait_task_inactive 2008-07-26 12:00:09 -07:00
seccomp.c
semaphore.c
signal.c tracehook: force signal_pending() 2008-07-26 12:00:09 -07:00
smp.c Full conversion to early_initcall() interface, remove old interface 2008-07-26 12:00:04 -07:00
softirq.c Full conversion to early_initcall() interface, remove old interface 2008-07-26 12:00:04 -07:00
softlockup.c Full conversion to early_initcall() interface, remove old interface 2008-07-26 12:00:04 -07:00
spinlock.c
srcu.c
stacktrace.c
stop_machine.c cpumask: change cpumask_of_cpu_ptr to use new cpumask_of_cpu 2008-07-26 16:40:33 +02:00
sys_ni.c signalfd: fix undefined reference to `compat_sys_signalfd4' when !CONFIG_SIGNALFD 2008-07-25 11:35:41 -07:00
sys.c kexec jump 2008-07-26 12:00:04 -07:00
sysctl_check.c sysctl: check for bogus modes 2008-07-25 10:53:45 -07:00
sysctl.c lost sysctl fix 2008-07-27 09:45:34 -07:00
taskstats.c taskstats: remove initialization of static per-cpu variable 2008-07-25 10:53:47 -07:00
test_kprobes.c
time.c
timeconst.pl
timer.c Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2008-07-14 16:06:58 -07:00
tsacct.c task IO accounting: move all IO statistics in struct task_io_accounting 2008-07-27 16:12:28 -07:00
uid16.c
user_namespace.c
user.c
utsname_sysctl.c
utsname.c
wait.c
workqueue.c workqueues: do CPU_UP_CANCELED if CPU_UP_PREPARE fails 2008-07-25 10:53:41 -07:00