kernel_optimize_test/kernel
Thomas Gleixner ce31332d3c hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table statically
Sedat and Bruno reported RCU stalls which turned out to be caused by
the following;

sched_init() calls init_rt_bandwidth() which calls hrtimer_init()
_BEFORE_ hrtimers_init() is called. While not entirely correct this
worked because hrtimer_init() only accessed statically initialized
data (hrtimer_bases.clock_base[CLOCK_MONOTONIC])

Commit e06383db9 (hrtimers: extend hrtimer base code to handle more
then 2 clockids) added an indirection to the hrtimer_bases.clock_base
lookup to avoid gap handling in the hot path. The table which is used
for the translataion from CLOCK_ID to HRTIMER_BASE index is
initialized at runtime in hrtimers_init(). So the early call of the
scheduler code translates CLOCK_MONOTONIC to HRTIMER_BASE_REALTIME.

Thus the rt_bandwith timer ends up on CLOCK_REALTIME. If the timer is
armed and the wall clock time is set (e.g. ntpdate in the early boot
process - which also gives the problem deterministic behaviour
i.e. magic recovery after N hours), then the timer ends up with an
expiry time far into the future. That breaks the RT throttler
mechanism as rt runtime is accumulated and never cleared, so the rt
throttler detects a false cpu hog condition and blocks all RT tasks
until the timer finally expires. That in turn stalls the RCU thread of
TINYRCU which leads to an huge amount of RCU callbacks piling up.

Make the translation table statically initialized, so we are back to
the status of <= 2.6.39.

Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: John stultz <johnstul@us.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104282353140.3005%40ionos%3E
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-29 10:57:11 +02:00
..
debug Fix common misspellings 2011-03-31 11:26:23 -03:00
gcov Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2011-03-20 18:14:55 -07:00
irq Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-04-07 12:12:58 -07:00
power PM: Fix error code paths executed after failing syscore_suspend() 2011-04-18 23:58:59 +02:00
time posix clocks: Replace mutex with reader/writer semaphore 2011-04-18 10:39:38 +02:00
trace block: make unplug timer trace event correspond to the schedule() unplug 2011-04-16 13:51:05 +02:00
.gitignore
acct.c
async.c
audit_tree.c Fix common misspellings 2011-03-31 11:26:23 -03:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
audit.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
audit.h
auditfilter.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
auditsc.c Fix common misspellings 2011-03-31 11:26:23 -03:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c userns: make has_capability* into real functions 2011-03-23 19:47:06 -07:00
cgroup_freezer.c
cgroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
compat.c posix-timers: Introduce a syscall for clock tuning. 2011-02-02 15:28:19 +01:00
configs.c
cpu.c Fix common misspellings 2011-03-31 11:26:23 -03:00
cpuset.c cpuset: hold callback_mutex in cpuset_post_clone() 2011-03-23 19:46:35 -07:00
crash_dump.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
cred.c userns: security: make capabilities relative to the user namespace 2011-03-23 19:47:02 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Fix common misspellings 2011-03-31 11:26:23 -03:00
extable.c
fork.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
freezer.c Freezer: Fix a race during freezing of TASK_STOPPED tasks 2010-12-24 15:02:40 +01:00
futex_compat.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
futex.c futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup 2011-04-15 16:34:32 +02:00
groups.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
hrtimer.c hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table statically 2011-04-29 10:57:11 +02:00
hung_task.c
hw_breakpoint.c perf: Dynamic pmu types 2010-12-16 11:36:43 +01:00
irq_work.c irq_work: Use per cpu atomics instead of regular atomics 2010-12-18 15:54:48 +01:00
itimer.c
jump_label.c
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c PM: Add missing syscore_suspend() and syscore_resume() calls 2011-04-20 00:36:11 +02:00
kfifo.c
kmod.c
kprobes.c Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-01-07 17:02:58 -08:00
ksysfs.c
kthread.c Fix common misspellings 2011-03-31 11:26:23 -03:00
latencytop.c Fix common misspellings 2011-03-31 11:26:23 -03:00
lockdep_internals.h
lockdep_proc.c lockdep: Remove unused 'factor' variable from lockdep_stats_show() 2011-03-23 13:54:47 +01:00
lockdep_states.h
lockdep.c Fix common misspellings 2011-03-31 11:26:23 -03:00
Makefile crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
module.c Fix common misspellings 2011-03-31 11:26:23 -03:00
mutex-debug.c
mutex-debug.h
mutex.c Fix common misspellings 2011-03-31 11:26:23 -03:00
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
padata.c Fix common misspellings 2011-03-31 11:26:23 -03:00
panic.c move x86 specific oops=panic to generic code 2011-03-22 17:44:11 -07:00
params.c Fix common misspellings 2011-03-31 11:26:23 -03:00
perf_event.c perf_event: Fix cgrp event scheduling bug in perf_enable_on_exec() 2011-04-11 11:07:55 +02:00
pid_namespace.c pidns: call pid_ns_prepare_proc() from create_pid_namespace() 2011-03-23 19:46:58 -07:00
pid.c next_pidmap: fix overflow condition 2011-04-18 10:35:30 -07:00
pm_qos_params.c PM QoS: Make pm_qos settings readable 2011-03-15 00:43:18 +01:00
posix-cpu-timers.c Fix common misspellings 2011-03-31 11:26:23 -03:00
posix-timers.c Fix common misspellings 2011-03-31 11:26:23 -03:00
printk.c printk: allow setting DEFAULT_MESSAGE_LEVEL via Kconfig 2011-03-22 17:44:13 -07:00
profile.c
ptrace.c userns: allow ptrace from non-init user namespaces 2011-03-23 19:47:05 -07:00
range.c
rcupdate.c rcu: add comment saying why DEBUG_OBJECTS_RCU_HEAD depends on PREEMPT. 2011-03-04 08:05:41 -08:00
rcutiny_plugin.h rcu: call __rcu_read_unlock() in exit_rcu for tiny RCU 2011-03-04 08:05:08 -08:00
rcutiny.c rcu: avoid pointless blocked-task warnings 2011-01-14 04:58:08 -08:00
rcutorture.c rcutorture: Get rid of duplicate sched.h include 2011-03-04 08:05:17 -08:00
rcutree_plugin.h rcu: increase synchronize_sched_expedited() batching 2010-12-17 12:34:08 -08:00
rcutree_trace.c rcu,cleanup: simplify the code when cpu is dying 2010-11-29 22:01:58 -08:00
rcutree.c Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-01-07 17:02:58 -08:00
rcutree.h rcu: limit rcu_node leaf-level fanout 2010-12-17 12:34:20 -08:00
relay.c
res_counter.c memcg: res_counter_read_u64(): fix potential races on 32-bit machines 2011-03-23 19:46:22 -07:00
resource.c resources: add arch hook for preventing allocation in reserved areas 2010-12-17 10:01:09 -08:00
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.h
rtmutex-tester.c rtmutex: tester: Remove the remaining BKL leftovers 2011-02-22 22:07:22 +01:00
rtmutex.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex.h
rwsem.c
sched_autogroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_autogroup.h sched, autogroup: Stop going ahead if autogroup is disabled 2011-02-23 11:33:59 +01:00
sched_clock.c sched: Add some clock info to sched_debug 2010-11-23 10:29:08 +01:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: Use a buddy to implement yield_task_fair() 2011-02-03 14:20:33 +01:00
sched_fair.c sched: Fix erroneous all_pinned logic 2011-04-11 11:08:54 +02:00
sched_features.h
sched_idletask.c sched, doc: Update sched-design-CFS.txt 2011-03-23 14:09:41 +01:00
sched_rt.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_stats.h
sched_stoptask.c sched, doc: Update sched-design-CFS.txt 2011-03-23 14:09:41 +01:00
sched.c block: let io_schedule() flush the plug inline 2011-04-16 13:27:55 +02:00
seccomp.c
semaphore.c
signal.c signal.c: fix erroneous syscall kernel-doc 2011-04-08 11:05:24 -07:00
smp.c smp: move smp setup functions to kernel/smp.c 2011-03-22 17:44:11 -07:00
softirq.c Fix common misspellings 2011-03-31 11:26:23 -03:00
spinlock.c
srcu.c rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status 2011-01-14 04:56:49 -08:00
stacktrace.c
stop_machine.c kthread: use kthread_create_on_node() 2011-03-22 17:44:01 -07:00
sys_ni.c vfs: Add open by file handle support 2011-03-15 02:21:44 -04:00
sys.c userns: user namespaces: convert all capable checks in kernel/sys.c 2011-03-23 19:47:06 -07:00
sysctl_binary.c open-style analog of vfs_path_lookup() 2011-03-14 09:15:28 -04:00
sysctl_check.c sysctl_check: drop dead code 2011-03-23 19:46:51 -07:00
sysctl.c sysctl: restrict write access to dmesg_restrict 2011-03-23 19:46:54 -07:00
taskstats.c taskstats: use appropriate printk priority level 2011-03-23 19:47:14 -07:00
test_kprobes.c
time.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 18:53:35 -07:00
timeconst.pl
timer.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 18:53:35 -07:00
tracepoint.c tracepoints: Fix section alignment using pointer array 2011-02-03 09:28:46 -05:00
tsacct.c
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c
user_namespace.c user_ns: improve the user_ns on-the-slab packaging 2011-01-13 08:03:18 -08:00
user-return-notifier.c Fix common misspellings 2011-03-31 11:26:23 -03:00
user.c userns: add a user_namespace as creator/owner of uts_namespace 2011-03-23 19:46:59 -07:00
utsname_sysctl.c
utsname.c userns: allow sethostname in a container 2011-03-23 19:47:03 -07:00
wait.c Fix common misspellings 2011-03-31 11:26:23 -03:00
watchdog.c kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events 2011-03-22 17:44:12 -07:00
workqueue_sched.h
workqueue.c Fix common misspellings 2011-03-31 11:26:23 -03:00