kernel_optimize_test/arch/x86
Steven Rostedt 40d6a14662 Kick CPUS that might be sleeping in cpus_idle_wait
Sometimes cpu_idle_wait gets stuck because it might miss CPUS that are
already in idle, have no tasks waiting to run and have no interrupts going
to them.  This is common on bootup when switching cpu idle governors.

This patch gives those CPUS that don't check in an IPI kick.

 Background:
 -----------
I notice this while developing the mcount patches, that every once in a
while the system would hang. Looking deeper, the hang was always at boot
up when registering init_menu of the cpu_idle menu governor. Talking
with Thomas Gliexner, we discovered that one of the CPUS had no timer
events scheduled for it and it was in idle (running with NO_HZ). So the
CPU would not set the cpu_idle_state bit.

Hitting sysrq-t a few times would eventually route the interrupt to the
stuck CPU and the system would continue.

Note, I would have used the PDA isidle but that is set after the
cpu_idle_state bit is cleared, and would leave a window open where we
may miss being kicked.

hmm, looking closer at this, we still have a small race window between
clearing the cpu_idle_state and disabling interrupts (hence the RFC).

    CPU0:                          CPU 1:
  ---------                       ---------
 cpu_idle_wait():                 cpu_idle():
      |                           __cpu_cpu_var(is_idle) = 1;
      |                           if (__get_cpu_var(cpu_idle_state)) /* == 0 */
 per_cpu(cpu_idle_state, 1) = 1;         |
 if (per_cpu(is_idle, 1)) /* == 1 */     |
 smp_call_function(1)                    |
      |                             receives ipi and runs do_nothing.
 wait on map == empty               idle();
   /* waits forever */

So really we need interrupts off for most of this then. One might think
that we could simply clear the cpu_idle_state from do_nothing, but I'm
assuming that cpu_idle governors can be removed, and this might cause a
race that a governor might be used after the module was removed.

Venki said:

  I think your RFC patch is the right solution here.  As I see it, there is
  no race with your RFC patch.  As long as you call a dummy smp_call_function
  on all CPUs, we should be OK.  We can get rid of cpu_idle_state and the
  current wait forever logic altogether with dummy smp_call_function.  And so
  there wont be any wait forever scenario.

  The whole point of cpu_idle_wait() is to make all CPUs come out of idle
  loop atleast once.  The caller will use cpu_idle_wait something like this.

  // Want to change idle handler

  - Switch global idle handler to always present default_idle

  - call cpu_idle_wait so that all cpus come out of idle for an instant
    and stop using old idle pointer and start using default idle

  - Change the idle handler to a new handler

  - optional cpu_idle_wait if you want all cpus to start using the new
    handler immediately.

Maybe the below 1s patch is safe bet for .24.  But for .25, I would say we
just replace all complicated logic by simple dummy smp_call_function and
remove cpu_idle_state altogether.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-14 08:52:22 -08:00
..
boot x86 setup: don't recalculate ss:esp unless really necessary 2007-11-28 18:17:17 -08:00
configs x86 gart: rename CONFIG_IOMMU to CONFIG_GART_IOMMU 2007-10-30 00:22:22 +01:00
crypto x86: merge arch/x86/crypto Makefiles 2007-10-23 22:37:23 +02:00
ia32 x86 - 32-bit ptrace emulation mishandles 6th arg 2007-11-10 04:30:36 +01:00
kernel Kick CPUS that might be sleeping in cpus_idle_wait 2008-01-14 08:52:22 -08:00
lguest lguest: prevent VISWS or VOYAGER randconfigs 2007-11-29 09:24:55 -08:00
lib x86: disable preemption in delay_tsc() 2007-11-14 18:45:44 -08:00
mach-default spelling fixes: arch/i386/ 2007-10-20 01:13:56 +02:00
mach-es7000 i386: es7000 minor cleanups 2007-10-17 20:16:15 +02:00
mach-generic spelling fixes: arch/i386/ 2007-10-20 01:13:56 +02:00
mach-visws [x86] remove uses of magic macros for boot_params access 2007-10-16 17:38:31 -07:00
mach-voyager x86: fix smp init sections 2007-11-17 16:27:00 +01:00
math-emu kbuild: fix up CFLAGS usage 2007-10-14 21:49:42 +02:00
mm memory hotplug x86_64: fix section mismatch in init_memory_mapping() 2007-11-29 09:24:54 -08:00
oprofile oprofile: op_model_athlon.c support for AMD family 10h barcelona performance counters 2007-12-18 18:05:58 +01:00
pci Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86 2007-11-26 19:41:28 -08:00
power i386: move power 2007-10-11 11:16:34 +02:00
vdso x86: ignore the sys_getcpu() tcache parameter 2007-11-17 16:27:00 +01:00
video i386: move video 2007-10-11 11:16:56 +02:00
xen xen: relax signature check 2007-12-10 19:46:58 -08:00
Kconfig Tiny clean-up of OPROFILE/KPROBES configuration 2007-12-06 09:41:12 -08:00
Kconfig.cpu x86: arch/x86/Kconfig.cpu unification 2007-11-12 21:02:19 +01:00
Kconfig.debug remove the dead X86_REMOTE_DEBUG option 2007-10-30 00:22:22 +01:00
Makefile x86: correctly set UTS_MACHINE for "make ARCH=x86" 2007-11-26 17:38:53 -08:00
Makefile_32 x86: do not use $(ARCH) when not needed 2007-11-12 21:02:20 +01:00
Makefile_32.cpu x86: move i386 and x86_64 Makefiles to arch/x86 2007-10-25 22:27:34 +02:00
Makefile_64 x86: do not use $(ARCH) when not needed 2007-11-12 21:02:20 +01:00