kernel_optimize_test/arch/x86
Henry Nestler b29c701dea x86: fix endless page faults in mount_block_root for Linux 2.6
Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-12 21:26:07 +02:00
..
boot x86: fix integer as NULL pointer warning 2008-05-23 08:11:06 -07:00
configs x86: add optimized inlining 2008-04-26 17:44:55 +02:00
crypto [CRYPTO] aes-x86-32: Remove unused return code 2008-04-21 10:19:21 +08:00
ia32 signals: x86 TS_RESTORE_SIGMASK 2008-04-30 08:29:37 -07:00
kernel geode: fix modular build 2008-06-12 21:25:51 +02:00
kvm namespacecheck: automated fixes 2008-05-23 14:08:06 +02:00
lguest lguest: fix ugly <NULL> in /proc/interrupts 2008-05-30 15:09:43 +10:00
lib x86: enable preemption in delay 2008-06-04 13:11:46 +02:00
mach-default
mach-es7000
mach-generic x86: coding style fixes to arch/x86/mach-generic/bigsmp.c 2008-04-17 17:40:48 +02:00
mach-rdc321x x86, rdc321x: remove watchdog file 2008-04-17 17:40:50 +02:00
mach-visws x86: fix compilation error in VisWS 2008-04-24 23:15:44 +02:00
mach-voyager x86, voyager: fix ioremap_nocache() 2008-04-30 23:15:34 +02:00
math-emu x86: fix broken math-emu with lazy allocation of fpu area 2008-06-04 13:11:46 +02:00
mm x86: fix endless page faults in mount_block_root for Linux 2.6 2008-06-12 21:26:07 +02:00
oprofile x86: oprofile: remove NR_CPUS arrays in arch/x86/oprofile/nmi_int.c 2008-04-19 19:44:58 +02:00
pci x86/PCI: add workaround for bug in ASUS A7V600 BIOS (rev 1005) 2008-06-05 15:32:15 -07:00
power x86: coding style fixes to arch/x86/power/cpu_32.c 2008-04-17 17:40:50 +02:00
vdso x86: use explicit copy in vdso_gettimeofday() 2008-05-23 14:08:06 +02:00
video x86: video/fbdev.c: add MODULE_LICENSE 2008-05-04 20:04:46 +02:00
xen x86/xen: fix arbitrary_virt_to_machine() 2008-05-23 14:08:06 +02:00
Kconfig PCI/x86: fix up PCI stuff so that PCI_GOANY supports OLPC 2008-06-05 14:29:25 -07:00
Kconfig.cpu x86: CONFIG_X86_ELAN fix 2008-04-30 23:15:35 +02:00
Kconfig.debug x86: fix CONFIG_NONPROMISC_DEVMEM prompt and help text 2008-06-05 14:21:45 -07:00
Makefile x86: add subarch support (for headers) to x86_64 2008-04-17 17:41:01 +02:00
Makefile_32.cpu