kernel_optimize_test/drivers
Matt Domsch 6b4b78fed4 PCI: optionally sort device lists breadth-first
Problem:
New Dell PowerEdge servers have 2 embedded ethernet ports, which are
labeled NIC1 and NIC2 on the chassis, in the BIOS setup screens, and
in the printed documentation.  Assuming no other add-in ethernet ports
in the system, Linux 2.4 kernels name these eth0 and eth1
respectively.  Many people have come to expect this naming.  Linux 2.6
kernels name these eth1 and eth0 respectively (backwards from
expectations).  I also have reports that various Sun and HP servers
have similar behavior.


Root cause:
Linux 2.4 kernels walk the pci_devices list, which happens to be
sorted in breadth-first order (or pcbios_find_device order on i386,
which most often is breadth-first also).  2.6 kernels have both the
pci_devices list and the pci_bus_type.klist_devices list, the latter
is what is walked at driver load time to match the pci_id tables; this
klist happens to be in depth-first order.

On systems where, for physical routing reasons, NIC1 appears on a
lower bus number than NIC2, but NIC2's bridge is discovered first in
the depth-first ordering, NIC2 will be discovered before NIC1.  If the
list were sorted breadth-first, NIC1 would be discovered before NIC2.

A PowerEdge 1955 system has the following topology which easily
exhibits the difference between depth-first and breadth-first device
lists.

-[0000:00]-+-00.0  Intel Corporation 5000P Chipset Memory Controller Hub
           +-02.0-[0000:03-08]--+-00.0-[0000:04-07]--+-00.0-[0000:05-06]----00.0-[0000:06]----00.0  Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC2, 2.4 kernel name eth1, 2.6 kernel name eth0)
           +-1c.0-[0000:01-02]----00.0-[0000:02]----00.0  Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC1, 2.4 kernel name eth0, 2.6 kernel name eth1)


Other factors, such as device driver load order and the presence of
PCI slots at various points in the bus hierarchy further complicate
this problem; I'm not trying to solve those here, just restore the
device order, and thus basic behavior, that 2.4 kernels had.


Solution:

The solution can come in multiple steps.

Suggested fix #1: kernel
Patch below optionally sorts the two device lists into breadth-first
ordering to maintain compatibility with 2.4 kernels.  It adds two new
command line options:
  pci=bfsort
  pci=nobfsort
to force the sort order, or not, as you wish.  It also adds DMI checks
for the specific Dell systems which exhibit "backwards" ordering, to
make them "right".


Suggested fix #2: udev rules from userland
Many people also have the expectation that embedded NICs are always
discovered before add-in NICs (which this patch does not try to do).
Using the PCI IRQ Routing Table provided by system BIOS, it's easy to
determine which PCI devices are embedded, or if add-in, which PCI slot
they're in.  I'm working on a tool that would allow udev to name
ethernet devices in ascending embedded, slot 1 .. slot N order,
subsort by PCI bus/dev/fn breadth-first.  It'll be possible to use it
independent of udev as well for those distributions that don't use
udev in their installers.

Suggested fix #3: system board routing rules
One can constrain the system board layout to put NIC1 ahead of NIC2
regardless of breadth-first or depth-first discovery order.  This adds
a significant level of complexity to board routing, and may not be
possible in all instances (witness the above systems from several
major manufacturers).  I don't want to encourage this particular train
of thought too far, at the expense of not doing #1 or #2 above.


Feedback appreciated.  Patch tested on a Dell PowerEdge 1955 blade
with 2.6.18.

You'll also note I took some liberty and temporarily break the klist
abstraction to simplify and speed up the sort algorithm.  I think
that's both safe and appropriate in this instance.


Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-10-18 11:36:12 -07:00
..
acorn IRQ: Maintain regs pointer globally rather than passing to IRQ handlers 2006-10-05 15:10:12 +01:00
acpi [PATCH] acpi_processor_latency_notifier(): UP warning fix 2006-10-17 08:18:44 -07:00
amba
ata Merge branch 'master' into upstream-fixes 2006-10-11 04:59:46 -04:00
atm Various drivers' irq handlers: kill dead code, needless casts 2006-10-06 15:00:58 -04:00
base [PATCH] hot-add-mem x86_64: use CONFIG_MEMORY_HOTPLUG_SPARSE 2006-10-01 00:39:18 -07:00
block [PATCH] rd: memory leak on rd_init() failure 2006-10-17 08:18:48 -07:00
bluetooth [Bluetooth] Use work queue to trigger URB submission 2006-10-15 23:14:35 -07:00
cdrom [PATCH] cdrom: add endianness annotations 2006-10-10 16:15:33 -07:00
char Merge git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2006-10-17 08:56:43 -07:00
clocksource [PATCH] scx200_hrt: fix precedence bug manifesting as 27x clock in 1 MHz mode 2006-10-04 07:55:14 -07:00
connector
cpufreq [PATCH] cpufreq: make the transition_notifier chain use SRCU 2006-10-04 07:55:30 -07:00
crypto
dio
dma [PATCH] drivers/dma trivial annotations 2006-10-10 15:37:21 -07:00
edac
eisa [PATCH] EISA: handle sysfs errors 2006-10-11 11:14:25 -07:00
fc4 IRQ: Maintain regs pointer globally rather than passing to IRQ handlers 2006-10-05 15:10:12 +01:00
firmware [PATCH] firmware/efivars: handle error 2006-10-11 11:14:25 -07:00
hwmon Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
i2c [POWERPC] Fix i2c-powermac platform device usage 2006-10-10 13:56:13 +10:00
ide [PATCH] ioc4: Enable build on non-SN2 2006-10-17 08:18:42 -07:00
ieee1394 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 2006-10-08 12:28:41 -07:00
infiniband IB/mthca: Use mmiowb after doorbell ring 2006-10-16 20:22:35 -07:00
input Merge master.kernel.org:/home/rmk/linux-2.6-arm 2006-10-17 14:46:31 -07:00
isdn [PATCH] ISDN: check for userspace copy faults 2006-10-17 08:18:49 -07:00
leds [PATCH] drivers/led: handle sysfs errors 2006-10-17 08:18:46 -07:00
macintosh [POWERPC] Fix windfarm platform device usage 2006-10-10 13:56:13 +10:00
mca [PATCH] drivers/mca: handle sysfs errors 2006-10-11 11:14:25 -07:00
md [PATCH] md: fix /proc/mdstat refcounting 2006-10-17 08:18:43 -07:00
media V4L/DVB (4750): AGC command1/2 is board specific 2006-10-14 00:44:29 -03:00
message [PATCH] I2O: handle a few sysfs errors 2006-10-17 08:18:46 -07:00
mfd IRQ: Maintain regs pointer globally rather than passing to IRQ handlers 2006-10-05 15:10:12 +01:00
misc [PATCH] ioc4: Enable build on non-SN2 2006-10-17 08:18:42 -07:00
mmc [PATCH] passing pointer to setup_timer() should be via unsigned long 2006-10-10 15:37:22 -07:00
mtd [PATCH] mtd: remove several bogus casts to void * in iounmap() argument 2006-10-10 15:37:22 -07:00
net [SPARC]: Fix some section mismatch warnings in sparc drivers. 2006-10-17 19:28:51 -07:00
nubus
oprofile
parisc Build fixes for struct pt_regs removal 2006-10-06 20:47:23 -06:00
parport [PATCH] sparc32 pt_regs fixes 2006-10-08 12:32:35 -07:00
pci PCI: optionally sort device lists breadth-first 2006-10-18 11:36:12 -07:00
pcmcia Merge branch 'irqclean-submit1' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6 2006-10-09 14:21:45 -07:00
pnp Fix DMA resource allocation in ACPIPnP 2006-10-18 11:36:11 -07:00
rapidio Fix several typos in drivers/ 2006-10-03 22:31:37 +02:00
rtc [PATCH] rtc: fix printk of 64-bit res on 32-bit platform 2006-10-17 08:18:47 -07:00
s390 [S390] cio: remove casts from/to (void *). 2006-10-11 15:31:47 +02:00
sbus [SPARC] {bbc_,}envctrl: Use call_usermodehelper(). 2006-10-17 19:28:52 -07:00
scsi [SPARC]: Fix some section mismatch warnings in sparc drivers. 2006-10-17 19:28:51 -07:00
serial [PATCH] ioc4: Enable build on non-SN2 2006-10-17 08:18:42 -07:00
sh
sn [PATCH] ioc4: Enable build on non-SN2 2006-10-17 08:18:42 -07:00
spi Various drivers' irq handlers: kill dead code, needless casts 2006-10-06 15:00:58 -04:00
tc [MIPS] Fix DECserial build error by IRQ hander change 2006-10-08 02:38:28 +01:00
telephony
usb Fix USB gadget net2280.c compile 2006-10-17 18:03:33 -07:00
video [PATCH] revert "nvidiafb: use generic ddc reading" 2006-10-11 11:14:14 -07:00
w1 [PATCH] w1 kconfig fix 2006-10-17 08:18:44 -07:00
zorro
Kconfig [PATCH] ioc4: Enable build on non-SN2 2006-10-17 08:18:42 -07:00
Makefile