While testing kernel on machine with "irqpoll" option I've caught such a
lockup:
__do_IRQ()
spin_lock(&desc->lock);
desc->chip->ack(); /* IRQ is ACKed */
note_interrupt()
misrouted_irq()
handle_IRQ_event()
if (...)
local_irq_enable_in_hardirq();
/* interrupts are enabled from now */
...
__do_IRQ() /* same IRQ we've started from */
spin_lock(&desc->lock); /* LOCKUP */
Looking at misrouted_irq() code I've found that a potential deadlock like
this can also take place:
1CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = A */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = B */
if (desc->status & IRQ_INPROGRESS) {
2CPU:
__do_IRQ()
spin_lock(&desc->lock); /* irq = B */
misrouted_irq()
for (i = 1; i < NR_IRQS; i++) {
spin_lock(&desc->lock); /* irq = A */
if (desc->status & IRQ_INPROGRESS) {
As the second lock on both CPUs is taken before checking that this irq is
being handled in another processor this may cause a deadlock. This issue
is only theoretical.
I propose the attached patch to fix booth problems: when trying to handle
misrouted IRQ active desc->lock may be unlocked.
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On running the Stress Test on machine for more than 72 hours following
error message was observed.
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
pc: c000000000060d90: .dup_fd+0x240/0x39c
lr: c000000000060d6c: .dup_fd+0x21c/0x39c
sp: c00000007ce2fa70
msr: 800000000000b032
dar: ffffffff00000028
dsisr: 40000000
current = 0xc000000074950980
paca = 0xc000000000454500
pid = 27330, comm = bash
0:mon> t
[c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
[c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
[c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
[c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
[c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
[c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
The problem is because of race window. When if(expand) block is executed in
dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be
modified. So actual open_files in oldf may not match with open_files
variable.
Cc: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If you call set_personality() with an expression such as:
set_personality(foo ? PERS_FOO1 : PERS_FOO2);
then this evaluates to:
((current->personality == foo ? PERS_FOO1 : PERS_FOO2) ? ...
which is obviously not the intended result. Add the missing parents
to ensure this gets evaluated as expected:
((current->personality == (foo ? PERS_FOO1 : PERS_FOO2)) ? ...
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix improper use of "&&" when "&" was intended.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix MSPEC driver to build for non SN2 enabled configs as the driver should
work in cached and uncached modes (no fetchop) on these systems. In
addition make MSPEC select IA64_UNCACHED_ALLOCATOR, which is required for
it and move it to arch/ia64/Kconfig to avoid warnings on non ia64
architectures running allmodconfig. Once the Kconfig code is fixed, we can
move it back.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Cc: Fernando Luis Vzquez Cao <fernando@oss.ntt.co.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The PCI sysfs attributes are created after the initial PCI bus scan. With
the addition of more return value checking and assertions in the device and
sysfs layers we now can get dumps like this on sparc64:
[ 20.135032] Call Trace:
[ 20.135042] [0000000000537f88] pci_remove_bus_device+0x30/0xc0
[ 20.135076] [000000000078f890] pci_fill_in_pbm_cookies+0x98/0x440
[ 20.135109] [000000000042e828] sabre_scan_bus+0x230/0x400
[ 20.135139] [000000000078c710] pcibios_init+0x58/0xa0
[ 20.135159] [0000000000416f14] init+0x9c/0x2e0
[ 20.135190] [0000000000417a50] kernel_thread+0x38/0x60
[ 20.135211] [0000000000417170] rest_init+0x18/0x40
[ 20.135514] PCI0(PBMB): Bus running at 33MHz
It's triggering because removal of the "config" PCI sysfs file for the
device fails.
On sparc64, after probing the device, we'll delete the PCI device via
pci_remove_bus_device() if we cannot find the firmware device tree node
corresponding to it.
This is fine, but at this point the sysfs files for the PCI device won't be
setup yet.
So we should not try to do anything in pci_remove_sysfs_dev_files() if
pci_sysfs_init() has not run yet.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- reorder 'struct vm_struct' to speedup lookups on CPUS with small cache
lines. The fields 'next,addr,size' should be now in the same cache line,
to speedup lookups.
- One minor cleanup in __get_vm_area_node()
- Bugfixes in vmalloc_user() and vmalloc_32_user() NULL returns from
__vmalloc() and __find_vm_area() were not tested.
[akpm@osdl.org: remove redundant BUG_ONs]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 6264d69d7d modified the nfsd_create()
error handling in such a way that nfsd_create will usually return
nfserr_perm even when succesful, if the export has the async export option.
This introduced a regression that could cause mkdir() to always return a
permissions error, even though the directory in question was actually
succesfully created.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric's changes to the htirq infrastructure require corresponding
modifications to the ipath HT driver code so that interrupts are still
delivered properly.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a variant of ht_create_irq __ht_create_irq that takes an
aditional parameter update that is a function that is called whenever we want
to write to a drivers htirq configuration registers.
This is needed to support the ipath_iba6110 because it's registers in the
proper location are not actually conected to the hardware that controlls
interrupt delivery.
[bos@serpentine.com: fixes]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <olson@pathscale.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This refactoring actually optimizes the code a little by caching the value
that we think the device is programmed with instead of reading it back from
the hardware. Which simplifies the code a little and should speed things up a
bit.
This patch introduces the concept of a ht_irq_msg and modifies the
architecture read/write routines to update this code.
There is a minor consistency fix here as well as x86_64 forgot to initialize
the htirq as masked.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Bryan O'Sullivan <bos@pathscale.com>
Cc: <olson@pathscale.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some more errors from the IPMI send message command are retryable, but are not
being retried by the IPMI code. Make sure they get retried.
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Frederic Lelievre <Frederic.Lelievre@ca.kontron.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A wrong function was being used to free a list; this fixes the problem.
Otherwise, an oops at unload time was possible. But not likely, since you
can't have any users when you unload the modules and it is very hard to get
messages into this queue without users.
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Patrick Schoeller <Patrick.Schoeller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The basic issue is that despite have been deprecated and warned about as a
very bad thing in the man pages since its inception there are a few real
users of sys_sysctl. It was my assumption that because sysctl had been
deprecated for all of 2.6 there would be no user space users by this point,
so I initially gave sys_sysctl a very short deprecation period.
Now that I know there are a few real users the only sane way to proceed
with deprecation is to push the time limit out to a year or two work and
work with distributions that have big testing pools like fedora core to
find these last remaining users.
Which means that the sys_sysctl interface needs to be maintained in the
meantime.
Since I have provided a technical measure that allows us to add new sysctl
entries without reserving more binary numbers I believe that is enough to
fix the sys_sysctl binary interface maintenance problems, because there is
no longer a need to change the binary interface at all.
Since the sys_sysctl implementation needs to stay around for a while and
the worst of the maintenance issues that caused us to occasionally break
the ABI have been addressed I don't see any advantage in continuing with
the removal of sys_sysctl.
So instead of merely increasing the deprecation period this patch removes
the deprecation of sys_sysctl and modifies the kernel to compile the code
in by default.
With committing to maintain sys_sysctl we get all of the advantages of a
fast interface for anything that needs it. Currently sys_sysctl is about
5x faster than /proc/sys, for the same string data.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When ACPI && NUMA, pxm_to_node is used and it exists in drivers/acpi/numa.c
Tony said:
The patch makes sense ... if you pick both of "ACPI" and "NUMA", then you
need (and should automatically be given) ACPI_NUMA too.
The only open question is whether there is a better way of getting there.
Perhaps with less configuration options in the first place? We are heading
towards a future where so many systems will be NUMA that there would seem to
be little benefit in keeping ACPI_NUMA separate from ACPI ... but perhaps
we aren't quite there yet.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are two bugs in the kretprobe-booster.
1) It doesn't make room for gs registers.
2) It doesn't change status of the current kprobe. This status will
effect the fault handling.
This patch fixes these bugs and, additionally, saves skipped registers for
compatibility with the original kretprobe.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If there's a swap file on a software RAID, it should be possible to use this
file for saving the swsusp's suspend image. Also, this file should be
available to the memory management subsystem when memory is being freed before
the suspend image is created.
For the above reasons it seems that md_threads should not be frozen during the
suspend and the appended patch makes this happen, but then there is the
question if they don't cause any data to be written to disks after the suspend
image has been created, provided that all filesystems are frozen at that time.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I forgot to has the size-in-blocks to (loff_t) before shifting up to a
size-in-bytes.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It turns out that CHANGE is preferred to ONLINE/OFFLINE for various reasons
(not least of which being that udev understands it already).
So remove the recently added KOBJ_OFFLINE (no-one is likely to care anyway)
and change the ONLINE to a CHANGE event
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The Coverity checker noted that in
drivers/telephony/ixj.c:ixj_build_filter_cadence(), filter_en[4] or
filter_en[5] could be written to.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All device-mapper targets must complete outstanding I/O before suspending.
The mirror target generates I/O in its recovery phase and fails to wait for
it. It needs to be tracked so we can ensure that it has completed before we
suspend.
[akpm@osdl.org: cleanup]
Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <dm-devel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When adding paths to the round-robin path selector, their order gets inverted,
which is not desirable.
Fix by replacing list_add() with list_add_tail().
Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <dm-devel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the device is already suspended, just return the error and skip the code
that would incorrectly wipe md->suspended_bdev.
(This isn't currently a problem because existing code avoids calling this
function if the device is already suspended.)
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <dm-devel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a race between dev_create() and find_device().
If the mdptr has not yet been stored against a device, find_device() needs to
behave as though no device was found. It already returns NULL, but there is a
dm_put() missing: it must drop the reference dm_get_md() took.
The bug was introduced by dm-fix-mapped-device-ref-counting.patch.
It manifests itself if another dm ioctl attempts to reference a newly-created
device while the device creation ioctl is still running. The consequence is
that the device cannot be removed until the machine is rebooted. Certain udev
configurations can lead to this happening.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <dm-devel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o Currently there is no specific alignment restriction in linker script
and in some cases it can be placed non 4K aligned addresses. This fails
kexec which checks that segment to be loaded is page aligned.
o I guess, it does not harm data segment to be 4K aligned.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the case where an open creates the file, we shouldn't be rechecking
permissions to open the file; the open succeeds regardless of what the new
file's mode bits say.
This patch fixes the problem, but only by introducing yet another parameter
to nfsd_create_v3. This is ugly. This will be fixed by later patches.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor rearrangement, cleanup of do_open_lookup(). No change in behavior.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_mb() is used by set_current_state() which needs mb(), not wmb(). I
think it would be right to assume that set_mb() implies mb(), all arches
seem to do just this.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the microcode driver is built in (rather than module) there are some,
ehm, interesting effects happening due to the new "call out to userspace"
behavior that is introduced.. and which runs too early. The result is a
boot hang; which is really nasty.
The patch below is a minimally safe patch to fix this regression for 2.6.19
by just not requesting actual microcode updates during early boot. (That
is a good idea in general anyway)
The "real" fix is a lot more complex given the entire cpu hotplug scenario
(during cpu hotplug you normally need to load the microcode as well); but
the interactions for that are just really messy at this point; this fix at
least makes it work and avoids a full detangle of hotplug.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the x86-64 version of f9dadfa71b
that did the same thing on i386.
Since the "mask" bit is in the low word, when we write a new entry, we
need to write the high word first, before we potentially unmask it.
The exception is when we actually want to mask the interrupt, in which
case we want to write the low word first to make sure that the high word
doesn't change while the interrupt routing is still active.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is just commit 130fe05dbc ported to
x86-64, for all the same reasons. It cleans up the IO-APIC accesses in
order to then fix the ordering issues.
We move the accessor functions (that were only used by io_apic.c) out of
a header file, and use proper memory-mapped accesses rather than making
up our own "volatile" pointers.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit de09bddb9d. It tried
to reserve the MMCONFIG mmio memory ranges, but since the MMCONFIG
information is broken and often bogus (which is why we don't dare use it
most of the time _anyway_), it does more harm than good.
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here are some fixes to endianess problems spotted by Al Viro.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proper upper limits for the loops and check for all error
conditions.
The problem was noticed by Adrian Bunk.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since pskb_copy tacks on the non-linear bits from the original
skb, it needs to count them in the truesize field of the new skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we can hit paths that (legally) do multiple deletes on the
same node and OOPS with the HLIST poison values there instead of
NULL.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes consideration of high memory when determining TCP
hash table sizes. Taking into account high memory results in tcp_mem
values that are too large.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Patrick McHardy <kaber@trash.net> suggested, Traffic Shaper is now
obsolete and alternative to it is no longer CBQ, since its problems with
virtual devices, alter Kconfig text to reflect this -- put a link to the
traffic schedulers as a whole.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>