If a physical device exposed to the OS by hpsa
is replaced (e.g. one hot plug tape drive is replaced
by another, or a tape drive is placed into "OBDR" mode
in which it acts like a CD-ROM device) and a rescan is
initiated, the replaced device will be added to the
SCSI midlayer with target and lun numbers set to -1.
After that, a panic is likely to ensue. When a physical
device is replaced, the lun and target number should be
preserved.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The test to detect OBDR ("One Button Disaster Recovery")
cd-rom devices was comparing against uninitialized data.
Fixed by moving the test for the device to where the
inquiry data is collected, and uninitialized variable
altogether as it wasn't really being used.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (71 commits)
[SCSI] fcoe: cleanup cpu selection for incoming requests
[SCSI] fcoe: add fip retry to avoid missing critical keep alive
[SCSI] libfc: fix warn on in lport retry
[SCSI] libfc: Remove the reference to FCP packet from scsi_cmnd in case of error
[SCSI] libfc: cleanup sending SRR request
[SCSI] libfc: two minor changes in comments
[SCSI] libfc, fcoe: ignore rx frame with wrong xid info
[SCSI] libfc: release exchg cache
[SCSI] libfc: use FC_MAX_ERROR_CNT
[SCSI] fcoe: remove unused ptype field in fcoe_rcv_info
[SCSI] bnx2fc: Update copyright and bump version to 1.0.4
[SCSI] bnx2fc: Tx BDs cache in write tasks
[SCSI] bnx2fc: Do not arm CQ when there are no CQEs
[SCSI] bnx2fc: hold tgt lock when calling cmd_release
[SCSI] bnx2fc: Enable support for sequence level error recovery
[SCSI] bnx2fc: HSI changes for tape
[SCSI] bnx2fc: Handle REC_TOV error code from firmware
[SCSI] bnx2fc: REC/SRR link service request and response handling
[SCSI] bnx2fc: Support 'sequence cleanup' task
[SCSI] dh_rdac: Associate HBA and storage in rdac_controller to support partitions in storage
...
In a shared SAS setup, target devices may be reset by one of
several hosts, and outstanding commands on that device will be
completed to corresponding hosts with status of UNSOLICITED_ABORT.
Such commands should be retried instead of being treated as i/o
errors. Also fixed a nearby spelling error.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This memcpy:
memcpy(cmd->sense_buffer, ei->SenseInfo,
ei->SenseLen > SCSI_SENSE_BUFFERSIZE ?
SCSI_SENSE_BUFFERSIZE :
ei->SenseLen);
The ei->SenseLen field is filled in by the Smart Array. For requests to
logical drives, it will not exceed 32 bytes, so should be ok, but for physical
requests it depends on the target device, not the Smart Array. It's conceivable
that this could exceed the 32 byte size of ei->SenseInfo. In that case, the memcpy
would read past the end of ei->SenseInfo, copying data from the next command,
as if it were sense data, or, if it happened to be the very last command in the
block of allocated commands, could fall off the end of the allocated area and
crash. I'm not aware of anyone ever encountering this behavior, but it could
conceivably happen. This bug was found by Coverity.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Not at all sure this is correct or appropriate to change,
but this seems odd.
Found via coccinelle script
@@
type T;
T* ptr;
expression E1;
@@
* memset(E1, 0, sizeof(ptr));
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Scott Teel <scott.stacy.teel@hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Just go straight to the soft-reset method instead.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
on driver load, if reset_devices is set, and the hard reset
attempts fail, try to bring up the controller to the point that
a command can be sent, and send it a soft reset command, then
after the reset undo whatever driver initialization was done to get
it to the point to take a command, and re-do it after the reset.
This is to get kdump to work on all the "non-resettable" controllers
(except 64xx controllers which can't be reset due to the potentially
shared cache module.)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The bit-2-doorbell reset method seemed to cause (survivable) NMIs
on some systems and (unsurvivable) IOCK NMIs on some G7 servers.
Firmware guys implemented a new doorbell method to alleviate these
problems triggered by bit 5 of the doorbell register. We want to
use it if it's available.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
hpsa_scsi_setup at one time contained enough code to justify
its existence, but that time has passed.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When waiting for the board to become "not ready"
don't print a message saying "waiting for board to
become ready" (possibly followed by a message saying
"failed waiting for board to become not ready". Instead,
it should be "waiting for board to reset" and "failed
waiting for board to reset."
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Detect failure of controller reset by noticing if the 32 bytes of
"driver version" we store on the hardware in the config table
fail to get zeroed out. Previously we noticed if the controller
did not transition to "simple mode", but this did not detect reset
failure if the controller was already in simple mode prior to
the reset attempt (e.g. due to module parameter hpsa_simple_mode=1).
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This attribute, requested by Redhat, allows kexec-tools to know
whether the controller can honor the reset_devices kernel parameter
and actually reset the controller. For kdump to work properly it
is necessary that the reset_devices parameter be honored. This
attribute enables kexec-tools to warn the user if they attempt to
designate a non-resettable controller as the dump device.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
My first attempt was botched, got the wrong PCI Device ID
(used PCI_DEVICE_ID_HP_CISSE, should have been PCI_DEVICE_ID_HP_CISSF)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
'!' has higher precedence than '&'. CFGTBL_ChangeReq is 0x1 so the
original code is equivelent to if (!doorbell_value) {...
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
We can get completions left over from before the attempted reset which
will interfere with the kdump. Better to just not make the attempt in
that case.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Controller will transfer only 32-bits on completion if it
knows we are only using 32-bit tags. Also, some newer controllers
apparently (and erroneously) require that we only use 32-bit tags,
and that we inform the controller of this.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
It's not enough to simple avoid putting the board into performant
mode, as we have to set up the interrupts differently, etc. When
I originally tested this module parameter, I tested it incorrectly
without realizing it, and the driver was running in performant mode
the whole time unbeknownst to me.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Driver's internal queues should be FIFO, not LIFO.
This is a port of an almost identical patch from cciss by Jens Axboe.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
memset arg64 to zero in the passthrough ioctls to avoid leaking contents
of kernel stack memory to userland via uninitialized padding fields
inserted by the compiler for alignment reasons.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Thanks to Scott Teel for noticing this.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Need to take the lock while accessing the register to check to
see if config table changes have taken effect.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This is to prevent hpsa from resetting older boards
which the cciss driver may be controlling.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This is to conserve memory in a memory-limited kdump scenario
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
and use the doorbell reset method if available (which doesn't
lock up the controller if you properly save and restore all
the PCI registers that you're supposed to.)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
After a reset, we should first wait for the board to become "not ready",
and then wait for it to become "ready", instead of immediately
waiting for it to become "ready", and do this waiting *after*
restoring PCI config space registers. Also, only wait 10 secs
for board to become "not ready" after a reset (it should quickly
become not ready.)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
They are defined in hpsa_cmd.h
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Some low bits might have been set by the driver, causing
a message like this to come out:
[ 13.288062] ------------[ cut here ]------------
[ 13.293211] WARNING: at lib/dma-debug.c:803 check_unmap+0x1a1/0x654()
[ 13.300387] Hardware name: ProLiant DL180 G6
[ 13.305335] hpsa 0000:06:00.0: DMA-API: device driver tries to free
DMA memory it has not allocated [device address=0x000000007f81e001]
[size=640 bytes]
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Otherwise, after doing a RAID level migration, the disk will be
disruptively removed and re-added as a different disk on rescan.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The firmware may have been updated, in which case, it's the same device,
and in that case, we do not want to remove and add the device, we want to
let it continue as is.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
PCI_DEVICE_ID_CISSF is defined as 323b in pci_ids.h but redefined as 3fff in
hpsa.c. The ID of 3fff will _never_ ship as a standalone controller. It is
intended only as part a complete storage solution. As such, this patch
removes the redefinition and the StorageWorks P1210m from the product table.
It also removes a duplicate line for the "unknown" controller support.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.
Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the mid-layer's ->queuecommand() invocation from being locked
with the host lock to being unlocked to facilitate speeding up the
critical path for drivers who don't need this lock taken anyway.
The patch below presents a simple SCSI host lock push-down as an
equivalent transformation. No locking or other behavior should change
with this patch. All existing bugs and locking orders are preserved.
Additionally, add one parameter to queuecommand,
struct Scsi_Host *
and remove one parameter from queuecommand,
void (*done)(struct scsi_cmnd *)
Scsi_Host* is a convenient pointer that most host drivers need anyway,
and 'done' is redundant to struct scsi_cmnd->scsi_done.
Minimal code disturbance was attempted with this change. Most drivers
needed only two one-line modifications for their host lock push-down.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The doorbell reset initially appears to work correctly,
the controller resets, comes up, some i/o can even be
done, but on at least some Smart Arrays in some servers,
it eventually causes a subsequent controller lockup due
to some kind of PCIe error, and kdump can end up leaving
the root filesystem in an unbootable state. For this
reason, until the problem is fixed, or at least isolated
to certain hardware enough to be avoided, the doorbell
reset should not be used at all.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Some controllers might try to tell us they support 0 commands
in performant mode. This is a lie told by buggy firmware.
We have to be wary of this lest we try to allocate a negative
number of command blocks, which will be treated as unsigned,
and get an out of memory condition.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There are things which need to be done in the intx
interrupt handler which do not need to be done in
the msi/msix interrupt handler, like checking that
the interrupt is actually for us, and checking that the
interrupt pending bit on the hardware is set (which we
weren't previously doing at all, which means old controllers
wouldn't work), so it makes sense to separate these into
two functions.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>