1832686 Members
2904 Online
110043 Solutions
New Discussion

DS2300 - SCSI Reset

 
Mike Lynch_8
Occasional Contributor

DS2300 - SCSI Reset

Hi All

I am having problems with our HP 2300 Disk system. It holds 14 disks and is split between 2 servers, both running HPUX 11.0. One side of the array works fine. The other side reports scsi resets whenever we do any serious amount
of I/O to certain disks. I'm pretty certain its not a disk failure as it either sorts itself out or returns to normal after a reboot.
Also, this error was seen a few months ago. At that time we needed to replace the disks anyway to increase storage.
We hoped that the new disks would also solve the scsi reset problem. Obviously not.

The problem occurs on the "right hand side" of the array ie. disks 7 - 14
Of the 7 disks in question we have 2 Oracle databases installed.
Disks 8 to 10 hold the standby database.
Disk 11 stores archive logs.
Disks 12 to 14 hold a cold copy of the standby database.

Problems occur when we do anything heavy with disks 12 - 14 i.e. scsi resets followed by inaccessible PV's etc.
Disk 13 and esepcecially disk 14 report problems.

How do I begin troubleshooting something like this ? Could the differing firmware versions be causing a probem?

Please see below for ioscan and a sample of the errors.

Thanks

Mike



Of the IO scan below disks 0.8.0 to 0.14.0 are in the DS 2300.

Class I H/W Path Driver S/W State H/W Type Description
======================================================================
disk 0 0/0/1/1.0.0 sdisk CLAIMED DEVICE HP 73.4GST373453LC
/dev/dsk/c1t0d0 /dev/rdsk/c1t0d0
disk 1 0/0/1/1.2.0 sdisk CLAIMED DEVICE HP 73.4GST373453LC
/dev/dsk/c1t2d0 /dev/rdsk/c1t2d0
disk 2 0/0/2/0.0.0 sdisk CLAIMED DEVICE HP 73.4GST373453LC
/dev/dsk/c2t0d0 /dev/rdsk/c2t0d0
disk 32 0/0/2/0.2.0 sdisk CLAIMED DEVICE HP 73.4GST373453LC
/dev/dsk/c2t2d0 /dev/rdsk/c2t2d0
disk 3 0/4/0/0.8.0 sdisk CLAIMED DEVICE COMPAQ BD146863B3 Firmware:HPB8
/dev/dsk/c4t8d0 /dev/rdsk/c4t8d0
disk 5 0/4/0/0.9.0 sdisk CLAIMED DEVICE HP 146 GST3146807LC Firmware:HPC5
/dev/dsk/c4t9d0 /dev/rdsk/c4t9d0
disk 6 0/4/0/0.10.0 sdisk CLAIMED DEVICE HP 146 GST3146807LC Firmware:HPC5
/dev/dsk/c4t10d0 /dev/rdsk/c4t10d0
disk 7 0/4/0/0.11.0 sdisk CLAIMED DEVICE HP 146 GMAP3147NC Firmware:HPC6
/dev/dsk/c4t11d0 /dev/rdsk/c4t11d0
disk 8 0/4/0/0.12.0 sdisk CLAIMED DEVICE HP 146 GST3146707LC Firmware:HPC1
/dev/dsk/c4t12d0 /dev/rdsk/c4t12d0
disk 9 0/4/0/0.13.0 sdisk CLAIMED DEVICE HP 146 GST3146707LC Firmware:HPC1
/dev/dsk/c4t13d0 /dev/rdsk/c4t13d0
disk 10 0/4/0/0.14.0 sdisk CLAIMED DEVICE HP 146 GST3146707LC Firmware:HPC1
/dev/dsk/c4t14d0 /dev/rdsk/c4t14d0

Here is a sample of the errors:

We get a hundreds of errors like this (the "lbolt" number varies)
Aug 8 12:58:53 hpback vmunix: SCSI: Reset detected -- lbolt: 16722, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Resetting SCSI -- lbolt: 18022, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Reset detected -- lbolt: 18022, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Resetting SCSI -- lbolt: 19322, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Reset detected -- lbolt: 19322, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Resetting SCSI -- lbolt: 20622, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Reset detected -- lbolt: 20622, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Resetting SCSI -- lbolt: 21922, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Reset detected -- lbolt: 21922, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Resetting SCSI -- lbolt: 23222, bus: 3
Aug 8 12:58:53 hpback vmunix: SCSI: Reset detected -- lbolt: 23222, bus: 3
...................
.....................


Here is an example of an error affecting a particular disk:

Aug 10 08:44:04 hpback vmunix: SCSI: Unexpected Disconnect -- lbolt: 15802265, dev: 1f04e000, io_id: 41d8b72
Aug 10 08:44:05 hpback vmunix: SCSI Gross Error on 0/4/0/0:
Aug 10 03:43:59 hpback : getty: cannot open "ttyd0p2". errno: 6
Aug 10 08:44:05 hpback above message repeats 11 times
Aug 10 08:44:05 hpback vmunix: shadowed SIST0 00 shadowed SIST1 04.
Aug 10 08:44:05 hpback vmunix: SCSI: isrEscape Controller at 0/4/0/0.
Aug 10 08:44:05 hpback vmunix:
Aug 10 08:44:05 hpback vmunix: SCSI: -- lbolt: 15802412, dev: 1f04e000
Aug 10 08:44:05 hpback vmunix: lbp->state: 30008
Aug 10 08:44:05 hpback vmunix: lbp->offset: ffffffff
Aug 10 08:44:05 hpback vmunix: lbp->nominalOffset: 360
Aug 10 08:44:05 hpback vmunix: lbp->Cmdindex: 5
Aug 10 08:44:05 hpback vmunix: lbp->last_nexus_index: 13
Aug 10 08:44:05 hpback vmunix: lbp->nexus_index: 14
Aug 10 08:44:05 hpback vmunix: uCmdSent: 1900dd40 uNexus_offset: cd650
Aug 10 08:44:05 hpback vmunix: last lbp->puStatus [0000000040335630]:
Aug 10 08:44:05 hpback vmunix: 00030078 00030078 00030057 00030057
Aug 10 08:44:05 hpback vmunix: next lbp->puStatus [0000000040335640]:
Aug 10 08:44:05 hpback vmunix: 00030078 00030078 00030057 00030078
Aug 10 08:44:05 hpback vmunix: From most recent interrupt:
Aug 10 08:44:05 hpback vmunix: ISTAT: 0a, SIST0: 48, SIST1: 00, DSTAT: 00, DSPS: 00000000
Aug 10 08:44:05 hpback vmunix: lsp: 0x000000004ca78900
Aug 10 08:44:05 hpback vmunix: bp->b_dev: 1f04e000
Aug 10 08:44:05 hpback vmunix: scb->io_id: 41d8b77
Aug 10 08:44:05 hpback vmunix: scb->cdb: 2a 00 01 1e a8 00 00 00 40 00
Aug 10 08:44:05 hpback vmunix: lbolt_at_timeout: 0, lbolt_at_start: 0
Aug 10 08:44:05 hpback vmunix: lsp->state: 5
Aug 10 08:44:05 hpback vmunix: Jump Table entry [fffffffffa7fcf68]: 0000006b fa7fc288
Aug 10 08:44:05 hpback vmunix: lsp->puScript [0000000040331100]:
Aug 10 08:44:05 hpback vmunix: 08008000 00270000 00000000 78370000
Aug 10 08:44:05 hpback vmunix: 00000000 80080000 fa7fc288 00000000
Aug 10 08:44:05 hpback vmunix: DSAtbl->host_iocb_index: 5
Aug 10 08:44:05 hpback vmunix: DSAtbl->host_iocb_addr: cde80
Aug 10 08:44:05 hpback vmunix: stored scratcha: 0xff07006b
Aug 10 08:44:05 hpback vmunix: scratch_lsp: 0x000000004ca78900
Aug 10 08:44:05 hpback vmunix: c8xx_iocb [fffffffffa7fcb00]:
Aug 10 08:44:05 hpback vmunix: 1a00de80 ff00006b 000c9100 9f0e1f80
Aug 10 08:44:05 hpback vmunix: 00000003 000cde60 0000000a 000cde68
Aug 10 08:44:05 hpback vmunix: Pre-DSP script dump [fffffffffa7fc220]:
Aug 10 08:44:05 hpback vmunix: 74358000 00000000 80840000 00000650
Aug 10 08:44:05 hpback vmunix: f1640004 00000008 7a5c0100 00000000
Aug 10 08:44:05 hpback vmunix: Script dump [fffffffffa7fc240]:
Aug 10 08:44:05 hpback vmunix: 1e000000 00000010 878b0000 000002a8
Aug 10 08:44:05 hpback vmunix: 1a000000 00000018 e27c0004 000c8700
Aug 10 08:44:05 hpback vmunix: NCR chip register dump for: 0x400200a
Aug 10 08:44:05 hpback vmunix: 00: SCNTL3: 9f SCNTL2: 80 SCNTL1: 10 SCNTL0: da
Aug 10 08:44:05 hpback vmunix: 04: GPREG: 0a SDID: 0e SXFER: 1f SCID: 47
Aug 10 08:44:05 hpback vmunix: 08: SBCL: ae SSID: 8e SOCL: 0e SFBR: 00
Aug 10 08:44:05 hpback vmunix: 0c: SSTAT2: 08 SSTAT1: 06 SSTAT0: 01 DSTAT: 00
Aug 10 08:44:05 hpback vmunix: 10: DSA: fa7fcb00
Aug 10 08:44:05 hpback vmunix: 14: MBOX1: 00 MBOX0: 00 ISTAT1: 00 ISTAT: 08
Aug 10 08:44:05 hpback vmunix: 1c: TEMP: 000c9040
Aug 10 08:44:05 hpback vmunix: 24: DCMDDBC: 1e000000
Aug 10 08:44:05 hpback vmunix: 28: DNAD: 000cde68
Aug 10 08:44:05 hpback vmunix: 2c: DSP: fa7fc248
Aug 10 08:44:05 hpback vmunix: 30: DSPS: 000cde60
Aug 10 08:44:05 hpback vmunix: 34: SCRATCHA: ff07006b
Aug 10 08:44:05 hpback vmunix: 38: DCNTL: a1 DWT: 00 DIEN: 7f DMODE: 4c
Aug 10 08:44:05 hpback vmunix: 3c: ADDER: fa8ca0a8
Aug 10 08:44:05 hpback vmunix: 40: SIST1: 00 SIST0: 00 SIEN1: 97 SIEN0: 8f
Aug 10 08:44:05 hpback vmunix: 44: GPCNTL: 2f MACNTL: 00 SWIDE: 00 SLPAR: 00
Aug 10 08:44:05 hpback vmunix: 48: RESPID1: 00 RESPID0: 80 STIME1: 00 STIME0: fc
Aug 10 08:44:05 hpback vmunix: 4c: STEST3: 80 STEST2: 00 STEST1: 0c STEST0: 76
Aug 10 08:44:05 hpback vmunix: 50: RESV50: 00 RESV51: c0 SIDL1: 00 SIDL0: 57
Aug 10 08:44:05 hpback vmunix: 54: CCNTL1: 01 CCNTL0: 01 SODL1: 00 SODL0: 00
Aug 10 08:44:05 hpback vmunix: 58: RESV58: 00 RESV59: 00 SBDL1: 00 SBDL0: 00
Aug 10 08:44:05 hpback vmunix: 5c: SCRATCHB: 000e0001
Aug 10 08:44:05 hpback vmunix: 60: SCRATCHC: c0ffffff
Aug 10 08:44:05 hpback vmunix: 64: SCRATCHD: 000c9100
Aug 10 08:44:05 hpback vmunix: 68: SCRATCHE: fa7fcfd4
Aug 10 08:44:05 hpback vmunix: 6c: SCRATCHF: 000c8f00
Aug 10 08:44:05 hpback vmunix: 70: SCRATCHG: 9f0e1f80
Aug 10 08:44:05 hpback vmunix: 74: SCRATCHH: 000cd650
Aug 10 08:44:05 hpback vmunix: 78: SCRATCHI: 09819f1f
Aug 10 08:44:05 hpback vmunix: 7c: SCRATCHJ: 1a00de80
Aug 10 08:44:05 hpback vmunix: bc: SCNTL4: 80
Aug 10 08:44:05 hpback vmunix: PCI configuration register dump:
Aug 10 08:44:05 hpback vmunix: Command: 0157
Aug 10 08:44:05 hpback vmunix: Latency Timer: ff
Aug 10 08:44:05 hpback vmunix: Cache Line Size: 10
Aug 10 08:44:06 hpback vmunix:
Aug 10 08:44:06 hpback vmunix: SCSI: Resetting SCSI -- lbolt: 15802512, bus: 4 path: 0/4/0/0
Aug 10 08:44:06 hpback vmunix: SCSI: Reset detected -- lbolt: 15802512, bus: 4 path: 0/4/0/0
Aug 10 08:44:10 hpback EMS [2120]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/def
ault/0_4_0_0.14.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 13893636
2 -r /storage/events/disks/default/0_4_0_0.14.0 -n 138936330 -a
Aug 10 08:44:06 hpback vmunix:
Aug 10 08:44:11 hpback EMS [2120]: ------ EMS Event Notification ------ Value: "SERIOUS (4)" for Resource: "/storage/events/disks/default/
0_4_0_0.14.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 138936362 -r
/storage/events/disks/default/0_4_0_0.14.0 -n 138936331 -a


When I execute the EMS command I get the following report:



ARCHIVED MONITOR DATA:

Event Time..........: Thu Aug 10 08:44:09 2006
Severity............: MAJORWARNING
Monitor.............: disk_em
Event #.............: 100091
System..............: hpback

Summary:
Disk at hardware path 0/4/0/0.14.0 : Software configuration error


Description of Error:

The device is in a condition where it requires action on the part of the
device driver or a human operator.

Probable Cause / Recommended Action:

The device has been reset by a Bus Device Reset message, a hard reset
condition, or a power-on reset.

If this is the case, no action is necessary.

Alternatively, a removable medium has been loaded or replaced.

If this is the case, no action is necessary.

Alternatively, the mode parameters, microcode, or inquiry data for the
device have been changed.

If this is the case, no action is necessary.

Alternatively, the installed version of the device driver does not match
that of the installed version of HP-UX. Install the correct version of the
driver.

Additional Event Data:
System IP Address...: 192.168.0.120
Event Id............: 0x44dae3ca00000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_disk_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
0x44dae3c900000001
Additional System Data:
System Model Number.............: 9000/800/L2000-44
OS Version......................: B.11.00
STM Version.....................: A.44.00
EMS Version.....................: A.03.20
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/scsi.htm#100091

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v



Component Data:
Physical Device Path...: 0/4/0/0.14.0
Device Class...........: Disk
Inquiry Vendor ID......: HP 146 G
Inquiry Product ID.....: ST3146707LC
Firmware Version.......: HPC1
Serial Number..........: 3KS2VJ5A00007617BRW7

Product/Device Identification Information:

Logger ID.........: sdisk
Product Identifier: SCSI Disk
Product Qualifier.: HP146
SCSI Target ID....: 0x0E
SCSI LUN..........: 0x00

I/O Log Event Data:

Driver Status Code..................: 0x0000000B
Length of Logged Hardware Status....: 22 bytes.
Offset to Logged Manager Information: 24 bytes.
Length of Logged Manager Information: 34 bytes.

Hardware Status:

Raw H/W Status:
0x0000: 00 00 00 02 70 00 06 00 00 00 00 0A 00 00 00 00
0x0010: 29 02 02 00 00 00

SCSI Status...: CHECK CONDITION (0x02)
Indicates that a contingent allegiance condition has occurred. Any
error, exception, or abnormal condition that causes sense data to be
set will produce the CHECK CONDITION status.

SCSI Sense Data:

Undecoded Sense Data:
0x0000: 70 00 06 00 00 00 00 0A 00 00 00 00 29 02 02 00
0x0010: 00 00

SCSI Sense Data Fields:
Error Code : 0x70
Segment Number : 0x00
Bit Fields:
Filemark : 0
End-of-Medium : 0
Incorrect Length Indicator : 0
Sense Key : 0x06
Information Field Valid : FALSE
Information Field : 0x00000000
Additional Sense Length : 10
Command Specific : 0x00000000
Additional Sense Code : 0x29
Additional Sense Qualifier : 0x02
Field Replaceable Unit : 0x02
Sense Key Specific Data Valid : FALSE
Sense Key Specific Data : 0x00 0x00 0x00

Sense Key 0x06, UNIT ATTENTION, indicates that the target has been
reset by a BUS DEVICE RESET message, a hard reset condition, or by a
power-on reset. If not a reset, then one of the following may have
occurred.
1. A removable medium may have been changed.
2. The mode parameters in effect for this initiator have been
changed by another initiator.
3. The version or level of microcode has been changed.
4. Tagged commands queued for this initiator were cleared by
another initiator.
5. INQUIRY data has been changed.
6. The mode parameters in effect for this initiator have been
restored from non-volatile memory.
7. A change in the condition of a synchronized spindle.
8. Any other event that requires the attention of the initiator.

SCSI Command Data Block:

Command Data Block Contents:
0x0000: 28 00 02 43 AD C0 00 00 10 00

Command Data Block Fields (10-byte fmt):
Command Operation Code...(0x28)..: READ
Logical Unit Number..............: 0
DPO Bit..........................: 0
FUA Bit..........................: 0
Relative Address Bit.............: 0
Logical Block Address............: 37989824 (0x0243ADC0)
Transfer Length..................: 16 (0x0010)

Manager-Specific Data Fields:
Request ID.............: 0x041D8B72
Data Residue...........: 0x00002000
CDB status.............: 0x00000002
Sense Status...........: 0x00000000
Bus ID.................: 0x04
Target ID..............: 0x0E
LUN ID.................: 0x00
Sense Data Length......: 0x12
Q Tag..................: 0x7E
Retry Count............: 1
0 @



2 REPLIES 2
Steven E. Protter
Exalted Contributor

Re: DS2300 - SCSI Reset

Shalom,

Normally an lbolt means a bad disk.

With an entire section(side) of the array down, I suspect a hardware problem with the array. Have the hardware people relace that part of the disk array, not the disks.

This will require downtime.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
sysadm_1
Valued Contributor

Re: DS2300 - SCSI Reset



Looks like problem with the inerface board on DISK array.
I suggest to log a call to get hardware replacement.

-sysadm