1838469 Members
2714 Online
110126 Solutions
New Discussion

Disk Controller Problem

 
Mike Lynch_8
Occasional Contributor

Disk Controller Problem

Hi All

We are getting intermittent SCSI errors and I am pretty confident it is a controller issue as opposed to a disk.

The problem causes logical volumes to become unavailable or corrupted. We recently hotswapped 3 new disks and it is since we started using these disks that the problem arose.

We have 2 internal disks in our server and 7 disk sitting in an external array. The problem seems to be in the external box.
We have powered down the server and disk array but still get the errors after a while.


grep -i scsi syslog.log
Nov 7 11:02:38 hpdev vmunix: c8xx BUS: 4 SCSI C1010 Ultra160 Wide LVD A6828-60101 assigned CPU: -1
Nov 7 11:02:38 hpdev vmunix: c8xx BUS: 5 SCSI C1010 Ultra160 Wide LVD A6828-60101 assigned CPU: -1
Nov 7 11:02:38 hpdev vmunix: c8xx BUS: 6 SCSI C1010 Ultra160 Wide LVD A6828-60101 assigned CPU: -1
Nov 7 11:02:38 hpdev vmunix: c8xx BUS: 7 SCSI C1010 Ultra160 Wide LVD A6828-60101 assigned CPU: -1
Nov 7 11:24:18 hpdev vmunix: SCSI: Unexpected Disconnect -- lbolt: 143986, dev: 1f074000, io_id: 708ccb9
Nov 7 13:10:04 hpdev vmunix: SCSI Gross Error on 0/12/0/0:
Nov 7 13:10:04 hpdev vmunix: SCSI: isrEscape Controller at 0/12/0/0.
Nov 7 13:10:04 hpdev vmunix: SCSI: -- lbolt: 778514, dev: cb07f002
Nov 7 13:10:05 hpdev vmunix: SCSI: Resetting SCSI -- lbolt: 778614, bus: 7 path: 0/12/0/0
Nov 7 13:10:05 hpdev vmunix: SCSI: Reset detected -- lbolt: 778614, bus: 7 path: 0/12/0/0
Nov 7 13:10:08 hpdev vmunix: SCSI: Reset detected -- path: 0/12/0/0
Nov 7 13:10:08 hpdev vmunix: SCSI: -- lbolt: 778914, bus: 7
Nov 7 13:10:08 hpdev vmunix: SCSI: Ultra160 Controller at 0/12/0/0: Error: The domain validation test for target 15 determined that communication may not be possible to this target. Verify the hardware configuration.
Nov 7 13:10:08 hpdev vmunix: SCSI: Ultra160 Controller at 0/12/0/0: Error: The domain validation test for target 1 determined that communication may not be possible to this target. Verify the hardware configuration.


hpdev-root$:ioscan -fnC ctl
Class I H/W Path Driver S/W State H/W Type Description
======================================================================
ctl 0 0/0/1/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c0t7d0
ctl 1 0/0/1/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c1t7d0
ctl 2 0/0/2/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c2t7d0
ctl 3 0/0/2/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c3t7d0
ctl 4 0/8/0/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c4t7d0
ctl 5 0/9/0/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c5t7d0
ctl 6 0/10/0/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c6t7d0
ctl 7 0/12/0/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c7t7d0
ctl 9 0/12/0/0.15.0 sctl CLAIMED DEVICE HP A6491A
/dev/rscsi/c7t15d0


hpdev-root$:ioscan -fnCdisk
Class I H/W Path Driver S/W State H/W Type Description
======================================================================
disk 5 0/0/1/1.2.0 sdisk CLAIMED DEVICE HP 36.4GMAN3367MC
/dev/dsk/c1t2d0 /dev/rdsk/c1t2d0
disk 6 0/0/2/0.2.0 sdisk CLAIMED DEVICE HP 36.4GMAN3367MC
/dev/dsk/c2t2d0 /dev/rdsk/c2t2d0
disk 0 0/0/2/1.2.0 sdisk CLAIMED DEVICE HP DVD-ROM 305
/dev/dsk/c3t2d0 /dev/rdsk/c3t2d0
disk 3 0/12/0/0.0.0 sdisk CLAIMED DEVICE HP 73.4GST373405LC
/dev/dsk/c7t0d0 /dev/rdsk/c7t0d0
disk 4 0/12/0/0.1.0 sdisk CLAIMED DEVICE HP 73.4GATLAS10K3_73_SCA
/dev/dsk/c7t1d0 /dev/rdsk/c7t1d0
disk 7 0/12/0/0.2.0 sdisk CLAIMED DEVICE HP 73.4GATLAS10K3_73_SCA
/dev/dsk/c7t2d0 /dev/rdsk/c7t2d0
disk 8 0/12/0/0.3.0 sdisk CLAIMED DEVICE HP 73.4GST373405LC
/dev/dsk/c7t3d0 /dev/rdsk/c7t3d0
disk 9 0/12/0/0.4.0 sdisk CLAIMED DEVICE MAXTOR ATLAS10K5_73SCA
/dev/dsk/c7t4d0 /dev/rdsk/c7t4d0
disk 10 0/12/0/0.5.0 sdisk CLAIMED DEVICE MAXTOR ATLAS10K5_73SCA
/dev/dsk/c7t5d0 /dev/rdsk/c7t5d0
disk 11 0/12/0/0.6.0 sdisk CLAIMED DEVICE MAXTOR ATLAS10K5_73SCA
/dev/dsk/c7t6d0 /dev/rdsk/c7t6d0


Any ideas ?

Thanks

Mike


5 REPLIES 5
Steven E. Protter
Exalted Contributor

Re: Disk Controller Problem

Possible conclusions:

1) One of the new disks is bad. Suggest xstm mstm check on hardware
2) The system has not been rebooted since the hot swap. This makes simple lbolts due to disk swapping go away.
3) Improper procedure was used with LVM management when you switched the disks.

I'm sure there are others. Was pvcreate used on the new disks? Were they temporarily excluded from the volume group to do this?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
A. Clay Stephenson
Acclaimed Contributor

Re: Disk Controller Problem

First of all, cb07f002 decodes to:
cb == Major device 203
07 == c7
f == t15
0 == d0
02 == driver specific, if 203 is sctl then 02 means inhibit inquiry on open.

Do an lsdev to make sure that major device 203 on your box is sctl.

You didn't mention why you replaced the disks so the problem may be older than you indicate. There are not enough data to nail this down. Is your external array some kind of smart array or is it simply a JBOD? Have you ensured that your total bus length (including internal connections) does not exceed (or closely approach) maximum bus length? Is at least one device on the bus supplying termination power? Is the bus terminated in EXACTLY two places? On the physical ends of the bus? Did ypu possibly leave the terminator jumpers/switches "ON" on one or more of the new drives? Before jumping to the conclusion that you have a bad controller make certain that your termination is okay. Replace the terminators because poor termination can cause exactly this sort of problem -- a SCSI bus that almost works perfectly.
If it ain't broke, I can fix that.
Robert-Jan Goossens
Honored Contributor

Re: Disk Controller Problem

Hi Mike,

Check if patch PHKL_32089 11.11 SCSI Ultra160 Cumulative Patch is installed.

http://www4.itrc.hp.com/service/patch/patchDetail.do?BC=patch.breadcrumb.main|patch.breadcrumb.search|&patchid=PHKL_32089&context=hpux:800:11:11

Regards,
Robert-Jan
Sยภเl Kย๓คг
Respected Contributor

Re: Disk Controller Problem

Hi,
This seems to be a disk failure which is creating the scsi lbolt error.

I adivice you to log a call with HP, if it is under the contract and u can send the syslog and event logs to HP solution center to decocde the error.

Regards,
Sunil
Your imagination is the preview of your life's coming attractions
Mike Lynch_8
Occasional Contributor

Re: Disk Controller Problem

Hi All

Thanks for the speedy replies. Here's some more clarification.

Firstly the external box is a HP 2300 Disk System A6491A storage cabinet. It has 14 disk bays which appear to be divided into 2 sets of 7.
There are 2x2 scsi ports at the back of the enclosure. There are 2 scsi cables, one from each side, going into 2 different servers (including the server in question). The other 2 ports are terminated.
We usually have the 2 servers up reading from the one disk system. The other server was also complaining of SCSI errors. I shut down this other server to simplify troubleshooting.
We haven't actually replaced disks, we are adding extra disks. We originally had 4 disks in each side and have now added 3 new disks to both sides.
The LVM configuration is pretty simple for these 3 disks. Each disk is in a separate, standalone volume group.

Here's the output of lsdev

hpdev-root$:lsdev 203
Character Block Driver Class
203 -1 sctl ctl

We do not have PHKL_32089 installed. I will certainly investigate it.

We have just powered down again and checked scsi connectors, reseated disks etc.

As the problem is intermittent, we are now trying to reproduce it using dd of lvols, prealloc to create huge files and the stm "exercise" option. I presume this stress testing is good enough ?

Regards

Mike