Operating System - HP-UX
1848150 Members
9557 Online
104022 Solutions
New Discussion

Cause of and action upon SCSI bus disconnects

 
Ralph Grothe
Honored Contributor

Cause of and action upon SCSI bus disconnects

Hello,

we have this

# model;uname -srv
9000/804/K450
HP-UX B.11.11 U

which is exhibiting quite a lot of SCSI bus disconnects

# grep vmunix /var/adm/syslog/syslog.log|tail
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571835, dev: cb00e002, io_id: d1
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571836, dev: cb00e002, io_id: d1
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571833, dev: cb00e002, io_id: d1
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571837, dev: cb00e002, io_id: d1
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571838, dev: cb00f002, io_id: d2
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571839, dev: cb00f002, io_id: d2
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571840, dev: cb00f002, io_id: d2
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571841, dev: cb00f002, io_id: d2
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571838, dev: cb00f002, io_id: d2
Dec 10 15:06:19 tiber vmunix: SCSI: Unexpected Disconnect -- lbolt: 571842, dev: cb00f002, io_id: d2

# grep -c 'SCSI: Unexpected Disconnect' /var/adm/syslog/syslog.log|tail
2485


To me this looks like a termination problem, or similar.

The volumes however are all in sync

# lvdisplay -v $(vgdisplay -v|awk '/LV Name/{print$NF}')|grep -ic stale
0


And a look at the disks with OnlineDiag showed no errors.

On the other hand this box'es patch level is a bit obsolete.
For instance I'm sure there will be a more recent SCSI patch (which may have fixed some hoax SCSI bus errors, who knows?)

# swlist -l fileset -a create_date -a install_date -a title PHKL_25896|sed -n 5,\$p
#

# PHKL_25896 Fri Dec 27 13:46:12 MET 2002 200212270852.50 SCSI IO Cumulati
ve Patch
PHKL_25896.C-INC Fri Dec 27 13:46:12 MET 2002 200212270852.50 ProgSupport.C-IN
C
PHKL_25896.CORE2-KRN Fri Dec 27 13:46:12 MET 2002 200212270852.50 OS-Core.CORE2-KR
N


First, how do I translate the device hex identifier from syslog entries to single out the affected devices?

Then what would you recommend?
Though the syslog SCSI errors have disappeared since Dec 10th, which also happened the date of last reboot

# who -b
. system boot Dec 10 13:33

I think this requires further investigation.

Regards
Ralph
Madness, thy name is system administration
1 REPLY 1
Victor BERRIDGE
Honored Contributor

Re: Cause of and action upon SCSI bus disconnects

Hi Ralf,
It reminds me a failure I had few years ago, a root mirrored disk failure...
With the bad side of it the disk was disrupting time to time but did never crash or break definitely. In had unexplained crashes but coulnd diagnose because after the reboot all was fine till the next time (few days-few weeks) th crash was due to sharing on the same scsi controller a HDS 5750 subsystem with internal root disks and at each time it was going bezerk, it sent a reset that the HDS acknowledged. it was looking at the HDS logs I found out: thousands of reset were done. I called HDS saying this can happen when a disk fails but not completely, switching istelf on/off the crashes wer due to swap I had on the HDS...

The difficulty was to decide which internal mirrored disk was causing all the trouble since after every reboot EMS found nothing...
One support Enginner asked me a type a now forgotten command which did return as error blaming the other disk, I was asked which disk was faulty and we changed the OTHER one, Ive been told experience shows that devices can often lie and blame the alter ego...
And the problem was solved...

So I would keep an eye on this system to se if you have more occurences...

cb would be 203:
# pwd
/dev/dsk
# ll|more
total 0
brw-r----- 1 bin sys 31 0x003000 Feb 26 2002 c0t3d0
brw-r----- 1 bin sys 31 0x012000 Feb 5 2002 c1t2d0
cr-------- 1 root root 203 0x012000 Feb 26 2002 c1t2d0.pt


these are the vg00 disks and one time I did have vxvm...


Good luck

All the best
Victor