Operating System - HP-UX
1825771 Members
2120 Online
109687 Solutions
New Discussion

Reboot after panic: SCSI error

 
SOLVED
Go to solution
Andreas D. Skjervold
Honored Contributor

Reboot after panic: SCSI error

I have a C180 database server(10.20) on our test system that did a reboot after panic this weekend.
The shutdownlog says:
Reboot after panic: SCSI: unrecoverred deferred error.

How can I sort out the reason for this event?

rgds
Andy

Only by ignoring what everyone think is important, can you be aware of what everyone ignores!
9 REPLIES 9
James R. Ferguson
Acclaimed Contributor
Solution

Re: Reboot after panic: SCSI error

Hi:

This suggests an unrecoverable SCSI error. This happens when an unrecoverable error occurs on a SCSI disk, but the error report comes too late and the operating system doesn't know what to do except a "panic"
because an I/O is assumed to be good when in fact it failed.

...JRF...
CHRIS_ANORUO
Honored Contributor

Re: Reboot after panic: SCSI error

Hi Andreas,
Have a look at the /var/adm/syslog/OLDsyslog.log. Check also the /var/adm/crash for crash files which should be sent to HP for proper diagnostics.
When We Seek To Discover The Best In Others, We Somehow Bring Out The Best In Ourselves.
Rick Garland
Honored Contributor

Re: Reboot after panic: SCSI error

Is there a crash file? If so, can run the q4 analysis on it and send the resulting file to HP for an answer. Is there a TombStone (TS) file?

These are some of the files that can help further diagnosis an issue with panics.
Cheryl Griffin
Honored Contributor

Re: Reboot after panic: SCSI error

You also want to make sure that you have a minimum patch level installed: PHKL_20686
WSIO SCSI cumulative patch
# swlist -l fileset -a state |grep 20686
(look for configured)

If you do not have this patch, install the updated version of PHKL_22223 and it's dependencies of
PHCO_16591 fsck_vxfs(1M) cumulative patch
PHKL_16750 SIG_IGN/SIGCLD,LVM,JFS,PCI/SCSI
PHKL_16959 Physical dump devices config
PHKL_17857 Fix for mount/access of disc
PHCO_18563 LVM commands cumulative patch
PHNE_19937 cumulative ARPA Transport patch
PHKL_20610 Correct process hangs on ufs
PHKL_21594 VxFS (JFS) mount, fsck
PHKL_21660 lo_realvfs panic fix, LOFS patch
"Downtime is a Crime."
Michael Lampi
Trusted Contributor

Re: Reboot after panic: SCSI error

To prevent such errors from happening in the future, turn off the WCE (write cache enable) bit on the disk drive. Use scsictl(1M) to do this. For example,

scsictl -a -m ir=0 /dev/rdsk/c0t6d0

Yes, the "immediate report" bit is the WCE bit.

Note that this will slow down the write performance of the disk drive, but your system will not crash with a deferred error.
A journey of 1000 steps ends in a mile.
Andreas D. Skjervold
Honored Contributor

Re: Reboot after panic: SCSI error

Thanks for the good responce!
First I'm not able to do a patch without affecting the test-system, that is without testing the patch with the software development first. So that solution will have to wait.

Second; how can I determine which disk that caused the error, if I'm to use your solution Michael?

Only by ignoring what everyone think is important, can you be aware of what everyone ignores!
James R. Ferguson
Acclaimed Contributor

Re: Reboot after panic: SCSI error

Andy:

The panic message should have looked something like:

panic: (display==0xbf00, flags==0x0) SCSI: unrecovered deferred error(dev = 0x).

The should be matchable to the disk by doing an ls -l /dev/dsk.

...JRF...



Michael Lampi
Trusted Contributor

Re: Reboot after panic: SCSI error

James has the best suggestion regarding locating the failing drive.

However, if you do not have the device information from the panic message, then your best bet is to closely monitor the dmesg output. Look for SCSI-related messages, as there could be other symptoms of drive failure appearing on the console.

Once you have determined which drive (or drives!) might be having problems, you can use the scsictl command.

Otherwise, the scsictl command can be used on any and all drives on your system with impunity. The only side effect is that writes will then complete when the data is safely on the rotating media.
A journey of 1000 steps ends in a mile.
Andreas D. Skjervold
Honored Contributor

Re: Reboot after panic: SCSI error

Thanks again!
The only place I found the panic msg was in the shutdownlog:
02:11 Sun Oct 08 2000. Reboot after panic: SCSI: unrecoverred deferred error

Is there any other log thats more informative?

and 2nd; can I safely disable WEC when using LVM?

rgds
Andy
Only by ignoring what everyone think is important, can you be aware of what everyone ignores!