Operating System - HP-UX
1753623 Members
5701 Online
108797 Solutions
New Discussion юеВ

Re: Help...getting intermittent SCSI parity errors

 
Jamie Jones
Occasional Contributor

Help...getting intermittent SCSI parity errors

Getting intermittent SCSI parity errors on an A5001 that a reboot seems to fix temporarily. HPUX 11.0.

syslog:
Oct 22 12:15:35 m12400 vmunix: ected parity error -- lbolt: 8644351, dev: cb000002^M^M
Oct 22 12:08:46 m12400 vmunix: SCSI: Target detected parity error -- lbolt: 8603510, dev: bc000000^M^M
Oct 22 12:15:35 m12400above message repeats 11 times
Oct 22 12:15:35 m12400 vmunix: SCSI: Target detected parity error -- lbolt: 8644351, dev: cb000002^M^M
Oct 22 12:08:46 m12400 vmunix: SCSI: Target detected parity error -- lbolt: 8603511, dev: bc000000^M^M
Oct 22 12:15:35 m12400 above message repeats 13 times
Oct 22 12:15:35 m12400 vmunix: SCSI: Target detected parity error -- lbolt: 8644352, dev: cb000002^M^M
Oct 22 12:15:35 m12400 vmunix: SCSI: Target detected parity error -- lbolt: 8644346, dev: cb000002^M^M
Oct 22 12:15:35 m12400 above message repeats 13 times

ioscan -fnC disk:
Class I H/W Path Driver S/W State H/W Type Description
======================================================================
disk 1 0/0/1/0.0.0 sdisk CLAIMED DEVICE HP 18.2GST318406LC
/dev/dsk/c0t0d0 /dev/rdsk/c0t0d0
disk 3 0/0/1/0.2.0 sdisk CLAIMED DEVICE HP 18.2GST318406LC
/dev/dsk/c0t2d0 /dev/rdsk/c0t2d0
disk 4 0/0/1/0.4.0 sdisk CLAIMED DEVICE HP 18.2GST318406LC
/dev/dsk/c0t4d0 /dev/rdsk/c0t4d0
disk 0 0/0/1/1.15.0 sdisk CLAIMED DEVICE SEAGATE ST318404LC
/dev/dsk/c1t15d0 /dev/rdsk/c1t15d0
disk 2 0/0/2/1.15.0 sdisk CLAIMED DEVICE SEAGATE ST318203LC
/dev/dsk/c3t15d0 /dev/rdsk/c3t15d0
disk 5 0/4/0/0.1.0 sdisk CLAIMED DEVICE HP DVD-ROM 305
/dev/dsk/c4t1d0 /dev/rdsk/c4t1d0


Any ideas or pointers to where I can translate this?

 

 

P.S. This thread has been moevd from General to HP-UX > Sysadmin. - Hp forum moderator

5 REPLIES 5
Steven E. Protter
Exalted Contributor

Re: Help...getting intermittent SCSI parity errors

I suspect you have a disk thats getting ready to do. The lbolt has always led to a disk replacement in our shop.

I'd have a good make_tape_recovery backup and run it regularly.

The disk throwing the lbolt needs replacement.

If you have done a hot swap switch, the lbolt will go away next bolt. I don't think thats the case here.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Thayanidhi
Honored Contributor

Re: Help...getting intermittent SCSI parity errors

Hi,

You can find out which device gives lbolt using the following procedure.

Reasons for lbolt errors:
------------------------------

Hardware: Bad SCSI cable, Bad SCSI terminator, Bad device, bad controller ....etc.

Software: SCSI IO patch, Bad application, core dumps of application issuing SCSI request.. etc.

Other reasons: Power off /on of device in live SCSI bus, plugging or unplugging a device,...etc

Analysis
----------

The lbolt errors can be analyzed like below.

Lbolt error itself a time stamp.

For example if you receive vmunix: SCSI: Request Timeout; Abort -- lbolt: 120842710, dev: cd061000, io_id: 6684ee3

The SCSI error occurred after 120842710 milliseconds of system reboot. You can find out system uptime

By #uptime and calculate when the SCSI lbolt occurred.

The second useful field is "dev". From example dev:cd061000, the "cd" stand for major number of the device in hex.

The decimal value for "cd" is 205.

#ll /dev/dsk/*
or
#ll /dev/rmt/*

Check for major number 205.

06 is bus number. Using ioscan we can findout Bus number.

#ioscan -kfnC ext_bus

check for bus number 6.

10 is SCSI id 1 and lun 0.

Once the you identify the bus/device, make sure all the cables are secured properly.

Replace the cables/terminators/Device whichever suspected.

Make sure latest SCSI/IO Cumulative patches are installed.

If it is disk device you may change timeout value with pvchange -t option.

TT
Attitude (not aptitude) determines altitude.
A. Clay Stephenson
Acclaimed Contributor

Re: Help...getting intermittent SCSI parity errors

Everything that is happening is happening on c0t0d0. cb000002 --> cb = Major Device (disk); 00 -> c0; 0 -> t0; 0 -> d0; 02 ==> device driver dependent but is almost always 00 - 02 is unususual. Check the minor device number of the other /dev/dsk and /dev/rdsk nodes to see if they end in 00.

The other bc000000 is a scsi pass-thru (sctl) failure to the same device.

The very first thing to check is proper termination. The bus must be terminated in exactly two places -- on the ends of the bus. Also, at least one device on the bus must supply termination power. Improper termination can cause a scsi bus to behave almost perfectly -- the kind of problem that wil drive you crazy. In your case, I'm leaning towards a failing disk.
If it ain't broke, I can fix that.
Jamie Jones
Occasional Contributor

Re: Help...getting intermittent SCSI parity errors

Thanks for all your help! I will replace the drive and see if that fixes the problem.
Sanjay_6
Honored Contributor

Re: Help...getting intermittent SCSI parity errors

Hi,

Check and see if your scsi terminators are set properly. The problem could be due to improper termination too.

If it is not because of termination, then you may have to end up replacing the disk.

Hope this helps.

Regds