Operating System - OpenVMS
1828291 Members
2797 Online
109975 Solutions
New Discussion

Re: SCSI Cluster - extended sense errors

 
SOLVED
Go to solution
A.W.R
Frequent Advisor

SCSI Cluster - extended sense errors

Hi,

I have a cluster of OpenVMS Alpha 800s. From time to time I get extended sense errors as below:-

******************************* ENTRY 214 ********************************


Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V6.2-1H3
Event sequence number 134.
Timestamp of occurrence 22-FEB-2007 09:16:36
Time since reboot 0 Day(s) 0:11:07
Host name ESB

System Model AlphaServer 800 5/500

Entry type 1. Device Error


---- Device Profile ----
Unit $1$DKB200
Product Name RZ1BB-CS
Vendor DEC

-- Driver Supplied Info -
Device Firmware Revision 0656
VMS SCSI Error Type 5. Extended Sense Data from Device
SCSI ID x02
SCSI LUN x00
SCSI SUBLUN x00
Port Status x00000001 NORMAL - normal successful completion
Command Opcode x00 Test Unit Ready
Command Data
x00
x00
x00
x00
x00

SCSI Status x02 Check Condition
Remaining Byte Length 18.

--- Sense Data For Device RZ1BB-CS, 2GB SCA-2 Fast 10 & Fast 20 -
7200RPM
Error Code x70 Current Error
Segment # x00
Information Byte 3 x00
Byte 2 x00
Byte 1 x00
Byte 0 x00
Sense Key x06 UNIT ATTENTION
Additional Sense Length x0A
CMD Specific Info Byte 3 x00
Byte 2 x00
Byte 1 x00
Byte 0 x00
ASC & ASCQ x2902 ASC/ASCQ not available from Seagate.
FRU Code x02
Sense Key Specific Byte 0 x00
Byte 1 x00
Byte 2 x00

----- Software Info -----
UCB$x_ERTCNT 16. Retries Remaining
UCB$x_ERTMAX 16. Retries Allowable
IRP$Q_IOSB x0000000000000000
UCB$x_STS x08065910 Online
Busy
Software Valid
Unload At Dismount
"Mount Verification" In-Progress
Volume is Valid on the local node
Suppress "Success" Mount Verification
Message
Unit supports the Extended Function bit
IRP$L_PID x84BBAC50 Requestor "PID"
IRP$x_BOFF 0. Byte Page Offset
IRP$x_BCNT 0. Transfer Size In Byte(s)
UCB$x_ERRCNT 1. Errors This Unit
UCB$L_OPCNT 1272. QIO's This Unit
ORB$L_OWNER x00010004 Owners UIC
UCB$L_DEVCHAR1 x1C4D4008 Directory Structured
File Oriented
Sharable
Available
Mounted
Error Logging
Capable of Input
Capable of Output
Random Access


What do they mean and should I be concerned?

Thanks
Andrew
4 REPLIES 4
Jim_McKinney
Honored Contributor
Solution

Re: SCSI Cluster - extended sense errors

> Time since reboot 0 Day(s) 0:11:07

> ASC & ASCQ x2902 ASC/ASCQ not available from Seagate.


That ASC/ASCQ indicates a SCSI bus reset. You said that this system was part of a cluster and I can see that it had just been booted. Is it possible that another node in the cluster was booted at this time and wanted to share this disk? If so, it's possible that the other node caused the bus reset while it was attempting to discover which devices were available to it during its initialization and that this node was reacting to that noise.
Jim_McKinney
Honored Contributor

Re: SCSI Cluster - extended sense errors

I should also have said that it is normal for these sorts of errors to be logged by active systems when a new system that wishes to share a SCSI bus is introduced into a cluster.
Ian Miller.
Honored Contributor

Re: SCSI Cluster - extended sense errors

If that's on a shared SCSI bus and a node was recently rebooted then it's not unusual.
____________________
Purely Personal Opinion
Hoff
Honored Contributor

Re: SCSI Cluster - extended sense errors

Receiving SCSI extended sense data is normal, and is not a cause for concern. Some drives return it, some don't. The SCSI drivers log it if and when they receive it, and that's what you are seeing. (The SCSI driver stack dates back to an era before it was in common use.)

Including the following citation is a little surreal, but that's fodder for another discussion:

http://h71000.www7.hp.com/wizard/wiz_6205.html

Also see the V8.2 -- far newer than your V6.2-1H3 release, and SCSI DKDRIVER stack has changed over the intervening releases -- discussion of this same message:
http://h71000.www7.hp.com/doc/82FINAL/6318/6318pro_023.html

These can particularly crop up in multi-host SCSI environments, where the other system has just lobbed a bus reset at the SCSI. (This is also why multi-host SCSI buses don't do so well with tapes, because the reset can cause the tape to rewind.)

Having this entry labeled as an error isn't comforting, true. If the error log started showing (other, non-unit attention or unit attention not extended sense data) errors, then that could be a matter of some concern.
This sense data was in response to the host lobbing a Test Unit Ready command.

If you start to see scan errors or revectoring or other such, or if this isn't a multihost configuration and you continue to see mount verifications, then I'd start to get worried. And I'd look to swap. And even new disks do see substantial failure rates -- the most recent numbers I've seen published show an annual disk failure rate of about 6% per year, starting around the third year. (There are a couple of very interesting papers recently published in this area, too, from Google and from CMU.)

If you're concerned about this, consider contacting your hardware support organization, and ask them for their input.

If you yourself are the hardware support organization, the usual caveats around ensuring spares are available and ensuring current BACKUP copies can and do apply. Older disks can and do fail, and the disk-level SMART monitoring and such are not very good at spotting an impending disk failure. And most sites will need to roll in the disk backups, sooner or later -- hardware support contract or not.

Stephen Hoffman
HoffmanLabs