Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Interpreting WSEA Tape Error Event

SOLVED
Go to solution
Jack Trachtman
Super Advisor

Interpreting WSEA Tape Error Event

I'm trying to get more detail on what shows
up as a "...PARITY" error in our backup jobs.
Below is the output from a WSEA TRANSLATE FULL
cmd. I'm hoping that someone can point out some useful fields for analysis (so far I've only figured out which fields define the tape drive reporting the error).

BTW: we have the 110/220 SDLT drives connected via an NSR to our HP SAN.

TIA

Event: 10
Description: VMS Tape Drive Event at Mon 19 Sep 2005 02:38:50 GMT-07:00 from ALP
HAD
File: /sys$errorlog/errlog.sys@alphad.vmmc.org
================================================================================

COMMON EVENT HEADER (CEH) V2.0
Event_Leader xFFFF FFFE
Header_Length 284
Event_Length 480
Header_Rev_Major 2
Header_Rev_Minor 1
OS_Type 2 -- OpenVMS
Hardware_Arch 4 -- Alpha
CEH_Vendor_ID 3,564 -- Hewlett-Packard Company
Hdwr_Sys_Type 38 -- Titan Corelogic
Logging_CPU 2 -- CPU Logging this Event
CPUs_In_Active_Set 4
Major_Class 1
Minor_Class 2
Entry_Type 2,001 -- VMS Tape Drive Event
DSR_Msg_Num 1,978 -- AlphaServer ES45
.... Model 2/2B
.... CPU Slots: 4 (1000 Mhz)
.... PCI Slots: 10
.... MMB Slots: 8 (DIMMs)
Chip_Type 12 -- EV68CB - 21264C
CEH_Device 28
CEH_Device_ID_0 x0000 0000
CEH_Device_ID_1 x0000 0000
CEH_Device_ID_2 x0000 0000
Unique_ID_Count 4,401
Unique_ID_Prefix 4
Exact_Length 185
Num_Strings 6

TLV Section of CEH
TLV_DSR_String AlphaServer ES45 Model 2
TLV_DDR_String COMPAQ SuperDLT1
TLV_Sys_Serial_Num 4150JSPZA261
TLV_Time_as_Local Mon 19 Sep 2005 02:38:50 GMT-07:00
TLV_OS_Version V7.3-2
TLV_Computer_Name ALPHAD
Entry_Type 2,001

EMB_Block
emb_ertcnt 16 Error Count
emb_ertmax 16 Max error count
emb_iosb 0
emb_sts x0000 1910
emb_class 2
emb_type 28
emb_rqpid 6,882,214
emb_boff 0
emb_bcnt 32,256
emb_media 0
emb_unit 1
emb_errcnt 206
emb_opcnt 336,605,372
emb_ownuic x0008 0009
emb_char x0DCC 5021
emb_Device_Number 0
emb_func 32,779
emb_name_len 6
emb_name $2$MGA
emb_dtname_len 16
emb_dtname COMPAQ SuperDLT1

OVMS_Tape_Header_Rev3
Longword_length 15
Tape_Hdr_Revision 3
Tape_Hardware_Revision 5555
Tape_Error_Type 5 Extended Sense Data
Tape_SCSI_ID x0000 0000 0000 000D
Tape_SCSI_LUN x0000 0000 0000 0200
Tape_Port_Status x0000 0001
Tape_SCSI_Cmd_Length 6
Tape_SCSI_Command Dump starting at offset: x1b5
[x0] xa
[x1] x0
[x2] x0
[x3] x7e
[x4] x0
[x5] x0
Tape_SCSI_Command_Status 2
Tape_SCSI_Additional_Data_Length 24
Tape_SCSI_Additional_DataDump starting at offset: x1bd
[x0] xf0
[x1] x0
[x2] x3
[x3] x6
[x4] x64
[x5] x86
[x6] x0
[x7] x16
[x8] x0
[x9] x31
[xa] x99
[xb] xff
[xc] xc
[xd] x0
[xe] x0
[xf] x0
[x10] x0
[x11] x0
[x12] xd0
[x13] x7
[x14] x35
[x15] x0
[x16] x0
[x17] x11
7 REPLIES
Volker Halle
Honored Contributor

Re: Interpreting WSEA Tape Error Event

Jack,

a beautiful tool, isn't it ;-)

Try also decoding this error with DECevent (it's quite good at decoding SCSI errors) and ELV (although I wouldn't expect that tool to be able to decode SCSI tape errors) and compare the results.

The idea here would be to find the Extended Sense Data (nameley ASC and ASCQ).

Volker.
Volker Halle
Honored Contributor
Solution

Re: Interpreting WSEA Tape Error Event

Jack,

I've found an old SCSI tape errlog entry example (decoded with DECevent) from a TZ89 and may be able to guess/map/explain some of the fields in the WSEA errlog translation:

...
Tape_SCSI_Cmd_Length 6
[x0] xa = Write (6 byte)
[x1] x0
[x2] x0
[x3] x7e
[x4] x0
[x5] x0
Tape_SCSI_Command_Status 2 = Check Condition
Tape_SCSI_Additional_Data_Length 24
[x0] xf0 = Error Code
[x1] x0 = Segment Number
[x2] x3 = Sense Key (Medium Error) ?
[x3] x6 = additional sense length ?
[x4] x64
[x5] x86
[x6] x0
[x7] x16
[x8] x0
[x9] x31
[xa] x99
[xb] xff
[xc] xc = ASC
[xd] x0 = ASCQ -> ASC/ASCQ = x0C00 Write error
[xe] x0 = FRU code
[xf] x0 = Sense Key specific bytes
[x10] x0
[x11] x0
[x12] xd0 = vendor-specific bytes ?!
[x13] x7
[x14] x35
[x15] x0
[x16] x0
[x17] x11

Note that most of this is guessing based on the values and pattern matching. They key information may be in the vendor-specific bytes...

Volker.
Volker Halle
Honored Contributor

Re: Interpreting WSEA Tape Error Event

Jack,

after looking at the SCSI-2 spec found at:

http://www.danbbs.dk/~dino/SCSI/SCSI2-08.html#8.2.14

I can confirm most of my findings:

Tape_SCSI_Command_Status 2 = Check Condition
Tape_SCSI_Additional_Data_Length 24
[x0] xf0 = Error Code = valid, current error
[x1] x0 = Segment Number
[x2] x3 = Sense Key (Medium Error)
[x3] x6 = Information bytes
[x4] x64 = ...
[x5] x86 = ...
[x6] x0 = ...
[x7] x16 = additional sense length
[x8] x0 = command-specific information
[x9] x31 = ...
[xa] x99 = ...
[xb] xff = ...
[xc] xc = ASC
[xd] x0 = ASCQ -> ASC/ASCQ = x0C00 Write error
[xe] x0 = FRU code
[xf] x0 = Sense Key specific bytes
[x10] x0 = ...
[x11] x0 = ...
[x12] xd0 = vendor-specific additional sense bytes
[x13] x7
[x14] x35
[x15] x0
[x16] x0
[x17] x11

So it looks like some kind of write error, but only the vendor specific bytes would provide more detailled information.

Volker.
Volker Halle
Honored Contributor

Re: Interpreting WSEA Tape Error Event

Jack,

after looking at the Quantum SDLT 220/320 SCSI Interface Guide chapter 4.27 Request Sense command:

http://downloads.quantum.com/sdlt320/818500101.pdf

[x12] xd0 = Internal Status code VS (vendor specific)
[x13] x7 = Tape motion hours
[x14] x35 = ...
[x15] x0 = Power-On hours
[x16] x0 = ... does not seem to be used ?!
[x17] x11 = Tape remaining ?

So the last secret remaining is the value of 0xD0 in the internal status code.

Volker.
Jack Trachtman
Super Advisor

Re: Interpreting WSEA Tape Error Event

Volker,

Thanks for all the research

- DECevent: I don't have it installed since its "deprecated" (and I miss it!)

- ELV: tried it and as you guessed, it can not interpret the data


BTW: this is the site I use to look up ASC/ASCQ codes:

http://www.t10.org/
select: "Vendor ID, ASC/ASCQ, and Standards Identifier Codes"
then: "ASC/ASCQ - Additional Sense Data Information"

- Thanks for the ref to the Quantum manual - I'll review that.


So it looks like I'm getting a "write error" - not very helpful. But I'll check more WSEA entries to see if there are other codes.
Jan van den Ende
Honored Contributor

Re: Interpreting WSEA Tape Error Event

Jack,

and you should count yourself lucky on WRITE errors. It least you KNOW you have to re-run your backup to get a good saveset.

What I really HATE about SCSI is the fact that there is NO WAY (afaik) to read past the point of a SINGLE parity error, and allow BACKUP to perform the recovery magic it is so good with, if only the rest of the data were presented.

At times like this I am longing back to DSA compliant tape units.

I vividly remember a tape which stored a devellopment enviroment of ~ one year old.
The client wanted a minor adaptation, but we could nit read the (TK70) tape.

Called Digital, they had a DSSI TK70.
Bring in the tape, and an empty one.

A few minutes, and ONE recoverable error...

But are there nowadays any DSA compliant modern format tape devises? I think not... :-(


Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Ian Miller.
Honored Contributor

Re: Interpreting WSEA Tape Error Event

Install DECevent V3.4 - there is still not a good alternative as has been shown.
____________________
Purely Personal Opinion