Re: SCSI Log Sense -- Error Counter Pages

Brian Eickhoff · ‎12-27-2005

I have been evaluating two different HP LTO tape drive models (Ultrium 448 & 960) in addition to two other models from different vendors. Comparing these four drive models, the HP tape drives are the only ones that seem to accumulate a large number of errors in the SCSI error counter pages (hundreds or thousands of errors for a large transfer). Specifically, these Log Sense error counter pages are:

write errors (page code 02h)
parameter codes:
0000h - Write errors corrected w/o delay
0004h - Total number of write retries

read errors (page code 03h)
parameter codes:
0000h - Read errors corrected w/o delay
0004h - Total number of read retries

Since the exact definition of these error counters is not part of the SCSI standard, can someone tell me how they are implemented on the HP Ultrium tape drives? Despite having the high error count, the drives perform as advertised. However, I would feel better about the reliability of these HP LTO drives if I knew what these errors really meant. I have not been able to find a HP SCSI Reference document with answers to any of these questions. Does anyone else have experience working with HP's SCSI Log Sense data? Thanks!

Richard Bickers · ‎12-28-2005

Hi Brian,

The logs have 7 parameters each and the HP drives only use a subset. You can use HP Library and Tape Tools to extract and view the data in a support ticket though it sounds like you're using a more direct SCSI approach.

Write error counters log page:
0 Errors corrected without substantial delay - not used - are you finding numbers in here?
1 Errors corrected with possible delays - not used - are you finding numbers in here?
2 Total Sum of parameters 3 and 6
3 Total errors corrected - The number of data sets that needed to be physically rewritten through repositioning - only happens if 4m of CCQ rewrites haven't been successful
4 Total times error correction processed - Number of CCQ sets rewritten - a CCQ is a piece of a dataset that can be re-written. It uses more tape but allows the writes to continue streaming. It's like sparing over small sections of media
5 Total data sets written - 400K for LTO 2, 1.6M for LTO 3
6 Total uncorrected errors -The number of data sets that could not be written - even after CCQ re-writes and retries. I.e. A write failure.

The drives use read-while-write to measure the quality of the data on tape as it goes. The usual impact of poor media, or dirt is to use more tape by writing extra CCQs. We call this capacity loss. This is the most accurate measure of write quality as it takes account of other factors such as tracking which can also result in CCQs being re-written. This can be calculated by comparing CCQs written with CCQ retries from the LTT support ticket in the 'write error rate log'. Usually <1% but can vary. 5% is fine, 10% is unusual, it will still work upto 50% (though we consider 20% to be returnable).

Even at 1% you will normally see lots of CCQ re-writes for large transfers. Don't worry!

Read error counters log page:
0 Errors corrected without substantial delay - not used - are you finding numbers in here?
1 Errors corrected with possible delays - not used - are you finding numbers in here?
2 Total Sum of parameters 3 and 6
3 Total errors corrected - The number of data sets that were corrected after a
physical read retry
4 Total times error correction processed - Number of times logical (C2) error correction is invoked - i.e. some of the write redundancy is used
5 Total datasets processed (read)
6 Total uncorrected errors - The number of data sets that could not be read after
retries - i.e. a read failure.

I wouldn't expect to see too many C2 error corrections but 1 per 100 datasets is reasonable.

It's really hard to compare different vendors drives with these measures because they are so vendor specific but I hope the above helps you untangle some of the numbers coming back from the HP drives. We watch these figures very carefully during production and also use the in support (via LTT) to determine drive health. Watch out for 'LTT reports' coming in Febrary (LTT 4.0 SR1) which translates all of this into english!

Good luck with your selection. Be interested to know how you get on.