Tape drive scsi problems

Fredrik.eriksson · ‎01-09-2009

Hi again guys :)

Thanks for the help in my last post about the broken tape drive.

Now it's acting up again but this time with another type of error, it seems that it is actually working properly this time.

During the holidays the tape drive worked for the better part. But when I got back to work it started reporting device timeout errors.

To explain a bit more, when I run "mount /for mke500:" it reports that medium is offline and then just prints out "%MOUNT-F-TIMEOUT, device timeout". After that it just says the error message directly when (within 1 second) I execute mount.

I've tried rebooting the machine in hopes that it would solve something, but when I did it didn't even show up as a device. After some work I got it working again, but it seemed more like a coincidence than a solution. I've also tried changing the scsi channel which worked for about 2 hours and then just gave up with the same result.

In my logic reasoning this could be 1 of 3 things,
1) scsi cable is broken?
2) scsi card is broken? (Qlogic ISP1020 SCSI)
3) the tape drive is broken, this is unlikely thou... since I got it to work fine for periods of time before the timeout errors occured.

Is there something more usual than this or am I on the right track?

Best regards
Fredrik Eriksson

Steven Schweda · ‎01-09-2009

> [...] it didn't even show up as a device.

If SYSMAN IO AUTOCONFIGURE (or a reboot)
does not detect the device now (but did
before), then you would seem to have some bad
hardware somewhere in the chain.

> [...] this is unlikely thou[gh]... [...]

Working things can fail. (Often, it's
exactly the working things which do fail.)
Replacing things other than the (new) tape
drive would certainly be a reasonable way to
start, however. What is the system here?
SCSI cables and old Qlogic PCI SCSI cards are
normally pretty easy to find for close to no
money.

> [...] my last post [...]

Including an URL would make that easier to
find. (For best results, leave out the XX in
"forumsXX.itrc.hp.com", and any "admit=X+Y+Z"
segment in the query string.)

Fredrik.eriksson · ‎01-09-2009

Yes ofc, should've included a link.
(http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1295322)

It's an OpenVMS 7.3-2. I might've gotten a hold of a proper scsi cable, but probably wont be available until monday to try it out.

I know that "working" hardware could malfunction in wierd ways, and it is a possibility since we just got a "new" one from HP just before christmas.
I'd rather like that it's just a scsi cable problem, mostly because that would be the simplest fix ;P

running $ MC SYSMAN IO AUTO /LOG works fine until it starts reporting device timeout and then it just can't detect it.

Best regards
Fredrik Eriksson

Allan Large · ‎01-09-2009

We have actually encountered a very similar problem with an rx2660. The solution was simple in that the controller card had to be reseated. The problem was resolved.

Dennis Handly · ‎01-09-2009

>should've included a link.

Also be careful about punctuation (your trailing ")"), better to use a line by itself.
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1295322

cnb · ‎01-13-2009

Did HP swap out the entire drive and enclosure or just the drive?

In addition to your suspect list don't overlook the terminator.

It could be the enclosure p/s as the older table-top supplies had a high failure history.

Turn off the enclosure for a while then power up and retry, if you can connect then most likely the p/s is failing when hot.

Just a thought.

HTH,

Fredrik.eriksson · ‎01-13-2009

Hi cnb,

You're correct, they didn't change the enclosure, just the internal bits.
I haven't checked the terminator so you could be correct. But it's somewhat like you describe it.
If I turn it off for a while it does reconnect and work again. I've noticed this not directly in the way you described it, but it has usually started working when I've tried to disconnect and reconnect the scsi cable, which might give it sufficient time to cool down i guess.
But it's wierd... I've had it running for several hours before it stops responding with device timeout errors. Even when I've only stopped it for like 2 minutes.
I haven't checked yet, but I moved it to another ES40 machine yesterday around 1pm and it was working when I went home around 5pm. Hopefully (simplest solution actually) is that it doesn't work and that it's causing these issues.
If it does work, in my reasoning, there can be 2 error sources. Either my OpenVMS installation is doing this or the SCSI card is broken.
Replacing the SCSI card isn't much of an issue, but if it's operationsystem problems then I need to find some other temporary solution since these machines are to be shutdown within 6 months.

Best regards
Fredrik Eriksson

cnb · ‎01-13-2009

Is there anything in the VMS error log to indicate GROSS or SCSI Phase Errors?

I'll bet my Carlsberg on the power supply being 'noisy'.
;-)

HTH,

David Lethe · ‎01-17-2009

There is an industry standard spec for monitoring health and decoding errors for tapes and autochanges. Google tapealert and you'll find info. One software product that is not ported to VMS, but ported to HP-UX, Windows, and just about everything else has some screenshots and further info. If you are able, temporarily hook the tape to a host running a supported O/S. Check out the manual and links for tapealert at http://www.santools.com/smart/unix/manualo

Hoff · ‎01-17-2009

Certainly do watch the SMART data (as there are some data points that do tend to predict failure), but do keep your data archives or recovery strategies current.

There are standards for all sorts of things to do with SCSI, too. Some of which sort-of match reality. (The best part of working with storage standards is that there are so many to choose from.)

The quote from Smart Reseller magazine over at that cited web site aside, the SMART monitoring (for disks) has been found to detect and report only a surprisingly small fraction of disk device failures. Prior to catastrophic failure, that is. SMART simply isn't a reliable predictor of failure, based on some large-scale empirical studies from folks at CMU and Google.

As for tools, here's some open source that might well be (reasonably) portable:

http://sourceforge.net/projects/smartmontools/

The ioctl() code that's very likely included (I haven't looked at the source code) would need to be switched over to IO$_DIAGNOSE calls to send the SCSI command packets, etc.

With OpenVMS and the specific device timeout case, that's already a failure somewhere in the chain. SMART and related tools likely won't help all that much. HP SIM / HP SEA / WBEM+WEBES / whateverthisstuffiscallednow might be worth a look. But I'd start swapping some SCSI parts here first, and see if or where the bug moved to. That's simpler.

marsh_1 · ‎01-17-2009

there's also the library and tape tools which supports 7.3-2 and up and dlt 1 and onwards.
i would tend to agree with hoff about narrowing it down physically - you won't get the sort of info you are after from hp sim or webes, even though hp sim contains a lot of tape info mib wise it's not comprehensive.

HTH

marsh_1 · ‎01-17-2009

sorry forgot to include link to L&TT download page.

marsh_1 · ‎01-17-2009

this time i've really put the link in !!

http://h20000.www2.hp.com/bizsupport/TechSupport/DriverDownload.jsp?pnameOID=406731&locale=en_US&taskId=135&prodSeriesId=406729&prodTypeId=12169

Ian Miller. · ‎01-19-2009

An LTT FAQ is available to answer the most common L&TT usage questions: www.hp.com/support/lttfaq

For more information on LTT and to download it see www.hp.com/support/tapetools

____________________
Purely Personal Opinion

Fredrik.eriksson · ‎01-19-2009

Hi guys, first of all, thanks for all your replies and sorry for not answering sooner, I've been sick.

I've tried running LTT on HP's request but it seems I'm missing LIBHBAAPI.EXE and it won't start. I've sent it back to them but they haven't answered yet.

I'm starting to lean more towards the theory about the chassi being bonkers. I've moved the tapedrive to the other machine in our cluster and it displays the same kind of error (Device timeout) after a couple of hours. There is no rescue from it except for rebooting the tapedrive (mostly it just needs a couple of minutes to get it's act together again)

Best Regards
Fredrik Eriksson

Jur van der Burg · ‎01-19-2009

Install the latest fibrescsi patchkit. It should include this library.

Jur.

David Lethe · ‎01-19-2009

LIBHBAAPI is for the SNIA FC API drivers. I would think that this wouldn't prevent the app to run, since it is SCSI-attach, not fibrechannel attached, but in all fairness I have never used this particular HP product.

Do you have ability to attach the tape to another O/S?

cnb · ‎01-19-2009

Fredik,

Since it's following the external tape sub-system to the other system, I suggest you swap out the external tape enclosure or replace it's power supply. A very common problem with these units.

Regards,

Let me know what the outcome is as I have a Carlsberg waiting on ice! ;-)

Fredrik.eriksson · ‎01-22-2009

Good morning guys :)

Jur, I would install if, if I knew where to find it. We currently don't have anything connected to these machines over FC. SAN and tapedrivers use SCSI interfaces only.

David, I currently don't have the possibility to connect it to another OS.

cnb, I'll get back to you when HP finally gets a move on. At the moment we're still stuck in the "we-want-information-you-cannot-give-us" loop and I'm just waiting for them to get back to me so I can tell them to send out a tech guy and fix it :P
(ps. They didn't really accept the situation when I told em that both my (previously working) alpha machines presents the same symptoms when connecting the tape driver ds.)

Best regards
Fredrik Eriksson

Jur van der Burg · ‎01-22-2009

The latest fibre scsi kit is here:

ftp://ftp.itrc.hp.com/openvms_patches/alpha/V7.3-2/VMS732_FIBRE_SCSI-V1500.ZIPEXE

Jur.

Fredrik.eriksson · ‎01-27-2009

Well this seems like a dead thread at the moment... HP shunted us with some ridiculus notion that our Digital Alpha machines and Digital DLT7000 (aka 35/70-GB DLT Drive) tape drive isn't supported if we run OpenVMS :P

I'll get back to ya'll when we've solved that confusion :)

Best regards
Fredrik Eriksson

David Lethe · ‎01-27-2009

Sorry, I know you were trying to reach me to get details on that diagnostic software for tape drives that I copied the dump from. The software does NOT support VMS, but works with just about everything else including HP/UX on both IA64 & PA-RISC architectures.
The url for the manual is: http://www.santools.com/smart/unix/manual
and the software also has tapealert capability, along with ability to look at log pages, and even script something so you can watch for errors/warnings and set up some trigger during I/O.
You can get contact info from the website

Hoff · ‎01-27-2009

>Well this seems like a dead thread at the moment... HP shunted us with some ridiculus notion that our Digital Alpha machines and Digital DLT7000 (aka 35/70-GB DLT Drive) tape drive isn't supported if we run OpenVMS :P

You need be very careful with that word. Very careful. It's loaded.

In classic OpenVMS terminology, "supported" had very specific meaning. Likely still has it, too.

The word "supported" does not mean "it works". To the software folks, that word means "it works and we'll fix the code or replace the drive if it doesn't." It's that latter part of the definition that causes care around use of that word.

The DEC DLT series was typically not supported by OpenVMS. The DLT drives were platform-generic drives intended for OEMs. For details on what is supported, check the AlphaServer support matrix for your widget, and the device specs, and the OpenVMS SPD.

The TZ-series equivalent was supported. At various points in history, these TZ-series drives were based on the DLT series drive kernels.

HP is correct. The DLT7000 is not AFAIK supported.

The question is "will it work", and the answer to that is usually "yes", so long as you're patched to current and you're running V6.2 or later, and later is better.

If you're going to work with SCSI and particularly with SCSI device integration and testing (for drives that are not "supported"), you have to know how SCSI works. And how to troubleshoot the SCSI bus. This is at the core of why there's a specific interpretation of "supported"; the device firmware and host drivers can and do vary, and there can be oddities.

The same holds with ATA and ATAPI devices.

These devices are not as compatible as we might all want. (And "compatible" is industry code for "different". If the device was truly identical, it would be called "identical" and not "compatible". And the definition of "compatible" is nebulous at best.

Here, do some basic SCSI troubleshooting. Check the bus length and bus under- or over-termination, correct addresses, and start swapping parts, and do definitely double-check the termination. If you're doing SCSI device integration and testing work, you have to know how SCSI works. Or you'll learn how over time.

Or get some help in to have a look at this.

Here, I'd verify the bus configuration and integrity, and then swap the drive. I'd see if the problem moved with the drive, or stayed with the bus. (I have a DLT8000 around that I use for this sort of testing, as well as some supported TZ drives.)

Part of the reason why supported drives are more expensive is because specific configurations have been tested and are known to work; the drives are part of the host software and I/O controller compatibility test matrix. And because the vendor will swap bad drives or will fix bad software.

Stephen Hoffman
HoffmanLabs LLC

Fredrik.eriksson · ‎01-27-2009

Hi Hoff,

Well I do know the diffrence between "supported" and that it works.
But HP's response was all of a sudden that it wasn't _supported_ and the case is now closed. There was no problems right before christmas when they came the first time to replace the tape drives internal bits (because of a stuck tape).
I did describe it very shortly... the thing is (atleast as me and my boss read it) that they told us that our support deal is not a valid configuration (anymore?) if we run OpenVMS on the Alpha servers or if we run it with that tape drive.
But I don't know anything until thursday when I'm back at work and hopefully my boss has solved this mess :)

And my method is similiar to yours when it comes to testing which one is in error... I moved it to an identical Alpha server and it still does the same as earlier (I've written this abit up in the thread).

Anyway, I'll tell you all when this is solved... Either by HP or by us buying a shelfed one ourselfs.

Best regards
Fredrik Eriksson

Fredrik.eriksson · ‎02-09-2009

So, HP finally got here and changed my tape drive, and so far it's been working (since monday last week, knock on wood).

They didn't change the power supply thou, only the actual tapedrive since the new powersupply they brought didn't fit into the old powersupplybox.

Anyway, thanks again for your insight :)
Best regards
Fredrik Eriksson

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Tape drive scsi problems

Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems

Re: Tape drive scsi problems