Simpler Navigation for Servers and Operating Systems - Please Update Your Bookmarks
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
If you have bookmarked forums or discussion boards in Servers and Operating Systems, we suggest you check and update them as needed.
Operating System - Tru64 Unix
cancel
Showing results for 
Search instead for 
Did you mean: 

Tru64 on GS60E: my DLT drives have disappeared!

Matt Hearn
Regular Advisor

Tru64 on GS60E: my DLT drives have disappeared!

I've got a real head-scratcher. The good news is that I have PLENTY of symptoms! Boy do I have symptoms.

It all started last week, I did a kernel parameter change and rebooted the box. Came up fine, but it never occurred to me to look and make sure the tape drives (2 DLTs in a TLX891 library) were still there, since why on earth would they disappear?

Fast forward to Sunday morning, our weekly backup (this is just a development server) fails when "robot move slot 0 drive 0" fails. I check it out, and run a hwmgr -show scsi and discover that the changer seems okay, but the drives have lost their device names (they were tape0 and tape1; now they're just nothing). I'd like to print a copy of the original hwmgr but it's long gone, sadly.

I'd run into a similar situation when we'd had to replace the entire library to fix a robot problem; I ended up deleting the drives, running hwmgr -scan component, and then using dsfmgr to move the device files to where they oughta be. So I said, no problem, I'll just delete 'em again, which I did.

Then I ran the hwmgr -scan scsi, and it just hung. Never came back. Eventually I left the data center; I assume it must have died, but I have no idea if it ever returned anything.

Anyway, today I finally get back to fiddling with it and trying to bring the drives back to life. A little research reveals I should have run "hwmgr -scan component," so I do that and monitor EVM for the success response, which never comes. I do a ps to see if the scan's still running, but it quit. I'm thinking maybe it wrote something to my mail, but unfortunately the mail file filled up, which is when I noticed the SECOND symptom: SCSI CAM errors, whatever those are, and tons of 'em, all looking like this:

Formatted Message:
SCSI event

Event Data Items:
Event Name : sys.unix.binlog.hw.scsi
Priority : 700
PID : 614
PPID : 1
Event Id : 110080
Timestamp : 25-Mar-2008 22:00:03
Host IP address : 170.212.26.37
Host Name : pdsdev
User Name : root
Format : SCSI event
Reference : cat:evmexp.cat:300

Variable Items:
subid_class (INT32) = 199
subid_num (INT32) = 1
subid_unit_num (INT32) = 96
subid_type (INT32) = 34
binlog_event (OPAQUE) = [OPAQUE VALUE: 1096 bytes]

============================ Translation =============================
Sequence number of error: 2593
Time of error entry: 25-Mar-2008 22:00:03
Host name: pdsdev

SCSI CAM ERROR PACKET
SCSI device class: DEC SIM
Bus Number: 1
Target number: 4
Lun Number: 0

Bus 1 is definitely where my drives and changer are, so this makes sense. No idea why they're throwing out errors, but they're coming fast and furious.

Then I go into scu, to see what I can see, and I run "scan edt," and it says:
Scanning all available buses, please be patient...

And then just sits there. I'm not a patient man by any means, but it's been almost 2 hours, I don't think it's coming back any time soon.

Any ideas? I'm wondering if I can just reboot the box to get it to come back to life, but that may not happen this week because the developers are working fast and furious. Even then, I'm not sure if a reboot will do a lick of good, since I'm guessing last week's reboot screwed it up. I find it hard to believe the kernel parameters (just a few things to line up with Oracle 10g, which we want to install in a few weeks) would cause a problem with the SCSI path.

I'm gonna send stuff over to the hardware vendor, but I'm hoping that this is just a funky OS thing I can fix myself.

THANKS!!!
17 REPLIES
Steven Schweda
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

What does the SRM console say for "show
device"? Bad hardware (cable, termination,
gizmos, ...) can foul a lot of software.
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Well, one problem is that I can't get to the SRM console to check that stuff without taking the box down, something I can't do while the developers are doing their thing. It might be next week before I have an opportunity to do that, I fear, and I'd like to get this working by the end of the week so I can get a good backup on Sunday before we start doing upgrades next week.
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

BTW: it occurs to me I didn't do the "dsfmgr -R hwid #" on the tape drives before hwmgr -delete, which might be causing some OS-level sadness. I'm not sure if a reboot will clear that up, or what.
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Could I put my stuff back with the hwmgr -add command? The manpage for it and internal help stuff is almost completely unhelpful, sadly. I'm pretty sure I know the bus, target, and lun of the drives, I just can't figure out if there's any way to translate that into something hwmgr understands. Argh.
Vladimir Fabecic
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Post output of:
# scu show edt
In vino veritas, in VMS cluster
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

CAM Equipment Device Table (EDT) Information:

Bus/Target/Lun Device Type ANSI Vendor ID Product ID Revision N/W
-------------- ----------- ------ --------- ---------------- -------- ---
0 0 0 RAID SCSI-2 DEC HSG80CCL V87F W
0 0 1 Direct SCSI-2 DEC HSG80 V87F W
0 0 2 Direct SCSI-2 DEC HSG80 V87F W
0 0 3 Direct SCSI-2 DEC HSG80 V87F W
0 0 101 Direct SCSI-2 DEC HSG80 V87F W
0 0 102 Direct SCSI-2 DEC HSG80 V87F W
0 0 103 Direct SCSI-2 DEC HSG80 V87F W
0 0 104 Direct SCSI-2 DEC HSG80 V87F W
0 0 105 Direct SCSI-2 DEC HSG80 V87F W
0 0 106 Direct SCSI-2 DEC HSG80 V87F W
0 0 107 Direct SCSI-2 DEC HSG80 V87F W
0 0 108 Direct SCSI-2 DEC HSG80 V87F W
0 0 109 Direct SCSI-2 DEC HSG80 V87F W
0 1 0 RAID SCSI-2 DEC HSG80CCL V87F W
1 0 0 Changer SCSI-2 DEC TL800 (C) DEC 0525 W
2 0 0 RAID SCSI-2 DEC HSG80CCL V87F W
2 1 0 RAID SCSI-2 DEC HSG80CCL V87F W
2 1 1 Direct SCSI-2 DEC HSG80 V87F W
2 1 2 Direct SCSI-2 DEC HSG80 V87F W
2 1 3 Direct SCSI-2 DEC HSG80 V87F W
2 1 101 Direct SCSI-2 DEC HSG80 V87F W
2 1 102 Direct SCSI-2 DEC HSG80 V87F W
2 1 103 Direct SCSI-2 DEC HSG80 V87F W
2 1 104 Direct SCSI-2 DEC HSG80 V87F W
2 1 105 Direct SCSI-2 DEC HSG80 V87F W
2 1 106 Direct SCSI-2 DEC HSG80 V87F W
2 1 107 Direct SCSI-2 DEC HSG80 V87F W
2 1 108 Direct SCSI-2 DEC HSG80 V87F W
2 1 109 Direct SCSI-2 DEC HSG80 V87F W
3 5 0 Sequential SCSI-2 COMPAQ SuperDLT1 4B4B W
4 4 0 CD-ROM SCSI-2 DEC RRD47 (C) DEC 1206 N
Vladimir Fabecic
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Thanks Matt
And now please do the following and send output:
# scu scan edt
# scu show edt
# hwmgr -show scsi
# hwmgr -view device
In vino veritas, in VMS cluster
Khairy
Esteemed Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

hi matt,

cd to /dev and look for rmt or tape entries. I've encounter almost similar problem a long time ago but it may not relate to yours but i think it worth to give a thought.

If the old entries are still there (tape0, tape1 etc), perform 'file' command to determine whether its still accessible.

# file /dev/tape1

If it still there, it will show tape drive info like the old days of tru64 v4. If it doesn't show anything and not accessible, it may relate to bad terminator or internal scsi bus in the library itself. I'm no expert on this but this is based on what I encounter long time ago where backup hangs and scu show edt doesn't show any tape drives.

To futher isolate the problem, if you have unused scsi adapter or any other server you could use, try attach the library to them. And scan it. for HPux the command would be :

# ioscan -fnC tape

for tru64 version 4.0F, `scu scan edt` .

If the tape drives are detectable, that mean there is nothing wrong with the library.

good luck!
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Hey guys; all the tape0 and tape1 stuff is gone, I'm assuming as a result of the hwmgr -delete I ran.

If I run a scu scan edt, it hangs after:

Scanning all available buses, please be patient...

Here's the hwmgr -view device, though:

HWID: Device Name Mfg Model Location
------------------------------------------------------------------------------
6: /dev/dmapi/dmapi
7: /dev/scp_scsi
8: /dev/kevm
40: /dev/disk/floppy0c 3.5in floppy fdi0-unit-0
48: /dev/cport/scp0 SWXCR xcr0
49: /dev/disk/dsk0c SWXCR ctlr-0-unit-0
50: /dev/disk/dsk1c SWXCR ctlr-0-unit-1
59: /dev/disk/dsk3c DEC HSG80 bus-0-targ-0-lun-2
60: /dev/disk/dsk4c DEC HSG80 bus-0-targ-0-lun-3
61: /dev/disk/dsk5c DEC HSG80 bus-0-targ-0-lun-101
62: /dev/disk/dsk6c DEC HSG80 bus-0-targ-0-lun-102
63: /dev/disk/dsk7c DEC HSG80 bus-0-targ-0-lun-103
64: /dev/disk/dsk8c DEC HSG80 bus-0-targ-0-lun-104
65: /dev/disk/dsk9c DEC HSG80 bus-0-targ-0-lun-105
66: /dev/disk/dsk10c DEC HSG80 bus-0-targ-0-lun-106
67: /dev/disk/dsk11c DEC HSG80 bus-0-targ-0-lun-107
68: /dev/disk/dsk12c DEC HSG80 bus-0-targ-0-lun-108
69: /dev/disk/dsk13c DEC HSG80 bus-0-targ-0-lun-109
70: /dev/disk/cdrom0c DEC RRD47 (C) DEC bus-4-targ-4-lun-0
74: /dev/ntape/tape2 COMPAQ SuperDLT1 bus-3-targ-5-lun-0
75: /dev/cport/scp1 HSG80CCL bus-0-targ-0-lun-0
76: /dev/random
77: /dev/urandom
88: /dev/disk/dsk2c DEC HSG80 bus-0-targ-0-lun-1
95: /dev/changer/mc0 DEC TL800 (C) DEC bus-1-targ-0-lun-0
Vladimir Fabecic
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Now try:
# scu scan edt bus 3
# scu show edt
and post output
In vino veritas, in VMS cluster
Vladimir Fabecic
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

You may have hardware problem with tape device on BUS 1 with SCSI ID 4.
So that may be the reason of your problem.
System can still see tape drive in BUS 3 SCSI ID 5.
What kernel parameters did you change (also do not think it had something to do with tape problem)?
First I would turn off the machine, and disconnect device on BUS 1 ID 4.
Then I would check if there were some software problems.
In vino veritas, in VMS cluster
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

CAM Equipment Device Table (EDT) Information:

Bus/Target/Lun Device Type ANSI Vendor ID Product ID Revision N/W
-------------- ----------- ------ --------- ---------------- -------- ---
0 0 0 RAID SCSI-2 DEC HSG80CCL V87F W
0 0 1 Direct SCSI-2 DEC HSG80 V87F W
0 0 2 Direct SCSI-2 DEC HSG80 V87F W
0 0 3 Direct SCSI-2 DEC HSG80 V87F W
0 0 101 Direct SCSI-2 DEC HSG80 V87F W
0 0 102 Direct SCSI-2 DEC HSG80 V87F W
0 0 103 Direct SCSI-2 DEC HSG80 V87F W
0 0 104 Direct SCSI-2 DEC HSG80 V87F W
0 0 105 Direct SCSI-2 DEC HSG80 V87F W
0 0 106 Direct SCSI-2 DEC HSG80 V87F W
0 0 107 Direct SCSI-2 DEC HSG80 V87F W
0 0 108 Direct SCSI-2 DEC HSG80 V87F W
0 0 109 Direct SCSI-2 DEC HSG80 V87F W
0 1 0 RAID SCSI-2 DEC HSG80CCL V87F W
1 0 0 Changer SCSI-2 DEC TL800 (C) DEC 0525 W
2 0 0 RAID SCSI-2 DEC HSG80CCL V87F W
2 1 0 RAID SCSI-2 DEC HSG80CCL V87F W
2 1 1 Direct SCSI-2 DEC HSG80 V87F W
2 1 2 Direct SCSI-2 DEC HSG80 V87F W
2 1 3 Direct SCSI-2 DEC HSG80 V87F W
2 1 101 Direct SCSI-2 DEC HSG80 V87F W
2 1 102 Direct SCSI-2 DEC HSG80 V87F W
2 1 103 Direct SCSI-2 DEC HSG80 V87F W
2 1 104 Direct SCSI-2 DEC HSG80 V87F W
2 1 105 Direct SCSI-2 DEC HSG80 V87F W
2 1 106 Direct SCSI-2 DEC HSG80 V87F W
2 1 107 Direct SCSI-2 DEC HSG80 V87F W
2 1 108 Direct SCSI-2 DEC HSG80 V87F W
2 1 109 Direct SCSI-2 DEC HSG80 V87F W
3 5 0 Sequential SCSI-2 COMPAQ SuperDLT1 4B4B W
4 4 0 CD-ROM SCSI-2 DEC RRD47 (C) DEC 1206 N

I should note that the SuperDLT on bus 3 is a totally separate thing from from the TL891 library. The library drives are on the same bus as the changer, bus 1. I did run a scan on bus 1, but it hangs.
Vladimir Fabecic
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

So you did run a scan on bus 1, but it hangs.
This indicates there is some problem with something on that bus.
Turn off the library, change SCSI ID on other tape device and do:
# hwmgr -scan scsi
In vino veritas, in VMS cluster
DCBrown
Frequent Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Sounds like a hardware issue. Disconnect ALL three devices from bus 1. Run a hwmgr scan component... it should now finish unless there is a scsi bus problem (i.e. termination or something else nasty like that).

If it finishes, attach changer, repeat.
If it finishes, remove changer, attach library box, repeat.

Are there any events in the binary.errlog? If so, what? Should be lots of cam errors and these should give a good indication of what's going on.

Bud
Pieter 't Hart
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

I would go to the library and from the front panel check it's settings.
If for some reason the changer and a drive have returned to a default SCSI-ID (both 0(!)) you get strange responses on this bus.
If so reconfigure the ID's from the front panel.
If you have multiple drives you might disconnect one of them from the bus.

As allready suggested check cables and terminator (replace if you have spare)
you mention a tlx drive so be sure the replacement terminator is a LVD-type.
Rob Leadbeater
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Hi Matt,

If you look at the output of a "hwmgr show scsi -full" do you see any null devices on the relevant SCSI bus ?

If so, you may well have to get rid of these with "dsfmgr -R hwid ..." and "hwmgr delete component -id ..."

Hope this helps,

Regards,

Rob
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Hey guys; turned out we had a number of hardware problems. The root cause was PROBABLY a scsi cable that's either too long or faulty, or both; the LVD SCSI controller is also bad, although that might be because I had to jiggle the bejebus out of it to replace the cable. :) The server had a spare SCSI card, so we're attached to that with a new, shorter cable, and the drives are there! Thanks for all your help!