Operating System - Tru64 Unix
1753435 Members
4751 Online
108794 Solutions
New Discussion юеВ

Re: Tru64 on GS60E: my DLT drives have disappeared!

 
Matt Hearn
Regular Advisor

Tru64 on GS60E: my DLT drives have disappeared!

I've got a real head-scratcher. The good news is that I have PLENTY of symptoms! Boy do I have symptoms.

It all started last week, I did a kernel parameter change and rebooted the box. Came up fine, but it never occurred to me to look and make sure the tape drives (2 DLTs in a TLX891 library) were still there, since why on earth would they disappear?

Fast forward to Sunday morning, our weekly backup (this is just a development server) fails when "robot move slot 0 drive 0" fails. I check it out, and run a hwmgr -show scsi and discover that the changer seems okay, but the drives have lost their device names (they were tape0 and tape1; now they're just nothing). I'd like to print a copy of the original hwmgr but it's long gone, sadly.

I'd run into a similar situation when we'd had to replace the entire library to fix a robot problem; I ended up deleting the drives, running hwmgr -scan component, and then using dsfmgr to move the device files to where they oughta be. So I said, no problem, I'll just delete 'em again, which I did.

Then I ran the hwmgr -scan scsi, and it just hung. Never came back. Eventually I left the data center; I assume it must have died, but I have no idea if it ever returned anything.

Anyway, today I finally get back to fiddling with it and trying to bring the drives back to life. A little research reveals I should have run "hwmgr -scan component," so I do that and monitor EVM for the success response, which never comes. I do a ps to see if the scan's still running, but it quit. I'm thinking maybe it wrote something to my mail, but unfortunately the mail file filled up, which is when I noticed the SECOND symptom: SCSI CAM errors, whatever those are, and tons of 'em, all looking like this:

Formatted Message:
SCSI event

Event Data Items:
Event Name : sys.unix.binlog.hw.scsi
Priority : 700
PID : 614
PPID : 1
Event Id : 110080
Timestamp : 25-Mar-2008 22:00:03
Host IP address : 170.212.26.37
Host Name : pdsdev
User Name : root
Format : SCSI event
Reference : cat:evmexp.cat:300

Variable Items:
subid_class (INT32) = 199
subid_num (INT32) = 1
subid_unit_num (INT32) = 96
subid_type (INT32) = 34
binlog_event (OPAQUE) = [OPAQUE VALUE: 1096 bytes]

============================ Translation =============================
Sequence number of error: 2593
Time of error entry: 25-Mar-2008 22:00:03
Host name: pdsdev

SCSI CAM ERROR PACKET
SCSI device class: DEC SIM
Bus Number: 1
Target number: 4
Lun Number: 0

Bus 1 is definitely where my drives and changer are, so this makes sense. No idea why they're throwing out errors, but they're coming fast and furious.

Then I go into scu, to see what I can see, and I run "scan edt," and it says:
Scanning all available buses, please be patient...

And then just sits there. I'm not a patient man by any means, but it's been almost 2 hours, I don't think it's coming back any time soon.

Any ideas? I'm wondering if I can just reboot the box to get it to come back to life, but that may not happen this week because the developers are working fast and furious. Even then, I'm not sure if a reboot will do a lick of good, since I'm guessing last week's reboot screwed it up. I find it hard to believe the kernel parameters (just a few things to line up with Oracle 10g, which we want to install in a few weeks) would cause a problem with the SCSI path.

I'm gonna send stuff over to the hardware vendor, but I'm hoping that this is just a funky OS thing I can fix myself.

THANKS!!!
17 REPLIES 17
Steven Schweda
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

What does the SRM console say for "show
device"? Bad hardware (cable, termination,
gizmos, ...) can foul a lot of software.
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Well, one problem is that I can't get to the SRM console to check that stuff without taking the box down, something I can't do while the developers are doing their thing. It might be next week before I have an opportunity to do that, I fear, and I'd like to get this working by the end of the week so I can get a good backup on Sunday before we start doing upgrades next week.
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

BTW: it occurs to me I didn't do the "dsfmgr -R hwid #" on the tape drives before hwmgr -delete, which might be causing some OS-level sadness. I'm not sure if a reboot will clear that up, or what.
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Could I put my stuff back with the hwmgr -add command? The manpage for it and internal help stuff is almost completely unhelpful, sadly. I'm pretty sure I know the bus, target, and lun of the drives, I just can't figure out if there's any way to translate that into something hwmgr understands. Argh.
Vladimir Fabecic
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Post output of:
# scu show edt
In vino veritas, in VMS cluster
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

CAM Equipment Device Table (EDT) Information:

Bus/Target/Lun Device Type ANSI Vendor ID Product ID Revision N/W
-------------- ----------- ------ --------- ---------------- -------- ---
0 0 0 RAID SCSI-2 DEC HSG80CCL V87F W
0 0 1 Direct SCSI-2 DEC HSG80 V87F W
0 0 2 Direct SCSI-2 DEC HSG80 V87F W
0 0 3 Direct SCSI-2 DEC HSG80 V87F W
0 0 101 Direct SCSI-2 DEC HSG80 V87F W
0 0 102 Direct SCSI-2 DEC HSG80 V87F W
0 0 103 Direct SCSI-2 DEC HSG80 V87F W
0 0 104 Direct SCSI-2 DEC HSG80 V87F W
0 0 105 Direct SCSI-2 DEC HSG80 V87F W
0 0 106 Direct SCSI-2 DEC HSG80 V87F W
0 0 107 Direct SCSI-2 DEC HSG80 V87F W
0 0 108 Direct SCSI-2 DEC HSG80 V87F W
0 0 109 Direct SCSI-2 DEC HSG80 V87F W
0 1 0 RAID SCSI-2 DEC HSG80CCL V87F W
1 0 0 Changer SCSI-2 DEC TL800 (C) DEC 0525 W
2 0 0 RAID SCSI-2 DEC HSG80CCL V87F W
2 1 0 RAID SCSI-2 DEC HSG80CCL V87F W
2 1 1 Direct SCSI-2 DEC HSG80 V87F W
2 1 2 Direct SCSI-2 DEC HSG80 V87F W
2 1 3 Direct SCSI-2 DEC HSG80 V87F W
2 1 101 Direct SCSI-2 DEC HSG80 V87F W
2 1 102 Direct SCSI-2 DEC HSG80 V87F W
2 1 103 Direct SCSI-2 DEC HSG80 V87F W
2 1 104 Direct SCSI-2 DEC HSG80 V87F W
2 1 105 Direct SCSI-2 DEC HSG80 V87F W
2 1 106 Direct SCSI-2 DEC HSG80 V87F W
2 1 107 Direct SCSI-2 DEC HSG80 V87F W
2 1 108 Direct SCSI-2 DEC HSG80 V87F W
2 1 109 Direct SCSI-2 DEC HSG80 V87F W
3 5 0 Sequential SCSI-2 COMPAQ SuperDLT1 4B4B W
4 4 0 CD-ROM SCSI-2 DEC RRD47 (C) DEC 1206 N
Vladimir Fabecic
Honored Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Thanks Matt
And now please do the following and send output:
# scu scan edt
# scu show edt
# hwmgr -show scsi
# hwmgr -view device
In vino veritas, in VMS cluster
Khairy
Esteemed Contributor

Re: Tru64 on GS60E: my DLT drives have disappeared!

hi matt,

cd to /dev and look for rmt or tape entries. I've encounter almost similar problem a long time ago but it may not relate to yours but i think it worth to give a thought.

If the old entries are still there (tape0, tape1 etc), perform 'file' command to determine whether its still accessible.

# file /dev/tape1

If it still there, it will show tape drive info like the old days of tru64 v4. If it doesn't show anything and not accessible, it may relate to bad terminator or internal scsi bus in the library itself. I'm no expert on this but this is based on what I encounter long time ago where backup hangs and scu show edt doesn't show any tape drives.

To futher isolate the problem, if you have unused scsi adapter or any other server you could use, try attach the library to them. And scan it. for HPux the command would be :

# ioscan -fnC tape

for tru64 version 4.0F, `scu scan edt` .

If the tape drives are detectable, that mean there is nothing wrong with the library.

good luck!
Matt Hearn
Regular Advisor

Re: Tru64 on GS60E: my DLT drives have disappeared!

Hey guys; all the tape0 and tape1 stuff is gone, I'm assuming as a result of the hwmgr -delete I ran.

If I run a scu scan edt, it hangs after:

Scanning all available buses, please be patient...

Here's the hwmgr -view device, though:

HWID: Device Name Mfg Model Location
------------------------------------------------------------------------------
6: /dev/dmapi/dmapi
7: /dev/scp_scsi
8: /dev/kevm
40: /dev/disk/floppy0c 3.5in floppy fdi0-unit-0
48: /dev/cport/scp0 SWXCR xcr0
49: /dev/disk/dsk0c SWXCR ctlr-0-unit-0
50: /dev/disk/dsk1c SWXCR ctlr-0-unit-1
59: /dev/disk/dsk3c DEC HSG80 bus-0-targ-0-lun-2
60: /dev/disk/dsk4c DEC HSG80 bus-0-targ-0-lun-3
61: /dev/disk/dsk5c DEC HSG80 bus-0-targ-0-lun-101
62: /dev/disk/dsk6c DEC HSG80 bus-0-targ-0-lun-102
63: /dev/disk/dsk7c DEC HSG80 bus-0-targ-0-lun-103
64: /dev/disk/dsk8c DEC HSG80 bus-0-targ-0-lun-104
65: /dev/disk/dsk9c DEC HSG80 bus-0-targ-0-lun-105
66: /dev/disk/dsk10c DEC HSG80 bus-0-targ-0-lun-106
67: /dev/disk/dsk11c DEC HSG80 bus-0-targ-0-lun-107
68: /dev/disk/dsk12c DEC HSG80 bus-0-targ-0-lun-108
69: /dev/disk/dsk13c DEC HSG80 bus-0-targ-0-lun-109
70: /dev/disk/cdrom0c DEC RRD47 (C) DEC bus-4-targ-4-lun-0
74: /dev/ntape/tape2 COMPAQ SuperDLT1 bus-3-targ-5-lun-0
75: /dev/cport/scp1 HSG80CCL bus-0-targ-0-lun-0
76: /dev/random
77: /dev/urandom
88: /dev/disk/dsk2c DEC HSG80 bus-0-targ-0-lun-1
95: /dev/changer/mc0 DEC TL800 (C) DEC bus-1-targ-0-lun-0