Operating System - OpenVMS
1752693 Members
5693 Online
108789 Solutions
New Discussion юеВ

Re: Fiber Channel tape drive doesn't work on one node of a cluster only

 
Ziggy Filek
Frequent Advisor

Re: Fiber Channel tape drive doesn't work on one node of a cluster only

Everybody: As I specified in the original question, I did everything by the book: Changed tape drive inside the library, power-cycled the library to make sure the fibre router inside the library is set to automatic discovery, then issued SYSMAN> REPLACE_WWID $2$MGA3 clusterwide. This command was sussessful on nodes B and C, but bombed on A with "Device is active" error. SYS$DEVICES.DAT has beed correctly updated with the new WWID and the drive works just fine on B and C. The WWIDs on A is different than on B and C because the sysman replace_wwid bombed on this node! I logged a call with HP, and this problem has been already escalated to engineering, since indeed it is silly to boot a "never down" machine because of a lousy tape drive. I will keep you posted, since this can happen to anyone.
Jan van den Ende
Honored Contributor

Re: Fiber Channel tape drive doesn't work on one node of a cluster only

@ Rick:

>>>
That forced a full HBV Shadow merge (this was prior to HBMM!) of all 30+ volumes in the middle of the afternoon... I was not popular!
<<<
Oh yeah! Been there, also have those scars! In a police cluster running the callroom this did not win the popularity poll neither :-)

If you ever went to European DECUSes, or the 1999 San Diego one, you may remember me from the Engeneering Panel discussions.
I have brought up the mini-merge missing from SCSI devices every time since SCSI became popular, pointing out the giant step back from DSA devices.
... and I also was the one to ask the audience of that same panel during one of the first Bootcamps to loudly applaud the realisation of HBMM.

But. Ziggy, it need not be a cluster reboot.
A well-planned one-node reboot should be relatively unnoticeable for the users. And the accompanying inconvenience for system management should be part of the job. A good occasion to demonstrate to Management that your job _IS_ important :-)

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Ziggy Filek
Frequent Advisor

Re: Fiber Channel tape drive doesn't work on one node of a cluster only

Jan: "Well planed reboot" would be fine without any applications running. Unfortunately we run applications using Oracle RAC in active-passive mode and the node in question happens to be the "active" one. If you boot it, Oracle is supposed to fail over. Application is also supposed to fail over but our application people are scared to death to induce the failover even though it has been supposedly tested... Real life bites again. Fortunately there WILL be a re-boot in the middle of the night one day next week to install application patches, so my problem will disappear.
By the way, HP people offered me to walk me thru booting DELTA and fixing some UCBs manually, but I said I was not brave enough. It is strange they don't have a fix for this common problem (it happened to me a few times before, but in a non-critical scenario). Let's hope the VMS engineering people come up with a patch, or something.

Brit and Rick: Yeah, we are talking Cerner Millennium here, with the world-famous "High Availability Toolkit".
The Brit
Honored Contributor

Re: Fiber Channel tape drive doesn't work on one node of a cluster only

Ziggy,
I know this shouldnt be necessary, however did you follow the instruction in the "Description" section, i.e. get the wwid from a "io list_wwid" command, and then include the "/WWID=" qualifier. followed by and "IO AUTO"

SYSMAN> io replace_wwid /wwid=02000008:5006-0B00-0029-1683
SYSMAN> io auto

??

(Just trying to rule out everything.)

Dave
Ziggy Filek
Frequent Advisor

Re: Fiber Channel tape drive doesn't work on one node of a cluster only

No, same error message "device is active".
The Brit
Honored Contributor

Re: Fiber Channel tape drive doesn't work on one node of a cluster only

hi again Ziggy.

Just a couple of final questions,

Does the NEW tape device appear in a "IO LIST_WWID" display??

I was thinking that you might be able to create a new "mga" device with the new WWID, possibly by editing it into the SYS$DEVICES.DAT file directly, i.e. put in something like

[Device $2$MGA4]
WWID=

(just throwing out suggestions, but I suspect this will still require a reboot)

I think this is probably the end of my contribution -- the idea bucket is empty.

Wish you luck, and if you do find a solution be sure to post it.

Dave.
Ziggy Filek
Frequent Advisor

Re: Fiber Channel tape drive doesn't work on one node of a cluster only

No, it does not show, because it is already configured in sys$devices.dat! As to your suggestion, I know I could have tried try to side-step the problem by creating a new device by replacing the tape drive, then instedad of running REPLACE WWID, run LIST_WWID on all 3 nodes and then CREATE_WWID $2$MGA4/wwid=... on all three nodes, but that would leave me with now permanently screwed up MGA3, and I would have to change all comand procedures using tape devices etc. In any case my problem disappears tomorrow 4AM, since I'l have a re-boot...
Thanks for your efforts! Ziggy
Ziggy Filek
Frequent Advisor

Re: Fiber Channel tape drive doesn't work on one node of a cluster only

Will have a re-boot tomorrow, so that the problem will be fixed. Thank you everybody for your input. -Ziggy
Tom O'Toole
Respected Contributor

Re: Fiber Channel tape drive doesn't work on one node of a cluster only


Ziggy,

Please let us know of any further results of your escalation, in particualr, why you are getting the "device is active" message. VMS still seems to have a few bugs like this in the handling of tape devices, which are very annoying because their aren't satisfactory workarounds.
Can you imagine if we used PCs to manage our enterprise systems? ... oops.