Operating System - OpenVMS
1822448 Members
2430 Online
109642 Solutions
New Discussion юеВ

Re: MessageQ automatic failover

 
Paul Jerrom
Valued Contributor

MessageQ automatic failover

Hi all,
Greetings from sunny NZ.
I am currently migrating from a single node alpha to an integrity cluster.
I am wanting to set up MessageQ so that the groups will fail over from one node to another seemlessly. According to the doco all I need to do is set the DMQ$GROUP_SYNCHRONIZE flag in the startup procedure (it is set by default). Thereafter, the first node to start will be the primary, and the second will wait until the lock is released by the primary, before it takes over as primary. Sounds great in theory!! In practise, each node attempts to become the primary; the second one up complains about the log files being locked.
Has anyone got this to work on integrity? (2xRX2620) Has anyone got this to work on Alpha? Am I missing something?
Many thanks,

Peejay
[See you in Nashua!!]
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
26 REPLIES 26
Wim Van den Wyngaert
Honored Contributor

Re: MessageQ automatic failover

Whish I was there ... (NZ)

Have no knowledge of MQ.

You can try to start the stuff with a log file on a concealed device that is node specific (node specific disk or directory).

Wim
Wim
Thomas Ritter
Respected Contributor

Re: MessageQ automatic failover

Paul, we are a big user of BEA MessageQ V 5.0 RP28 on VMS 7.3-2 on a 4 node VMS cluster. What are you going to achieve by failing one DMQ group from one node to another ? We run seperate DMQ groups on each node with a unique name and use decnet for cross group communications. Two of our point to point inferfaces use a dedicated DMQ group. In the event of failure like a node crash, we have procedures which moved that group to another node and point the interfaces to the new node. Nothing automatic about it.
Are you using journaling for all your transactions (DQF) ?

Hans Adriaanse
Advisor

Re: MessageQ automatic failover

Hi Paul,

We had the same problem. We solved it by specifying the logfile during startup of the BMQ bus/group. Sample code:

$ BUSGROUP="0500_00001"
$ LOGDIR = BUSGROUP + "_" + F$GETSYI("NODENAME")
$ IF F$SEARCH("DMQ$DISK:[''F$TRNLNM("DMQ$ROOT")'.LOG]''LOGDIR'.DIR") .EQS. "" -
THEN CREATE/DIRECTORY DMQ$DISK:['F$TRNLNM("DMQ$ROOT")'.LOG.'LOGDIR']
$ @DMQ$EXE:DMQ$STARTUP 'BUS' 'GROUP' "" Y "" DMQ$DISK:[DMQ$V50.USER.'BUSGROUP'] -
DMQ$DISK:['F$TRNLNM("DMQ$ROOT")'.LOG.'LOGDIR']

I hope this is clear. It should solve your problem.

Good luck,
Hans
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Thanks guys. The purpose for failover is so that in the event of crash or maintenance the messages still arrive at the applications. I know I can have a manual failover between the nodes, but I really wanted it to be automatic ├в as it is advertised as a feature of the middleware.
Hans, does this actually solve the problem? It├в s just that BOTH my nodes currently think they are the primary, when only the first one up should. Are you using automatic failover?
BTW, I would ask BEA, but because I pay my maintenance dollars via HP they refuse to provide any support, even access to their website. Go figure├в ┬ж

[Wym, we have airports here so come on over. But unfortunately very little VMS├в ┬ж]

Cheers,
Peejay
[See you in Nashua!!]
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Hans Adriaanse
Advisor

Re: MessageQ automatic failover

Hi Paul,

In our case it solves the problem. That both nodes think they are primary is strange. The mechanism they use is a clusterwide lock. The lock is named DMQ$PRIMARY_GRP_bbbb_ggggg where bbbn is the bus en ggggg is the group (See http://edocs.bea.com/tuxedo/msgq/vmsconf/chap13.htm for the documentation). This lock is requested exclusive, so it is not possible that 2 nodes get the same lock. The lock is per group/bus, so it is possible to have an active bus/group on one node and another active bus/group on another node.
With anal/system you must be able to find these locks and see what the status is. This should tell you where the primary is running.
At our site it works fine.

Good luck,
Hans
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Thanks Hans.
Yes, it is strange, that's why I was wondering if anyone had managed to get this to work on Itanium - especially as there are several new 'features' to the Itanium version (like menus not working etc).
As I am in Nashua this week I won't be able to investiate the locks, but will do on my return. I'll keep you posted!
Cheers,

PJ
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Jeffrey Goodwin
Frequent Advisor

Re: MessageQ automatic failover

Paul,

We use failover groups extensively on Alpha without any issues. Your explanation of how it works in your original post is correct.

You mention the complaint about the logfiles. There is an issue with VAX/Alpha MessageQ with logfiles and failover groups. If you reboot a cluster member more than four times, the logfile open on the active group cannot be purged and the group doesn't start.

Here's what we did to get around the issue:
************
File SYS03:[DMQ$V50.EXE]DMQ$START_SERVER.COM;2
144 $!FSC if f$search("''log_file'") .nes. "" then -
145 $!FSC purge/keep=4/nolog 'log_file'
146 $
******
File SYS03:[DMQ$V50.EXE]DMQ$START_SERVER.COM;1
142 $ if f$search("''log_file'") .nes. "" then -
143 purge/keep=4/nolog 'log_file'
144 $
************

-Jeff

Jeffrey Goodwin
Frequent Advisor

Re: MessageQ automatic failover

Paul,

I had another thought on your problem. We've had an issue where cluster transitions on nodes not running DMQ groups would trigger a secondary group into an attempt to become primary.

We'd see these messages in the DMQ log:

DmQ E 05:31.0 Checkpoint open failed -- Group already running on another node.
DmQ E 05:31.0 %RMS-E-FLK, file currently locked by another user

We'd sent this to HP, but the problem was not repeatable and very rare. We never got a fix. It's possible the port to Itanium has further exposed the issue.

-Jeff
Klaes-G├╢ran Carlsson
Frequent Advisor

Re: MessageQ automatic failover

Hi

Not an answer on your question, but...

One of my customers has splitted the DMQ groups. 3 groups running *only* CLS on 3 different vms-nodes (1 cls-group/node). These CLS groups are configured with automatic failover. Then they have some "normal" dmq groups with multireader queues. No failover on these groups. This make sure the clients can always send messages, one of the cls-groups will always receive it and confirm it, no matter if one of the cls-nodes are down.

/Klaes-G├Г┬╢ran
Everyone talking about Nashua, whats that? Just a joke or?
Volker Halle
Honored Contributor

Re: MessageQ automatic failover

re: Klaes-Goeran,

Nashua is not a joke, it's the home of OpenVMS Engineering !

Last week, about 200 people from all over the world visited the OpenVMS Bootcamp, where lots of the OpenVMS engineers gave presentations about current and future features of our beloved operating system.

Volker.
Jan van den Ende
Honored Contributor

Re: MessageQ automatic failover

Klaes-Goeran:

to give (just) an indication: many Engeneers were actively presenting, but if you came up with a specific issue, EVERY Engeneer was available at the next meal, to sit at your table and go just to any depth that suits you (I did just that a previous time).

5 of the top-10 VMS ITRC members were there.

Nearly everyone from OenVMS.ORG was there.

A LOT of what is coming soon, AND OF WHAT is considered, was presented (but, like all present, I also signed the Non-Disclosure Agreement, so I will have to wait with those jewels till Engeneering brings them out).

Got you interested?
Same week next year is the next occasion.
Maybe see you there and then?
Be prepared for an exhausting, but beautiful week!

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Hi gents,

I've attached a word document which shows the output from both nodes starting the same group/bus combination. I've highlighted where they both think they are the primary.
There are subsequently no DMQ processes on the second node, so when I stop DMQ on the first node there is no failover.
Jeffrey, looks like I have the same problem as you, except I can repeat it at will!! Let me know if you hear from HP, please.

Re. Nashua, if you are contemplating doing ANY VMS related training next year, I would thoroughly recommend the Nashua Bootcamp - it ain't basic, it is darn hard work, and it costs some (especially coming from this side of the world) but the value in learning from the actual VMS engineers, from meeting others using VMS around the world, from meeting a supping with ITRC contributors, and being part of the all-around camaradarie that is VMS makes it the highlight of my professional career to date.
So there.
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Hans Adriaanse
Advisor

Re: MessageQ automatic failover

Hi Paul,

I am wondering what the situation of the lock is in this case. The primary group should own the BMQ_... lock exclusively and the second node should be waiting for that lock. This can be seen in SDA. If that is correct, this looks like a bug in DMQ.

Good luck,
Hans
PS today starts my holiday, so no responses from me for the next 2 weeks.
Jeffrey Goodwin
Frequent Advisor

Re: MessageQ automatic failover


I had my DMQ expert look at your logs. We doubt it's a problem with your DMQ$INIT.TXT, but you might want to post it anyway.

One thought we had was that you need to be using the same DMQ installation and not DMQ installations on separate disks. Are you on one disk?

BTW, this is what the logs should look like:

COM_SERVER 24-DEC-2005 12:12:57.12 I Begin initial configuration database load
EVENT_LOGGER 24-DEC-2005 12:12:57.30 I Event Logger Initialized
DMQ_LOADER 24-DEC-2005 12:12:57.87 I Loader is running
DMQ_LOADER 24-DEC-2005 12:12:57.92 I Loader is exiting
COM_SERVER 24-DEC-2005 12:12:57.92 I Group 1205 (G1205) Initialized
COM_SERVER 24-DEC-2005 12:12:57.92 I This group is waiting to become the primary

-Jeff
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Hi again,

I've attached several files here:
1) The DMQ$INIT.TXT file
2) A zip file of the output from a SDA>SHOW LOCK/BRIEF - I can see some DMQ locks, but not the one mentioned - any ideas?
3) The output from the startup of the COM server, so you can see that the GROUP_SYNCHRONIZE logical is defined

In case it helps, the idea is that there will be two groups on this cluster, group 1 for application PPL3 and group 5 for the application LAB2. I want each to be able to run on either node. The cluster is actually 3 nodes; the itanium servers LOKI and HELA which are rx2620s, and an Alpha ds10 which is used only as a tie-breaker for votes (the cluster is in 3 locations on the site, the virtual disks are all volume shadowed , the members are 1 in each location housing the rx2620s.) DMQ does not run nor is it loaded on the ds10.
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Attachment 2 - the locks.
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Attachment 3 - the com_server startup log
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Me again.

I find that on my standalone Alpha server the lock dmq$primary_grp_bbbb_ggggg is, indeed, granted. But on my Integrity cluster servers it is not.
If anyone out there in VMS land is running DMQ on Integrity, even on a standalone node, can you see whether this lock exists on your server please?

[$ assign/user qq.qq sys$output
$ anal/sys
SDA> show lock/brief
SDA> exit
$ search qq.qq dmq$pr ]

Methinks either I'm configured wrong, or DMQ has a flaw...
Many thanks,

PJ
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Thomas Ritter
Respected Contributor

Re: MessageQ automatic failover

Paul,
You wrote
"BTW, I would ask BEA, but because I pay my maintenance dollars via HP they refuse to provide any support, even access to their website. Go figure "

You support contract sounds identical to ours. You are able to log a call with HP and HP can then contact BEA, if need be.

Regards
Tom
Jeffrey Goodwin
Frequent Advisor

Re: MessageQ automatic failover


Paul,

- We use DECnet as the transport for most of our failover groups. For the TCPIP failover groups we do have, we use a different endpoint for each node. You have the same endpoint for both nodes in each group.

- We have MessageQ support through HP and we've found their support to be acceptable. I don't believe that MessageQ for Itanium is released yet. If whoever got it for you can't help you, I suggest you contact Gene.Spadi@hp.com.

- You never really stated whether both nodes access the DMQ installation from the same disk.

-Jeff/Anthony
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Thanks Jeff, i'll play with the end points. I think we had a standard of the endpoint being a function of the bus/group number, I'll see what happens when I change them.
Yes, both nodes use the same DMQ disk. The disk is a shadowset made up of a member local to each server.
I downloaded the itanium software from BEA's public website, but there's no info with it. I have been in touch with John Egolf, who I believe is the HP/BEA relationship manager, but no joy as yet.

Thanks for your help.
PJ
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Hi all,
Just in case anyone else wants to use DMQ on Itanium with automatic failover, the issue was a programming error - the IA64 port simply didn't set the lock it was supposed to. So there is a patch available now, if you find yourself in the same boat.
Big thanks to Justin and the engineering team for finding the issue and managing the resolution.
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Jeffrey Goodwin
Frequent Advisor

Re: MessageQ automatic failover


Great news Paul. Where can one get the patch?

-Jeff
Paul Jerrom
Valued Contributor

Re: MessageQ automatic failover

Jeff (and anyone else, for that matter),

The patch is mq-v501-rp4, I got it from HP support but I don't think it is on the ftp site anymore. Let me have an email address, I'll flick it to you. [Like previous rolling patches, this is relatively small, as it patches the kit you already have. So you'll need the main kit before you can apply this patch to it.]
My email is paul.jerrom@bluescopesteel.com

Have fun.
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?