1827892 Members
1660 Online
109969 Solutions
New Discussion

VMS Failover

 
Kelly Phillipps
New Member

VMS Failover

I have two non clustered nodes with a TCP cluster alias address that I would like to fail between. I am forced to have users connect to an address so I must have an address not a hostname bounce between two systems. How can I do this? I see the address defined on both systems when I issue a show interface /cluster and when I issue an arp -a command I see the MAC of one of the servers but it does not fail over when the host tied to the MAC address dies unless I change the interface with an ifconfig command.

Thanks,
Kelly
12 REPLIES 12
Bart Zorn_1
Trusted Contributor

Re: VMS Failover

Hi Kelly,

I am afraid that you are expecting a little bit too much of OpenVMS.

You cannot expect an IP cluster alias to work between non clustered systems.

That said, I think you mean to say that your systems ARE clustered. But we would need more information. Which versions of OpenVMS and TCP/IP are you using? Can you show how you configured the cluster alias?

Regards,

Bart Zorn
Volker Halle
Honored Contributor

Re: VMS Failover

Kelly,

as Bart said, the TCPIP cluster is based on the OpenVMS cluster functionality.

You may be able to implement your own 'poor mans alias' by writing some background batch job, which will PING the other node and if the PING fails, activate the 'cluster' alias on the local node. But this will not be as perfect as a real cluster alias.

TCPIP V5.4 has introduced failSAFE IP, which replaces the traditional TCPIP cluster alias, but to allow failover between 2 nodes, those nodes also need to be clustered.

Volker.
Willem Grooters
Honored Contributor

Re: VMS Failover

TCP cluster alias is only feasable in a cluster, as stated by others. However, I don't see real use for it at all, since TCPIP (neither the current V4, not the next generation (V6)) is capable of handling the cluster-concept.
You will need to rely on external (= non-VMS-like) solutions, involving METRIC service and an (external) DNS server, and a name to name the combination of those unrelated machines.
The DNS server will need to allow periodic updates to translate the clusterNAME to a specific machine, by his IP address, based on the METRIC outcome, by time or whatever scheme you wish to use.
The advantage is the machines do not need to be clustered - it's the "simple", *x-like solution. I don't say "bad" - it seems to work in that environment, but it's defenitely NOT the VMS-way of doing things, where synchronisation isn't even an issue.

When TCPIP would have had the cluster awareness of DECNet, we wouldn't require this. One of those missed opportunities :-(

Willem
Willem Grooters
OpenVMS Developer & System Manager
Ian Miller.
Honored Contributor

Re: VMS Failover

you need a third system to monitor availability (as is done for the METRIC, load broker stuff) to make this work properly. I guess you can have the standby node connect to the live node and when the connection breaks modify the config with a ifconfig command.
____________________
Purely Personal Opinion
Mobeen_1
Esteemed Contributor

Re: VMS Failover

Kelly,
I think this should probably help you. Years back i was confronted with similar issue and this is what was the solution we adopted

1. An application that was very critical
required it to be tied to an IP address
rather than a host name.

2. I had 2 stand alone servers, one was live
and the other backup.

3. In efforts to fail over this application
or switch the application from Server A
to Server B, what we did was define
a secondary IP address

To give you little more details, each of my servers Serv A and Serv B had their own IP addresses. We had this application that need an IP address point to a secondary IP address on Sever A (now server A has 2 IPs). Essentially when failover of application was required from Server A to Server B, we would shutdown the appplication on Server A, remove the secondary IP from Server A and add the same IP to Server B as Secondary and start the application on Server B.

Let me know if any clarifications are needed

regards
Mobeen
Kelly Phillipps
New Member

Re: VMS Failover

What a great responce! I appreciate the information.

I took the cluster apart recently and now have two totally independent nodes. I have written a C program that accesses the application health on these two nodes and initiates a fail over. I am using decent to do some of the inter-node communications (using TCP to see if TCP is ok did not make as much sense) I was looking for tricks to play so I could get a cluster alias working and hoped that a TCP service would alias without a VMS cluster. For those wondering, I un-clustered because a VMS cluster is more reliable than a stand alone system but sometimes there are issues that cannot be totally eliminated which affect the whole cluster. Most applications cannot be completely distributed and so clustering is best but mine can be completely redundant when I get a redirector working to route the requests to the systems that work.
Thanks,
Kelly
Stanley F Quayle
Valued Contributor

Re: VMS Failover

Hmmm. Decnet is "decent". I knew that all along!

Seriously, I've been involved with doing redundant VMS systems that aren't clustered -- to avoid the cluster transition time. The time can be made smaller, but not zero. There are some applications that can't take that. Process automation, for one.

It's a lot harder than you think.

What happens if DECnet connectivity between the nodes fails? Who "wins" the race condition, or do both become "primary"? And how do you force one mode to become the primary so you can do maintenance on the other?

I've seen this handled with a special Q-bus arbitration card. I've also seen a pair of programmable controllers connected via serial ports.

In all cases, it seems to need a third system to be the "tiebreaker". Of course, if that node fails, then what?

http://www.stanq.com/charon-vax.html
Keith Parris
Trusted Contributor

Re: VMS Failover

> I've been involved with doing redundant VMS systems that aren't clustered -- to avoid the cluster transition time. The time can be made smaller, but not zero. There are some applications that can't take that. Process automation, for one. <

I've started discussion of this issue in another thread, entitled "Real-tine process control in an OpenVMS Cluster environment".
Ian Miller.
Honored Contributor

Re: VMS Failover

"In all cases, it seems to need a third system to be the "tiebreaker". Of course, if that node fails, then what?"

You have to have the third node (which does not have to be a VMS system - it can be other hardware like Stanley said). If that node fails then the application stays how it is and you lose the ability to automagically switch to standby.
____________________
Purely Personal Opinion
Kelly Phillipps
New Member

Re: VMS Failover

The application is serving a TCP request and either system can respond which makes this an easier task than some of the challanges other have listed here. The wathdog program I have created runs on both hosts and watches both hosts so there are four examinations being done. As long as any one node is providing a good responce the system does not take action it only notifies. If the system that was responding goes bad then both systems will attempt to remove the alias on the formerly good system and define it on the fail to system. Then the "bad system will be reset (even rebooted if it is called for). Since TCP may be the problem I thought it would not be good to use TCP to monitor TCP. In my tests there seems to be little ill effect (at least the customers don't see it) when both systems have the alias defined it seems to go to the last one who grabbed it. I think I am going to add a heartbeat on the serial interface and call it good at this point. The next generation is a third system (as suggested in this thread) that will be a forwarder which examines the responces and sends requests to both boxes or omits a system if it fails to return a proper responce. I than all for the great responces. I am still pondering them.

Kelly
Wim Van den Wyngaert
Honored Contributor

Re: VMS Failover

The article Keith is talking about (Gogle found it, the HP search didn't ...)
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=861945

Wim
Wim
Stanley F Quayle
Valued Contributor

Re: VMS Failover

How about assigning some points to the responses?

Pointer to help on points:
http://forums1.itrc.hp.com/service/forums/helptips.do?#33

Thanks in advance.
http://www.stanq.com/charon-vax.html