- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Cluster IP failover - notification possible in som...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 01:22 AM
тАО03-04-2008 01:22 AM
I have the following problem:
in an OpenVMS cluster, I have configured a cluster IP address (no METRIC/LBROKER configuration) + failSAFE IP; while it is possible to have failSAFE IP start a procedure (SYS$MANAGER:TCPIP$SYFAILSAFE.COM) in case one of the monitored interfaces on a single NODE fails, it doesn't seem to be possible to start a procedure if a node fails and the cluster IP becomes available on another member node of the cluster.
I have the requirement, to have interactive users log in to the node, where certain processes are running. So, in case the node crashes on which these particular processes are running, the processes should be restarted automatically on another node (which is easy of course) and at the same time users should be diverted to that node instead.
I can see messages in OPERATOR.LOG such as the following:
%%%%%%%%%%% OPCOM 4-MAR-2008 16:45:10.83 %%%%%%%%%%%
Message from user INTERnet on H3A01
%TCPIP-I-FSVALNOTVALID, IE0 10.200.1.138 take ownership
%%%%%%%%%%% OPCOM 4-MAR-2008 16:45:10.83 %%%%%%%%%%%
Message from user INTERnet on H3A01
%TCPIP-I-FSIPADDRUP, IE0 10.200.1.138 alias active on node H3A01, interface IE0
and I could of course scan for these messages to take some action, but I thought there should be a more clever way....
Does anyone have an idea to that problem, or even better, solved the same problem before?
Thanks in advance!
Cheers
Matthias
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 01:57 AM
тАО03-04-2008 01:57 AM
Re: Cluster IP failover - notification possible in some way?
What tasks are these interactive users performing?
Some random thoughts: It might be easier to communicate outwards from the host and raise the session(s) or X Window displays "the other way" from your current connection, or it might be feasible to communicate within the cluster and not have to be co-resident with the host running the applications or it might be feasible to have the application running everywhere and simply use the cluster as a cluster and not as one big fail-over server.
I might also ask why the users need be on the box. Might a remote management interface either for the host or for the application(s) be a better solution?
You might well insinuate some processing into the FailSAFE locking, but that's probably not going to be considered documented and supported. Locking is the usual solution for coordinating a primary process and multiple secondaries.
I'd also look to use DNS, and have the host running the applications also advertise itself via DNS. Or you might configure your DNS and simply run applications on multiple hosts in the cluster. (Most folks don't actually use an IP address directly, they translate via DNS, and DNS and DNS appliances can have some other uses here.)
Some web appliances can provide this sort of capability, as well.
You might be able to use LAN Failover, which was added circa V7.3-2, too.
That TCPIP$SYFAILSAFE.COM doesn't fire reliably seems odd. I'd expect that to fire on any node in a cluster, where-ever the IP address lands. (Pardon the question, but is FailSAFE configured to operate across all of the nodes in the cluster?)
What might work best here (and what won't work so well) depends on your environment; on what control you have over the applications and application code, over the ability of the application to operate in parallel (is the IP address the limit here, or the application?) and other bits of background on your particular situation. Some added details and background, please?
Stephen Hoffman
HoffmanLabs LLC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 01:57 AM
тАО03-04-2008 01:57 AM
Re: Cluster IP failover - notification possible in some way?
$b:
$ ucx ping alias
$ def/user sys$output wim.lis
$ ucx sho arp alias
$ diff wim.lis
$ if $status .ne. %x006c8009
$ then
$ @what_you_like_when_it_changes
$ else
$ delete wim.lis;
$ endif
$ wait 00:01
$ goto b
When the hardware address changes, the alias has changes. Didn't test it because stuck on old stuff.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 02:24 AM
тАО03-04-2008 02:24 AM
Re: Cluster IP failover - notification possible in some way?
we're running an in-house produced application and are not using X-Windows. The users log in interactively via plain old TELNET sessions and parts of the interactive programs rely on the existence of Mailbox devices provided by the background processes.
Now, we have configured a multi-site cluster and I am trying to automate (or at least semi-automate) a failover situation.
The VT emulators of course have assigned DNS names for the particular IP address in their session profiles and the failover of the IP address works really good. But I would like to restart the processes on the particular node where the IP address was brought up; imagine an 8-node cluster, 4 nodes per site, and you cannot really predict on which member of the backup site the IP is brought up... DNS zone updates would be an option, but I have to ask our network team if I will be allowed to do that. Running BIND on OpenVMS could be another option, although I doubt I will get allowance for that...
Regarding failSAFE: yes, it is configured throughout the cluster. When reading the documentation on TCPIP$FAILSAFE and also checking the logs it occurs that the procedure is only started when an interface(!) fails, but not when the node fails, because it is just fired when certain polls fail locally. The IP migrates to another node, but no trace whatsoever is found in the failsafe logs, only in OPERATOR.LOG....
Cheers
Matthias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 02:27 AM
тАО03-04-2008 02:27 AM
Re: Cluster IP failover - notification possible in some way?
But this might be the last chance if all else fails, I will test it then...
Cheers
Matthias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 02:35 AM
тАО03-04-2008 02:35 AM
Re: Cluster IP failover - notification possible in some way?
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 02:41 AM
тАО03-04-2008 02:41 AM
Re: Cluster IP failover - notification possible in some way?
You could conceivably swap out the mailboxes for another communications protocol (eg: ICC, TCP, DECnet). This has advantages in allowing remote access, too.
Why even use telnet? Why not connect your management tools to the web, or to TCP, or to multicast UDP for periodic status update messages, or...
Yeah, I know, no application changes. :-)
You could set up a mailbox relay application (if the mailbox protocol uses one mailbox for reads and one for writes, this relay can likely be insinuated into the environment easily), or the code could open remote mailboxes using DECnet.
BTW: If you do decide to migrate your command and control protocol, realize that TCP streams have requirements around how you pass over datagrams, if you're going to swap from your existing mailbox datagram protocol over to a TCP stream. You have to build the arriving datagram from the bytes as the bytes arrive; don't expect the datagram will always arrive as one entire I/O, and don't expect that one I/O is necessarily one datagram.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 05:32 AM
тАО03-04-2008 05:32 AM
Re: Cluster IP failover - notification possible in some way?
The inter-node communication (non-cluster) will soon be replaced by RTR and there we see it: the biggest problem are development resources.
So long
Matthias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 06:55 AM
тАО03-04-2008 06:55 AM
Solutionwe had some apps with somewhat similar restaints.
The way we solved this (also covering the fact that the app itself DID occasionally fail) was by using the QUEMANAGER locking.
We made the app starting from a batch job.
Just before getting to the actual starting itself, we $ SET PROCES/NAME=RUNNING_
But before getting there, we did a clusterwide scan for that process name.
If it was found, we got the queue entry number, and synchronised on it. If the job was NOT running on the local node, if SYNCHRONISE fell through we waited a short delay (shorter on the prefered site), and looped to looking for the process again, before changing the name ro RUNNING_
So, if the job failed on the current node, it restarted immediately. If the node had gone with the process, then another node of the prefered site (in our case, after 10 soconds) would activate the app. And if the site should be gone (as happened one time, during Millennium testing of the fire alarm) the wait was 15 seconds before the app restarted at the other site.
So, the end result was to have the app running on one node, and on every node a job (appropriately with process name
You will need some double checking after the SET PROCESS/NAME= to avoid timing issues; and you need to submit a new batch job to become SYNC after this SYNC has been promoted to RUNNING.
Of course, this general mechanism needs adaptations for the specific app, but this general idea has served us well for many years. )Actually, the hardest part for the guys that had to decommission one of those apps when I was not available was the fact that it was REALLY hard to kill if you did it not by first killing ALL the SYNC's.)
Oh, and one of those apps DID communicate by way of IP mailboxes.
No way we could have sufficient influence on the DNS server, that is why we had the cluster node names also in the (cluster-common) TCPIP$HOSTS, and we made the startup of the server process also define an IP alias pointing the actual node. Tell the app the connection has to be to the alias name, and you are set for business.
hth
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 07:30 AM
тАО03-04-2008 07:30 AM
Re: Cluster IP failover - notification possible in some way?
Another solution is to move the alias around with the application (ifconfig intf alias xxx netmask 255.255.255.255). We use this for a year without problems. This combined with the lock file could be a solution for you.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2008 07:02 PM
тАО03-04-2008 07:02 PM
Re: Cluster IP failover - notification possible in some way?
I had (have) the same problem. We use MessageQ which can be configured to run in a cluster, but a particular bus/group can only be active on one cluster member at a time. TCPIP$SYFAILSAFE.COM runs on the node giving up the address, which is okay in a controlled shutdown/failover but no good for a crash.
I'm told the TCPIP team are looking at implementing something that would fit the bill in a version tba, this would probably involve OPCOM scrapping, so you are looking at a similar solution to them! In the meantime I have a DCL procedure on each cluster member opening a network mailbox on the other nodes and issuing a read; when the read completes with an error then I know the other node has disappeared for some reason and can act accordingly. Not perfect but better than being rung at 3 in the morning.
Have fun,
PJ
Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-17-2008 02:50 AM
тАО04-17-2008 02:50 AM
Re: Cluster IP failover - notification possible in some way?
after a long time investigating the options, I have come to the conclusion that I will create a solution as outlined by Jan but maybe a little different ;-)
In the end, we need to have some application startup process that defines the alias address on the interface and that should be it, I hope.
There is still potential for this to not work correctly, ie. someone stops the queue on one node, the processes fail over to the other node including the definition of the IP alias. But in that situation it might happen that there are still connections to this IP on the originating node and therefore it will not move; yes I could use ifconfig -alias ... abort or whatever but if the killed process does not come to that point...
Still, this solution is better than nothing!
A bit THANKS to all who have answered!
Matthias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-17-2008 02:51 AM
тАО04-17-2008 02:51 AM