Re: How to Implement Application Switch in OpenVMS Cluster!

Feige · ‎02-28-2007

I have established a HP Alpha DS25 OpenVMS (V8.2 and RSIC chip)Cluster with 2 Nodes(A and B),Gibit Ethernet Network Interconnect and SCSI Shared Storages,a Shared quorum Disk and is also system Disk.There are a immage APP.EXE Programmed by HP C++ in Shared Disk.

Now I start APP.EXE in Node A, when I shutdown Node A, APP.EXE can't automatically start in Node B, why?

I remember In Linux Cluster system,there is a cluster auto-switch script,as long as you add your application program to the script, when application run abnormal or stop in a node, Cluster will automatically start application program in another node.

So my question is:
1)How to implement the function In OpenVMS cluster system?

2)If programming is possible, could you share a example?

Thanks in advance!

Jon Pinkley · ‎02-28-2007

There is no way that I am aware of to save the state of a generic running program and have it continue execution on another node. If all you want is for a service to be available and running only a single simultaneous instance within the system, then there are several ways to do this.

The simplest, requiring no programming, is to run the program in a batch job, and have the batch job run on an autostart queue. If the batch job is submitted /restart the batch job will automatically restart if it doesn't explicitly exit, and this restart can happen on a different node in the cluster.

Read

http://h71000.www7.hp.com/doc/731FINAL/4477/4477pro_012.html#clusterque

If you want more control that that gives you,
the VMS queue manager is aware that a job has been restarted, and sets some DCL symbols that can be checked by the batch job, similar is concept to the way a unix forked process can tell it was a clone of another instance.

For the most control, there is the VMS distributed lock manager, which uses the system services SYS$ENQUE, SYS$DEQ and SYS$GETLKI. See

http://h71000.www7.hp.com/doc/82FINAL/5841/5841pro_022.html#synch_accs_res_chap

In this case, you would have multiple instances of the program running, up to one per cluster node, but only one would be holding the lock and doing the processing. The others would be waiting for the lock.

See the VMS User's Manual chapters 13 - 16 for information about writing DCL command procedures (scripts). Chapter 16 is about batch jobs, and starting with 16.5.8 it discusses restartable batch jobs. Restartable batch jobs running in a autostart batch queue may be similar to the autoswitch script, but I have no knowledge of Linux Clusters, so that is pure guesswork.

it depends

Feige · ‎02-28-2007

Thank your quick response! I will try to do it.

Robert Gezelter · ‎02-28-2007

Feige,

OpenVMS clusters are significantly different than the various Linux clustering schemes.

What does your application do? If it is simply accessing files, then the combination of RMS and the lock manager may allow you to run two copies of the application simultaneously.

If the application must be master/slave, then the lock usage described in the previous posting will accomplish the goal.

More details about your application would be helpful in order to identify the correct answer.

-Bob Gezelter, http://www.rlgsc.com

Feige · ‎03-01-2007

Hi,Robert,

Thank you in advance!

In above mentioned, application APP.EXE is just Example.In fact It's a Command procedure BBS.COM.
For BBS.COM,main functions as follows:
1)Start 4 IMAGE(.EXE)
2)Define Logical name
3)Access Oracle, for example, @Orauser instance

What's more, In the cluster environment,the 4 Image only running on one node(A or B), and can't running on two node A and B at the same time.

Besides,the 4 Image(.EXE) will also fork other process.

Robert Gezelter · ‎03-01-2007

Feige,

I try not to speculate. Without understanding the precise processing that is being done in the image, it is not possible to know whether it would be safe to run on both nodes simultaneously.

If it is a straightforward business application using Oracle for all of its data, it should be able to run on both nodes at the same time (the Oracle instances, if set up properly, will handle the database lock synchronization between the two members of the cluster).

Personally, if I were working on this project, I would take a good careful look at 4Image.exe to see precisely what it is doing.

- Bob Gezelter, http://www.rlgsc.com

John Gillings · ‎03-01-2007

Feige,

Automatically STARTING the application on another cluster node after a failure is very easy to do. Recovering state from a process lost for an arbitrary reason is NOT easy - I'd even go so far as to say intractible.

The simplest mechanism for automatic takeover is a "dead man switch". Using the Distributed Lock Manager, your application $ENQW's a request against an application specific resource requesting an EX mode lock. The first process to request the lock will get it. Subsequent processes will wait on the lock. If the process holding the lock fails for any reason (programming bug, node crash, power failure, comms, anything) then another process will get it and continue.

The application needs to be written to realise that once it's got the lock, it should check around for evidence of a failed process and do whatever is necessary to recover.

This mechanism is very general. You can start as many "backup" processes on as many nodes as you like, and be sure that only one will ever get past the $ENQW. Also note that this is not a polled wait - it's event driven, so there is minimal delay in the standby process taking over, and almost zero overhead. By the sound of the mechanism you describe in Linux, it's a polling loop, checking that the processes on its list are running.

The call would look something like this:

status=$ENWQ(0,LCK$M_EXMODE,lksb,0,
"MY_APP_DEAD_MAN_LOCK",0,0,0,0,0,0,0);

That said, in an OpenVMS cluster, rather than architecting your application as active/standby, it's usually better to make it active/active - that is have multiple processes working simultaneously and load balanced. If one process goes away, the other(s) are already working. Properly designed, this type of architecture can be easily scaled, whereas active/standby is typically limited to the power of a single node or process.

A crucible of informative mistakes

John Gillings · ‎03-01-2007

Oops, now that I've read some of the other responses, I see you want DCL...

That can be done too, in the simplest case it requires polling, using an RMS file as the "lock". For example:

DEADMAN.COM
$ IF p1.EQS."" THEN INQUIRE p1 "Command to execute"
$ IF p1.EQS."" THEN EXIT
$ IF p2.EQS."" THEN INQUIRE p2 "Lock file name"
$ IF p2.EQS."" THEN EXIT
$ IF p3.EQS."" THEN p3=F$CVTIME("0-0:0:10.0","DELTA") ! Timeout in seconds default
$ IF F$SEARCH(p2).EQS."" THEN CREATE 'p2'
$ SET NOON
$ w="0"
$ Retry: wait 'w'
$ w=p3
$ DEFINE/USER/NOLOG SYS$OUTPUT NL:
$ DEFINE/USER/NOLOG SYS$ERROR NL:
$ OPEN/READ/WRITE/ERROR=Retry lock 'p2' ! NOSHARE
$ DEASSIGN/USER SYS$OUTPUT
$ DEASSIGN/USER SYS$ERROR
$ 'p1'
$ w="0"
$ GOTO Retry

A crucible of informative mistakes

Hoff · ‎03-01-2007

OpenVMS is set to run applications entirely in parallel, and to operate across up to 96 nodes in parallel in a cluster.

One of the comparatively weird features of OpenVMS is that locking is built into the system APIs used by applications, meaning that multiple applications can have the same file open at the same time, and to not stomp on each other. Transparently.

Unix tends to use "stub" locking files to coordinate application activity and to handle determining a primary process, while OpenVMS doesn't. OpenVMS uses the distributed lock manager when coordinating access and application activity, and uses RMS record locks -- basically a built-in hierarchical database -- to handle and to coordinate file access.

It's a slightly different approach to application design.

There are discussions on C and file sharing in the OpenVMS FAQ, among other places. There are also common C coding confusions listed there, too.

Stephen Hoffman
HoffmanLabs

Feige · ‎03-04-2007

Thanks everyone!

I will try it.after that ,I will share the sourceing code with us

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: How to Implement Application Switch in OpenVMS Cluster!

How to Implement Application Switch in OpenVMS Cluster!