1828329 Members
3733 Online
109976 Solutions
New Discussion

TCPIP$FAILSAFE on IA64

 
Paul Jerrom
Valued Contributor

TCPIP$FAILSAFE on IA64

Hi all,

I am currently setting up a cluster of rx2620s. The idea is that one application will normally run on one node, and the other application on the second node, but that the apps will fail over to the remaining node in the case of a server or network failure.
So far I have managed to set up TCPIP$FAILSAFE so that the addresses successfully fail over to the other NIC or server. However:
1) I do not get any messages in the FAILSAFE logfiles SYS$SYSDEVICE:[TCPIP$FSAFE]
TCPIP$FAILSAFE_nodename.LOG that tells me that anything has happened after a fail over/fail back - in fact the only messages I get here are that the process has started. Is there a logical I can set to increase the log level so I can actually see FAILSAFE working?
2) the logical "TCPIP$SYFAILSAFE" is set to point to a command file, which should email me or send a REPLY message when FAILSAFE detects a failover or faill back - but this command file is never invoked. Do I need to do something else to get this command file to be invoked.
[Long term what I want is to use the mechanism of the IP address failing over to fire up an application on the surviving node, and I thought TCPIP$SYFAILSAFE would be a good way of doing so.]

Versions are VMS V8.3 and TCPIP V5.6.

Cheers from sunny New Zealand.

PJ
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
7 REPLIES 7
John Gillings
Honored Contributor

Re: TCPIP$FAILSAFE on IA64

PJ,

Is the logical name defined /SYSTEM/EXEC? Try creating SYS$COMMON:[SYSMGR]TCPIP$SYFAILSAFE.COM, see if it gets executed from its default location.


Logical name TCPIP$FAILSAFE_LOG_LEVEL controls the volume of log messages sent to OPCOM and the log file. If the logical name is undefined or has a value of zero, the default log level is assumed. Larger
values are used for debugging. This logical name is translated each time failSAFE IP logs a message.
A crucible of informative mistakes
Paul Jerrom
Valued Contributor

Re: TCPIP$FAILSAFE on IA64

Hi John,

How is Oz??
I have set the logfile level to 8. All I get is the following in TCPIP$FAILSAFE_node.LOG:

13-Mar-2007 15:09:46 DEBUG: Primary IE0 - 152.153.10.47
13-Mar-2007 15:09:46 DEBUG: Primary IE1 - 152.153.10.44
13-Mar-2007 15:09:46 DEBUG: Primary IE2 - 192.168.1.3
13-Mar-2007 15:09:46 DEBUG: Primary IE3 - 192.168.1.4
13-Mar-2007 15:09:46 DEBUG: ifname_list[0] - IE0 [152.153.10.47]
13-Mar-2007 15:09:46 DEBUG: ifname_list[1] - IE1 [152.153.10.44]
13-Mar-2007 15:09:46 DEBUG: ifname_list[2] - IE2 [192.168.1.3]
13-Mar-2007 15:09:46 DEBUG: ifname_list[3] - IE3 [192.168.1.4]
13-Mar-2007 15:09:46 INFO: Reading configuration file: SYS$SYSDEVICE:[TCPIP$FS
AFE]TCPIP$FAILSAFE.CONF
13-Mar-2007 15:09:46 DEBUG: IE0 - netstat -rn
13-Mar-2007 15:09:46 DEBUG: IE0 - type SYS$SYSDEVICE:[TCPIP$FSAFE]TCPIP$FAILSAF
E_HELA_IE0_ROUTE_TABLE.TMP
13-Mar-2007 15:09:46 DEBUG: Primary IE0 - 152.153.10.47
13-Mar-2007 15:09:46 DEBUG: Primary IE1 - 152.153.10.44
13-Mar-2007 15:09:46 DEBUG: Primary IE2 - 192.168.1.3
13-Mar-2007 15:09:46 DEBUG: Primary IE3 - 192.168.1.4
13-Mar-2007 15:09:46 DEBUG: ifname_list[0] - IE0 [152.153.10.47]
13-Mar-2007 15:09:46 DEBUG: ifname_list[1] - IE1 [152.153.10.44]
13-Mar-2007 15:09:46 DEBUG: ifname_list[2] - IE2 [192.168.1.3]
13-Mar-2007 15:09:46 DEBUG: ifname_list[3] - IE3 [192.168.1.4]
13-Mar-2007 15:09:46 DEBUG: IE0 - netstat -rn
13-Mar-2007 15:09:46 DEBUG: IE1 - netstat -rn
13-Mar-2007 15:09:46 DEBUG: IE1 - type SYS$SYSDEVICE:[TCPIP$FSAFE]TCPIP$FAILSAF
E_HELA_IE1_ROUTE_TABLE.TMP
13-Mar-2007 15:09:46 DEBUG: Primary IE0 - 152.153.10.47
13-Mar-2007 15:09:46 DEBUG: Primary IE1 - 152.153.10.44
13-Mar-2007 15:09:46 DEBUG: Primary IE2 - 192.168.1.3
13-Mar-2007 15:09:46 DEBUG: Primary IE3 - 192.168.1.4
13-Mar-2007 15:09:46 DEBUG: ifname_list[0] - IE0 [152.153.10.47]
13-Mar-2007 15:09:46 DEBUG: ifname_list[1] - IE1 [152.153.10.44]
13-Mar-2007 15:09:46 DEBUG: ifname_list[2] - IE2 [192.168.1.3]
13-Mar-2007 15:09:47 DEBUG: ifname_list[3] - IE3 [192.168.1.4]
13-Mar-2007 15:09:47 DEBUG: IE1 - netstat -rn
13-Mar-2007 15:09:47 DEBUG: IE2 - netstat -rn
13-Mar-2007 15:09:47 DEBUG: IE2 - type SYS$SYSDEVICE:[TCPIP$FSAFE]TCPIP$FAILSAF
E_HELA_IE2_ROUTE_TABLE.TMP
13-Mar-2007 15:09:47 DEBUG: Primary IE0 - 152.153.10.47
13-Mar-2007 15:09:47 DEBUG: Primary IE1 - 152.153.10.44
13-Mar-2007 15:09:47 DEBUG: Primary IE2 - 192.168.1.3
13-Mar-2007 15:09:47 DEBUG: Primary IE3 - 192.168.1.4
13-Mar-2007 15:09:47 DEBUG: ifname_list[0] - IE0 [152.153.10.47]
13-Mar-2007 15:09:47 DEBUG: ifname_list[1] - IE1 [152.153.10.44]
13-Mar-2007 15:09:47 DEBUG: ifname_list[2] - IE2 [192.168.1.3]
13-Mar-2007 15:09:47 DEBUG: ifname_list[3] - IE3 [192.168.1.4]
13-Mar-2007 15:09:47 DEBUG: IE2 - netstat -rn
13-Mar-2007 15:09:47 DEBUG: IE3 - netstat -rn
13-Mar-2007 15:09:47 DEBUG: IE3 - type SYS$SYSDEVICE:[TCPIP$FSAFE]TCPIP$FAILSAF
E_HELA_IE3_ROUTE_TABLE.TMP
13-Mar-2007 15:09:47 DEBUG: Primary IE0 - 152.153.10.47
13-Mar-2007 15:09:47 DEBUG: Primary IE1 - 152.153.10.44
13-Mar-2007 15:09:47 DEBUG: Primary IE2 - 192.168.1.3
13-Mar-2007 15:09:47 DEBUG: Primary IE3 - 192.168.1.4
13-Mar-2007 15:09:47 DEBUG: ifname_list[0] - IE0 [152.153.10.47]
13-Mar-2007 15:09:47 DEBUG: ifname_list[1] - IE1 [152.153.10.44]
13-Mar-2007 15:09:47 DEBUG: ifname_list[2] - IE2 [192.168.1.3]
13-Mar-2007 15:09:47 DEBUG: ifname_list[3] - IE3 [192.168.1.4]
13-Mar-2007 15:09:47 DEBUG: IE3 - netstat -rn
13-Mar-2007 15:09:47 EVENT: started on node HELA:
Logfile : SYS$SYSDEVICE:[TCPIP$FSAFE]TCPIP$FAILSAFE_HELA.LOG
Monitoring: IE0,IE1,IE2,IE3
Info Poll : 3s
Warn Poll : 2s (1 retry)
Error Poll: 30s
Generate : mac
Rt Retry : 1s
13-Mar-2007 15:09:47 DEBUG: IE0 - 152.153.10.47 heartbeat
13-Mar-2007 15:09:48 DEBUG: IE1 - 152.153.10.44 heartbeat
13-Mar-2007 15:09:48 DEBUG: IE2 - 192.168.1.3 heartbeat
13-Mar-2007 15:09:48 DEBUG: IE3 - 192.168.1.4 heartbeat
$

It is now two hours hence, I have shutdown the other node, manually failed and unfailed interfaces etc etc, but nothing has been reported.

TCPIP$FAILSAFE_RUN.LOG just reports the routing tables every so often.I would expect an entry to say that the interface had been failed over?

As for the command file, yes it is defined /sys/exec - points to another logical for a device name, which is itself /sys/exec.

$ sh log tcpip$syfailsafe/full
"TCPIP$SYFAILSAFE" [exec] = "CMN$MANAGER:TCPIP$SYFAILSAFE.COM" (LNM$SYSTEM_TABLE)
$ sh log cmn$manager/full
"CMN$MANAGER" [exec] = "DSA2:[SYSMGR]" (LNM$SYSTEM_TABLE)

I can log on locally as tcpip$fsafe and @ the command file, so it is not protection. I have also tried copying the command file to sys$manager in case, but no change. Would be nice if the logfile reported the reason why the command file can't be run...

Cheers,

PJ
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
John Gillings
Honored Contributor

Re: TCPIP$FAILSAFE on IA64

PJ,

If you suspect it's an issue accessing the log file, put an AUDIT and/or ALARM ACE on it.

(ALARM=SECURITY,ACCESS=R+W+E+D+C+S+F)

Otherwise, time to log a case.

If all you're trying to do is failover an application, you're probably better off using deadman locks. Simpler, more reliable, and with much broader scope. You may need some logic after obtaining the lock to determine if you're "it" as far as failsafe is concerned.
A crucible of informative mistakes
Paul Jerrom
Valued Contributor

Re: TCPIP$FAILSAFE on IA64

Howdy,

No, I don't think it is a protection issue. I've granted the tcpip$fsafe account all privileges, and it has made no difference. I think Failsafe simply doesn't work as advertised!
Re. deadman locks, I don't see why I should go to the effort, as the mechanism attributed to Failsafe should give me exactly what I want.
Is anyone out there using Failsafe on IA64?? Is anyone getting messages in the logfile? Does anyone have the TCPIP$SYFAILSAFE command file mechanism working?
[As an aside, after I have manually failed and un-failed the interfaces several times I lose my default gateway definition...Not sure if this is a failsafe issue or something else unrelated.]

Have fun.

PJ
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Martin Hughes
Regular Advisor

Re: TCPIP$FAILSAFE on IA64

I have seen a similar discussion to this before, the bottom line is that the functionality you are looking for does not exist (yet). That is, SYFAILSAFE does not get run on the node that is failed over to.
For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2
Paul Jerrom
Valued Contributor

Re: TCPIP$FAILSAFE on IA64

Gah! The use of SYFAILSAFE has been documented in at least the last two versions of TCPIP for over a year!
Can you point me to that discussion please Martin?
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Martin Hughes
Regular Advisor

Re: TCPIP$FAILSAFE on IA64

Hi PJ,

I can't find the earlier discussion I was referring to, it may not have been in ITRC forums. If you have a support contract with HP then I would encourage you to log a call and get an official response.

Just to clarify my earlier point, my understanding is that SYFAILSAFE runs on the node where the interface fails. But in the event of a system crash, SYFAILSAFE is not invoked on other cluster nodes. Which suggests it is not designed to be a trigger for application failover.
For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2