Operating System - OpenVMS
1748178 Members
4099 Online
108758 Solutions
New Discussion юеВ

Re: Reg. LAN swithover time...

 
Accullise
Frequent Advisor

Reg. LAN swithover time...

hi friends...

Here we have 4 alpha nodes and one itanium node... 4 alpha nodes are Memory channel cluster and all the 5 nodes are in Lan cluster.
If i switch off the memory channel, how much time it takes to switch over from MC to lan.Is there any parameters other than RECNXINTERVEL to be consider to calculate the time?...

can anyone explain briefly about this concept?

----------------------------------------------


If i connect all nodes to a independent switch with some ip and run cluster_config_lan.com(on this i think, i have to give some alias name for nodes) it works as a independent cluster?(if n/w disturbance occurs,no one goes out of cluster).

can anyone tell me about this?

Regards,
Saravanan.M
9 REPLIES 9
Wim Van den Wyngaert
Honored Contributor

Re: Reg. LAN swithover time...

Check sysgen parameter TIMVCFAIL which is on 16 seconds (7.3). Don't understand what you ask in 2nd part.

Wim
Wim
Volker Halle
Honored Contributor

Re: Reg. LAN swithover time...

Saravanan.M,

the SCS-messages between the node travel via the VC (Virtual Circuit). The VC is being multiplexed over the available and functioning Channels, the so-called ECS (Equivalent Channel Set). Between the Alpha nodes, which are connected to the Memory Channel, most of the SCS traffic may go via the Memory Channel, but between the Alphas and the Itanium, data travels via the LAN anyway. You can check the channels and the counters via SCACP to determine, which channels are being used most. You can also change the priority settings to direct the traffic the way you want.

If you turn off Memory Channel, you should not see any effect on the VCs between the cluster nodes.

In a cluster, you should always have a secondary interconnect between ALL nodes, i.e. a separate switch, to which ALL nodes are connected. In your case, the Itanium system will probabaly crash with CLUEXIT, if there has been any disturbance on the network for more then RECNXINTERVAL seconds.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Reg. LAN swithover time...

For the 2nd part : the cluster protocol will failover to the independent switch (or it will be the permanent selected communication path). So, whatever occurs on the network or on memory channel, the cluster will continue via the independent switch.

The alias ?

Wim
Wim
Andy Bustamante
Honored Contributor

Re: Reg. LAN swithover time...

>>>If i switch off the memory channel, how much time it takes to switch over from MC to lan.

Cluster traffic by default uses all connections, your system doesn't fail over, it's active/active. The systems favor the channel with least latency.

>>> In a cluster, you should always have a secondary interconnect between ALL nodes, i.e. a separate switch, to which ALL nodes are connected.

A second NIC in each system and another switch provide an easy reliablity increase. There is no need to configure another cluster traffic path, VMS is configured to work out of the box. Make sure you don't have the network team set up another VLAN on the same switch your present traffic is using.

Andy Bustamante
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Volker Halle
Honored Contributor

Re: Reg. LAN swithover time...

If your 'turn off' the Memory Channel interconnect, the SCS port driver most likely is notified immediately, no timeout involved. Cluster traffic would continue un-disturbed via the available channels.

In case of a failure on the LAN, there may be a timeout of up to 8-9 seconds (2-3x Hello Timer), depending upon the nature of the failure.

Can you explain what 'the problem' is ? Why are you asking these questions ? Did you experience 'a problem' and want to know why it happened ?

Volker.
Accullise
Frequent Advisor

Re: Reg. LAN swithover time...

thanks friends...


If memory channel getting disturbed, the system is in hanging condition?(at the time of recnxintervel). or working via LAN? Can we access the nodes at that time?
and
If one node getting disturbed, the system is waiting in cluster for recnxintervel time(20sec). After that time the node will crash or not?
If not please explain biefly why?


Regards,
Saravanan.M
Volker Halle
Honored Contributor

Re: Reg. LAN swithover time...

Saravanan.M,

it depends on the nature of the failure.

If your memory channel interface in one of the node fails, it may cause the system the crash immediately. If only the communication via MC fails, the systems should continue undisturbed, as long as the LAN interconnect is correctly configured and working on ALL nodes. You can check at any time with SCACP> SHOW CHAN, if all PEDRIVER channels (LAN) are working.

If a node has a problem (e.g. code looping at high IPL), the other nodes will hang for RECXINTERVAL seconds, until dropping that node from the cluster and continuing. If the loop somehow ends and the node tries to re-establish cluster-communications with the other surviving nodes, it will be forced to take a CLUEXIT crash and reboot.

Volker.
Accullise
Frequent Advisor

Re: Reg. LAN swithover time...

thanks volker,


But three months back, i have tested with a itanium and DS20 as in Lan. when i plugged out the interface from one node another node was not gone to crash, just remain in hanging condition.what is the reason for that? how long it remains in this state?

After the Recxintervel time it must go to crash condition, but the node remain only in hanging condition, why?..
Volker Halle
Honored Contributor

Re: Reg. LAN swithover time...

You may need to supply more information about your cluster configuration for a more detailled answer, but here is a possible scenario:

If you unplug the LAN cable (which cable did you unplug ?), the node will loose connections to the rest of the cluster and hang waiting for quorum. The other nodes will time out this node and remove it from the cluster after RECNXINTERVAL seconds. Depending on VOTES, the other nodes may continue running. In a 2-node cluster with VOTES=1 each, the other node will also hang.

A node will only take a CLUEXIT crash, if communication have been restored and the other nodes had already removed that node from the cluster.

Volker.