1833543 Members
3079 Online
110061 Solutions
New Discussion

Re: serviceguard

 
SOLVED
Go to solution
Shivkumar
Super Advisor

serviceguard

Dear Sirs,

We are using serviceguard to run oracle 9i cluster. The cluster has 2 nodes. One node went down due to
some network problem. We are using quorum server also.

In this situation where only 2 nodes exist in the cluster how come the reformation of the new group of nodes will take place ?

Server which went down has 2 network cards and both were in the same subnet. Should i propose 2 lan cards in different subnet or vlan to avoid network failure of serviceguard.

Thanks,
Shiv
13 REPLIES 13
Steven E. Protter
Exalted Contributor

Re: serviceguard

If properly configured when the node that went down comes back up it rejoins the cluster.

Packages may go back to node two, depending on how they are configured.

You might want to look at the log file /var/adm/syslog/syslog.log and the package cluster logs in /etc/cmluster for more information.

I don't understand the question in your second paragraph, can you try to elaborate?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Shivkumar
Super Advisor

Re: serviceguard

My question is "we need minimum 2 node to form a cluster. In case only one cluster node is alive then how come the cluster work ?
Shivkumar
Super Advisor

Re: serviceguard

what is the meaning of TOC and how it functions ?
Adisuria Wangsadinata_1
Honored Contributor
Solution

Re: serviceguard

Hi Shiv,

Your question : "we need minimum 2 node to form a cluster. In case only one cluster node is alive then how come the cluster work ?"

The first reason cluster system is to make 'high availability' environment on the system. So if 1 system down, the other system will take over and users can continue their work with very less interuption (if possible make zero interuption).

On your system with 2 nodes, if the node A down, the node B will take over the package from node A. So at your cluster, it's a normal situation if the cluster still alive even with 1 nodes only.

You need to check the cluster configuration (see Node_Switching_Parameters) with :

# cmviewcl -v

You will see which one is the primary & the alternate.

On 3 nodes or more, we can set the rule for server quorum (must 2 nodes alive), since maybe consider about the load on the node itself if other 2 nodes is down. Check the cluster configuration with 'cmgetconf' command.

TOC (transfer of control) is the system way to release itself from the panic situation. Usually TOC process will create a fingerprint of the system data at the time the TOC process run, and because of this HP engineer can trace what happen when the system crash or panic.

But TOC process can also run manually. On old system, there's a TOC button at the back of the system. But on the new one, TOC can be start by execute 'TC' command from GSP.

Hope this information can help you.

Cheers,
AW




now working, next not working ... that's unix
Adisuria Wangsadinata_1
Honored Contributor

Re: serviceguard

Addendum :

As I inform to you, TOC is the system way to release itself from panic situation. The way is by reset the system itself and create a crashdump information on /var/adm/crash.

Usually if the system panic, HP people will ask for the files under /var/adm/crash directory and decode it.

Since TOC can be done manually and TOC will leave the fingeprint of what's going on when the TOC process started, it's not a helpfull if you run TOC when your system is healthy ... since the fingerprint didnt show any strange issue on your system.

Hope this information can help you.

Cheers,
AW
now working, next not working ... that's unix
Ranjith_5
Honored Contributor

Re: serviceguard

Hi Shiva,

:***: In a two node cluster for the service to run only one node is enought if the package switching is enabled on both the nodes.

ie; if a node is down then the packages( Services) will switch to the other node and your service will be available without much interruption. Thats why it is also called as a High Availability Cluster / HA Cluster.

:***: For high availability you have to connect the network cards to physically different networks rather than VLAN...( That means you need to use 2 different switches/hubs for connecion.

:***: According to the traditional methods, quorum server is not really required for a 2 node cluster. Only a cluster lock disk will do thw work. Quorum server is normally used when the cluster contains 4 or more nodes.




Regards,
Syam
Shivkumar
Super Advisor

Re: serviceguard

What are the commands to find out below:-

(1) How many network interfaces are being used by the ServiceGuard ( as i am seeing many network interfaces on the server )?

(2) Whether they are configured on the same switch or different ?

Because i noticed one of the serviceguard node is configured in one subnet.

Secondly, Will network unavailability cause panic and crash of the node ?

Thanks,
Shiv
A. Clay Stephenson
Acclaimed Contributor

Re: serviceguard

cmquerycl -c clustername -l net.

MC/SG has no direct way of knowing how these are connected on the switch.

If you have multiple LAN cards defined, multiple switches, multiple heartbeat networks, etc. then the failure of any single component (NIC, switch, cable) should not trigger a TOC and should be considered a completely normal (more or less) and expected event.
If it ain't broke, I can fix that.
Shivkumar
Super Advisor

Re: serviceguard

What is the role of hyperfabric card in serviceguard ? Are they different than usual ethernet network cards ?
Deoncia Grayson_1
Honored Contributor

Re: serviceguard

You may want to check this link in regards to hyperfabric cards:

http://www.hp.com/products1/serverconnectivity/adapters/adapter06/specifications/index.html
If no one ever took risks, Michelangelo would have painted the Sistine floor. -Neil Simon
Geoff Wild
Honored Contributor

Re: serviceguard

What you might want to do is review this doc.

http://docs.hp.com/en/B3936-90079/index.html

If gives a lot of information on how ServiceGuard works.

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Mel Burslan
Honored Contributor

Re: serviceguard

Shiv,

I think you have misconceived the subnet and/or vlan concepts for the serviceguard. Please take a look at the attached drawing before reading this any further.

The cluster network ports (not the heartbeat connections though) should be on the same subnet, hence the same vlan for all the ports can talk to each other in case of failure of any one of them. But at the same time, as shown on the drawing, they need to be cross connected to two separate core switches (or managed switches when you can adjoin ports from different devices and pool them into same vlan)

On the other hand, your hearbeat lans should be connected to two separate switches for the maximum redundancy and they should not be on the same vlan or anything. Just one from each server needs to be connected to a switch so that if a switch fails, the backup heartbeat lan should survive, preventing a TOC reboot.


Hope this makes it a little clearer.
________________________________
UNIX because I majored in cryptology...
Shivkumar
Super Advisor

Re: serviceguard

Mel Burslan; Thanks for clarification.