Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

Problem ServiceGuard in a MSA500 Cluster Pack

SOLVED
Go to solution
SnIphe
Advisor

Problem ServiceGuard in a MSA500 Cluster Pack

Hi all.

I`m installing ServiceGuard over two DL380, with a MSA500 array disk. With a 4 SCSI-ports controller.
The OS is RedHat 4 AS U4 for 32 bits technology.
The ServiceGuard version is 11.6.02

I have a message wich appears in the startup time. Says something like:
- Process 'cmviewconf' is using obsolete setsockopt SO_BSDCOMPAT
- Process 'cmclconfd' is using obsolete setsockopt SO_BSDCOMPAT
- Process 'cmviewconfd' is using obsolete setsockopt SO_BSDCOMPAT is using obsolete setsockopt SO_BSDCOMPAT
- Process 'cmviewconf' is using obsolete setsockopt SO_BSDCOMPAT
- Process 'cmrunmode' is using obsolete setsockopt SO_BSDCOMPAT

When I turn on one of the servers, stops the startup of the OS in the 'cmrunmode' message. The startup still stoped till the second server displays the same message. In this moment both servers starts normally. So I can start only one server at a time.

Do yo know something about this?

Thanks a lot.
14 REPLIES
Steven E. Protter
Exalted Contributor
Solution

Re: Problem ServiceGuard in a MSA500 Cluster Pack

Shalom,

So the cluster is non-functional until both nodes get the message?

Have you conducted any failover tests? I'm wondering if the SG is doing you any good here.

My search turned up nothing, but if the messages are meaningful as a group I'd say there is a softwware problem. I'd look into patching and seeing if this install is clean.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Serviceguard for Linux
Honored Contributor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

The message is informational and is fixed in a later patch. You should use the latest patch version of the SGLX. Call support and they can help you get these latest bits.

I don't think that the hang is related to this message.

I am confused about your description. As I understand it, you boot one server, it hangs after those messages. Then you boot the other server, those messages are displayed and both servers start. If this is correct I cannot understand the statement "So I can start only one server at a time."

But the first thing to do is get the latest patch version.

SnIphe
Advisor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

OK You are right -> "I can`t start a single node" because the node stops in thise messages.

I`m looking for a patch in this site: http://haweb.cup.hp.com/Support/Linux/RecmdPatches.html

I usually apply SGLX_00113. But now I have two DL380 with 64bits processor. But we have installed RedHat4-AS u4 for 32bits technology. Because the customer ask for 32bits.

Any ide for the patch to apply? Because the SGLX_00111 or SGLX_00046 is for IA32.

SG works well, when the two nodes are up. The same text appears but, the cluster starts.


Thanks a lot.
Serviceguard for Linux
Honored Contributor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

The latest patches for RH4 32 bits that are available are SGLX_0111 and SGLX_00046. You'll need both.
SnIphe
Advisor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

Ok, thanks a lot, I think this will be the problem.

I found in the SAW some problem similar, and sais the same. Install the SGLX_00111 patch. But nothing about SGLX_0046 patch. ¿What is exactly the purpose for this patch?

The monday I`ll go to the customer, to install another SG. So I`ll tell you the results.

Thanks a lot.
Serviceguard for Linux
Honored Contributor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

There is a link between the version of the serviceguard rpm and the sgcmom RPM. The second patch is necessary for that reason.
SnIphe
Advisor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

Ok, and the order of application for the patches?? First the SGLX_00111 ?

Because, I see the RPM of the patches and I think the can overwrite some other files.

Tomorrow is the day of the installation...

Thanks
Serviceguard for Linux
Honored Contributor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

I believe the patch documentation has all information about order of install. If not, I believe 00046 goes first. 00046 covers cmom and 0111 covers serviceguard.
SnIphe
Advisor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

Ok, lots of thanks. Now everithing works well. I havn`t got this warnings and the cluster starts.

But now I have notice another problem:
When I have my two nodes down. I switch on one node and everithing starts well. The cluster waits for one minute because I put in the configuration file AUTO_START_TIMEOUT to one minute.
But after this, after the login screen. I do cmviewcl and I found the cluster in "unknown" state.
If I try to make a cmruncl, sais that the cluster is waiting for the other node to start.

There is another parameter to say to the cluster, not to wait to the other node?
If the cluster and both nodes are up and runing, and I switch off one node... everithing works OK... so I don`t know if ther is a locklun problem. I don`t thing so.

Thanks a lot another time.
Serviceguard for Linux
Honored Contributor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

This is the way it is designed.

A "single node" cluster will not start by itself. It will wait for another node. If you want a single node to start by itself, you can force a cluster to start manually. (I'm not 100% sure but it may be cmruncl).
SnIphe
Advisor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

Ok, you are right, thats it the correct function of SG.
I can do cmruncl -n and the cluster will work with a single node.

OK thanks for the reply.

At this point, I want to know if there is some script, wich I can put in the init level, and I can make automatic the comand cmviewcl -n if the other node is down.
Because this cluster is going to a farm without workers. And I need to take the control of the cluster if the power fails and only one node starts.

I`m sure there is some script wich can solve this problem.

Thanks a lot.
Matti_Kurkela
Honored Contributor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

If you automate the "cmruncl -n" command, your automation *MUST* include some way to ensure that the other node of the cluster is definitely down.

If the other node is active but isolated by a network problem, your "cmruncl -n" will effectively force a split-brain situation in the cluster: both nodes will access the package filesystems simultaneously, each node assuming that it's the only one. This will rapidly lead to filesystem corruption, which is not fixable by any means other than restoring from a backup.

This is exactly the reason why any ServiceGuard cluster node does not start in a single-node mode automatically.

MK
MK
SnIphe
Advisor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

Ok, I understando you, this sounds very logical. Thanks.

But, If I put something like this in the cluster.init script?

#
# Check to see if the daemon is already running
#
findproc cmcld
if [ "$pid" = "" ]
then
#
# The daemon isn't running already
#
isnodeup ingrids2
if [ "$node_status" = "down" ]
then
action "El nodo ingrids2 esta abajo, levantamos el cluster solo con el nodo ingrids1"
${SGSBIN}/cmruncl -v -f -n ingrids1
exit 0
fi

if [ -f ${SGSBIN}/cmrunnode ]
then
#
# Attempt to join the cluster
#

You know, I ask if the node "indgrids2" is up, and in the other node I ask for the "ingrids1"...
This works OK, but... If one node starts over this situation, and later I run the other node... I can find the problem you told me. It`s that right??

Thanks a lot.
skt_skt
Honored Contributor

Re: Problem ServiceGuard in a MSA500 Cluster Pack

SPlit brian syndrom wont come when you run the cmrunnode to make the failed node again part of the cluster.when the failed node comes back online both of the nodes will be able to communicate each other then.