Operating System - Linux
1839236 Members
4191 Online
110137 Solutions
New Discussion

Red Hat Cluster suite 5 question

 
SOLVED
Go to solution

Red Hat Cluster suite 5 question

Hi All,

Cluster package unable to failover when node 1 when press the power button directly. Even though has already upgrade the ilo firmware from 1.30 to 1.42.

27 REPLIES 27

Re: Red Hat Cluster suite 5 question

Is heartbeat is still been use in red hat cluster suite 5.
Appreciate anyone that reply and help.

thanks.
Ivan Ferreira
Honored Contributor

Re: Red Hat Cluster suite 5 question

Probably node 2 cannot fence node 1 properly. What fence device are you using?
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?

Re: Red Hat Cluster suite 5 question

Hi,

I'm using hp ilo to fence both node. Both node 1 and node to fence to another server hp ilo.

thanks.
Ivan Ferreira
Honored Contributor

Re: Red Hat Cluster suite 5 question

What do you get in /var/log/messages, have you confirmed that fencing is working?
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Steven E. Protter
Exalted Contributor

Re: Red Hat Cluster suite 5 question

Shalom,

Test manually that you can connect to ilo and do a power reset on the server.

RH4 required a password be hard coded in the cluster.conf file. Use that password to manually test ilo.

This could be a simple problem of not having the correct authentication. It could also be a bad ilo card.

service fenced status

both nodes please.

iLo cards need to be on the same network and be able to talk to each other.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Re: Red Hat Cluster suite 5 question

Hi all,

How should my fence physical connection to be. As, Physical connection is that both node only use 2 network port to do bonding and i just use another server ilo port to fence both node. Is this workable. All the ipaddress has been configure in the same network/ segment.

Thanks.

Ivan Ferreira
Honored Contributor

Re: Red Hat Cluster suite 5 question

It does not matter as long as the ILO and your public network are in the same subnet. What about the messages? Is your ILO configuration correct?
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?

Re: Red Hat Cluster suite 5 question

I have check the configuration few times. I'm sure i have configure properly. In the cluster log there is no error for the ilo. Besides cluster log is there another log for ilo?

Re: Red Hat Cluster suite 5 question

this cluster suite is making me go crazy. Today try to setup the fence device by using HP ilo where by using the both node ilo port. Someone has hardcode the ilo on node 2(ml370 g3)now everytime i press f8 it will prompt for username and password. did ask the owner on that machine but don't remember the username and password. Is there a way to reset the ilo.?
Steven E. Protter
Exalted Contributor

Re: Red Hat Cluster suite 5 question

Shalom,

To reset ilo to factory defaults.

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=845909

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Re: Red Hat Cluster suite 5 question

Hi all,

Is this normal when i configure hp-ilo on the same segment with the public ip. When press the power button on node 1, the node when down and few seconds later the node power up by itself and the service only failover to node 2.
Ivan Ferreira
Honored Contributor
Solution

Re: Red Hat Cluster suite 5 question

I think it is, as when you power down node 1, node 2 will try to fence it. By default, fence_ilo reboots the server, according to man fence_ilo, there is an option to power down the server instead of restarting.

And yes, it will failover to node 2, and unless you configure a failover group with priotiy, will stay there.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?

Re: Red Hat Cluster suite 5 question

Hi Ivan,

I have man the fence_ilo. It the descrition it say's that fence_ilo -0 is to set the option, But where should i set it, just open a teminal and set it or do i need to hard code it somewhere? e.g cluster config file / fence_ilo script.

Re: Red Hat Cluster suite 5 question

After changing the action from reboot to off in fence_ilo script, now when press the power button the service able to failover. If the power cable have been plug out then the service wont failover. is this the limitation for fence_ilo.?
Ivan Ferreira
Honored Contributor

Re: Red Hat Cluster suite 5 question

Yes, as the remaining node cannot confirm that the other was fenced. You can then configure other fence device, as brocade if you use sanswitch.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?

Re: Red Hat Cluster suite 5 question

Facing another problem here. Base on the /var/log/messages log show that after 30 mins the cluster service restart the application by itself an return error code 1. how do i debug this error message.



error log

Mar 28 07:25:15 BA-GW1 clurgmgrd: [4751]: script:MDCS_QUERY_SERVICE: status of /etc/init.d/queryService failed (returne
d 1)
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: status on script "MDCS_QUERY_SERVICE" returned 1 (generic error)
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Stopping service service:MDCS_QUERY_SERVICE
Mar 28 07:25:15 BA-GW1 clurgmgrd: [4751]: script:MDCS_GATEWAY_SERVICE: status of /etc/init.d/mdcs failed (returned 1)
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: status on script "MDCS_GATEWAY_SERVICE" returned 1 (generic error)
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Stopping service service:MDCS_GATEWAY_SERVICE
Mar 28 07:25:15 BA-GW1 clurgmgrd: [4751]: script:MDCS_DBLOADER_SERVICE: status of /etc/init.d/dbloader failed (returned
1)
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: status on script "MDCS_DBLOADER_SERVICE" returned 1 (generic error)
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Service service:MDCS_QUERY_SERVICE is recovering
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Stopping service service:MDCS_DBLOADER_SERVICE
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Service service:MDCS_GATEWAY_SERVICE is recovering
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Recovering failed service service:MDCS_QUERY_SERVICE
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Service service:MDCS_DBLOADER_SERVICE is recovering
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Recovering failed service service:MDCS_GATEWAY_SERVICE
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Service service:MDCS_QUERY_SERVICE started
Mar 28 07:25:15 BA-GW1 clurgmgrd[4751]: Recovering failed service service:MDCS_DBLOADER_SERVICE
Mar 28 07:25:16 BA-GW1 clurgmgrd[4751]: Service service:MDCS_GATEWAY_SERVICE started
Mar 28 07:25:16 BA-GW1 clurgmgrd[4751]: Service service:MDCS_DBLOADER_SERVICE started



Attach also is a sample script of a script is use to start the application.
Ivan Ferreira
Honored Contributor

Re: Red Hat Cluster suite 5 question

script:MDCS_QUERY_SERVICE: status of /etc/init.d/queryService failed (returne
d 1)


You must identify why the status part of the script is not returning 0. Run the script manually outside red hat cluster suite and ensure that you always get exit code 0.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?

Re: Red Hat Cluster suite 5 question

Sorry posted the old script. Here attach is the new script that is use with cluster.

Re: Red Hat Cluster suite 5 question

My script actually locate in /etc/init.d. When it run with cluster the service restarted by itself after 30mins or less. Have try to run the script without using the cluster, it seems everything is normal and it does not stop or restart and there is no error. Took out the script from /etc/init.d, place it in another directory and run with cluster the service restarted by itself after 22mins.
Ivan Ferreira
Honored Contributor

Re: Red Hat Cluster suite 5 question

I would try logging the status check adding a redirection to service_check.log for example:

statuses() {
cd_path
if [ -e $pidpath ]; then
if [ `cat $pidpath` -eq `.script/getGateway|.script/clusterpid` ]; then
echo -n "Process is running fine" | tee /var/log/service_check.log;
echo
exit 0
fi
fi
echo -n "Process is not running...";
echo
exit 1
}
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
skt_skt
Honored Contributor

Re: Red Hat Cluster suite 5 question

what would be the diffeence when i use self_fence value zero or one











Re: Red Hat Cluster suite 5 question

Hi Ivan,

Have test out, with your suggestion and add the particular line into the script where tee -a /var/log/service_check.log. Run the script using cluster service the service_check.log appear and i can see the process is running keep appending in the service_check.log using tail -f command. Also i can see when it say process is not running one time when the service is restarting. Second scenario is to start the script without using cluster service, the service_check.log does not appear in /var/log. Is this confirm that the script is having problem to check the status by itself.?
Ivan Ferreira
Honored Contributor

Re: Red Hat Cluster suite 5 question

Have test out, with your suggestion and add the particular line into the script where tee -a /var/log/service_check.log. Run the script using cluster service the service_check.log appear and i can see the process is running keep appending in the service_check.log using tail -f command. Also i can see when it say process is not running one time when the service is restarting.

Then yes, is a problem with the status section of the script.

Second scenario is to start the script without using cluster service, the service_check.log does not appear in /var/log.

The log file won't appear unless you run the script with the status option.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?

Re: Red Hat Cluster suite 5 question

Nodes unable to boot up after changing all the connection to the core switch. Initially all tested ok able to boot up, able to failover by restart the machine, even also able to failover by pressing the power button. Once change it to all the connection to core switch, the server unable to boot up. When it bootup in will go down once it the fenced daemon start. When isolate the lan this problem never occur. Really no idea why is this happen. Anyway just refresh here is i'm using ilo as fence device.