System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Cluster error in RHEL 5 servers

 
Santosh Balan
Occasional Contributor

Cluster error in RHEL 5 servers

Hi Friends,

I am getting the following errors on the cluster at my site. I am using the RHEL 5 cluster suite on for HA of my database and web server. My cluster service at one point of a day restarts automatically. Can you please guide me on this issue. On checking my logs i.e. /var/log/messages it shows me the following info:

Jul 6 18:18:11 DB01 clurgmgrd: [4350]: Failed to ping xxx.xxx.xxx.xxx
Jul 6 18:18:11 DB01 clurgmgrd[4350]: status on ip "xxx.xxx.xxx.xxx" returned 1 (generic error)
Jul 6 18:18:11 DB01 clurgmgrd[4350]: Stopping service service:mysql
Jul 6 18:18:11 DB01 clurgmgrd: [4350]: Executing /etc/init.d/mysql stop
Jul 6 18:18:19 DB01 clurgmgrd: [4350]: Removing IPv4 address xxx.xxx.xxx.xxx from bond0
Jul 6 18:18:19 DB01 snmpd[2238]: Connection from UDP: [127.0.0.1]:36318
Jul 6 18:18:29 DB01 clurgmgrd: [4350]: unmounting /data
Jul 6 18:18:29 DB01 clurgmgrd[4350]: Service service:mysql is recovering
Jul 6 18:18:29 DB01 clurgmgrd[4350]: Recovering failed service service:mysql
Jul 6 18:18:30 DB01 clurgmgrd: [4350]: mounting /dev/mapper/vg01-DB on /data
Jul 6 18:18:30 DB01 kernel: kjournald starting. Commit interval 5 seconds
Jul 6 18:18:30 DB01 kernel: EXT3 FS on dm-3, internal journal
Jul 6 18:18:30 DB01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 6 18:18:30 DB01 clurgmgrd: [4350]: Adding IPv4 address xxx.xxx.xxx.xxx to bond0
Jul 6 18:18:31 DB01 clurgmgrd: [4350]: Executing /etc/init.d/mysql start
Jul 6 18:18:33 DB01 clurgmgrd[4350]: Service service:mysql started


Thanks in advance and expecting your reply at the earliest.

Thanks and Regards
Santosh Balan
9819419509
2 REPLIES
Steven E. Protter
Exalted Contributor

Re: Cluster error in RHEL 5 servers

Shalom Santosh,

Jul 6 18:18:11 DB01 clurgmgrd: [4350]: Failed to ping xxx.xxx.xxx.xxx
Jul 6 18:18:11 DB01 clurgmgrd[4350]: status on ip "xxx.xxx.xxx.xxx" returned 1 (generic error)

heartbeat failed on the cluster.

Then mysql was stopped on one node and started on a second node.

Check network connectivity:

cables
NIC
switch
switch configuration
logs on the switch if they exist.

Replace or repair accordingly.

If your heartbeat is on a public corporate network and not a private heartbeat network, collision or network congestion could be triggering this.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Santosh Balan
Occasional Contributor

Re: Cluster error in RHEL 5 servers

Hi Friends,

Thanks for your reply. However what I found out is that in one of my location with same installation this event is not happening. However in other three location the same above mentioned event is happening.

Anyways thanks for your support. Let me check on the afore said.

Thanks and Regards
Santosh Balan
9819419509