Rehat AS3 Update 6 Cluster suite

Mike Hedderly · ‎01-20-2006

I am running redhat cluster suite on 2 dl380 servers. when running clustat one member remains fairly stable but the second member switches from active to inactive every 5 seconds or so. I have bonded my two nics together and enable 8021q trunk so I have my cluster traffic on bond0.13. I can always ping each cluster member. This is the error i get in the messages file

Jan 20 15:49:56 ralph clusvcmgrd[4311]: State change: huey-c UP
Jan 20 15:49:57 ralph clumembd[4144]: Member huey-c DOWN
Jan 20 15:49:58 ralph clumembd[4144]: Membership View #7350:0x00000001
Jan 20 15:49:59 ralph cluquorumd[4119]: --> Commencing STONITH <--
Jan 20 15:49:59 ralph cluquorumd[4119]: STONITH: Falsely claiming that
huey-c has been fenced
Jan 20 15:49:59 ralph cluquorumd[4119]: STONITH: Data integrity may be co
mpromised!
Jan 20 15:50:00 ralph clusvcmgrd[4311]: Quorum Event: View #12657 0x00000
001
Jan 20 15:50:00 ralph clusvcmgrd[4311]: State change: huey-c DOWN
Jan 20 15:50:08 ralph clumembd[4144]: Member huey-c UP
Jan 20 15:50:12 ralph clumembd[4144]: Member huey-c DOWN

Mike Hedderly · ‎01-20-2006

some further information. I am running clumanager-1.2.28-1 and redhat-config-cluster-1.0.8-1

I do not have any power switches and the external disks are on an MSA100 via a fibre chanel.

Steven E. Protter · ‎01-21-2006

Shalom Mike,

I don't think you've fully configured the cluster.

STONITH: Falsely claiming that
huey-c has been fenced

Shoot
The
Other
Node
In
The
Head

Its trying to shut down the other node becasue it thinks its down or there is a risk of data corruption.

Checklist:
MSA1000 firmware up to date
sansurfer package on both servers to check the state of shared storage
shared storage is configured so the sd# devices are the same on both nodes.
Firmware on the qlogic cards is the same on all cards, all servers and reasonably up to date.
Cluster configuration files.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Vitaly Karasik_1 · ‎01-21-2006

I suggest you re-check cluster configuration according to RH doc http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/cluster-suite/ch-software.html
(Chapter 3)
Rgds,
Vitaly

John McNulty_2 · ‎01-21-2006

Thanks for the advise but we found the problem. The STONITH errors were a red herring. This cluster has no Power Switches so its not possible to STONITH a node that the cluster perceives has changed to a "down" state.

The cause of "huey" dropping in and out of the cluster every few seconds turned out to be a clash between two Redhat clusters using the same 255.0.0.11 multicast address elsewhere on the same network. We changed the multicast address to be unique, reloaded the config, restarted the cluster and the problem has gone away. The cluster is stable now.

Would have been nice for Redhat to have reported this somewhere. We only discovered what was going on after pinging the multicast address and seeing more DUP responses than we were expecting and from IP addresses belonging to the other Redhat cluster.

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Rehat AS3 Update 6 Cluster suite

Rehat AS3 Update 6 Cluster suite

Re: Rehat AS3 Update 6 Cluster suite

Re: Rehat AS3 Update 6 Cluster suite

Re: Rehat AS3 Update 6 Cluster suite

Re: Rehat AS3 Update 6 Cluster suite