General
cancel
Showing results for 
Search instead for 
Did you mean: 

RH Cluster 4.5 without shared storage - cluser cannot start

SOLVED
Go to solution
Vitaly Karasik_1
Honored Contributor

RH Cluster 4.5 without shared storage - cluser cannot start

I deployed RH cluster 4.5 on two nodes without shared storage. As far as I understand, it is supported configuration.
On both nodes into syslog I see:

ccsd[xxxx]: Unable to connect to cluster infrastructure after xxx seconds

and system-config-cluster says that nodes aren't part of cluster.

What may be a reason for this error?

8 REPLIES
Steven E. Protter
Exalted Contributor

Re: RH Cluster 4.5 without shared storage - cluser cannot start

Shalom,

I emailed you.

I just fixed this problem yesterday.

Change your boot kernel to .55 with no updates.

uname -a Linux tekoa 2.6.9-55.EL

Thats the good one. After a full up2date or yum (If you use Centos) the kernel is left at 2.6.9.55.02

For some reason that won't work with the cluster suite. You don't need shared storage, ilo is good, but I've got cluster monitor daemon's and scripts posted to ITRC and can get them from the actual systems if you need.

Nice to hear from you.

Shmuel
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Vitaly Karasik_1
Honored Contributor

Re: RH Cluster 4.5 without shared storage - cluser cannot start

thanks!

I use 2.6.9-42.0.3.EL kernel from RHEL and cluster suite from CENTOS repositories.
ccsd starts OK, but reports about connect problem after a few seconds:

Jul 18 10:25:33 node1 ccsd[3211]: Starting ccsd 1.0.10:
Jul 18 10:25:33 node1 ccsd[3211]: Built: Jun 17 2007 17:38:08
Jul 18 10:25:33 node1 ccsd[3211]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Jul 18 10:25:33 node1 ccsd: succeeded
Jul 18 10:26:02 node1 ccsd[3211]: Unable to connect to cluster infrastructure after 30 seconds.
Steven E. Protter
Exalted Contributor
Solution

Re: RH Cluster 4.5 without shared storage - cluser cannot start

I'm bouncing from email back to here.

Centos never released the .10 patch set for clustering but did release it for the OS. This broke clustering rather nicely.

You caught me 5 days before my planned
CentOS upgrade in the US. I did half the systems yesterday and will be doing the other half in the US on Monday. Two clusters there.

Here is a good working combination

shalom1.investmenttool.com 2.6.9-42.0.3.ELsmp #1 SMP Fri Oct 6 06:21:39 CDT 2006 i686 i686 i386 GNU/Linux
kernel-2.6.9-42.EL
kernel-hugemem-2.6.9-42.0.3.EL
kernel-smp-2.6.9-42.EL
kernel-ib-1.0-1
kernel-smp-2.6.9-42.0.3.EL
kernel-2.6.9-42.0.3.EL
dlm-kernel-2.6.9-44.3
cman-kernel-hugemem-2.6.9-45.8
dlm-kernel-hugemem-2.6.9-44.3
kernel-utils-2.4-13.1.83
cman-kernel-2.6.9-45.8
cman-kernel-smp-2.6.9-45.8
dlm-kernel-smp-2.6.9-44.3

Note the uname -a thats important.

SEP


Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: RH Cluster 4.5 without shared storage - cluser cannot start

Additional news:

My results were Dell poweredge.

I have a pair of HP VL systems at home with the same cluster setup.

I was never able to get them past .2 on the kernel.

You may need to experiment, based on your hardware.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Vitaly Karasik_1
Honored Contributor

Re: RH Cluster 4.5 without shared storage - cluser cannot start

>Centos never released the .10 patch set >for clustering but did release it for the >OS. This broke clustering rather nicely.
what is .10 patch set?
I used the latest releases from the CENTOS4.5 csgfs repository.


>Here is a good working combination
>2.6.9-42.0.3.ELsmp #1 SMP Fri Oct 6
>kernel-2.6.9-42.0.3.EL
>dlm-kernel-2.6.9-44.3
>kernel-utils-2.4-13.1.83
>cman-kernel-2.6.9-45.8
>dlm-kernel-smp-2.6.9-44.3

I have the same versions of kernel and cman.
Are you sure we need dlm for my configuration?
Steven E. Protter
Exalted Contributor

Re: RH Cluster 4.5 without shared storage - cluser cannot start


Sorry, I was in a hurry.

kernel-2.6.9-42.0.10.EL

Also for hugemem and smp.

They released the OS versions of the kernel.

There need to be equivalent versions in RHCS to match this kernel. Red Hat released a series of RHCS patch sets, CentOS did not bother to port them.

So when you yum your system it breaks clustering because RHCS can't work with:
kernel-2.6.9-42.0.10.EL or the smp or hugemem versions.

Its a relatively major problem and all efforts I've made to contact CentOS have failed.

Shmuel
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Vitaly Karasik_1
Honored Contributor

Re: RH Cluster 4.5 without shared storage - cluser cannot start

Shmuel, thank you for your help!
Installing "dlm" solved my problem.
For some reason I was under wrong opinion that I don't need dlm if I don't use GFS/CLVM....

Vitaly Karasik_1
Honored Contributor

Re: RH Cluster 4.5 without shared storage - cluser cannot start

problem solved by installing "dlm" packages.
Thanks to Shmuel Protter & "4U5 CSS/CMAN/fence quorum confusion" thread (http://www.spinics.net/lists/cluster/msg08953.html)