Operating System - Linux
1832647 Members
2793 Online
110043 Solutions
New Discussion

RH update 2 Clustering configuration

 
Steven E. Protter
Exalted Contributor

RH update 2 Clustering configuration

We are in the tesing phase of RH update 2 clustering with gfs.

We have set up a cluster with two GFS filesystems on shared storage and are encountering some unusual behavior.

dlm configuration the cluster comes up with the configuration file below. Problem is that the cluster does not fail over as configured. We bring down node 1 and node 2 just sits there instead of coming online and mounting the filesystems.

We have the standard documents form the RH site and are working on the issue. This post is not complete, but I will add to it as I gather details.

Configuration file:
-----------
































----------------

Both nodes have it.

In the morning(IST), I'll provide details from the syslog.

What I want to know is the following:
1) Anyone have RH 4 update 2 clustering with GFS in production. If so, can you share a conf file?
2) If you do not believe the software is production quality let me know.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
11 REPLIES 11
Ivan Ferreira
Honored Contributor

Re: RH update 2 Clustering configuration

Hi SEP, are you using Red Hat GFS? Is supposed that all nodes can access to the file system simultaneously, you don't have to umount and mount it.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Steven E. Protter
Exalted Contributor

Re: RH update 2 Clustering configuration

gfs_mount for access though right? Otherwise how does access begin?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ivan Ferreira
Honored Contributor

Re: RH update 2 Clustering configuration

You should run:

mount -t gfs BlockDevice MountPoint -o option

For example:

mount -t gfs /dev/pool/pool0 /gfs1

For initial access.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Steven E. Protter
Exalted Contributor

Re: RH update 2 Clustering configuration

update:

Cluster is now functional. Its very important NOT to have two NIC cards coming up on the same network unless they are bonded.

My original issue was caused by this and cman would not start on both nodes. The NIC involved is very flakey and unknown to me had magically revived itself and gotten an IP address from DHCP.

This scenario is very, very, very bad.

Now we're having fancing troubles.

When the fence to a brocade switch locks a port, it stays locked unless someone manually intervenes to stop the lock.

We have been told a script can be written to reset fencing locks on brocade switch ports. There is a bunny in it for someone that submits a working script.

Also:

We'd like to know if anyone is using APC power switches as fence devices. If so we'd like to know the model number that is in use and see configuration files if possible.

Also II:
HP server ilo is supported.

IBM servers Slimfast Remote supervisor adapter. This appears to not be supported at this time by RH clustering. Anyone using it as a fence device anyway? If so, how, scripts and config files earn bunnies. Anyone have a solid doc from RH on whether this type of fencing device is in use and how.

Our conclusion at this time is that dlm locking is production quality and glum locking is not. Opinions?

Lots and lots of points available here.

Inquring minds want to know.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ivan Ferreira
Honored Contributor

Re: RH update 2 Clustering configuration

Brocade switches have Linux OS. SSH is enabled, so you can try something like:

ssh admin@sanswitch portenable portnumber

The only problem will be the password specification. Maybe you are able to generate a public key pair without passphrase if you logon to the sanswitch as root.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Steven E. Protter
Exalted Contributor

Re: RH update 2 Clustering configuration

Seems we're running into RH functionality.

If node1 detects a problem on node2 it fences node2 off the storage. This makes it inoeprable but does not force a boot.

Potentially node2 can still be online and holding tight to a floating IP address that node1 needs to handle failover properly.

The answer seems to be a custom script like what was suggested in the prior post or using iLo fencing, which would boot node2 immediately causing failover to node1.

Problem is IBM's equivalent of ILO, and for that matter Dell's is not supported by RH.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: RH update 2 Clustering configuration

Interesting problem.

Cluster was built on machine:
linux1
Second node was added called:
linux2

No matter where the packages are running if a normal shutdown -ry is run on either node, they fail over properly if needed. Service remains online.

If I power switch linux2, all packages fail over to linux1 in a reasonable time.

If I power switch linux1, the cluster freezes up.

clustat produces no results.

fencing is manual and is posted above. The problem is created by the manual fence. It fences off linux1 but linux2 can not function.

There is supposed to be a command that can be run that can force the cluster to continue running in the circumstance that its frozen.

I have two bunny eligible questions:

What command can I use to force linux2 to take over the cluster? Or what change can I make to make this automatic?

attaching a more current cluster.con file.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: RH update 2 Clustering configuration

Duh,

Command is fenc_ack_manual -n

Why the heck doesn't the cluster configuration do this itself?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: RH update 2 Clustering configuration

Type much?
fence_ack_manual -n

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: RH update 2 Clustering configuration

fence_ack_manual -O -n

The dash O bypassed the manual aknowledgement.

Don't use this in a cluster that includes shared storage.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: RH update 2 Clustering configuration

Anyone ever run samba in one of these guys?

How do you handle the net join issues?

I'm not talking to myself I hope. Even a comment with garner you folks a point or two.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com