Operating System - HP-UX
1829103 Members
2688 Online
109986 Solutions
New Discussion

cmcheckconf and linkloop fail - but when cluster starts all is OK!

 
SOLVED
Go to solution
Dave Polshaw
Frequent Advisor

cmcheckconf and linkloop fail - but when cluster starts all is OK!

Hi. Had occasion to re-apply a cluster yesterday due to fibre disk changes. No changes at all to the network or it's configuration.

Performed the changes (replaced FC hubs with switches which changes HW paths out of interest.) Changed the ascii file to reflect the new lock disks and tries cmcheckconf. It failed with multiple lan errors. Checked via linkloop and low and behold - connection failures as detailed. Some OK. Some not.

Scrathed my head a while. Couldn't figure it out as all seemed to be in order. In pure frustration I started the cluster with the original config (I have not tried to apply the new conf yet) and low and belold - all the errors went away.

cmcheckconf reported no network errors. All the linkloops worked. Could not apply it though as disk changes require the cluster to be down. So...

Stopped the cluster. Tried cmcheckconf again. Guess what? Same errors! Linkloop? Same errors.
Started the cluster again. Errors went away...

Any ideas? Could I be going mad and posting to a psychiatric site instead???

Please help save my sanity...

:D
Knowledge speaks. Wisdom listens...
8 REPLIES 8

Re: cmcheckconf and linkloop fail - but when cluster starts all is OK!

Dave, thats very strange - never seen it before...

That said, here's a couple of things to try...

1. Serviceguard doesn't exactly do a linkloop, so there are some situations where linkloop might fail and Serviceguard actually works (and vice-versa). You should talk to HP support and try and get hold of the utility they have for checking these situations (I beleive its called 'dlping' or seomthing like that)

2. I've seen situations where the contents of an existing /etc/cmcluster/cmclconfig binary file prevent the creation of a new one. You could try (with the cluster down)renaming the existing files to cmclconfig.orig or whatever, and then try the cmcheckconf/cmapplyconf.

HTH

Duncan

I am an HPE Employee
Accept or Kudo
melvyn burnard
Honored Contributor

Re: cmcheckconf and linkloop fail - but when cluster starts all is OK!

OK, so what version of SG, and what patch level?
what were the errors?
if you do a cmquerycl -v -C (ascii file) and review the ascii file, do you see any differences? or errors?
alsol do a cmscancl to see if that rebveals anything.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Uday_S_Ankolekar
Honored Contributor

Re: cmcheckconf and linkloop fail - but when cluster starts all is OK!

Linkloop can be tested by cmscancl command. Try it out and see if it helps.

-USA..
Good Luck..
Dave Polshaw
Frequent Advisor

Re: cmcheckconf and linkloop fail - but when cluster starts all is OK!

Thanks guys. I have seen the existing binary affecting things as well. I was a little reluctant to delete the cluster in case I screwed anything up but I will try the rename in the new year. Will renaming the binaries work or do I realy need to do a cmdeletecl?

I will also get back with versions, patches etc. I know it is UX11.0 and not well patched but can't remember the SG version.

I did do a cmquerycl and the odd thing is it did see the NICs but it then gave me the old HW addresses of the lock disks:-( I assumed that was also a problem of having the original binaries in place.

Anyroads. Not doing any more now. Off for the festivities. Back (In body at least) on the 5th. I will check from home on the 2nd in case anyone feels inclined to add anything..;-)

All the best for the new year.

Cheers

Dave
Knowledge speaks. Wisdom listens...
Stephen Doud
Honored Contributor

Re: cmcheckconf and linkloop fail - but when cluster starts all is OK!

mv'ing /etc/cmcluster/cmclconfig effectively prevents that node from loading the cluster binary - and being aware of how the cluster is configured. However, the clustered VGs still maintain a cluster ID in their LVM VGDA - so a cmdeleteconf is preferable after copying the cmclconfig file to a backup file.

-SD.
Dave Polshaw
Frequent Advisor

Re: cmcheckconf and linkloop fail - but when cluster starts all is OK!

OK - ServiceGuard is 11.09. I think there may be more patches installed but the last patch bundle was March 2000. Anything specific to look for?

I have attached the cmscancl in txt format. This is BEFORE the disk changes and reflects the old configuration.

Cheers

:D
Knowledge speaks. Wisdom listens...
melvyn burnard
Honored Contributor
Solution

Re: cmcheckconf and linkloop fail - but when cluster starts all is OK!

Right, first thing to note is that 11.09 of SG is no longer supported, you should look at updating to 11.13 or 11.14.
Failing that, ensure you have the last patch for 11.09, PHSS_27158.

Second thing to note is hte settings:
node timeout: 70.00 (seconds)
heartbeat connection timeout: 20.00 (seconds)

These are way out of recommended limits, the maximum limit for node timeout is 30 seconds.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Dave Polshaw
Frequent Advisor

Re: cmcheckconf and linkloop fail - but when cluster starts all is OK!

That'll work for me, Melvyn:-) It was a customer site which I believe should have been upgraded but looks like t wasn't! My fault - I should have checked, but at least we now know that some major upgrading needs to occur before we can do any more troubleshooting.

Cheers

:D
Knowledge speaks. Wisdom listens...