Operating System - HP-UX
1837103 Members
2391 Online
110112 Solutions
New Discussion

Re: very strange issue with service guard

 
SOLVED
Go to solution
nobleboi
Advisor

very strange issue with service guard

# remsh rhino2 /usr/sbin/swlist | grep -i service
T1905BA A.11.17.00 Serviceguard
# pwd
/etc/cmcluster
#
# uname -r
B.11.23
#

am trying to setup a single node cluster and am getting this error.

# cmcheckconf -v -C cmclconfig.ascii
Checking cluster file: cmclconfig.ascii
Checking nodes ... Done
Checking existing configuration ... Done
This node is at revision A.11.17.00 of Serviceguard, node rhino2 is at A.11.16.00.
Unable to make configuration changes when a node in the cluster is at a different revision.
#

22 REPLIES 22
likid0
Honored Contributor

Re: very strange issue with service guard

Hy,

Could you please your ascii config to check it out?

how did you generate the cmclconfig.ascii? using cmquerycl ?
Windows?, no thanks
nobleboi
Advisor

Re: very strange issue with service guard

yes, the ascii file was generated using "cmquerycl -v -C cmclconfig.ascii -n node2 " command.

further, it has not identified any lock disk in the ascii file....
nobleboi
Advisor

Re: very strange issue with service guard

see this....i tried it two times without success...and third time it works...but with tht network error...

linkloop is successfull..



# ll
total 64
drwxr-xr-x 2 bin bin 8192 Feb 18 00:41 cfs
-rw-rw-rw- 1 root sys 7816 Feb 18 02:21 cmclconfig.ascii
-rwxrwxrwx 1 root sys 25 Feb 18 02:06 cmclnodelist
-r-------- 1 bin bin 70 Sep 24 2005 cmknowncmds
# cmcheckconf -v -C cmclconfig.ascii
Checking cluster file: cmclconfig.ascii
Checking nodes ... Done
Checking existing configuration ... Done
This node is at revision A.11.17.00 of Serviceguard, node node2 is at A.11.16.00.
Unable to make configuration changes when a node in the cluster is at a different revision.
#
# cmcheckconf -v -C cmclconfig.ascii
Checking cluster file: cmclconfig.ascii
Checking nodes ... Done
Checking existing configuration ... Done
This node is at revision A.11.17.00 of Serviceguard, node node2 is at A.11.16.00.
Unable to make configuration changes when a node in the cluster is at a different revision.
# ksh
# ./create_depot_hpux.11.23
# cmcheckconf -v -C cmclconfig.ascii
Checking cluster file: cmclconfig.ascii
Checking nodes ... Done
Checking existing configuration ... Done
Gathering storage information
Found 2 devices on node node2
Analysis of 2 devices should take approximately 1 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Found 1 volume groups on node node2
Analysis of 1 volume groups should take approximately 1 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
Gathering network information
Beginning network probing (this may take a while)
Completed network probing
Network interface lan0 on node node2 couldn't talk to itself.
Network interface lan1 on node node2 couldn't talk to itself.
Network interface lan2 on node node2 couldn't talk to itself.
Checking for inconsistencies
No bridged net specified for NM_ID 0 at node2
No bridged net specified for NM_ID 0 at node2
No bridged net specified for NM_ID 0 at node2
cmcheckconf: Unable to verify cluster file: cmclconfig.ascii.



# lanscan
Hardware Station Crd Hdw Net-Interface NM MAC HP-DLPI DLPI
Path Address In# State NamePPA ID Type Support Mjr#
0/1/2/0 0x001321BD3E8D 0 UP lan0 snap0 1 ETHER Yes 119
0/2/1/0 0x00306EF5C6EF 1 UP lan1 snap1 2 ETHER Yes 119
0/6/1/0 0x00306EF5E6A1 2 UP lan2 snap2 3 ETHER Yes 119
# linkloop -i 0 0x001321BD3E8D
Link connectivity to LAN station: 0x001321BD3E8D
-- OK
# linkloop -i 1 0x00306EF5C6EF
Link connectivity to LAN station: 0x00306EF5C6EF
-- OK
# linkloop -i 2 0x00306EF5E6A1
Link connectivity to LAN station: 0x00306EF5E6A1
-- OK
#
likid0
Honored Contributor

Re: very strange issue with service guard

With only one node in the cluster there is no need for cluster lock, so there is no problem there.

In the network configuration:

NODE_NAME node2
NETWORK_INTERFACE lan0
STATIONARY_IP 172.16.231.49
NETWORK_INTERFACE lan1
HEARTBEAT_IP 172.16.100.49
NETWORK_INTERFACE lan2

You should change lan0 and put:

NODE_NAME node2
NETWORK_INTERFACE lan0
HEARTBEAT_IP 172.16.231.49
NETWORK_INTERFACE lan1
HEARTBEAT_IP 172.16.100.49
NETWORK_INTERFACE lan2

on the other hand, it says than lan2 is a failover for lan1, that lan0 doesn't have any
standby interfaces...

You should check conectivity between the 3 interfaces with linkloop
Windows?, no thanks
nobleboi
Advisor

Re: very strange issue with service guard

tried that...and getting the same error.

linkloop is already verified for these three interfaces as you see above...

Re: very strange issue with service guard

Hi,

some of what you have written here is confusing me...

-is this host called 'node2' or 'rhino2'?
-what does hostname return?
-was this previously part of another cluster?
-are cluster services actually running on this host at the moment - what does cmviewcl return?
-is there a cmclconfig file in /etc/cmcluster?


All this suggests that this node and rhino2 were previously in a cluster which wasn't dismantled correctly before you attempted to create a new single node cluster...did 'cmdeleteconf' get run when the previous cluster was dismantled?

HTH

Duncan

I am an HPE Employee
Accept or Kudo
melvyn burnard
Honored Contributor
Solution

Re: very strange issue with service guard

well your linkloop shows all ok, but from, your symptoms, I almost suspect there is another node out on your network that i sresponding. I assuem you are running these commands on the ndoe node2?
If you do nslookup node2, do you get teh ip address you expect?
And then nslookup using the ip address?
Are you using /etc/hosts or dns?
I would suspect duplicate ip address or duplicate hostname right now
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
nobleboi
Advisor

Re: very strange issue with service guard

i had verified the name resolution part in da begining itself.

am using files for name resolution and nslookup is showing fine too.

i just brought down the lan0 and lan1 interfaces and try to ping into those ips to see if there is any server with duplicate ips and found tht there is nothing..
nobleboi
Advisor

Re: very strange issue with service guard

hi duncan,
(actual node name is rhino2, was editing it as node2 manualy for the forum)

1)-
# hostname
node2
#

2)-
nope, its a new installation....

3)-
# cmviewcl
cmviewcl: Cannot view the cluster configuration: No such file or directory.
Either this node is not configured in a cluster, user doesn't have
access to view the cluster configuration, or there is some obstacle
to viewing the configuration. Check the syslog file for more information.
For a list of possible causes, see the Serviceguard manual for cmviewcl.
#

4)-
i touched a cmclconfig file in /etc/cmcluster as i was getting this error in syslog

Feb 18 03:47:43 node2 cmclconfd[6704]: Unable to stat /etc/cmcluster/cmclconfig, No such file or directory



Siju Vadakkan
Trusted Contributor

Re: very strange issue with service guard

You don't need to specify heart LAN and IP in the cluster ascii file since there no other node to communicate.

Remove the below entry from your cluster ascii file.

HEARTBEAT_IP 172.16.100.49
NETWORK_INTERFACE lan2

and then execute
#cmcheckconf -v -C cluster ascii file

if it still fails try


#cmcheckconf -v -w none -C cluster ascii file

-w none will avoid network probing
F Verschuren
Esteemed Contributor

Re: very strange issue with service guard

Hi,
looks like two differend instalations, please check the version of the software on bothe systems.
service gard needs to be on the same level on both nodes...

#########################
swlist -l product |grep ServiceGuard
ServiceGuard A.11.16.00 ServiceGuard
#########################
It looks likes that you have one server on 16 and one on 17....
solution: upgrade to 11.17
F Verschuren
Esteemed Contributor

Re: very strange issue with service guard

sorry, I dit not saw that you want a singe node clusers,
it seems that your cmclconfig is corupted, you can try to move this one reboot and try again. somehow old stuff seems to be cept n there and I do not kwow a other way to remove to corupted cluster config els....

Re: very strange issue with service guard

I'm still concerned about that reporting of different versions of Serviceguard - it either suggests a corrupt installation of Serviceguard, or a Melvyn mentioned the system is trying to connect off to another host for some reason.

A couple more things to try:

1. If you have access to the console, try pulling *all* the LAN connections and run the cmcheckconf again - does it still print the message about different versions? (other things might fail, but does the version message go away?)

2. What does the command 'cmversion' return?

3. Look through the output of 'what /usr/sbin/cm*' - does everything look consistent at version 11.17?

4. Could you post the file created by running 'cmscancl -n node2'

HTH

Duncan

I am an HPE Employee
Accept or Kudo
nobleboi
Advisor

Re: very strange issue with service guard

I had changed the ip for node2 and tried running cmcheckconf, which is stil reporting the version difference, but it goes thro' after couple of attempts:

here is the output after disableing the lancards from console :

# cmcheckconf -v -C cmclconfig.ascii
Checking cluster file: cmclconfig.ascii
Checking nodes ... Done
Checking existing configuration ... Done
Node rhino2 is refusing Serviceguard communication.
Please make sure that the proper security access is configured on node
rhino2 through either file-based access (pre-A.11.16 version) or role-based
access (version A.11.16 or higher) and/or that the host name lookup
on node rhino2 resolves the IP address correctly.
cmcheckconf: Failed to gather configuration information
#

# cmversion
A.11.17.00
#

cmscancl output

HP-UX node2 B.11.23 U 9000/800 3015666796 unlimited-user license

(Mon Feb 18 22:33:52 WAT 2008)

A.11.17.00 Date: 09/23/05 (node2)

Version: B.04.00.00 (node2)


################# INFORMATION FOR THE NODE node2 ###############


------ Output of lanscan (node2) ------

Hardware Station Crd Hdw Net-Interface NM MAC HP-DLPI DLPI
Path Address In# State NamePPA ID Type Support Mjr#
0/1/2/0 0x001321BD3E8D 0 UP lan0 snap0 1 ETHER Yes 119
0/2/1/0 0x00306EF5C6EF 1 UP lan1 snap1 2 ETHER Yes 119
0/6/1/0 0x00306EF5E6A1 2 UP lan2 snap2 3 ETHER Yes 119

------ Auto Port Aggregation information (node2) ------

------ Output of netstat -in (node2) ------

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lan1 1500 172.16.100.0 172.16.100.55 74665 0 21901 0 0
lan0 1500 172.16.231.0 172.16.231.49 514 0 401 0 0
lo0 4136 127.0.0.0 127.0.0.1 2516 0 2516 0 0

------ Output of netstat -i (node2) ------

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lan1 1500 172.16.100.0 node2 74665 0 21901 0 0
lan0 1500 172.16.231.0 172.16.231.49 514 0 401 0 0
lo0 4136 loopback localhost 2516 0 2516 0 0

------ Output of netstat -r (node2) ------

Routing tables
Destination Gateway Flags Refs Interface Pmtu
localhost localhost UH 0 lo0 4136
node2 node2 UH 0 lan1 4136
172.16.231.49 172.16.231.49 UH 0 lan0 4136
172.16.231.0 172.16.231.49 U 2 lan0 1500
172.16.100.0 node2 U 2 lan1 1500
loopback localhost U 0 lo0 0
default 172.16.100.100 UG 0 lan1 0

------ Output of mount (node2) ------

/ on /dev/vg00/lvol3 ioerror=nodisable,log,dev=40000003 on Mon Feb 18 19:25:53 2
008
/stand on /dev/vg00/lvol1 defaults,dev=40000001 on Mon Feb 18 19:25:54 2008
/var on /dev/vg00/lvol8 ioerror=mwdisable,delaylog,nodatainlog,dev=40000008 on M
on Feb 18 19:25:58 2008
/usr on /dev/vg00/lvol7 ioerror=mwdisable,delaylog,nodatainlog,dev=40000007 on M
on Feb 18 19:25:58 2008
/tmp on /dev/vg00/lvol6 ioerror=mwdisable,delaylog,nodatainlog,dev=40000006 on M
on Feb 18 19:25:58 2008
/opt on /dev/vg00/lvol5 ioerror=mwdisable,delaylog,nodatainlog,dev=40000005 on M
on Feb 18 19:25:58 2008
/home on /dev/vg00/lvol4 ioerror=mwdisable,delaylog,nodatainlog,dev=40000004 on
Mon Feb 18 19:25:58 2008
/net on -hosts ignore,indirect,nosuid,soft,nobrowse,dev=2000000 on Mon Feb 18 19
:26:35 2008

------ Output of strings on lvmtab (node2) ------

/dev/vg00
/dev/dsk/c2t1d0
/dev/vglock
/dev/dsk/c7t0d2
/dev/dsk/c9t0d2
/dev/dsk/c10t0d2
/dev/dsk/c11t0d2
/dev/vg01
/dev/dsk/c7t0d1
/dev/dsk/c9t0d1
/dev/dsk/c10t0d1
/dev/dsk/c11t0d1

------ Output of lvmpvg (node2) ------

cat: Cannot open /etc/lvmpvg: No such file or directory

------ Checking LOCAL network connections (node2) ------


(The linkloop command will test for link level connections between all LAN
hardware displayed by lanscan. A -- OK after the line means those two
devices can talk to each other. A (NO CONNECTION) after a line means
the two devices can not talk at the link (MAC) level. Network connectivity
check will not be performed for non-LAN hardware (HyperFabric, ATM. etc),
if any, since linkloop command is supported only for LAN hardware.)

------ lan0 to lan1 ------
PPA 0 link test to 0x00306EF5C6EF (NO CONNECTION)

------ lan0 to lan2 ------
PPA 0 link test to 0x00306EF5E6A1 (NO CONNECTION)

------ lan1 to lan0 ------
PPA 1 link test to 0x001321BD3E8D (NO CONNECTION)

------ lan1 to lan2 ------
PPA 1 link test to 0x00306EF5E6A1 -- OK

------ lan2 to lan0 ------
PPA 2 link test to 0x001321BD3E8D (NO CONNECTION)

------ lan2 to lan1 ------
PPA 2 link test to 0x00306EF5C6EF -- OK



------ Contents of the Binary Configuration File (node2) ------
cmviewconf: Either binary file does not exist, or the user doesn't
have access to view the cluster configuration.

Siju Vadakkan
Trusted Contributor

Re: very strange issue with service guard

Please provide the below output, if it is bin make it to root and try

# more /etc/inetd.conf | grep auth
auth stream tcp6 wait root /usr/lbin/identd identd
#
melvyn burnard
Honored Contributor

Re: very strange issue with service guard

I suggest you now log a call with your local HP Response Centre
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Emil Velez
Honored Contributor

Re: very strange issue with service guard

Bottom line.. No configuration changes can be made to nodes in a serviceguard cluster if they are at different versions of serviceguard. The first error you got.

You cannot build a cluster initially if the nodes are of different versions.

Siju Vadakkan
Trusted Contributor

Re: very strange issue with service guard

Here he is trying to build a single node cluster then how the version issue comes in pictiure.
Stephen Doud
Honored Contributor

Re: very strange issue with service guard

You wrote the following:

"i touched a cmclconfig file in /etc/cmcluster as i was getting this error in syslog"

cmclconfig is the name of the cluster binary file. It is ONLY created by a cmapplyconf. It cannot be created by merely 'touch'ing the file!!!! Doing so will confuse Serviceguard, as it expects the file, if it exists, to be populated with legitimate cluster information.

Since you were not aware of this, and since you have been floundering without remedy, I suggest you open a case with HP to make faster progress on the problem.
nobleboi
Advisor

Re: very strange issue with service guard

this issue was resolved after changing the hostname....which doesnt make any sense to me though !!
nobleboi
Advisor

Re: very strange issue with service guard

thank u for responding each one out there !
melvyn burnard
Honored Contributor

Re: very strange issue with service guard

Ah!, that goes back to my suspicion about another host.
I believe there may be another configured or inactive cluster out there with the same hostname that was responding to the cmquerycl/cmcheckconf requests.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!