Operating System - HP-UX
1832642 Members
3098 Online
110043 Solutions
New Discussion

Re: Service Guard cluster not starting

 
Neeraj Bajpai
Advisor

Service Guard cluster not starting

Hi Guru's,

I have replace some configuration in my Service Gurard 11.12v, after applying the cluster configuration, I started the cluster and one of my node is giving fallowing error..

"cmcld: Assertion failed: pnet != NULL, file:
comm_link.c, line: 140."

syslog.log -- error
**********************
Jul 23 05:21:37 tel3 cmcld: Reserving 1848 Kbytes of memory and 54 threads
Jul 23 05:21:37 tel3 cmcld: The maximum # of concurrent local connections to the daemon that will be supported is 21.
Jul 23 05:21:38 tel3 cmcld: Assertion failed: pnet != NULL, file: comm_link.c, line: 140
Jul 23 05:21:40 tel3 cmsrvassistd[8267]: Unable to notify ServiceGuard main daemon (cmcld): Connection reset by peer
Jul 23 05:21:40 tel3 cmsrvassistd[8266]: Unable to notify ServiceGuard main daemon (cmcld): Connection reset by peer
Jul 23 05:21:40 tel3 cmclconfd[8260]: The ServiceGuard daemon, /usr/lbin/cmcld[8261], died upon receiving signal number 6.
Jul 23 05:21:40 tel3 cmsrvassistd[8265]: Lost connection to the cluster daemon.
Jul 23 05:21:40 tel3 cmsrvassistd[8265]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection
abort
*******************************************

after that cluster is not starting, even though I am able to start my new pakage and cluster on other node.

Anybod has any idea or clue....!

thanks in advance
11 REPLIES 11
melvyn burnard
Honored Contributor

Re: Service Guard cluster not starting

Well you say you have Serviceguard A.11.12, but what patch do you have installed for this? Use the what comand:
what /usr/lbin/cmcld

And look for the line containing PHSS_

It is most probable that you are missing a patch.
Also be aware that this version of SG is no longer supported.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
wip
Frequent Advisor

Re: Service Guard cluster not starting

Neeraj,

Please tell us what was the change/modification that you have done

wip
Neeraj Bajpai
Advisor

Re: Service Guard cluster not starting

hi, i know u will not like my patch update, it is "PHSS_26270" and i am worring to update the patches, it may create some more problem for me, as i dont have any contract for this machine, working for upgrade.

I have updated the new VG in cluster configuration file, as i have added new VG in cluster.

Currently i have mounted all my filesystem without cluster, what u guys going to suggest me.


melvyn burnard
Honored Contributor

Re: Service Guard cluster not starting

well you have the last patch that was created for that release, so no new patch to install.
I would suggest you try restarting the whole cluster, but you may find there is now an inconsistency in the CDB of the cluster, and it may be you have to do a cmapplyconf again.
I would also advise you update to a supported release of Serviceguard.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Neeraj Bajpai
Advisor

Re: Service Guard cluster not starting

Thanks, upgrading cluster may take time and it is a production system, so i need to start the services ASAP. Let me explanin till now what I have done.

1. first i hv created cluster conf file to update the new VG in that then i updated the .cntl file for VG and mount point.

2. I check the cluster configuration via cmcheckconf, it reported some network (heartbeat/standby) error (somthing cluster configuration is unconsile, can not updated configuration) so i corrected that. see the attached scancl output.

3. then i halted the cluster and updated the cluster configuration via cmaplyconf.
cluster updated and when i tried to start the cluster only one node is giving "cmcld: Assertion failed: pnet != NULL, file:
comm_link.c, line: 140." , other node can start the cluster and package too.

4. I tried to apply the cluster configuration sevral time on problematic node.

What to do next i dont understand here !!
Enrico P.
Honored Contributor

Re: Service Guard cluster not starting

Hi,
I think you have same problem in your network configuration.

In your binary configuration you have:

lan0 --> heartbeat
lan3 --> data
lan2 --> standby for the other lan

Then you need this node1-node2 communication:

lan0 <--> lan0
lan0 <--> lan2
lan0 <--> lan3
lan2 <--> lan0
lan2 <--> lan2
lan2 <--> lan3
lan3 <--> lan0
lan3 <--> lan2
lan3 <--> lan3

But from your linkloop command you have:

lan1 --> lan1
lan1 --> lan2
lan1 --> lan3
lan2 --> lan0
lan3 --> lan0
lan3 --> lan1
lan3 --> lan3

from tel3 to tel4

You don' t send linkloop output from tel4 to tel3 but I think network configuration is your problem.

Enrico
Neeraj Bajpai
Advisor

Re: Service Guard cluster not starting

thanks for your time & efforts, but i dont understand one thing from
node1-lan0 --> node2-lan0 fails and other way too. but i can ping the ip address for each other from both the node.

same way happing to some other adapters too.

any idea.. !
Neeraj Bajpai
Advisor

Re: Service Guard cluster not starting

oppss typo
node1-lan0 --> node2-lan0 linkloop fails and other way too. but i can ping the ip address for each other from both the node.

node1-lan0 --> node2-lan2 linkloop okay
node2-lan2 --> node1-lan0 linkloop fails

same way happing to some other adapters too.

any idea.. !
Enrico P.
Honored Contributor

Re: Service Guard cluster not starting

Are you sure you don' t have ip duplication in your network? try to telnet to ip 192.168.0.3/4 to see if the system is correct.

Is the linkloop command correct?

from tel3 to tel4:

linkloop -i 0 0x001083FF4B51

from tel4 to tel3:

linkloop -i 0 0x001083FF5B60

Enrico
Ravi Bridglal
New Member

Re: Service Guard cluster not starting

Neeraj,
Was this resolved,if so what was the fault , I seem to be having the same issue.
nanan
Trusted Contributor

Re: Service Guard cluster not starting

check

arp -a on each system
and check the mac addresses

when you try to ping the other system, your system seems to bring wrong MAC adress from the own system arp table.

check IP duplication and your switch flush time
If it was,
you can flush the arp table in manual
arp -d

Regards
nanan