Operating System - HP-UX
1753777 Members
7443 Online
108799 Solutions
New Discussion юеВ

node2 down; cluster lock not activated; DLPI error

 
SOLVED
Go to solution
S.N.S
Valued Contributor

node2 down; cluster lock not activated; DLPI error

Hi Folks,

Some good advice needed.
The node2 of the cluster (all are 11.23 IA, HP SG 11.18) is down.

When I try to
vgisplay -v /dev/vg _lk
vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vg_lk".
vg_lk is the lock disk. I also have
the DLPI error :DLPI error ack for primitive 11 with 8 0

Can you good people guide?

Merci/Dunke
SNS
"Genius is 1% inspiration, 99% Perspiration" - Edison
18 REPLIES 18
rariasn
Honored Contributor

Re: node2 down; cluster lock not activated; DLPI error

Hi,

# vgchange -a e /dev/vg_lk

# vgdisplay -v /dev/vg_lk


rgs,
Rita C Workman
Honored Contributor

Re: node2 down; cluster lock not activated; DLPI error

How many nodes does your cluster have?

You say node2 is down....but is your cluster down?

If it is only a single node in a multi node cluster, then that node may have an issue with seeing the lock disk. Remember-only one node gets the lock_disk, but all need to have the ability to see the lock disk in the event of a failover. Which ever nodes gets it first - they become the owner (i.e. exclusive rights) to that disk.

Rita
rariasn
Honored Contributor

Re: node2 down; cluster lock not activated; DLPI error

S.N.S
Valued Contributor

Re: node2 down; cluster lock not activated; DLPI error

Thank you all for the very swift reply..

Rita, the Cluster is up - running on a single node as of now; only node 2 is down


CLUSTER STATUS
scocl up

NODE STATUS STATE
sco1 up running

PACKAGE STATUS STATE AUTO_RUN NODE
pkg1 up running enabled sco1

NODE STATUS STATE
sco2 down unknown


However, both node 1 & node 2 shows the same message:
vgdisplay: Volume group not activated.
But on sco1, the syslog doesnt show any issue. Rather:
Feb 22 11:33:50 sco1 cmclconfd[13377]: Querying volume group /dev/vg_lk for node sco1
Feb 22 11:33:50 sco1 cmclconfd[16801]: Querying volume group /dev/vg_lk for node sco1
Feb 22 11:33:50 sco1 cmclconfd[16801]: Volume group /dev/vg_lk is configured exclusive
Mar 11 14:50:48 sco1 LVM[10725]: /usr/sbin/vgexport -s -p -m /etc/lvmconf/vg_lk.mapfile /dev/vg_lk

Even if the lock disk would be with a single node (here sco1 is primary); the vgdisplay should work - shared disk- am I right?

And, is the DLPI error anyway related to this?


Can you good ppl throw some light?


Good that HP has ITRC; the GSCs & GCCs would have less traffic :-)...

Dunke/Merci,
SNS
"Genius is 1% inspiration, 99% Perspiration" - Edison
Rita C Workman
Honored Contributor

Re: node2 down; cluster lock not activated; DLPI error

Think of playing the musical chairs game. When the music stop the first person to grab the chair gets to keep it. The lock disk is that chair that every nodes wants.
Every node must be able to see the lock disk, but only the first node to the sit on it-gets it! So the lock disk then becomes exclusive to that node. It got the lock disk and it is the only one to sit on it.

Now, if and when that node goes down - the lock disk is up for grabs again to the first node who can grab it.

I like to illustrate, so hope my little tale of the lock disk (musical chair) helps. In technical terms the lock disk is what grants quorum so the cluster can form. It is only granted to one node at a time, and strictly on a first come basis.

Rita
melvyn burnard
Honored Contributor
Solution

Re: node2 down; cluster lock not activated; DLPI error

quite simply put, although the cluster lock disk HAS to be in an LVM VG an dmust b shared, that VG does NOT have to be activated to use the Cluster Lock mechanism.
Therefore you could conceivably se the vgdisplay failing.
There are a number of sites I know of who have a small LUN as their CL disk, in a VG, and that VG is NOT part of any package so it NEVER gets activated.
Check and see if that VG is in any of your packages.

And a DLPI error is normally an issue with networking
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Rita C Workman
Honored Contributor

Re: node2 down; cluster lock not activated; DLPI error

DLPI stands for Data Link Provider Interface.

Could you put a more detail output of the DLPI message....it looks like we just one little piece of it in your post. Need the full picture for us to respond.

Thanks,
Rita
S.N.S
Valued Contributor

Re: node2 down; cluster lock not activated; DLPI error

Thank You, Rita and Melvyn - I will be back after checking the server on Monday!

Appreciate the example, Rita - nice.
And Melvyn, experience speaks volumes.

I think You both need to be assigned more than 7pts, so am keeping the assigning on hold till Monday.

And on the DLPI, I think I know the reason - and since its not connected as per You gurus, let me see if I can fix on Monday.

Will keep You posted.

Bon Weekend

SNS
"Genius is 1% inspiration, 99% Perspiration" - Edison
S.N.S
Valued Contributor

Re: node2 down; cluster lock not activated; DLPI error

Hi,

Melvyn was right on the dot - the cluster works even when the vg_lk isnt activated.

So, will that be the case for larger system; or will it only depend if the lock disk is activated by the package?

As for the DLPI error - the pblm started when the LAN card was replaced - it waa lan 1; now it is lan 10; I had changed in /etc/rc.config.d/netconf - but the cmgetconf still says lan1.

This even after the cluster config file was edited (or may be the wrong file was edited -since there seems to be multiple files with confusing name).
Here are the related errors from syslog of node2 :


cmnetd[4224]: Assertion failed: NULL != element, file: netsen/cmnetd_ip_hpux.c, line: 1350

cmclconfd[2020]: DLPI error ack for primitive 11 with 8 0
cmclconfd[2020]: Unable to attach to network interface 1
cmclconfd[2020]: Unable to attach to DLPI: I/O error

cmcld[2052]: Service cmnetd terminated due to a signal(6).
cmcld[2052]: Utility Daemon cmnetd died unexpectedly! It may be due to a pending reboot or panic
cmcld[2052]: Exiting with status 1.
cmsrvassistd[2072]: Lost connection with Serviceguard cluster daemon (cmcld): Software caused connection abort
cmclconfd[1980]: The cluster daemon aborted our connection (231).
cmclconfd[2026]: The Serviceguard daemon, cmcld[2052], exited with a status of 1.

Details of syslog attached..

Merci
SNS
"Genius is 1% inspiration, 99% Perspiration" - Edison