- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: node2 down; cluster lock not activated; DLPI e...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2010 04:54 AM
03-26-2010 04:54 AM
Some good advice needed.
The node2 of the cluster (all are 11.23 IA, HP SG 11.18) is down.
When I try to
vgisplay -v /dev/vg _lk
vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vg_lk".
vg_lk is the lock disk. I also have
the DLPI error :DLPI error ack for primitive 11 with 8 0
Can you good people guide?
Merci/Dunke
SNS
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2010 05:05 AM
03-26-2010 05:05 AM
Re: node2 down; cluster lock not activated; DLPI error
# vgchange -a e /dev/vg_lk
# vgdisplay -v /dev/vg_lk
rgs,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2010 05:09 AM
03-26-2010 05:09 AM
Re: node2 down; cluster lock not activated; DLPI error
You say node2 is down....but is your cluster down?
If it is only a single node in a multi node cluster, then that node may have an issue with seeing the lock disk. Remember-only one node gets the lock_disk, but all need to have the ability to see the lock disk in the event of a failover. Which ever nodes gets it first - they become the owner (i.e. exclusive rights) to that disk.
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2010 05:12 AM
03-26-2010 05:12 AM
Re: node2 down; cluster lock not activated; DLPI error
How to verify cluster lock is working?
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=34030
rgs,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2010 06:00 AM
03-26-2010 06:00 AM
Re: node2 down; cluster lock not activated; DLPI error
Rita, the Cluster is up - running on a single node as of now; only node 2 is down
CLUSTER STATUS
scocl up
NODE STATUS STATE
sco1 up running
PACKAGE STATUS STATE AUTO_RUN NODE
pkg1 up running enabled sco1
NODE STATUS STATE
sco2 down unknown
However, both node 1 & node 2 shows the same message:
vgdisplay: Volume group not activated.
But on sco1, the syslog doesnt show any issue. Rather:
Feb 22 11:33:50 sco1 cmclconfd[13377]: Querying volume group /dev/vg_lk for node sco1
Feb 22 11:33:50 sco1 cmclconfd[16801]: Querying volume group /dev/vg_lk for node sco1
Feb 22 11:33:50 sco1 cmclconfd[16801]: Volume group /dev/vg_lk is configured exclusive
Mar 11 14:50:48 sco1 LVM[10725]: /usr/sbin/vgexport -s -p -m /etc/lvmconf/vg_lk.mapfile /dev/vg_lk
Even if the lock disk would be with a single node (here sco1 is primary); the vgdisplay should work - shared disk- am I right?
And, is the DLPI error anyway related to this?
Can you good ppl throw some light?
Good that HP has ITRC; the GSCs & GCCs would have less traffic :-)...
Dunke/Merci,
SNS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2010 06:27 AM
03-26-2010 06:27 AM
Re: node2 down; cluster lock not activated; DLPI error
Every node must be able to see the lock disk, but only the first node to the sit on it-gets it! So the lock disk then becomes exclusive to that node. It got the lock disk and it is the only one to sit on it.
Now, if and when that node goes down - the lock disk is up for grabs again to the first node who can grab it.
I like to illustrate, so hope my little tale of the lock disk (musical chair) helps. In technical terms the lock disk is what grants quorum so the cluster can form. It is only granted to one node at a time, and strictly on a first come basis.
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2010 06:35 AM
03-26-2010 06:35 AM
SolutionTherefore you could conceivably se the vgdisplay failing.
There are a number of sites I know of who have a small LUN as their CL disk, in a VG, and that VG is NOT part of any package so it NEVER gets activated.
Check and see if that VG is in any of your packages.
And a DLPI error is normally an issue with networking
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2010 06:36 AM
03-26-2010 06:36 AM
Re: node2 down; cluster lock not activated; DLPI error
Could you put a more detail output of the DLPI message....it looks like we just one little piece of it in your post. Need the full picture for us to respond.
Thanks,
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2010 07:19 AM
03-26-2010 07:19 AM
Re: node2 down; cluster lock not activated; DLPI error
Appreciate the example, Rita - nice.
And Melvyn, experience speaks volumes.
I think You both need to be assigned more than 7pts, so am keeping the assigning on hold till Monday.
And on the DLPI, I think I know the reason - and since its not connected as per You gurus, let me see if I can fix on Monday.
Will keep You posted.
Bon Weekend
SNS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-29-2010 07:57 AM
03-29-2010 07:57 AM
Re: node2 down; cluster lock not activated; DLPI error
Melvyn was right on the dot - the cluster works even when the vg_lk isnt activated.
So, will that be the case for larger system; or will it only depend if the lock disk is activated by the package?
As for the DLPI error - the pblm started when the LAN card was replaced - it waa lan 1; now it is lan 10; I had changed in /etc/rc.config.d/netconf - but the cmgetconf still says lan1.
This even after the cluster config file was edited (or may be the wrong file was edited -since there seems to be multiple files with confusing name).
Here are the related errors from syslog of node2 :
cmnetd[4224]: Assertion failed: NULL != element, file: netsen/cmnetd_ip_hpux.c, line: 1350
cmclconfd[2020]: DLPI error ack for primitive 11 with 8 0
cmclconfd[2020]: Unable to attach to network interface 1
cmclconfd[2020]: Unable to attach to DLPI: I/O error
cmcld[2052]: Service cmnetd terminated due to a signal(6).
cmcld[2052]: Utility Daemon cmnetd died unexpectedly! It may be due to a pending reboot or panic
cmcld[2052]: Exiting with status 1.
cmsrvassistd[2072]: Lost connection with Serviceguard cluster daemon (cmcld): Software caused connection abort
cmclconfd[1980]: The cluster daemon aborted our connection (231).
cmclconfd[2026]: The Serviceguard daemon, cmcld[2052], exited with a status of 1.
Details of syslog attached..
Merci
SNS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2010 04:04 AM
03-30-2010 04:04 AM
Re: node2 down; cluster lock not activated; DLPI error
I may need some more help here...
When I tried to cmapplyconf to the clusterconfig.ascii file:
Detected a partition of IPv4 subnet 192.168.220.0.
Partition 1
sco1 lan1
Partition 2
sco2 lan10
Failed to evaluate network
cmapplyconf: Unable to reconcile configuration file socbencl.ascii
with discovered configuration information
Ok, this has to be the networking DLPI mismatch..
Fine, the 192.168.220.0. subnet is only the secondary LAN/HB network..Why should node2 - sco2 not be in the cluster when the primary n/w is up and running- and the only issue is with the secondary 192.168.220.0. subnet ?
Highly appreciate your inputs.
Merci,
SNS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2010 04:18 AM
03-30-2010 04:18 AM
Re: node2 down; cluster lock not activated; DLPI error
SG is very picky when you're doing the build. So, if it's complaining about some HB, then take a look at that. You seem to have a good handle on doing network work.
Now for:
"...Melvyn was right on the dot - the cluster works even when the vg_lk isnt activated. So, will that be the case for larger system; or will it only depend if the lock disk is activated by the package? "
>>>When your cluster grows, you might consider switching from a lock disk to a quorum server. I find them much easier with less problems. Check it out:
http://docs.hp.com/en/B8467-90048/ch01s03.html
Kindest regards,
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2010 06:08 AM
03-30-2010 06:08 AM
Re: node2 down; cluster lock not activated; DLPI error
Thank you Rita, your post is good to read- embedded technically, but always very well phrased.
Very true, SG is very picky;
some history for you folks:
I wasnt there when the cluster was setup; and (un)fortunately - I come to a situation wherein I was told that cluster is running on node 1 (primary)only; node2 is down due to LAN card failure. Now for a properly designed cluster, no single LAN card failure would make node2 inaccessible to the Cluster itself -especially when primary LAN is up on the node2.
And the admins here say that the cluster never worked.
Now, the question - allow me to rephrase - how can the cluster be dependent on the secondary LAN/HB of any node?
The Primary LANs for both are Up and running.
lan2 - Primary LAN in both nodes
lan 1 - LAN/HB subnet in Node 1
lan 10 - LAN/HB subnet for Node 2
lan 3- Dedicated HB n/w for both nodes
lan 4*- standby LAN
Am I missing something fundamental here, folks?
Highly Appreciate your time & inputs,
Thank You all very much,
SNS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2010 05:31 AM
04-02-2010 05:31 AM
Re: node2 down; cluster lock not activated; DLPI error
It would appear that the lan card was replaced incorrectly for a HA configuration since the instance number changed. This has confused Serviceguard. You will need to edit the cluster ascii file and change the instance numbers to reflect the actual current configuration. You may need to halt the cluster to make this change as I am not sure if cmapplyconf is going to let you do this online.
Proper procedures to replace hardware in a cluster is documented in the SG manual in Chapter 8 - see http://docs.hp.com/en/ha.html#Serviceguard Note that SG continues to expand a bit what you are allowed to do online vs offline but since this is already confused, you should really be prepared to halt the cluster to fix it and make sure you have enough of a time window to absorb some unexpected problems because if something goes weird, you may not be able to start the cluster at all on either node until you get if sorted out.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2010 07:41 AM
04-02-2010 07:41 AM
Re: node2 down; cluster lock not activated; DLPI error
http://www13.itrc.hp.com/service/patch/patchDetail.do?patchid=PHSS_40363&sel={hpux:11.23,}&BC=main|search|
Search on cmnetd under patch keywords for 11.23.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2010 03:29 AM
04-03-2010 03:29 AM
Re: node2 down; cluster lock not activated; DLPI error
I have heard you - very good presentation.
I was in an HP GSC myself some time back.
I had initially looked at node2 only. The linkloop test (cmscancl makes it easier) showed in node 1; the lan 1 - which makes the local LAN/HB network can see itself.
Now this adds to the HP SG network woes!
linkloop -i 1 0x001A4B06F293
Link connectivity to LAN station: 0x001A4B06F293
error: expected primitive 0x30, got DL_ERROR_ACK
dl_error_primitive = 0x2d
dl_errno = 0x04
dl_unix_errno = 57
error - did not receive data part of message
I need to fix lan1 of node 1st - yes, I have the answer for the error 57;
thereafter cmquerycl; and cmcheckconf the ascii file from the cmquery.
So my question would be: why is that, when lan1 was down at the linkloop level; the HP SG did start, throwing no errors in syslog of node1 ? - it clearly shows the needed bridge net:
Apr 2 17:14:28 sco1 cmcld[11370]: lan1 0x001a4b06f293 192.168.220.1 brid ged net:1
Meaning SG can start the cluster (using the cmrunnode -n node1 Only)even if the lan1 is down? [lan2 is the primary public lan]
I would like to confirm this, pls.
It is a networking issue for sure now.
The cluster lock had nothing to do with it.
The DLPI error would be 99% solved by if the lan1 is OK.
I am not closing this thread as of now: I would my HP folks to know how it was finally resolved,
The solution is very near.
And you ppl are great - and I thought that only the Linux folks were coolest!
Cheers,
SNS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-05-2010 05:10 AM
04-05-2010 05:10 AM
Re: node2 down; cluster lock not activated; DLPI error
a) linkloop is a level two test of the physical mac address. If you have not physical acknowledgment then you have no physical connection. Get it? Check you cable. Swap you cable. Verify the nic's are up by ping their ip addresses.
b) the patches provided were for corrections between ipv6 and ipv4 exchange of 4 and 6 byte ip addresses?
b1) have you installed the patches?
b2) do you have ipv6 turn on somewhere accidentially
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-06-2010 06:01 AM
04-06-2010 06:01 AM
Re: node2 down; cluster lock not activated; DLPI error
Here is how -
The LAN/HB network wasnt present - and so the HP SG didnt see node2
The initially configuration had lan1 of both nodes this LAN/HB.
On node 2 - there a lan card failure; and when it was replaced - its wasnt put back in the same position as the faulty lancard.
But here is the fun part - all along it was thought issue was due to lan or someother failure on node 2...
But...
On further analysis the cmscancl o/p - I saw that the lan1 of node1, yes node 1, doesnt see itself...linkloop fails with the same error 57.
Now lan5 on node1 and lan0 on node2 can linkloop with themselves (locally). So, instead of setting lan1 on both node right -
1. I configured the The LAN/HB network (subnet) with lan5 and lan0..
Then I halted the package - cmhaltpkg & then the cluster - cmhaltcl,
2. got the new cluster config file - cmquerycl -C XYZ -n node1 -n node2
3. editted the config file for the giving the VG and PV lockdisk entries; and the shared VGs
[I had compared with the running config file obtained using cmgetconf]
4. cmcheckconf --- it succeeded
5. cmapplyconf -C XYZ --succeeded >>>>>
6. restared the cluster
And it worked!!!
Thanks to all you out there...
GOD bless Us all
SNS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 10:00 PM
04-12-2010 10:00 PM
Re: node2 down; cluster lock not activated; DLPI error
Cheers!
SNS