- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- running cluster services on 2nd node causes first ...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2004 10:38 PM
03-01-2004 10:38 PM
running cluster services on 2nd node causes first to fail
Using cmrunnode I can bring one machine up into the cluster state by itself but when I try to bring the 2nd machine into the cluster, it causes the first machine to fail.
If I am actually running packages on the first machine, then the first machine hangs and reboots. As you can imagine, this is not a good situation.
We are running 11.11 on both machines and Service Guard version is A.11.09. I am running the latest patch PHSS_27158, although I have problems after I remove that patch as well.
(I tried removing it and then running the cluster, no help. I reinstalled the patch.)
Below is the sequence of messages in the syslog from the 1st node (called switch) when the 2nd node (called mouse) tried to join the cluster. Not long after this, switch failed completely and rebooted itself.
I was then able to bring the cluster up on the 2nd node, once the 1st node had failed. However, when the 1st node finished rebooting, it caused the 2nd node to panic in the same manner when the cluster services tried to start at boot time.
Anyone have any advice? Thanks,
Marty
Mar 2 04:42:41 switch cmcld: New node mouse is joining the cluster
Mar 2 04:42:41 switch cmcld: Error. Cannot accept mouse into the cluster
Mar 2 04:42:41 switch cmcld: Please use cmrunnode instead of cmruncl
Mar 2 04:42:41 switch cmcld: Attempting to kill node mouse
Mar 2 04:42:41 switch cmcld: Reason: Incorrect use of a cluster run command
Mar 2 04:42:53 switch cmcld: New node mouse is joining the cluster
Mar 2 04:42:53 switch cmcld: Error. Cannot accept mouse into the cluster
Mar 2 04:42:53 switch cmcld: Please use cmrunnode instead of cmruncl
Mar 2 04:42:53 switch cmcld: Attempting to kill node mouse
Mar 2 04:42:53 switch cmcld: Reason: Incorrect use of a cluster run command
Mar 2 04:43:43 switch cmcld: New node mouse is joining the cluster
Mar 2 04:43:43 switch cmcld: Attempting to adjust cluster membership
Mar 2 04:43:47 switch cmcld: Enabling safety time protection
Mar 2 04:43:47 switch cmcld: Clearing First Dual Cluster Lock
Mar 2 04:43:47 switch vmunix: Failed to set socket receive buffer, Invalid argument
Mar 2 04:43:47 switch vmunix: Service Guard Aborting!
Mar 2 04:43:47 switch vmunix: Cause: setsockopt failed
Mar 2 04:43:47 switch cmcld: Failed to set socket receive buffer, Invalid argument
Mar 2 04:43:48 switch vmunix: (File: comm_ip.c, Line: 5709)
Mar 2 04:43:48 switch vmunix: Aborting! setsockopt failed
Mar 2 04:43:48 switch vmunix: (file: comm_ip.c, line: 5709)
Mar 2 04:43:47 switch cmcld: Aborting! setsockopt failed
Mar 2 04:43:49 switch cmlvmd: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Mar 2 04:43:49 switch cmlvmd: CLVMD exiting
Mar 2 04:43:49 switch cmtaped[12368]: Lost connection to the cluster daemon.
Mar 2 04:43:49 switch cmtaped[12368]: cmtaped terminating. (ATS 1.14)
Mar 2 04:43:49 switch cmclconfd[12360]: The ServiceGuard daemon, /usr/lbin/cmcld[12361], died upon receiving the signal 6.
Mar 2 04:43:49 switch cmsrvassistd[12365]: Lost connection to the cluster daemon.
Mar 2 04:43:49 switch cmsrvassistd[12365]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection abort
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2004 10:43 PM
03-01-2004 10:43 PM
Re: running cluster services on 2nd node causes first to fail
Not positive, but I believe 11.09 may no longer be supported - 11.15 is current. I would *strongly* recommend that you come up to at least 11.13 or 11.14. You may be running into a conflict with that SG version & a new patch.
Also you need only run cmruncl on a *single* node to initially start the cluster. Then if you need to have another node join, you run cmrunnode on it. Try that first & if still trouble, then it's time to upgrade MC/SG. I'd be planning on that anyway.
Rgds,
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2004 10:55 PM
03-01-2004 10:55 PM
Re: running cluster services on 2nd node causes first to fail
-Karthik S S
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2004 11:51 PM
03-01-2004 11:51 PM
Re: running cluster services on 2nd node causes first to fail
It appears that there may be some with the network that may be causing sockets to become lost or unavailable.
As already stated, 11.09 is no longer supported, you should seriously consider updating to 11.14 or 11.15, with patches.
A few suggestions to start with (for both nodes).
1) confirm you do have the 11.09 SG patch, do:
what /usr/lbin/cmcld
and verify the version and patch version
2) confirm all those patches installed and configured ok, do:
swlist -l fileset -a state |grep -e corrupt -e transient -e install
and see if anything gets returned
3) again using swlist, check you have the correct fileset for the version of Serviceguard you are running.
4) confirm the correct command was used to get the second node to join the running cluster (as per the log file use cmrunnode).
If the cluster is actually running on one node, you could consider removing the failing node from the running cluster, and then re-adding it to see if this fixes the symptoms.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2004 02:20 AM
03-02-2004 02:20 AM
Re: running cluster services on 2nd node causes first to fail
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2004 02:25 AM
03-02-2004 02:25 AM
Re: running cluster services on 2nd node causes first to fail
If you do not have a contract, you will need to purchase it I am afraid.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2004 02:38 AM
03-02-2004 02:38 AM
Re: running cluster services on 2nd node causes first to fail
Well, if you had one then you could load:
MC/SG 11.14 => PHSS_30028
MC/SG 11.15 => PHSS_30087
Rgds,
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2004 06:48 AM
03-02-2004 06:48 AM
Re: running cluster services on 2nd node causes first to fail
Thanks.
PHKL_25209.CORE2-KRN installed
PHKL_25212.C-INC installed
PHKL_25212.CORE-KRN installed
PHKL_25212.CORE2-KRN installed
PHKL_25238.CORE2-KRN installed
PHKL_25367.CORE2-KRN installed
PHKL_25368.CORE2-KRN installed
PHKL_25375.CORE2-KRN installed
PHKL_25428.KERN2-RUN installed
PHKL_25506.C-INC installed
PHKL_25506.CORE-KRN installed
PHKL_25506.CORE2-KRN installed
PHKL_25593.CORE2-KRN installed
PHKL_25602.CORE2-KRN installed
PHKL_25761.CORE2-KRN installed
PHKL_25773.CORE2-KRN installed
PHKL_25871.CORE2-KRN installed
PHKL_26002.CORE2-KRN installed
PHKL_26032.C-INC installed
PHKL_26032.CORE2-KRN installed
PHKL_26074.CORE2-KRN installed
PHKL_26087.CORE2-KRN installed
PHKL_26104.VXFS-BASE-KRN installed
PHKL_26269.C-INC installed
PHKL_26269.CORE-KRN installed
PHKL_26405.C-INC installed
PHKL_26405.CORE-KRN installed
PHKL_26405.CORE2-KRN installed
PHKL_26425.CORE-KRN installed
PHKL_26425.CORE2-KRN installed
PHKL_26464.CORE2-KRN installed
PHKL_26552.VXFS-BASE-KRN installed
PHKL_26705.CORE2-KRN installed
PHKL_26719.CORE2-KRN installed
PHKL_26755.CORE2-KRN installed
PHKL_26834.CORE2-KRN installed
PHKL_27025.CORE2-KRN installed
PHKL_27025.KERN2-RUN installed
PHKL_27054.CORE2-KRN installed
PHKL_27179.CORE2-KRN installed
PHKL_27200.CORE2-KRN installed
PHKL_27266.CORE2-KRN installed
PHKL_27304.C-INC installed
PHKL_27304.CORE-KRN installed
PHKL_27304.CORE2-KRN installed
PHKL_27304.KERN2-RUN installed
PHKL_27321.CORE2-KRN installed
PHKL_27431.KERN2-RUN installed
PHKL_27447.CORE2-KRN installed
PHKL_27498.CORE2-KRN installed
PHKL_27531.CORE2-KRN installed
PHKL_27532.CORE2-KRN installed
PHKL_27682.CORE2-KRN installed
PHKL_27686.CORE2-KRN installed
PHKL_27688.CORE2-KRN installed
PHKL_27715.CORE2-KRN installed
PHKL_27727.CORE2-KRN installed
PHKL_27734.VXFS-BASE-KRN installed
PHKL_27737.CORE2-KRN installed
PHKL_27751.CORE2-KRN installed
PHKL_27751.FCMS-ENG-A-MAN installed
PHKL_27757.CORE2-KRN installed
PHKL_27778.CORE2-KRN installed
PHKL_27839.CORE2-KRN installed
PHKL_27918.CORE2-KRN installed
PHKL_27949.CORE2-KRN installed
PHKL_28025.C-INC installed
PHKL_28025.CORE-KRN installed
PHKL_28025.CORE2-KRN installed
PHKL_28100.CORE2-KRN installed
PHKL_28113.CORE2-KRN installed
PHKL_28114.CORE2-KRN installed
PHKL_28185.VXFS-BASE-KRN installed
PHKL_28326.CORE2-KRN installed
PHKL_28410.CORE2-KRN installed
PHNE_24829.INET-ENG-A-MAN installed
PHNE_24829.INETSVCS-RUN installed
PHNE_24829.NET2-KRN installed
PHNE_25083.STRTIO-KRN installed
PHNE_25083.STRTIO2-KRN installed
PHNE_26939.GE-KRN installed
PHNE_26939.GE-RUN installed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2004 06:50 AM
03-02-2004 06:50 AM
Re: running cluster services on 2nd node causes first to fail
If the cluster is actually running on one node, you could consider removing the failing node from the running cluster, and then re-adding it to see if this fixes the symptoms.
----------------
How do you remove a machine from the cluster and then re-add it? Sorry for the basic questions, but I am a relative novice at HP administration (our HP fellow was laid off some time ago).
Thanks.
Marty
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2004 10:34 AM
03-02-2004 10:34 AM
Re: running cluster services on 2nd node causes first to fail
Run the following
swconfig \*
to configure installed filesets. If that fails you may have to force-reinstall them or even swremove & reinstall 'em.
HTH,
Jeff