- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: network problem starting cluster
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-14-2005 02:40 AM
07-14-2005 02:40 AM
network problem starting cluster
Jul 14 12:27:06 jmar1 SAM cl adm[6569]: Start cluster jmar_cluster1 on all nodes
Jul 14 12:27:09 jmar1 cmclconfd[6576]: Executing "/usr/lbin/cmcld" for node jmar1
Jul 14 12:27:09 jmar1 cmcld: Logging level changed to level 0.
Jul 14 12:27:09 jmar1 cmcld: Daemon Initialization - Maximum number of packages supported for this incarnation is 10.
Jul 14 12:27:09 jmar1 cmcld: Global Cluster Information:
Jul 14 12:27:09 jmar1 cmcld: Heartbeat Interval is 1 seconds.
Jul 14 12:27:09 jmar1 cmcld: Logging level changed to level 0.
Jul 14 12:27:09 jmar1 cmcld: Node Timeout is 2 seconds.
Jul 14 12:27:09 jmar1 cmcld: Network Polling Interval is 2 seconds.
Jul 14 12:27:09 jmar1 cmcld: Auto Start Timeout is 600 seconds.
Jul 14 12:27:09 jmar1 cmcld: Information Specific to node jmar1:
Jul 14 12:27:09 jmar1 cmcld: Cluster lock disk: /dev/dsk/c9t0d0.
Jul 14 12:27:09 jmar1 cmcld: lan0 0x00306e0960b2 140.139.46.121 bridged net:1
Jul 14 12:27:09 jmar1 cmcld: lan1 0x00306e08171b 10.1.1.1 bridged net:2
Jul 14 12:27:09 jmar1 cmcld: Heartbeat Subnet: 10.0.0.0
Jul 14 12:27:09 jmar1 cmcld: The maximum # of concurrent local connections to the daemon that will be supported is 1014.
Jul 14 12:27:09 jmar1 cmcld: Lookup of link /nodes/jmar1/networks/lan/lan1/peers failed.
Jul 14 12:27:09 jmar1 cmcld: Unable to send DLPI info request, Bad file number
Jul 14 12:27:09 jmar1 cmcld: cl_abort: abort cl_kepd_printf failed: Invalid argument
Jul 14 12:27:09 jmar1 cmcld: cl_kepd_printf, fstat: kepd_fd=8, st_dev=1073741827, st_ino=446, st_rdev=-486539264
Jul 14 12:27:09 jmar1 cmcld: Aborting! Failed to communicate with DLPI
Jul 14 12:27:09 jmar1 cmlvmd: init_cdb_callback: starting
Jul 14 12:27:09 jmar1 cmcld: Waiting for connection request from CMGMSD
Jul 14 12:27:09 jmar1 cmcld: CMGMSD successfully started
Jul 14 12:27:12 jmar1 cmsrvassistd[6580]: The cluster daemon aborted our connection.
Jul 14 12:27:12 jmar1 cmsrvassistd[6580]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection
abort
Jul 14 12:27:12 jmar1 cmlvmd: callback_thread: Calling process callback
Jul 14 12:27:12 jmar1 cmlvmd: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Jul 14 12:27:12 jmar1 cmlvmd: CLVMD exiting
Jul 14 12:27:12 jmar1 cmgmsd[6587]: The cluster daemon aborted our connection.
Jul 14 12:27:12 jmar1 cmgmsd[6587]: Unable to send 92 bytes (Software caused connection abort).
Jul 14 12:27:12 jmar1 cmclconfd[6578]: The cluster daemon aborted our connection.
Jul 14 12:27:12 jmar1 cmclconfd[6576]: The ServiceGuard daemon, /usr/lbin/cmcld[6577], died upon receiving signal number 6.
Jul 14 12:27:12 jmar1 cmclconfd[6589]: Failed to open connection to cmcld: No such file or directory
Jul 14 12:27:12 jmar1 cmtaped[6588]: cmtaped - failed to set up sdb callback. (ATS 1.8)
Jul 14 12:27:12 jmar1 cmtaped[6588]: Failed to set callback: 6004
Jul 14 12:28:04 jmar1 SAM cl adm[6569]: Fail to form and start cluster jmar_cluster1
It appears be a network problem of some sort;
I have attached the cluster config file
cheers .. rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-14-2005 02:56 AM
07-14-2005 02:56 AM
Re: network problem starting cluster
what version of SG, and what version of SGeRAC
What SG and SGeRAC patches are installed?
Please run cmscancl and post the contents of the /tmp/scancl.out file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-14-2005 03:26 AM
07-14-2005 03:26 AM
Re: network problem starting cluster
Serviceguard Extension for RAC A.11.15.00
scancl.out is attached
thanks .. rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-14-2005 04:44 AM
07-14-2005 04:44 AM
Re: network problem starting cluster
lan2* 1500 10.0.0.0 10.1.1.2
lan1 1500 10.0.0.0 10.1.1.1
I would also suggest that you change the heartbeat and node timeout intervals before this goes into production, as the default settings are normally insufficient:
heartbeat interval: 1.00 (seconds)
node timeout: 2.00 (seconds)
I would suggest changing these to 2 and 4 seconds respectively
One possibility I have seen before is that the CDB has got corrupted.
If sorting out the above network config does not fix it, you may be forced to try deleting the config, using cmdeletconf, and then recreate it
As a final comment, you appear not have either SG or SGeRAC patched.
to check do:
what /usr/lbin/cmcld |grep PHSS
and
what /usr/lbin/cmgmsd | grep PHSS
If not, obtain these patches from the ITRC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2005 12:42 AM
07-15-2005 12:42 AM
Re: network problem starting cluster
A.11.15.00 Date: 09/16/03 Patch: PHSS_29053
# what /usr/lbin/cmgmsd |grep PHSS
A.11.15.00 Date: 03/09/05 Patch: PHSS_32859
This is what I got; are these the patches that are already installed, or patches that I need to install
cheers ... rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2005 01:04 AM
07-15-2005 01:04 AM
Re: network problem starting cluster
# what /usr/lbin/cmcld |grep PHSS
A.11.15.00 Date: 03/09/05 Patch: PHSS_32660
# what /usr/lbin/cmgmsd |grep PHSS
A.11.15.00 Date: 03/09/05 Patch: PHSS_32859
#
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-24-2005 04:31 PM
07-24-2005 04:31 PM
Re: network problem starting cluster
Since the cmcld process aborted there should be a core file in /var/adm/cmcluster.
A. Verify that the core file creation time matches the time of the dump.
B. Use adb to obtain the stack from the core file:
# adb cmcld core
attach the output
regards
vinod