- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: MC/SG primary node is down (crashing)
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 06:35 AM
06-16-2004 06:35 AM
MC/SG primary node is down (crashing)
I have a cluster with two nodes running oracle (rel 7.3.2.3.0) and a custom program (these are my PKG) on HP-UX B.10.20. Suddenly the node A failed and halted all process, then MC/SG start the PKG on node B, where the PKG is running properly now, however I lost the primary server. I had tried to switch manually to node A but I get the same fault and MC/SG switch to server B. Could somebody help me with this problem please. I have attached a file with some part of log files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 06:43 AM
06-16-2004 06:43 AM
Re: MC/SG primary node is down (crashing)
--------------------------------
Error Timeout:AllStatusEnd file.
Error Timeout. SNMP Extensible Agent Statup Failure
----------------------------------
That is coming from one of your scripts, it seems. It seems as though something is not configured the same between the two nodes.
any more clues in the syslog?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 06:46 AM
06-16-2004 06:46 AM
Re: MC/SG primary node is down (crashing)
Check /etc/hutdownlog
Check syslog.log
Does it generated the crash dump?
Anything in /var/tombstones/ts99 file?
Anil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 07:01 AM
06-16-2004 07:01 AM
Re: MC/SG primary node is down (crashing)
recv error!! errno=242
If this is a system error, then it corresponds to "no route to host".
I would first see what changed on Primary Node.
Look at your OLDsyslog.log at the time of crash . Also try to compare the versions of DCE products installed on both the nodes.
It also says "/etc/cmcluster/toolkit/oracle/oracle.cntl[6]: 8753 Killed"
Something made the control script to be killed. Verify your package configuration parameters and see if you have
NODE_FAIL_FAST_ENABLED and SERVICE_FAIL_FAST_ENABLED are set to yes. If so, then this behaviour is expected.
-Sri
-Sri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 07:23 AM
06-16-2004 07:23 AM
Re: MC/SG primary node is down (crashing)
regards
Radhakrishnan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 07:50 AM
06-16-2004 07:50 AM
Re: MC/SG primary node is down (crashing)
This is picked by monitoring service ORACLE_RFT and sisnce its a faill of a service it shuts the package down , the MCSG starts it on the second node .
Looks like somthing is configured differently application wise on both the nodes thats why it is running on one and not on other .
There were some DCE errors in the beginning . Please also ensure that you have sam version and patch levels of dce on both the nodes and they are running .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 08:18 AM
06-16-2004 08:18 AM
Re: MC/SG primary node is down (crashing)
Excuse me if the problem description is not very clear, but I am new in HP-UX and MC/SG.
Unfortunately I didn't get a syslog when the server crashed, however I've restarted both server several times with the same results, I've attached a zip file with syslogs and cluster cfg files.
The parameters NODE_FAIL_FAST_ENABLED and SERVICE_FAIL_FAST_ENABLED are both set to NO.
The Node A is now a member of cluster is up and running, but it is not the current server, please see the cmviewcl output at the end of file attached before.
I've performed the commands #cmquerycl -v -n nodeA -n nodeB -C cfg_cluster.log.
and #cmcheckconf -v -C cfg_cluster.log.
It seems to me no error were found (I included the log in zip file attached).
The command cmgetconf didn't work.
regards.,
Gonzalo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 08:29 AM
06-16-2004 08:29 AM
Re: MC/SG primary node is down (crashing)
//
Jun 11 17:26:29 rf05sbpe cmcld[7770]: Communication to node rf05sape has been interrupted
Jun 11 17:26:29 rf05sbpe cmcld[7770]: Attempting to form a new cluster
Jun 11 17:26:29 rf05sbpe cmcld[7770]: Communication with node rf05sape has been interrupted
//
The above is suspecious. Make sure the network interfaces are all up and running, particulary heartbeat interfaces.
-Sri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 09:15 AM
06-16-2004 09:15 AM
Re: MC/SG primary node is down (crashing)
The lan interfaces looks ok, lanscan and ping commands work ok, please see the log files attached inside zip file.
I have attached a zip file with our cluster configuration files, if it can help you to get some idea about our MC/SG environment.
regards.
Gonzalo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2004 09:44 AM
06-16-2004 09:44 AM
Re: MC/SG primary node is down (crashing)
HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 2000000
Your node will timeout if it doesn't receive two successive heartbeats. Your heartbeat timeout is only 1 second. It may be causing the issue.
I know you would post the question that the same configuration is working fine before. Yes. But something might have changed on the system elsewhere later that may be causing the interfaces to lock temporarily. DCE errors about network failures second it. I would look at parameters like buffer cache, memory utilization etc., that are not causing intermittent freezes on the system.
If the system crashes again, then send the core dump to HP for more analysis.
-Sri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2004 08:37 PM
06-19-2004 08:37 PM
Re: MC/SG primary node is down (crashing)
Sridhar is correct. The timeout interval should atleast be 4 times the node timeout.
Apart from that I strongly suspect something has changed in the services control on your Node A. It will be useful if you post the /etc/cmcluster/
Check the cmrunserv part on this NODE A. You sure have something changed there.
Cheers,
Mohan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2004 07:36 AM
06-23-2004 07:36 AM
Re: MC/SG primary node is down (crashing)
This does not overly delay the restart of the package but does keep away some failures where there is a brief LAN outage.
Best regards,
Kent M. Ostby