- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- MC ServiceGuard Cluster Failover Problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-26-2007 07:14 PM
12-26-2007 07:14 PM
I have a two node cluster setup with the primary server named "datasvr" while the secondary server is named "appl".
The cluster failover has been working before. Recently it encountered problem during failover when the primary server encountered disk full so I switch to the secondary server to act as a temporary primary server (appl) while fixing the current primary server (datasvr).
These are the steps done:
1. cmviewcl -v on datasvr
# cmviewcl -v
----------------------------------------------
CLUSTER STATUS
FEU_CLUSTER up
NODE STATUS STATE
appl up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/0/0/0 lan0
PRIMARY up 0/4/0/0/6/0 lan1
STANDBY up 0/7/0/0/6/0 lan3
NODE STATUS STATE
datasvr up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/0/0/0 lan0
STANDBY up 0/4/0/0/7/0 lan2
PRIMARY up 0/4/0/0/6/0 lan1
PACKAGE STATUS STATE PKG_SWITCH NODE
ORADB up running enabled datasvr
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service uninitia 0 0 Oracle_DB
Subnet up 192.168.0.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled datasvr (current)
Alternate up enabled appl
----------------------------------------------
2. cmhaltnode datasvr
----------------------------------------------
# cmhaltnode datasvr
cmhaltnode : Package ORADB is still running on datasvr.
Use the -f option to forcefully halt the node including halting packages.
# cmhaltnode -f datasvr
Disabling package switching to all nodes being halted.
Warning: Do not modify or enable packages until the halt operation is completed.
Halting Package ORADB
Halting cluster services on node datasvr
..
cmhaltnode : Successfully halted all nodes specified.
Halt operation completed.
----------------------------------------------
3. cmviewcl -v on datasvr
----------------------------------------------
# cmviewcl -v
CLUSTER STATUS
FEU_CLUSTER up
NODE STATUS STATE
appl up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/0/0/0 lan0
PRIMARY up 0/4/0/0/6/0 lan1
STANDBY up 0/7/0/0/6/0 lan3
PACKAGE STATUS STATE PKG_SWITCH NODE
ORADB up running enabled appl
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up 0 0 Oracle_DB
Subnet up 192.168.0.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary down datasvr
Alternate up enabled appl (current)
NODE STATUS STATE
datasvr down halted
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY unknown 0/0/0/0 lan0
STANDBY unknown 0/4/0/0/7/0 lan2
PRIMARY unknown 0/4/0/0/6/0 lan1
----------------------------------------------
It now shows that appl is now the current primary server.
4. bdf on appl
----------------------------------------------
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 143360 45092 92176 33% /
/dev/vg00/lvol1 83733 52870 22489 70% /stand
/dev/vg00/lvol8 1105920 512825 556484 48% /var
/dev/vg00/lvol7 1179648 521665 616880 46% /usr
/dev/vg00/u01 4096000 1288668 2631934 33% /u01
/dev/vg00/lvol4 65536 43907 20282 68% /tmp
/dev/vg00/lvol6 536576 401811 126392 76% /opt
/dev/vg00/lvol5 20480 2392 17018 12% /home
/dev/vgdb/u02 2048000 1554689 462484 77% /u02
/dev/vgdb/u03 1536000 375122 1088329 26% /u03
/dev/vgdb/u04 10240000 1643575 8060965 17% /u04
/dev/vgdb1/u05 10240000 2234660 7755178 22% /u05
/dev/vgdb2/u6 15360000 14032994 1285542 92% /u06
/dev/vgdb2/u7 15360000 5741138 9318688 38% /u07
----------------------------------------------
I can see that the cluster failover was successful since u02, u03, u04, u05, u06 and u07 are present.
5. After a few seconds, I run bdf again on appl server
----------------------------------------------
# bdf
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 143360 43152 93994 31% /
/dev/vg00/lvol1 83733 52870 22489 70% /stand
/dev/vg00/lvol8 1105920 512824 556485 48% /var
/dev/vg00/lvol7 1179648 521665 616880 46% /usr
/dev/vg00/u01 4096000 1288663 2631938 33% /u01
/dev/vg00/lvol4 65536 43907 20282 68% /tmp
/dev/vg00/lvol6 536576 401811 126392 76% /opt
/dev/vg00/lvol5 20480 2392 17018 12% /home
#
----------------------------------------------
It seems that the cluster is breaking. u02, u03, u04, u05, u06 and u07 are not present anymore.
I have also attached syslog.log file.
Hope you can help me on this.
Thanks in advance.
Kenrick
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-26-2007 09:44 PM
12-26-2007 09:44 PM
Re: MC ServiceGuard Cluster Failover Problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-26-2007 09:48 PM
12-26-2007 09:48 PM
Re: MC ServiceGuard Cluster Failover Problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 12:23 AM
12-27-2007 12:23 AM
Re: MC ServiceGuard Cluster Failover Problem
to debug, log cluster are necessary
this log is into /etc/cmcluster/"name of package or cluster"/"name".cntl.log
If you are a problem of vg , first i mount filesystems, detect error and umount filesystem
Have you execute export vg on first node and import on secondary ?
Check your log (*.cntl.log) for more informations
Regards
L-DERLYN
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 12:27 AM
12-27-2007 12:27 AM
Re: MC ServiceGuard Cluster Failover Problem
Dec 27 10:54:08 appl cmcld: Service PKG*18433 terminated due to an exit(0).
Dec 27 10:54:08 appl cmcld: Started package ORADB on node appl.
Dec 27 10:54:38 appl cmcld: Service Oracle_DB terminated due to an exit(1).
Dec 27 10:54:38 appl cmcld: Service Oracle_DB in package ORADB has gone down.
Dec 27 10:54:38 appl cmcld: Disabled node appl from running package ORADB.
Dec 27 10:54:38 appl cmcld: Executing '/etc/cmcluster/oradb/oradb.cntl stop' for package ORADB, as service PKG*18433.
Dec 27 10:54:38 appl CM-ORADB[8793]: cmhaltserv Oracle_DB
Dec 27 10:54:38 appl : su : + tty?? root-oracle
Dec 27 10:54:38 appl CM-ORADB[8830]: cmmodnet -r -i 192.168.0.93 192.168.0.0
Dec 27 10:54:42 appl LVM[8878]: vgchange -a n vgdb
Dec 27 10:54:42 appl LVM[8881]: vgchange -a n vgdb1
Dec 27 10:54:42 appl LVM[8884]: vgchange -a n vgdb2
Dec 27 10:54:48 appl cmcld: Service PKG*18433 terminated due to an exit(0).
Dec 27 10:54:48 appl cmcld: Halted package ORADB on node appl.
Dec 27 10:54:48 appl cmcld: Package ORADB cannot run on this node because switching has been disabled for this node.
You need to check the cntl.log for this package to know the root cause why this package did not started on this node.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 12:28 AM
12-27-2007 12:28 AM
Solutionhave a look in appl's syslog to these events :
Dec 27 09:48:53 appl cmcld: Started package ORADB on node appl.
Dec 27 09:49:23 appl cmcld: Service Oracle_DB terminated due to an exit(1).
Dec 27 09:49:23 appl cmcld: Service Oracle_DB in package ORADB has gone down.
Dec 27 09:49:23 appl cmcld: Disabled node appl from running package ORADB.
Dec 27 09:49:23 appl cmcld: Executing '/etc/cmcluster/oradb/oradb.cntl stop' for package ORADB, as service PKG*18433.
30 secondes after the package has been succesfully started, the service associated with the package ends. If the service falls, the package is supposed to be stopped by the cluster then transfered to another node. But no other node is suitable ...
You should investigate more closely to the application : what is this service, how is it configured in the package, why does it stop, and so on ...
I suggest that you should look at package's log file, on node appl, probably under /etc/cmcluster/oradb/oradb.sh.log, it depends how you have built your cluster. Post this file.
If you can't find the explanation, I suggest that you first modify the package on node appl and deactivate the service. Then start the package on node appl. If the package keeps working, then try starting by hand the script or process that was associated with the service to examine what is happening, why it terminates after 30 s.
I have questions about /u01. First is it a component of your application, more exactly does it have an impact on the package ORADB ? If yes, as it is in vg00, it can't follow the package during a failover. So you should have the same /u01 on node datasvr. Right ? So my idea is that there are significant differences between /u01 on node appl and /u01 on node datasvr that could explain why the service works on datasvr and not on appl.
If you have no difference on this specific /u01, keep in mind that both nodes must offer the same environment so that the package ORADB can work. You problem is probably there. In this order of idea, you should also control that /etc/cmcluster/
Hope this will help
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 01:16 AM
12-27-2007 01:16 AM
Re: MC ServiceGuard Cluster Failover Problem
Attached are the oradb.sh and oradb.sh.abort.log files. /u01 is where the Oracle applicationwas installed.
It resides on the hard disk of the server. /u02, /u03, /u04, /u05, /u06 and/u07 resides on the SC10. I have the same /u01 on both appl and datasvr. This cluster system has been running for more than 3 years with the timestamp of oradb.sh set at Nov. 16, 2000.
Thanks.
Kenrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 01:19 AM
12-27-2007 01:19 AM
Re: MC ServiceGuard Cluster Failover Problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 01:24 AM
12-27-2007 01:24 AM
Re: MC ServiceGuard Cluster Failover Problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 01:27 AM
12-27-2007 01:27 AM
Re: MC ServiceGuard Cluster Failover Problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 01:32 AM
12-27-2007 01:32 AM
Re: MC ServiceGuard Cluster Failover Problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 01:40 AM
12-27-2007 01:40 AM
Re: MC ServiceGuard Cluster Failover Problem
ORADB_MONITOR: Exiting with failed status 1
########### Node "appl": Halting package at Thu Dec 27 10:54:38 EAT 2007 ###########
Dec 27 10:54:38 - Node "appl": Halting service Oracle_DB
cmhaltserv : Service name Oracle_DB is not running.
*** /etc/cmcluster/oradb/oradb.sh called with shutdown argument! ***
It was because of improper starting of oracle.. which resulted in shutdown of oracle by monitoring script.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 01:47 AM
12-27-2007 01:47 AM
Re: MC ServiceGuard Cluster Failover Problem
well there is clearly a problem with the database itself : no process oracle, but listener, in the ps !!! It seems that the database crashes some seconds after starting. Since service monitors oracle's processes like ora_pmon_XXX, service goes down, then package goes down.
So in order to investigate you must modify the package so that it does not start the database, nor the service. The package oradb should only mount filesystems, assign floating IP, that's all.
If you need help to do that, post file oradb.cntl, i guess it will be standard and not very difficult to modify.
Once done, you will be able to start the package and the file systems will stay mounted. So a dba admin wil be able to analyze what's happening with Oracle. But i am not DBA admin ;-(
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 01:53 AM
12-27-2007 01:53 AM
Re: MC ServiceGuard Cluster Failover Problem
finally the database doesn't start at all !!!
Oracle8i Enterprise Edition Release 8.1.6.0.0 - Production
With the Partitioning option
JServer Release 8.1.6.0.0 - Production
SVRMGR> Connected.
SVRMGR> ORA-27146: post/wait initialization failed
SVRMGR>
Server Manager complete.
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2007 03:45 AM
12-27-2007 03:45 AM
Re: MC ServiceGuard Cluster Failover Problem
Thanks for your help. I left the site already. I will post oradb.cntl file tomorrow.
Regards,
Kenrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-29-2007 02:51 AM
12-29-2007 02:51 AM
Re: MC ServiceGuard Cluster Failover Problem
Try
cmapplyconf -P /etc/cmcluster/
and say "yes"
Best RGS
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-29-2007 02:52 AM
12-29-2007 02:52 AM
Re: MC ServiceGuard Cluster Failover Problem
Try
cmapplyconf -P /etc/cmcluster/
say "yes" and try again run cluster
Best RGS
Alex