- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Problem stopping packages/cluster on one node clus...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2006 08:40 PM
03-09-2006 08:40 PM
Problem stopping packages/cluster on one node cluster
Help...
We use a one node cluster for our disaster environment.
Configure and starting the cluster workes fine.
When we stop a (or all) package(s) the following happens:
- The applications in the package are shutdown;
- The filesystems are umounted;
- The package is shutdown, statted in the package log;
########### Node "
22 MET 2006 ###########
a. But the command (cmhaltpkg
b. cmviewcl says that the package is in status halting and stays in this status:
CLUSTER STATUS
NODE STATUS STATE
PACKAGE STATUS STATE AUTO_RUN NODE
c. There are
Looks like that this where the package processes?
d. We have to kill all "cm" processes to stop the cluster.
Can someone tell me what is wrong here?
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2006 08:57 PM
03-09-2006 08:57 PM
Re: Problem stopping packages/cluster on one node cluster
cmviewcl -p "pkg_name" -v say??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2006 09:02 PM
03-09-2006 09:02 PM
Re: Problem stopping packages/cluster on one node cluster
Yes, that is so.
Is this the problem?
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2006 09:03 PM
03-09-2006 09:03 PM
Re: Problem stopping packages/cluster on one node cluster
# UNIX95=1 ps -efH
to see what the child processes of the package halt scripts are.
You should consider to set the HALT_SCRIPT_TIMEOUT in the package configuration file and rerun cmapplyconf (see SG manual). This would only be a workaround though (and not a really good one either).
Carsten
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2006 09:09 PM
03-09-2006 09:09 PM
Re: Problem stopping packages/cluster on one node cluster
What configuration changes, and where, do I have to do to change packages from "switche enabled" to switche disabled"?
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2006 09:52 PM
03-09-2006 09:52 PM
Re: Problem stopping packages/cluster on one node cluster
Also try putting -x in shutdown scripts and where it hangs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2006 09:59 PM
03-09-2006 09:59 PM
Re: Problem stopping packages/cluster on one node cluster
In all the package logfiles the last line says that the package is halted!
########### Node "
22 MET 2006 ###########
So, it looks to me that the ctrl.sh script is ended!
Also all filesystem are umounted!
What the problem makes more confusing is that it workes now, halting packages, but after some days running it is not possible anymore to halt the packages.
Help...
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2006 10:04 PM
03-09-2006 10:04 PM
Re: Problem stopping packages/cluster on one node cluster
I changed "AUTO_RUN" param of all packages to "NO".
I tested it and now it seems to work. But I tested also "AUTO_RUN YES" and this also did work now?
How can I change "SWITCHING" from "enabled" to "disabled" ?
(see "Node_Switching_Parameters"
# cmviewcl -v
CLUSTER STATUS
NODE STATUS STATE
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/3/0/0 lan1
STANDBY up 0/6/0/0 lan2
PACKAGE STATUS STATE AUTO_RUN NODE
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 10.164.0.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled
PACKAGE STATUS STATE AUTO_RUN NODE
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 10.164.0.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled
PACKAGE STATUS STATE AUTO_RUN NODE
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 10.164.0.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled
PACKAGE STATUS STATE AUTO_RUN NODE
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 10.164.0.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2006 12:05 AM
03-10-2006 12:05 AM
Re: Problem stopping packages/cluster on one node cluster
> script is ended!
> Also all filesystem are umounted!
This is not good enough as a criterium to decide whether the script has finished. cmsrvassistd waits for the shell script to finish (see wait(2) manual page). If the script did not finish, cmsrvassistd will not notice. When the problem reoccurs you should check the process list.
It is also possible that cmsrvassistd is unable to communicate with cmcld and update it on the new status. This is probably less likely.
Carsten
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2006 12:23 AM
03-10-2006 12:23 AM
Re: Problem stopping packages/cluster on one node cluster
The last commando executed in the package ctrl.sh script is:
- print "\n\t########### Node \"$(hostname)\": Package hal
t completed at $(date) ###########"
and
- exit 0
So, for me the script is ended...
Maybe the cmsrvassistd is unable to comunicate with cmcld.
In the syslog.log I miss some messages when it goes wrong.
Normal it should look like:
Mar 10 11:28:19
Mar 10 11:28:19
for package
Mar 10 11:28:28
Mar 10 11:28:28
Mar 10 11:28:28
But if it goes wrong it looks like:
Mar 10 11:28:19
Mar 10 11:28:19
for package
Then the last three lines are NOT displayed!!
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2006 01:31 AM
03-13-2006 01:31 AM
Re: Problem stopping packages/cluster on one node cluster
use "what /usr/lbin/cmcld | grep Date" to get that info.
What has changed recently, or has it always failed in this manner?
Does only one package do this, or multiple?
If only one packages runs normally, create a 2nd package that does nothing (do not modify the package control script), and start it up and run it for a while too before halting it and checking to see if SG will register the package halt completion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2006 01:37 AM
03-13-2006 01:37 AM
Re: Problem stopping packages/cluster on one node cluster
1. Version info of "/usr/lbin/cmcld":
A.11.12 Date: 11/10/2000; PATCH: PHSS_22541
2. It fialed always in this manner;
3. All four packages fial in this manner;
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2006 07:36 PM
03-13-2006 07:36 PM
Re: Problem stopping packages/cluster on one node cluster
Because the issue can be reproduced it might make sense to perform a tusc trace of cmsrvassistd before the next cmhaltpkg. tusc can be obtained from http://gatekeep.cs.utah.edu and you should run it with
# tusc -f -p -E -v -T%X -r all -w all -o /tmp/cmsrv.trc
This would show you whether cmsrvassistd keeps waiting (i.e. does not get signaled) or tries to send a message to cmcld.
Not sure this is worth it. You might also consider updating your patch level.
Carsten
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2006 10:36 PM
03-13-2006 10:36 PM
Re: Problem stopping packages/cluster on one node cluster
I tried tusc.
The cluster did go DOWN (aborted) the moment I started tusc and did a "cmcheckconf".
All the filesystems and running application where still running.
No cmcld or other MC/SG processes where running any more.
I had to stop all applications and umount all filesystems by hand.
Becose also "netstat -in" showed all package-IP adressen assigned to the lan-card I did
a reboot of the server to be able to start the cluster!
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2006 11:49 PM
03-13-2006 11:49 PM
Re: Problem stopping packages/cluster on one node cluster
If you can't afford to do do tusc, follow Carsten's advice and consider upgrading Serviceguard. This will give you current files with a current SG patch, replacing potentially corrupt or sick bits.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2006 01:27 AM
03-14-2006 01:27 AM
Re: Problem stopping packages/cluster on one node cluster
If you have gdb (product WDB) installed you might want to check out the core file of cmcld to get a stack trace:
# gdb /usr/lbin/cmcld /var/adm/cmcluster/core
gdb> bt
I suspect that cmsrvassistd died for some reason and could not be restarted by cmcld. What does syslog say?
Do you have by any chance PHNE_28895 cumulative ARPA Transport patch
installed? This patch is known to remove the route to the loopback network (127.0.0.0) and causes that cmsrvassistd cannot talk to cmcld anymore and cmcld cannot restart cmsrvassistd when it dies. It is not a perfect match of what you have seen so far therefore I did not mention it yet. But perhaps you doublecheck.
Carsten
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2006 01:34 AM
03-14-2006 01:34 AM
Re: Problem stopping packages/cluster on one node cluster
In the output file of the tusc command the following text is displayed:
< 2 7 > M a r 1 4 1 1 : 5 1 : 1 9 c m s r v a s s i s t d
[ 7 9 9 6 ] : L o s t c o n n e c t i o n t o t h e c
l u s t e r d a e m o n .
< 2 7 > M a r 1 4 1 1 : 5 1 : 1 9 c m s r v a s s i s t d
[ 7 9 9 6 ] : L o s t c o n n e c t i o n w i t h S e r
v i c e G u a r d c l u s t e r d a e m o n ( c m c l d )
: S o f t w a r e c a u s e d c o n n e c t i o n a b o
r t
One question here.
What when I do a "cmdeleteconf", after closing down the cluster, and than do a cmcheckconf and cmapplyconf?
Will this clear all "old" configuration files and rebuild it from scrats?
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2006 01:56 AM
03-14-2006 01:56 AM
Re: Problem stopping packages/cluster on one node cluster
The tusc trace you show is what I would expect if cmcld dies. cmcld and cmsrvassistd have a TCP connection open. When this goes down the daemons will notice this and cmcld would try to re-establish it (of course if it is just the death of the TCP connection and cmcld is still alive).
Well, you already said that cmcld died. The question is why? Because of the tusc trace?? Only a stack trace of cmcld and/or famous last words of cmcld from syslog would tell us. If you do not have gdb you can try using adb
# adb /usr/lbin/cmcld /var/adm/cmcluster/core
adb> $c
This does not always work though. gdb is better. I think this would be interesting because I could imagine that the problems reported are related (cmcld death, packages fail to halt).
Carsten
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2006 03:14 AM
03-14-2006 03:14 AM
Re: Problem stopping packages/cluster on one node cluster
Hereby the syslog.log output of the moment that cmcld died:
Mar 14 11:51:16
_MAX_USEC, file: timers.c, line: 792
Mar 14 11:51:19
oftware caused connection abort
Mar 14 11:51:19
luster daemon (cmcld): Software caused connection abort
Also the cmcld messages after startup in the syslog.log:
Mar 14 12:31:37
Mar 14 12:31:38
Mar 14 12:31:38
Mar 14 12:31:38
Mar 14 12:31:38
Mar 14 12:31:39
Mar 14 12:31:39
Mar 14 12:31:39
Mar 14 12:31:39
Mar 14 12:31:39
Mar 14 12:31:39
Mar 14 12:31:39
Mar 14 12:31:39
Mar 14 12:32:24
Mar 14 12:32:24
Mar 14 12:32:39
Mar 14 12:32:39
Mar 14 12:32:39
Mar 14 12:32:53
Mar 14 12:32:53
Mar 14 12:32:59
Mar 14 12:32:59
Mar 14 12:32:59
Mar 14 12:33:06
Mar 14 12:33:06
Mar 14 12:33:19
Mar 14 12:33:19
Mar 14 12:33:19
Mar 14 12:33:29
Mar 14 12:33:29
Mar 14 12:35:39
Mar 14 12:35:39
Mar 14 12:35:39
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2006 03:30 AM
03-14-2006 03:30 AM
Re: Problem stopping packages/cluster on one node cluster
No patch PHNE_28895 isn't installed.
We run on a HP-UX 11.00 server.
Regards,
CvB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2006 04:00 AM
03-14-2006 04:00 AM
Re: Problem stopping packages/cluster on one node cluster
> We run on a HP-UX 11.00 server.
This makes perfectly sense, because SG A.11.12 was only supported on 11.00 ... ok.
> Mar 14 11:51:16
> failed: (tsb_tmp).tsb_low <= TICKS_PER
> _MAX_USEC, file: timers.c, line: 792
This is the key message. An assertion has failed (i.e. a specific condition in the code that was expected to be true was really false and caused cmcld to abort).
I think this might be fixed in PHSS_23373 for SG A.11.12. Search the patch text for "TICKS_PER_MAX_USEC". This message can also be caused by a system hang though that starved out cmcld from CPU. Judging from the fact that you only run a 1-node cluster, this might be even more likely.
In no way I'd expect the tusc trace to be responsible for the cmcld abort.
To fix the most prominent system hangs it is advisable to call your support rep and to ask him to prepare a patch bundle that contains kernel patches that fix kernel hangs (LVM, Filesystem, SCSI, FC, ARPA, LAN, process management.. )
A system hang could potentially also explain the package halt problems. If cmsrvassistd is starved out it won't be able to deliver the return value of the package halt scripts back to cmcld.
Carsten
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG