- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- MC/SG and Sterling Direct Connect issue
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-03-2008 09:31 PM
тАО01-03-2008 09:31 PM
MC/SG and Sterling Direct Connect issue
the product is Sterling Connect Direct version 3.8 and Sterling Connect Enterprise.
Recently cmcld is restarting the fileagent service frequently, and after the 3rd time it fails it over to the secondary node even though The service is still running on the main node.
Dec 17 12:26:27 cmcld: Automatically restarted service ceu_fileagt for the 2nd time after failure.
Dec 17 19:33:44 cmcld: Automatically restarted service ceu_fileagt for the 3rd time after failure.
Dec 17 19:54:22 cmcld: Automatically restarted service ceu_fileagt for the 1st time after failure.
Dec 18 19:41:37 cmcld: Automatically restarted service ceu_fileagt for the 2nd time after failure.
Dec 19 09:35:17 cmcld: Automatically restarted service ceu_fileagt for the 3rd time after failure.
Dec 20 19:23:40 cmcld: Automatically restarted service ceu_fileagt for the 1st time after failure.
Dec 20 19:29:02 cmcld: Automatically restarted service ceu_fileagt for the 2nd time after failure.
Dec 21 06:26:33 cmcld: Automatically restarted service ceu_fileagt for the 3rd time after failure.
Dec 22 12:33:45 cmcld: Automatically restarted service ceu_fileagt for the 1st time after failure.
Dec 22 19:34:32 cmcld: Automatically restarted service ceu_fileagt for the 2nd time after failure.
Dec 24 13:10:26 cmcld: Automatically restarted service ceu_fileagt for the 3rd time after failure.
Dec 24 16:05:35 cmcld: Automatically restarted service ceu_fileagt for the 1st time after failure.
Dec 24 18:46:36 cmcld: Automatically restarted service ceu_fileagt for the 2nd time after failure.
Dec 24 18:51:58 cmcld: Automatically restarted service ceu_fileagt for the 3rd time after failure.
Dec 26 07:57:35 cmcld: Automatically restarted service ceu_fileagt for the 1st time after failure.
Dec 26 19:09:36 cmcld: Automatically restarted service ceu_fileagt for the 2nd time after failure.
the oldsyslog shows:
Dec 26 07:57:35 cmcld: Service ceu_fileagt terminated due to an exit(1).
Dec 26 07:57:35 cmcld: Automatically restarted service ceu_fileagt for the 1st time after failure.
Dec 24 18:50:47 sshd[1764]: connection from
Dec 24 18:50:47 sshd[21656]: Remote host disconnected: Connection closed by remote host.
Dec 24 18:50:47 sshd[21656]: connection lost: 'Connection closed by remote host.'
Dec 24 18:51:36 inetd[28576]: registrar/tcp: Connection from at Mon Dec 24 18:51:36 2007
Dec 24 18:51:58 cmcld: Service ceu_fileagt terminated due to an exit(1).
Dec 24 18:51:58 cmcld: Automatically restarted service ceu_fileagt for the 3rd time after failure.
Dec 24 18:51:58 su: + tty?? root-ceadmin
Dec 24 18:52:12 inetd[3722]: bpcd/tcp: Connection from at Mon Dec 24 18:52:12 2007
Dec 24 18:52:32 inetd[6474]: pblocald/tcp: Connection from hostb at Mon Dec 24 18:52:32 2007
Dec 24 18:53:36 inetd[15662]: registrar/tcp: Connection from hostx at Mon Dec 24 18:53:36 2007
Dec 24 18:55:36 inetd[2057]: registrar/tcp: Connection from (10.9.210.21) at Mon Dec 24 18:55:36 2007
Dec 24 18:55:47 sshd[1764]: connection from "10.42.2.14"
Dec 24 18:55:47 sshd[3719]: Remote host disconnected: Connection closed by remote host.
Dec 24 18:55:47 sshd[3719]: connection lost: 'Connection closed by remote host.'
Dec 24 18:57:21 cmcld: Service ceu_fileagt terminated due to an exit(1).
Dec 24 18:57:21 cmcld: Service ceu_fileagt in package pkg has gone down.
Dec 24 18:57:21 cmcld: Disabled node x from running package pkg.
Dec 24 18:57:21 cmcld: Executing '/etc/cmcluster/pkg/pkg.cntl stop' for package pkg, as service PKG*40705.
Dec 24 18:57:22 pkg[16440]: cmhaltserv ceu_core
Dec 24 18:57:23 pkg[16629]: cmhaltserv ceu_ftp
Dec 24 18:57:24 pkg[16804]: cmhaltserv ceu_svd
Dec 24 18:57:25 pkg[16979]: cmhaltserv ceu_sshftp
Dec 24 18:57:26 pkg[17152]: cmhaltserv ceu_fileagt
Dec 24 18:57:26 pkg[17157]: cmhaltserv ceu_admin
Dec 24 18:57:27 _pkg[17340]: cmhaltserv ndm
Dec 24 18:57:28 pkg[17526]: cmhaltserv apache_monitor
Dec 24 18:57:29 pkg[17712]: cmhaltserv tomcat_monitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-04-2008 03:37 AM
тАО01-04-2008 03:37 AM
Re: MC/SG and Sterling Direct Connect issue
ServiceGuard is not aware of the manually-started file agent and tries to start its own copy. As there is already a file agent running, the start-up of the second agent fails. When the file agent process that was started by SG fails, SG tries to restart it... and when it has failed enough times, SG will move the package to the secondary node.
If the file agent has been configured to run as a ServiceGuard service, you should not use any methods other than ServiceGuard to start it: otherwise there will be confusion.
Examine the parent PID of the file agent process when the file agent is running on the secondary node. If I remember correctly, the parent PID should refer to some ServiceGuard process. Then examine the parent PID of the file agent process running on the primary node: you're likely to find that it's started by something else, or that it's been adopted by the init process (process number 1).
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-04-2008 11:54 AM
тАО01-04-2008 11:54 AM
Re: MC/SG and Sterling Direct Connect issue
From what you explained I should not have the first ps output as 1:
root:# ps -ef | grep -i fileagt
root 28862 20565 0 13:36:49 ttyp2 0:00 grep -i fileagt
ceadmin 12157 1 0 06:26:23 ? 0:07 /opt/ce_sterling/ceunix/hpux/bin/ceufileagt -l /opt/ce_sterling
ceadmin 12006 11890 0 06:26:07 ? 0:01 /bin/ksh /etc/cmcluster/pkg/ce_pkg/ce_services ceufileagt
ceadmin 11890 11367 0 06:26:07 ? 0:00 -ksh -c /etc/cmcluster/pkg/ce_pkg/ce_services ceufileagt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-04-2008 12:27 PM
тАО01-04-2008 12:27 PM
Re: MC/SG and Sterling Direct Connect issue
When you stop a package it should stop Everything it was running....Everything. Otherwise, you see what you see.
Above it was suggested that the process may have been started manually and then the package tried to start it again - thus confusion. It may have happened that way. OR as my thought it may be that the process never cleared up the last time somebody stopped the package.
Clean up processes on pkg shutdown - test and confirm. Failover package - test and confirm everything runs and everything stops with package.
Regards,
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-04-2008 12:57 PM
тАО01-04-2008 12:57 PM
Re: MC/SG and Sterling Direct Connect issue
Knowing what that executable (maybe a script?) does would probably help a lot in understanding this problem. I guess it starts the file agent and then spins in an endless loop, sleeping a while and then checking that the actual fileagent process is still running.
The point is: when all's well, this monitor process should just keep running. If something is wrong, the termination of the monitor process signals ServiceGuard that a restart is needed; then ServiceGuard starts it again, unless it has already been restarted too many times. That's the behavior that ServiceGuard expects from any "service" processes.
When an application is packaged for ServiceGuard, wrapper/monitor executables like this are created if the application process's behavior does not match ServiceGuard's expectations.
Something would seem to cause that wrapper/monitor to exit while the actual file agent is still running. The syslog message
Dec 26 07:57:35 cmcld: Service ceu_fileagt terminated due to an exit(1).
reveals that the wrapper was terminated with result code 1. Result code 0 would be "normal termination", anything else indicates trouble of some sort. As the wrapper does not seem to have produced any output (or you haven't shown anything identifiable as such), it's impossible for me to know why it has stopped.
The next step would be to examine the wrapper. If it's a shell script, there should be a loop of some sort inside it: maybe a "while true; do [...] done" structure, or something with the same effect.
I would then examine the commands within the loop structure, and think about how and why any of them could fail. I would pay particular attention to any conditional clauses that contain "exit 1".
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-04-2008 08:54 PM
тАО01-04-2008 08:54 PM
Re: MC/SG and Sterling Direct Connect issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-06-2008 03:19 AM
тАО01-06-2008 03:19 AM
Re: MC/SG and Sterling Direct Connect issue
If the wrapper is a part of some "Sterling Direct Connect - MC/SG Integration Kit", then the first thing would be to examine the documentation of the integration kit.
If there is no useful documentation available, the next step is to examine the wrapper. Is it a script, or something else?
What does the command
file /etc/cmcluster/pkg/ce_pkg/ce_services
report?
If it's a script, then try to read and understand it - you may find it's so simple there's only one place which can cause an exit with error code 1. If so, you've found out how the problem happens, and need only to find out why.
If you cannot understand the script, attach it to this thread if possible (note: you may need to get a permission for posting that if the script is considered a "trade secret").
The standard method of tracing a shell script is to add the "set -x" command to the beginning of the script. It will cause the script to display all the commands it executes.
Note that this will cause your ServiceGuard package log to grow quickly. Normally the log is located on the root filesystem, and filling it to 100% is very bad, so you should redirect the wrapper's output to /var/tmp or some other place that has plenty of free space. Or you might use symbolic links to move the entire package log to some other location.
If /etc/cmcluster/pkg/ce_pkg/ce_services is not a script, you might need to use tools like tusc to find out what it's doing. You'll also need someone with basic knowledge about programming to understand the output.
MK