Operating System - HP-UX
1834163 Members
2734 Online
110064 Solutions
New Discussion

Problem with remsh and start/stop scripts

 
Dave Johnson_1
Super Advisor

Problem with remsh and start/stop scripts

I am running the command "remsh hosta /sbin/init.d/OraclePTCHapp stop" from hostd and it works fine. When I run the command "remsh hosta /sbin/init.d/OraclePTCHapp start" from hostd, the script runs on hosta, completes and exits, however hostb never gets the completion code and waits forever for it to complete. What could I have mucked up in the script that the stop alwasy sends the exit signal but the start section does not?
11 REPLIES 11
A. Clay Stephenson
Acclaimed Contributor

Re: Problem with remsh and start/stop scripts

You may have done nothing wrong at all. I suspect that it has more to do with how the Oracle daemon was written. The first thing that I would look for is to see if remshd on hosta is still running from one of your "start" commands. If so, then your remsh never will terminate as it is still connected to remshd. It's very common in writing daemons to do a setsid() which has the effect of diassociating a child process from a parent and means that signals are not behaving as expected. I'll bet that the "stop" command is not daemonized and does not suffer from this problem.

It is also (barely) possible that your remsh is waiting for input so that using -n may fix you but I think you are seeing the results of a daemonized process.

If it ain't broke, I can fix that.
Dave Johnson_1
Super Advisor

Re: Problem with remsh and start/stop scripts

When the start option is executed, I watch ps and see the processes start, monitor the log file while it runs, then see the processes end and dissapear when they have completed. However hostb does not seem to get the signal that hosta has completed executing the script. The start script calls another script that calls a script that calls a number of scripts, but I am not aware of any "funny" business going on. I believe the start and stop options are calling the same scripts with just the start or stop command line option added.
FYI: I did try to add an explicit exit 0 statement at the end of the start case, that did not effect the result.
A. Clay Stephenson
Acclaimed Contributor

Re: Problem with remsh and start/stop scripts

The exit 0 will have no effect. I assume that all your tasks have finished at this point. On the remote host execute this command:
UNIX95=1ps -C remshd

If you see any remsh daemons running then you are seeing an artifact of a daemonized process. You need to make this determination before going any further. If remshd's persist on the remote host after you think the remsh has finished all of its tasks then you need to know that. There may still be a way to outbushwhack this but it ain't pretty.
If it ain't broke, I can fix that.
Dave Johnson_1
Super Advisor

Re: Problem with remsh and start/stop scripts

While the script is running on the remote host, here is the ps command:
# ps -C remshd
PID TTY TIME CMD
19415 ? 00:00 remshd
# ps -fe | grep remsh
root 19415 966 0 15:40:50 ? 00:00 remshd
root 20178 19288 0 15:41:06 ttyp1 00:00 grep remsh

After the script exits on the remote host, here is the output on the remote host:
# ps -fe | grep remsh
root 20590 19288 0 15:42:24 ttyp1 00:00 grep remsh

At this point the local host is still waiting for the exit on the remote host:
Wed Aug 1 15:44:40 CDT 2007
root 7635 24731 0 15:44:40 ttyp8 00:00 grep remsh
root 6990 6989 0 15:40:50 ttyp9 00:00 remsh hosta /sbin/init.d/OraclePTCHapp start
root 6989 29138 0 15:40:50 ttyp9 00:00 remsh hosta /sbin/init.d/OraclePTCHapp start

Any other things to try?
A. Clay Stephenson
Acclaimed Contributor

Re: Problem with remsh and start/stop scripts

Ok, it's ugly bet here's the outline of a scheme.

Add a command at the end of your start script on hosta to remove a lockfile on hostd. This can be done via an NFS mount or a remsh command from hosta to hostd.

Inside your script on hostd.
1) Create a lockfile, e.g. /var/tmp/mylock
2) Invoke your remsh to hosta in the backgound.
remsh hosta xxxx start &
REMSH_PID=${!}
3) Loop until the lockfile is removed or a count is exceeded.
typeset -i MAXCOUNT=100
typeset -i COUNT=1
typeset DELAY=10
typeset LOCKFILE=/var/tmp/mylock
while [[ -f "${LOCKFILE}" && ${COUNT} -le ${MAXCOUNT} ]]
do
sleep ${DELAY}
((COUNT += 1))
done
if [[ -f "${LOCKFILE}" ]]
then
echo "Timed out." >&2
fi
# Now kill the remsh if it's still around
kill -0 ${REMSH_PID}
STAT=${?}
if [[ ${STAT} -eq 0 ]]
then
kill -15 ${REMSH_PID}
fi
rm -f "${LOCKFILE}" # just making sure

If it ain't broke, I can fix that.
Dave Johnson_1
Super Advisor

Re: Problem with remsh and start/stop scripts

You are right, the solution you propose is on the ugly side. It will work, but I feel the need to "solve" this problem before I move this from my sandbox to the development environment and then to production.
When the DBA returns next week, I will work with him to see if we can find some other cause along the lines of a daemon issue.
A. Clay Stephenson
Acclaimed Contributor

Re: Problem with remsh and start/stop scripts

You might try adding -m to the remshd entry in inetd.conf but I don't normally like to run remshd in this mode. Man remshd.
If it ain't broke, I can fix that.
Dave Johnson_1
Super Advisor

Re: Problem with remsh and start/stop scripts

Thanks for your suggestions so far Clay.
The purpose is to start/stop Oracle Applications 11.5.10.2.
I have been disecting the scripts that are called by the scripts, etc. I have gone through the first script and commented out all but one step at a time. That process allowed me to pin-point the problem. The problem occurs when the underlying script calls the script to start the Oradcle Application listner. I am going to further disect that to see if I can determine a fix.
Sandman!
Honored Contributor

Re: Problem with remsh and start/stop scripts

Just a thought - how about adding the "-n" option to your remsh in order to redirect stdin from /dev/null to the command line. It is possible that the remsh command is stuck in an infinite loop since the the started Oracle processes are running as daemons that are waiting for client connections are not returning any status back to the calling environment. So try changing your remsh command line as follows:

# remsh hosta -n "/sbin/init.d/OraclePTCHapp start"
Dave Johnson_1
Super Advisor

Re: Problem with remsh and start/stop scripts

Thank you sandman, it was a nice idea. However it did not work. Hostb waits forever for hosta to finish. I did perform a new test. While hostb is waiting, I issued the command in a new window to hosta to shutdown the Application Listener. As soon as the listener was down, the remsh from hostb to hosta completed and I get the command line back. This confirms the startup of the listener is causing the problem.

I will work with the DBA to see if there is another way to get the listener started with out calling it from the script I am trying to run.

I will update when I have more news,
-Dave
Dennis Handly
Acclaimed Contributor

Re: Problem with remsh and start/stop scripts

>Sandman: how about adding the "-n" option

Except Clay mentioned it above in his first reply. (That was my first thought too.)