Operating System - HP-UX
1829617 Members
2203 Online
109992 Solutions
New Discussion

kill -9 $$ fails! So does exit! Remsh bug?

 
Trever Furnish
Regular Advisor

kill -9 $$ fails! So does exit! Remsh bug?

Under what circumstances might the exit command cause a shell to print "logout" but never actually close?

Same thing happens if I type kill -9 $$ (which ought to kill the current process). "Logout" gets printed, but it never really happens - it just hangs.

It doesn't always happen - just when I've run certain scripts (oracle startup scripts, apowsctl.sh start). Doesn't matter whether I use nohup to run the command or not.

Also, perhaps nohup provides a clue. Even though I'm starting a process with "nohup process &", when I type "exit", I'm warned that there are running processes. I thought that ought not happen when I use nohup, since the process I started should have been dissociated from the controlling terminal.

It's probably worth mentioning that the hanging only happens when I'm using a remote login shell, such as one obtained via remsh or ssh.

But it seems like a bug in HP-UX to me.

BTW, I've also tried every imaginable (by me) I/O redirection, including this one:
process 1>/dev/null 2>&1
Hockey PUX?
14 REPLIES 14
Bill Hassell
Honored Contributor

Re: kill -9 $$ fails! So does exit! Remsh bug?

The remsh comment is critical. If you are scripting remsh commands, then you won't have a controlling tty device and a lot of things get messed up unless you tell remsh to redirect tty devices into /dev/null. Use remsh -n


Bill Hassell, sysadmin
Trever Furnish
Regular Advisor

Re: kill -9 $$ fails! So does exit! Remsh bug?

The -n makes sense - forgot to mention that I had already tried it. :-) I'll try it again though...
Hockey PUX?
Martin Johnson
Honored Contributor

Re: kill -9 $$ fails! So does exit! Remsh bug?

Even when you use nohup, the system will warn you that you have running processes. Therefore, you have to enter exit twice.

Marty
Trever Furnish
Regular Advisor

Re: kill -9 $$ fails! So does exit! Remsh bug?

I should also mention that this happens interactively, not just when issuing a command within the remsh statement. Interactively, there IS a controlling tty, right?

If I were issuing a command with the remsh like so:

remsh host2 'tty'

...then I would not expect a controlling tty (and hence would get the message "not a tty" from the tty command above), but I would NOT expect this to be an issue when actually running an interactive session via remsh. For example, if I just remsh into host2, wait for the shell prompt, then issue my command, wait for the shell prompt again, then exit (twice if using nohup), I would not expect to have a problem related to the absence of a controlling tty, because there *was* a tty allocated. Nonetheless, the connection hangs after the remote shell prints "logout".

Again, this seems like a bug to me. I'm not sure what conditions might cause this behavior to be valid...?
Hockey PUX?
Bill Hassell
Honored Contributor

Re: kill -9 $$ fails! So does exit! Remsh bug?

remsh hosta

will actually translate into rlogin in which case there is a controlling tty. "remsh hosta tty" or "remsh hosta -n tty" should produce: "not a tty" in either case. This may be an obscuree patch issue. Do you have the 2002 patch bundles loaded?


Bill Hassell, sysadmin
Jordan Bean
Honored Contributor

Re: kill -9 $$ fails! So does exit! Remsh bug?


nohup does not disassociate a process from the controlling terminal or shell. It only disables SIGHUP and redirects stdout and stderr to a file.

To completely avoid this problem, try passing the command to batch instead of nohup. For example:

remsh host
echo "command" | batch

or

remsh host -n 'echo "command" | batch'


This way the command doesn't have to be disassociated since it is invoked under cron.

Trever Furnish
Regular Advisor

Re: kill -9 $$ fails! So does exit! Remsh bug?

The technique of using "batch" or "at now" does indeed allow the original process to continue merrily along without waiting. However, that's less than ideal because the next step depends on the successful completion of that step - if it starts too soon, then it won't work. It also makes me uneasy because I don't believe the problem is going away - it's just getting passed on to the cron daemon, which seems like a Bad Idea.

Here's the exact situation. I need to automate startup and shutdown of Oracle processes where the actual processes are distributed across several accounts and several machines.

For each step there is an Oracle-provided shell script which launches various Oracle binaries. The processes resulting from these scripts are daemons (for example the web listeners), not the scripts that I run directly.

Even if I do this interactively within an rlogin session, and even though the script that launches the binaries (apowsctl.sh, for example) successfully completes and exits, when I exit the rlogin shell I get the shell's "logout" message but the connection hangs.

In answer to your question about patch levels, Bill, we have various individual patches applied, along with the quality pack from March of 2001.

Thanks for the clarification of how nohup works - I should have read the man page before posting. :-)
Hockey PUX?
Jordan Bean
Honored Contributor

Re: kill -9 $$ fails! So does exit! Remsh bug?

I'm wondering if the resulting daemons are holding open the controlling file handles and preventing remshd from terminating. Do you have lsof?
Trever Furnish
Regular Advisor

Re: kill -9 $$ fails! So does exit! Remsh bug?

Regarding file handles and lsof - I was thinking the same thing. Unfortunately I didn't have a version of lsof that worked on 64-bit HPUX. Maybe I gave up trying to compile it too quickly.

Got any advice for compiling lsof for 64-bit hpux? Or suggestions on other "process profiling" tools that may be helpful?

Of course, I'm not sure I could get an update out of Oracle if I determined exactly the behavior that leads to this symptom (although I have more faith in HP having a fix).
Hockey PUX?
Sridhar Bhaskarla
Honored Contributor

Re: kill -9 $$ fails! So does exit! Remsh bug?

Hi Trever,

Look at the following thread on compiling lsof for 64bit.

I tried to do a work around but there are some other good suggestions too.

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x15eaf715edc6d5118ff10090279cd0f9,00.html

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Martin Johnson
Honored Contributor

Re: kill -9 $$ fails! So does exit! Remsh bug?

Re: lsof for 64 bit

If you have HP Openview Operations (OVO/VPO/ITO/OpC or whatever they are calling it these days) you may have a copy of it. Look for:

/opt/OV/contrib/OpC/lsof_64bit

HTH
Marty
Jordan Bean
Honored Contributor

Re: kill -9 $$ fails! So does exit! Remsh bug?

On the client, in roots home dir, I wrote this simple script to call lsof on the invoking shell:

#!/sbin/sh
lsof -p $$

The attached shows the results from two remsh invocations.

This one

remsh client -n 'nohup script /dev/null 2>&1'

_should_ eliminate the problem, but then you won't get any output back.

Since remsh does not return the errno of the invoked process, the process must return readable data.

Try this:

#!/sbin/sh
typeset -i errno=0
if remsh client -n './script /dev/null 2>&1; echo $?' | read errno
then
if [ $errno -gt 0 ]
then
echo script failed
fi
else
echo remsh failed
fi
Jack Werner
Frequent Advisor

Re: kill -9 $$ fails! So does exit! Remsh bug?

If you launch a process with surrounding parentheses ie
(process &), the process will be inherited by the init process. When you exit, there will be no batch processes tied to your session.
i'm retired
Frank Slootweg
Honored Contributor

Re: kill -9 $$ fails! So does exit! Remsh bug?

Can you reproduce the problem *outside* the original environment, i.e. without using Oracle etc., and with using only standard HP-UX commands and standard HP-UX files, i.e. only using some simple scripts?

If so, post that and we can look at it, try to reproduce it, etc..

Other than that, adding to the other responses:

While nohup(1) should redirect standard output and standard error and "remsh -n" should redirect standard input, I have seen cases where that was not sufficient and only *explicit* redirection solved the problem.

So in any location which *could* generate (error) output or/and ask for input, do yourself a favor and *explicitly* redirect, i.e.

/dev/null 2>/dev/null