Operating System - HP-UX
1830193 Members
3350 Online
109999 Solutions
New Discussion

init.d startup script errors out only after a reboot .. Works fine when run after reboot

 
Richard Ross
Regular Advisor

init.d startup script errors out only after a reboot .. Works fine when run after reboot

Guys .. I'm appending the following for one of our developers .. I ran into the same issue with another customer, but never found out the reason. Hopefully, someone will have some insight:

Here are steps that I've taken to try to solve the HP defect. I've also attached the startup script.

1) Moved the startup script from rc2.d to rc3.d. Also tried rc4.d but the machine only boots to run level 3, so it was never executed. Also tried renaming the script to S999cqserver_startup in rc3.d.

2) Provided absolute paths for all the UNIX commands. This reduced the number of errors, but didn't fix the problem. A Java wrapper script that is provided by Sun, which is invoked when the Java command is executed, has errors as well. As a test, I modified this script to use absolute paths and was able to reduce the number of errors even more, but ultimately, I still receive a Java exceptions when trying to bind to a port. This error message repeats until the CQ Server successfully starts which happens when the HP machine has fully booted. The script is designed to continually attempt to start the server until it is successful.

3) Tried simply sleeping for some amount of time at the beginning of the script, however, this just causes the whole boot process to take longer. It looks like the startup scripts are run synchronously, so adding a delay to the beginning of the script just delays the entire boot process not just the script.

4) Tried adding a loop that tests for the availability of a UNIX command that doesn't have its absolute path specified. This essentially hangs the machine because the loop will never evaluate to true for the same reason described in step 3.

The problem happens when it tries to execute two other scripts, which essentially starts two Java processes as a different user other than root. The script lines that do that are:

/usr/bin/su www -c "/usr/bin/nohup ./start_cqreg.sh > logs/jvm.stdout 2>&1 &"
/usr/bin/su www -c "/usr/bin/nohup ./start_cqrm.sh > logs/jvm.stdout 2>&1 &"

Again, the exact same script run fine when started after the boot process has completed on HP and runs fine as a rc2.d startup script on Solaris.

The script is as follows:

#!/bin/sh
CQWEB_HOME=/opt/rational/clearquest/cqweb
CQREG_PIDFILE=$CQWEB_HOME/cqregsvr/logs/cqreg.pid
CQRM_PIDFILE=$CQWEB_HOME/cqserver/logs/cqrm.pid

### This must be run as root
/usr/bin/rm -f /tmp/cqrmtempfile
/usr/bin/id > /tmp/cqrmtempfile
ROOT=`/usr/bin/grep "uid=0(root)" /tmp/cqrmtempfile`
if [ "x$ROOT" != "x" ] ; then
root=1
/usr/bin/rm -f /tmp/cqrmtempfile
else
/usr/bin/echo "you must run this script as root"
/usr/bin/rm -f /tmp/cqrmtempfile
exit 1
fi

### Use the pidfile to see if CQREG is already running
if [ -f $CQREG_PIDFILE ] ; then
CQREG_PID=`/usr/bin/cat $CQREG_PIDFILE`
if [ "x$CQREG_PID" != "x" ] && /usr/bin/kill -0 $CQREG_PID 2>/dev/null ; then
CQREG_RUNNING=1
else
CQREG_RUNNING=0
fi
else
CQREG_RUNNING=0
fi

### Use the pidfile to see if CQRM is already running
if [ -f $CQRM_PIDFILE ] ; then
CQRM_PID=`/usr/bin/cat $CQRM_PIDFILE`
if [ "x$CQRM_PID" != "x" ] && /usr/bin/kill -0 $CQRM_PID 2>/dev/null ; then
CQRM_RUNNING=1
else
CQRM_RUNNING=0
fi
else
CQRM_RUNNING=0
fi

if [ $CQREG_RUNNING -eq 1 ]; then
/usr/bin/echo "Rational ClearQuest Registry Server is already running"
else
rm -f $CQWEB_HOME/cqregsvr/logs/jvm.stdout
/usr/bin/echo "Starting Rational ClearQuest Registry Server, redirecting output to /cqregsvr/logs/jvm.stdout"
cd $CQWEB_HOME/cqregsvr
/usr/bin/su www -c "/usr/bin/nohup ./start_cqreg.sh > logs/jvm.stdout 2>&1 &"
fi

if [ $CQRM_RUNNING -eq 1 ]; then
/usr/bin/echo "Rational ClearQuest Request Manager is already running"
else
rm -f $CQWEB_HOME/cqserver/logs/jvm.stdout
/usr/bin/echo "Starting Rational ClearQuest Request Manager, redirecting output to /cqserver/logs/jvm.stdout"
cd $CQWEB_HOME/cqserver
/usr/bin/su www -c "/usr/bin/nohup ./start_cqrm.sh > logs/jvm.stdout 2>&1 &"
/usr/bin/sleep 20
fi

exit 0

And Additional info from the developer:

I'm running on an HP-UX machine, OS version B.11.11. The machine boots to run level 3.

A few notes on the problem:

-The main script S78cqserver_startup is run as root and seems to execute correctly, no path issues.
-The main script executes two other scripts which are having problems finding UNIX commands in /usr/bin unless the absolute path is given:
/usr/bin/su www -c "/usr/bin/nohup ./start_cqreg.sh > logs/jvm.stdout 2>&1 &"
/usr/bin/su www -c "/usr/bin/nohup ./start_cqrm.sh > logs/jvm.stdout 2>&1 &"

The original version didn't have an absolute path for nohup so it failed immediately with an error nohup: not found. After adding the absolute path I got past this problem, only to hit more of the same inside the scripts start_cqreg.sh, start_cqrm.sh, and .java_wrapper (Sun jre script)
-After fixing all path related issues, still receive the following Java errors until the system finishes booting. (Java process continually restarts until successful)

./start_cqreg.sh (251) Beginning Execution
RegistryServer Serving registry on port 1130
java.rmi.server.ExportException: Listen failed on port: 1131; nested exception is:
java.net.SocketException: No such file or directory
java.net.SocketException: No such file or directory
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.PlainSocketImpl.bind(Unknown Source)
at java.net.ServerSocket.(Unknown Source)
at java.net.ServerSocket.(Unknown Source)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createServerSocket(Unknown Source)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createServerSocket(Unknown Source)
at sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.listen(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.exportObject(Unknown Source)
at sun.rmi.transport.tcp.TCPEndpoint.exportObject(Unknown Source)
at sun.rmi.transport.LiveRef.exportObject(Unknown Source)
at sun.rmi.server.UnicastServerRef.exportObject(Unknown Source)
at sun.rmi.server.UnicastServerRef.exportObject(Unknown Source)
at java.rmi.server.UnicastRemoteObject.exportObject(Unknown Source)
at java.rmi.server.UnicastRemoteObject.exportObject(Unknown Source)
at java.rmi.server.UnicastRemoteObject.(Unknown Source)
at com.rational.clearquest.cqweb.jtl.JTLRegistryImpl.(Unknown Source)
at com.rational.clearquest.cqweb.jtl.JTLBaseRegistryImpl.(Unknown Source)
at com.rational.clearquest.cqweb.jtl.JTLRegistryServer.(Unknown Source)
at com.rational.clearquest.cqweb.jtl.JTLRegistryServer.main(Unknown Source)
./start_cqreg.sh (251) Restarting RegistryServer
./start_cqreg.sh (251) Restarted RegistryServer (343)
RegistryServer Serving registry on port 1130
java.rmi.server.ExportException: Listen failed on port: 1131; nested exception is:
java.net.SocketException: No such file or directory
java.net.SocketException: No such file or directory
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.PlainSocketImpl.bind(Unknown Source)
at java.net.ServerSocket.(Unknown Source)
at java.net.ServerSocket.(Unknown Source)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createServerSocket(Unknown Source)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createServerSocket(Unknown Source)
at sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.listen(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.exportObject(Unknown Source)
at sun.rmi.transport.tcp.TCPEndpoint.exportObject(Unknown Source)
at sun.rmi.transport.LiveRef.exportObject(Unknown Source)
at sun.rmi.server.UnicastServerRef.exportObject(Unknown Source)
at sun.rmi.server.UnicastServerRef.exportObject(Unknown Source)
at java.rmi.server.UnicastRemoteObject.exportObject(Unknown Source)
at java.rmi.server.UnicastRemoteObject.exportObject(Unknown Source)
at java.rmi.server.UnicastRemoteObject.(Unknown Source)
at com.rational.clearquest.cqweb.jtl.JTLRegistryImpl.(Unknown Source)
at com.rational.clearquest.cqweb.jtl.JTLBaseRegistryImpl.(Unknown Source)
at com.rational.clearquest.cqweb.jtl.JTLRegistryServer.(Unknown Source)
at com.rational.clearquest.cqweb.jtl.JTLRegistryServer.main(Unknown Source)


Thanks for any help .. Richard
2 REPLIES 2
Steven E. Protter
Exalted Contributor

Re: init.d startup script errors out only after a reboot .. Works fine when run after reboot

You can change the level the machine boots to on the first line of /etc/inittab

change
init:3:initdefault:

to
init:4:initdefault:

Then your script in run level 4 will execute.

If thats not enough, run the environment check.

add a line to the startup script:

env /tmp/environment.txt

Take a look at that in the various run levels and you may find something the product needs is missing.

It sounds like this is an application and this application may need full networking so it should probably be the last script to execute at startup.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Richard Ross
Regular Advisor

Re: init.d startup script errors out only after a reboot .. Works fine when run after reboot

Steve,

Thanks .. But they would like to keep it in runlevel 3 .. If I link to the init.d script in rc3.d so it would be the last item run, then the system should theoretically be up and available .. no?