Operating System - HP-UX
1836825 Members
2136 Online
110110 Solutions
New Discussion

Re: KSH: a never ending SLEEP()

 
Enrico Venturi
Super Advisor

KSH: a never ending SLEEP()

Hello colleagues,
something strange is happening on my application!
We've a Korn Shell script starting 2 childrens and sleeping for sometimes.
It sleeps 5 seconds then checks something then sleeps again and so on.
At a certain time the sleep doesn't end anymore.
If we use "tusc" we see that the script is waiting (waitpid "-1,...) for a termination of one of his children; the sleep process should be one of them.
I mean: in normal conditions when the master script sleeps (5) a new UNIX process is created and the master script executes a waitpid (-1).
As soon as the sleep expires the master script is woke up.
In the wrong condition the sleep process doesn't exist anymore and then the master script never receives a signal.
If we kill one of the 2 children then the master script is woke up and continues working.
My question is:
which are the conditions making the sleep (5) never ending? or even: which are the condition making the sleep unix process deading without signalling anything to the father?

Thanks a lot

Enrico
5 REPLIES 5
Enrico Venturi
Super Advisor

Re: KSH: a never ending SLEEP()

please help us,it's a very urgent problem!!

thanks
Enrico
A. Clay Stephenson
Acclaimed Contributor

Re: KSH: a never ending SLEEP()

Without seeing a small example that illustrates the problem, it's difficult to know. It's possible that you have a signal handler (trap) that is already catching SIGALRM. It appears that your sleep command is never receiving the SIGALRM. First, are you sure that the sleep you are executing is actually /usr/bin/sleep. If it is /usr/bin/sleep, has it been replaced with an "improved" version.
If it ain't broke, I can fix that.
Enrico Venturi
Super Advisor

Re: KSH: a never ending SLEEP()

Please see below the script where the problem occurs:

export TAB=-1

. /usr/Systems/1359HA_8.0.0_Presentation/HA_1353NM_2_7.4_Presentation/etc/env_osres_common
. ${HA_HOME}/etc/commfunc
. ${HA_CAT}

InitOutAndErr CONTROL/Control.log 200

trap '' 1 18 25 27 28 30
trap 'ContinueClient STOP ; exit' 2 15

MSG_ACTIVE_KO_RETRY=0

rm -f ${INSTANCE_TMP_DIR}/OSResControl_Proc* > /dev/null 2>&1

export SwitchOutFile=${INSTANCE_TMP_DIR}/OSResControl_Proc.out
export MsgFileName=${INSTANCE_TMP_DIR}/OSResControl_Proc.msg
export CntFileName=${INSTANCE_TMP_DIR}/OSResControl_Proc.cnt

set -a

SwitchProcStatus=OFF
IconErrCnt=0
RetCnt=0
rm ${HA_HOME}/tmp/ul_integ_ret_cnt_* > /dev/null 2>&1

xdialog_processes_name=`basename $HA_MESSAGEBOX`

my_pid=$$
my_name="$0"

if [ "X_${INSTANCE_ROLE}" = "X_${MASTER_STRING}" ]; then
[ -z "$SERVER_PORT1" -o -z "$SERVER_PORT2" -o -z "$LOCAL_HOST" -o -z "$COMPANION_HOSTNAME" ] && errorexit "HA not correclty configured!"
host_type=Master
else
[ -z "$SERVER_PORT1" -o -z "$SERVER_PORT2" -o -z "$LOCAL_HOST" ] && errorexit "HA not correclty configured!"
host_type=Client
fi

TestInt $SERVER_PORT1
[ $? -ne 0 ] && errorexit "HA not correclty configured!"
TestInt $SERVER_PORT2
[ $? -ne 0 ] && errorexit "HA not correclty configured!"

if [ "X_${INSTANCE_ROLE}" = "X_${MASTER_STRING}" ]; then
host_1="$LOCAL_HOST"
host_2="$COMPANION_HOSTNAME"
fi

if [ "X_${INSTANCE_ROLE}" = "X_${CLIENT_STRING}" ]; then
host_1="$HOST1"
host_2="$HOST2"
fi

pid_1=0
pid_2=0

client_log_file_1=${INSTANCE_LOG_DIR}/SERVER/client_${host_1}_CNT.log
client_log_file_2=${INSTANCE_LOG_DIR}/SERVER/client_${host_2}_CNT.log

server_status_log_file_1=${INSTANCE_TMP_DIR}/server_status_${host_1}.log
server_status_log_file_2=${INSTANCE_TMP_DIR}/server_status_${host_2}.log
server_status_img_file_1=${INSTANCE_TMP_DIR}/server_status_${host_1}.img
server_status_img_file_2=${INSTANCE_TMP_DIR}/server_status_${host_2}.img

pids_list_file="${WINDOW_PIDS_DIR}/`basename $0`"

ContinueClient START

while [ 1 ]
do
echo " " >> ${CONTROL_LOG_FILE} 2>&1
echo "Date : `date +'%D %H:%M:%S`" >> ${CONTROL_LOG_FILE} 2>&1

echo "-------------------------------------------------------------------------------"
echo "$(date)"
echo "-------------------------------------------------------------------------------"

ContinueClient CHECK

UpdateIcons

[ -s $SV_SITE_FILE ] && UpdateIcons_SV $(cat $SV_SITE_FILE)

if [ "X_${INSTANCE_ROLE}" = "X_${CLIENT_STRING}" -a "X_${INST_TYPE}" = "X_DR" ]; then
CheckSwitch
CheckSwitchProc
CkeckULInstance
fi

du -sk $CONTROL_LOG_FILE | read control_log_file_size b
if [ $control_log_file_size -gt 50 ]; then
mv ${CONTROL_LOG_FILE}.bak ${CONTROL_LOG_FILE}.old
cp $CONTROL_LOG_FILE ${CONTROL_LOG_FILE}.bak
> $CONTROL_LOG_FILE
fi

if [ "X_${INSTANCE_ROLE}" = "X_${MASTER_STRING}" ]; then
du -sk $SERVER_TRACE_FILE | read server_log_file_size b
if [ $server_log_file_size -gt 200 ]; then
mv ${SERVER_TRACE_FILE}.bak ${SERVER_TRACE_FILE}.old
cp $SERVER_TRACE_FILE ${SERVER_TRACE_FILE}.bak
> $SERVER_TRACE_FILE
fi
fi

du -sk $HA_LOG_FILE | read osres_log_file_size b
if [ $osres_log_file_size -gt $MAX_HA_LOG_SIZE ]; then
cp $HA_LOG_FILE ${HA_LOG_FILE}.bak
> $HA_LOG_FILE
fi

if [ -f $client_log_file_1 ]; then
du -sk $client_log_file_1 | read client_log_file_1_size b
if [ $client_log_file_1_size -gt 200 ]; then
> $client_log_file_1
fi
fi

if [ -f $client_log_file_2 ]; then
du -sk $client_log_file_2 | read client_log_file_2_size b
if [ $client_log_file_2_size -gt 200 ]; then
> $client_log_file_2
fi
fi

CheckOutAndErrSize

echo " "
note "sleep ${POLLING_TIME} ..."
sleep ${POLLING_TIME}
done
Enrico Venturi
Super Advisor

Re: KSH: a never ending SLEEP()

This is a screenshot where the current status is described:
the master process is sleeping while no child sleep exists, so there isn't any chance to wake up the process.
Dennis Handly
Acclaimed Contributor

Re: KSH: a never ending SLEEP()

>This is a screenshot where the current status is described:
the master process is sleeping while no child sleep exists,

It appears the parent is 17180 and has two children: 17228 and 17215
This would imply you aren't doing sleep but doing a wait??

>which are the conditions making the sleep (5) never ending?

Somehow you got past the sleep? Can you instrument your script with set -x?

> or even: which are the condition making the sleep unix process dead without signalling anything to the father?

Someone did a kill?

What does tusc say about the whole process tree, from the start? I assume you can repeat it?

>Please see below the script where the problem occurs:

Where do you create the child processes? Is it in this script fragment?