Operating System - HP-UX
1833323 Members
3190 Online
110051 Solutions
New Discussion

remsh issue in the failover scripts

 
Bibi
Occasional Contributor

remsh issue in the failover scripts

I got a remsh issue that causes our mc/serviceguard script to hang. The script is a custom build script that calls functions under various unix accounts to halt our applications plus packages. When I run the script under root as "./script-name" - it halts when it comes to the remsh command. Never returns a value. But if I run it as "ksh -x ./script-name" the script works just fine. At first, I thought it was a remsh issue and checked everything in the /etc/hosts + inetd.conf file, plus the .rhosts file. Also checked permissions but I am now not certain if it is a remsh issue or an environment issue. Can anyone assist?
5 REPLIES 5
Massimo Bianchi
Honored Contributor

Re: remsh issue in the failover scripts

Hi,
i had similar issue with SG/SAP integration.

They were due to the way the remsh handles the stdin/stdout.


Try redirecting to /dev/null every non needed output, and to a file every needed output.

Also pay attention to the use of demon, like command started with nohup, because these file retain an open connection to the output file.

It may take you a little to work this out.

A trick i used was to start the remsh in background, and then wait and check if jobs were finished (using "jobs -l").

I used a timeout, and if job didn't finished in a timely manner, i killed them.


loops=0
again="1"
while [ $again != "0" ]
do
pids=$(jobs -l | cut -c 7- | awk '{ print $1 }')
print "$(date '+%b %e %X') - Node \"$(hostname)\": Waiting for remsh ["$pids"] to exit"
sleep 8
let loops=loops+1
if [ $loops -ge 450 ]
then
print "$(date '+%b %e %X') - Node \"$(hostname)\": Timeout exceeded. Killing all remsh still alive"
for idle in $(jobs -l | cut -c 7- | awk '{ print $1 }')
do
print "$(date '+%b %e %X') - Node \"$(hostname)\": kill -9 $idle"
kill -9 $idle
done
sleep 8
jobs >/dev/null 2>/dev/null # elimina il mes. "Terminated..."
fi
if [ $loops -ge 500 ]
then
print "$(date '+%b %e %X') - Node \"$(hostname)\": WARNING - Unable to kill all remsh. Continuing Oracle startup sequence anyway."
break
fi
again=$(jobs|wc -l)
done




If it is a issue with the oracle lsnrctl, i use the "at" command to start it and detach from every possible issue.

HTH,
Massimo
Massimo Bianchi
Honored Contributor

Re: remsh issue in the failover scripts

Hi,
forgot the part with the remsh :)
It is before all the previous code.

for i in $NODES
do
print "$(date '+%b %e %X') - Node \"$(hostname)\": Killing Baan processes on $i with remsh; PID = \c"
remsh $i -n "/etc/cmcluster/MP1/baanIVc.sh stop" &
pid=$(jobs -l | cut -c 7- | awk '{ print $1 }'| head -1)
print $pid
done


Massimo
Bibi
Occasional Contributor

Re: remsh issue in the failover scripts

Thanks Massino,

Tried the suggestions but to no prevail. I think you are on the right track with the stdin/stdout but the redirection still produced same results - remsh hanging. Unfortunately I cannot terminate the remsh, since the returned value is anticipated in the script, which causes the packages to switch.

Thanks though. If you have any other ideas, let me know.

melvyn burnard
Honored Contributor

Re: remsh issue in the failover scripts

A Few questions here
1) did this EVER work?
2) If so, what has changed? patches installed etc?
3) What OS version?
4) is remshd patched?
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Massimo Bianchi
Honored Contributor

Re: remsh issue in the failover scripts

Hi,
in another case is solved piping the output:


let say the command was


remsh HOST -l sidadm "startsap"



this hanged, i never understood why, but this error come after a patch that affected r-commands, but i don't remember which. you could check in the patch description...


was solved with this sintax:

remsh HOST -l sidadm "startsap 2>&1 | cat -tve >/dev/null"



also pay attention to use redirection local to the remote host, and not to the local host.


HTH,
Massimo