1834023 Members
2534 Online
110063 Solutions
New Discussion

sleep commands fail

 
Dave Chamberlin
Trusted Contributor

sleep commands fail

Greetings! I have a script that syncs two processes and has to sleep for a while. The script worked for several months when set at 600. Recently I had to increase the value to 1300 and the script errored out - with sleep exceeding its maximum value. I tried to call multiple sleep commands (sleep 600, sleep 600, sleep 100) but this also failed. Anyone know how to increase the system value (wo rebuilding the kernel) or a workaround? Thanks
5 REPLIES 5
Bill Hassell
Honored Contributor

Re: sleep commands fail

Something is wrong with your script. sleep can accept any positive integer so sleep 123456789 is just fine. There is no maximum value for sleep except UINT_MAX (hint: getconf UINT_MAX which returns 4294967295).

However, whenevet I see sleep in association with multiple processes as a 'sync' method, a red flag goes up. It will eventually fail because it is incredibly simplistic and inreliable. It assumes that the processes will always take the same time to run and of course, your processes aren't running as expected.

Start with tracing the process that is taking too long. Is it full of errors and completely failing? You need a method to bail out, report the errors and signal the dependent processes not to start or continue. Also read the man page for the command designed to handle this type of situation:

man wait

But better yet is to communiate interprocess status with a common file. When processes have completed, they write a message to the file.


Bill Hassell, sysadmin
Yang Qin_1
Honored Contributor

Re: sleep commands fail

Can you capture the exit status code when sleep command failed? If you run sleep 1300 from command line it will fail immediately or it will fail after certain period? multiple sleep commands failed at the second one or failed at the first one?

From sleep man page:

sleep exits with one of the following values:

0 The execution was successfully suspended for time seconds, or a SIGALRM signal was received.

>0 If the time operand is missing, is not a decimal integer, is negative, or is greater than UINT_MAX, sleep returns with exit status 2.

Yang
Dave Chamberlin
Trusted Contributor

Re: sleep commands fail

I had changed the value in sleep from 600 to 1300 and since, the script fails with error code -2 and attributed it to sleep issue (I did not know how to query UINT_MAX). I will look for something else in the script. Thanks
Bill Hassell
Honored Contributor

Re: sleep commands fail

To query UINT_MAX:

getconf UINT_MAX

The value for UINT_MAX is 4294967295 so:

# sleep 4294967296
sleep: illegal argument

and

# sleep -2
sleep: illegal option -- 2
usage: sleep time

However, sleep actually ignores any number larger than 2147483647 up to 4294967295 causes sleep to immediately return with no error code.

Unless you trace the script, you won't be able to determine where the error code was generated until you trace the script. Add the command set -x at the front of the script and make sure stderr is redirected into a logfile.


Bill Hassell, sysadmin
A. Clay Stephenson
Acclaimed Contributor

Re: sleep commands fail

Sometimes it's good to look at the underlying system call or function to really understand what is going on. Sleep normally terminates when a SIGALRM is caught. Essentially a signal handler for SIGALRM is setup, an alarm(sleep_seconds) is done and then a pause() is done. The pause terminates whenever a signal is received. What is probably happening is that your process is receiving another signal during the sleep.

As suggested, you will probably be far better served by setting up some sort of lock file to coordinate the processes or some sort of sockets-based semaphore server if this must be coordinated across multiple hosts. The sleeps simply cannot cope with expected events such as seeming spurious signals or processes that take atypical amounts of time to complete.


If it ain't broke, I can fix that.