Operating System - Tru64 Unix
1753802 Members
8010 Online
108805 Solutions
New Discussion юеВ

/usr/sbin/collect: received fatal SIGALRM

 
moonzh
Occasional Advisor

/usr/sbin/collect: received fatal SIGALRM

I write korn shell script to collect CPU/MEM data by using "/usr/sbin/collect -i1 -R 2s -scm" command. This script will be called every 5 minutes by customized program. But got error message as below every 5 minutes.

Oct 3 14:28:58 testsys /usr/sbin/collect[407507]: exiting

Oct 3 14:34:22 testsys /usr/sbin/collect[407700]: started by root

Oct 3 14:34:23 testsys /usr/sbin/collect[407700]: received fatal SIGALRM

But when I manually run that shell script or "/usr/sbin/collect -i1 -R 2s -scm" command,
everything is fine and I could get data correctly.

Any help is appreciated.
9 REPLIES 9
Ralf Puchner
Honored Contributor

Re: /usr/sbin/collect: received fatal SIGALRM

maybe a heap/stack limitation within the script/user environment?
Help() { FirstReadManual(urgently); Go_to_it;; }
Joris Denayer
Respected Contributor

Re: /usr/sbin/collect: received fatal SIGALRM

The SIGALRM comes from collect itself. Because you have defined the duration option (-R 2s).
The setduration function in collect does this with alarm(seconds).

This signal is catched and via a syslog it comes in /var/adm/syslog.dated/current/daemon.log

So, this looks very normal to me

Joris
To err is human, but to really faul things up requires a computer
moonzh
Occasional Advisor

Re: /usr/sbin/collect: received fatal SIGALRM

Thanks, Joris. But if I do not use -R 2s,(which means 1s by default), I got empty output sometimes. So I have to let it run 2s. Any more suggestion? Thanks in advance!
Joris Denayer
Respected Contributor

Re: /usr/sbin/collect: received fatal SIGALRM

For this, you need an (non-existant) option to specify the number of samples that must be taken.
This shouldn't be very difficult, because the alarm time is now also the result of some calculations. SIGALRM should be raised after (nr_of_samples*interval)seconds. Could be a suggestion for improvement.

Actually, you must experiment with -R value in function of the system load, size, the selected subsystems (memory/cpu in your case), etc...
I don't have an easy solution for this.

There are solutions, if you do use awk/grep/sed/perl/head/tail or alike commands.
Following is a dirty example:

collect -scm -i 1 -R 5s | awk '/RECORD 1/, /RECORD 2/' | grep -v "RECORD 2"


First run enough time, to have at least one sample. Than, select all output between "RECORD 1" and "RECORD 2" and last throw the "RECORD 2" line out.

Enjoy

Joris
To err is human, but to really faul things up requires a computer
Joris Denayer
Respected Contributor

Re: /usr/sbin/collect: received fatal SIGALRM

This inputtool is really not good for input of commands.

There must be 4 spaces between the strings RECORD and 1
"RECORD1"

Obviously, the same spaces between RECORD and 2.

Joris
To err is human, but to really faul things up requires a computer
moonzh
Occasional Advisor

Re: /usr/sbin/collect: received fatal SIGALRM

Hi Joris, I tried the way you suggested. But it did not work. :( Any more suggestion?
Joris Denayer
Respected Contributor

Re: /usr/sbin/collect: received fatal SIGALRM

You can try attached program.
It works fine on my system
To err is human, but to really faul things up requires a computer
moonzh
Occasional Advisor

Re: /usr/sbin/collect: received fatal SIGALRM

What I tried is

collect -i1 -R7s -scm | awk '/RECORD 1/,/RECORD 2/' | grep -v 'RECORD 2' >/tmp/$$.cpumem 2>/dev/null

But it did not work.

And I would like to give more details about this issue. When I manually run above command and ksh script including this command, it works fine. But on my system, I have a agent running as a kind of service, used to call above ksh script every 5 minutes and transfer data to central monitoring server. I could not get any data from my Tru64 system, and keep getting SIGALRM error message.

Any more idea. Thanks in advance!
Joris Denayer
Respected Contributor

Re: /usr/sbin/collect: received fatal SIGALRM

I started this with cron and I got a nice output in /tmp/$$_cpumem.

I think that there is problem with the agent. Can't you give any details about it?
To err is human, but to really faul things up requires a computer