1833780 Members
2423 Online
110063 Solutions
New Discussion

Re: sadc processes

 
JDL_2
Advisor

sadc processes

Hi all,
I am curious if anybody can tell me how these processes just hang and use up all available processes in the system after a couple of days of accumulation. The result is I cannot telnet or rlogin or even login from the console. Below is an output from a ps -ef, and also a copy of what I have in the root crontab.
##############################
root 8078 1 0 09:33:01 ? 0:00 /usr/lbin/sa/sadc 1 11
root 8494 1 0 09:35:01 ? 0:00 /usr/lbin/sa/sadc 1 11
##############################

##############################
0 * * * * /usr/lbin/sa/sa1
20,40 6-18 * * 1-5 /usr/lbin/sa/sa1
05 18 * * * /usr/lbin/sa/sa2 -s 6:00 -e 18:01 -i 90
0 -A
##############################

system is N-Class HP-UX 11.00


9 REPLIES 9
David_246
Trusted Contributor

Re: sadc processes

Hi JDL,

Just type :

sar -d 10 10

Means, sar -disk (every) 10 seconds for 10 (times)

if I do a ps -ef | grep sa then :

root 27061 11287 0 18:41:25 pts/tf 0:00 sar -d 10 10
root 27062 27061 0 18:41:25 pts/tf 0:00 /usr/lbin/sa/sadc 10 11

So this might explain your sa. Not why it hangs. Do a truss/tusc of the process (tusc -p ) and see what it actualy does.

Is the directory /var/adm/sa writable or has is different problems ??


Regs David
@yourservice
S.K. Chan
Honored Contributor

Re: sadc processes

Not having much experience in running sar in cron mode I can only suggest a few things ..
1- You should comment them out from cron for the time being until you figure out what's wrong.
2- Check if you got the latest sar cumulative patch (PHCO_25174).
3- Run sar in command line mode to see if they produces and error.
4- See if you can save the existing data (saXX files) and then purge those files to refresh them.
David_246
Trusted Contributor

Re: sadc processes

Ah,

Got it again.
Just run sar without any options. It will display you an error.
This is the reason why it hangs. If it doesn't give you a proper error use "tusc sar"
It will display a lot detailed info.

Regs David
@yourservice
Darren Prior
Honored Contributor

Re: sadc processes

In addition to S K Chan's troubleshooting ideas: are you aware when this problem started - and any changes (patching, new software, cron changes, etc) that occurred around that time?

regards,

Darren.
Calm down. It's only ones and zeros...
JDL_2
Advisor

Re: sadc processes

Thanks for all the speedy replies.

David,
The processes hang because I have processes out there from this morning and it should stop after the 11th time. I also did a sar without the options and I did get a coredump. I attached it and maybe someone could interpret it for me. Also I do not have a truss or tusc command. Is that a utility that I could download and install?

S.K./Darren,
I do not have the lastest patch. This systems has an older patch rev, Sept 2001. This system is also a standby system to our primary production, for MC Serviceguard. The only change on this is that I have copied 2 SAP instances from 2 V-class boxes in order to consolidate servers. The primary box is exactly the same as this standby as far as all the patches. The SAP applications do not do much and are only used for queries by 2 or 3 users. The cron entries were not there initially and discovered all the sadc processes. We added the entries to see if it will stop the sadc process from accumulating. I have accumulated 149 sadc processes since the start of this thread. Any more input is greatly appreciated. Thanks.



keith persons
Valued Contributor

Re: sadc processes

Couple questions, do you really want a single snapshot of sar data at 20 and 40 minutes past the hour (2nd sa1 entry)? And, is the -A really on it's own line in crontab?

Regarding sa1, let me suggest you add a time and count variable to the 2nd sa1 entry - the first line of output from sar should always be ignored as it is the cumulative average of that variable since the sytem was rebooted. The information that is most important from sar is what happened in the interval between the iterations - this is what provides the performance measurements.

And, if the -A is on it's own line that could explain the sadc core dump - reason for failure - "sar: Number of samples and interval must be more than 0". To insure it's not alone, edit the file, place curson on the sa2 line and enter ^j to joim (assuming vi) then delete the extraneous 0.

One last suggestion - do not run sar, sa1, sa2 on intervals less than 5 seconds - an interval below 5 seconds is less reliable since the unix kernel structures are typically only updated every 5 seconds.

Let us know the results.

Keith
JDL_2
Advisor

Re: sadc processes

Keith,

The -A is on the same line and not on its own. I commented out the 3 cron entries and rebooted the system. The processes starts 2 minutes after the system is up. It keeps accumulating every 2 minutes after that. What could be spawning this process?
keith persons
Valued Contributor

Re: sadc processes


If I interpret correctly, you no longer have active sa* entries in root's crontab file yet after a reboot you're still getting spawning of the sadc process? Hmm, interesting to say the least. Looks like you'fe got a runaway process, sometimes difficult to track. A few things to try:

run crontab -l as root on the system and verify the sa entries are commented out, I would prefer this method, see what crontab potentially returns.

run the ps command twice in a row - making sure you have a between, and note the pid separation (with no arguments, this will show just your processes and not all of them) this should give us an idea of at least how many processes are getting spawned in between.

Also, you mentioned this is a cluster correct? If you're using root's .rhosts file, connect to the other system and rerun the crontab -l above - let's make sure there's not a remote call to it from the other node - unlikely, but something to verify.

Take a look in /var/spool/cron/atjobs on both systems, make sure there's no reference to the sa command there.

Then, check the /sbin/init.d/cron file - possible corruption, hidden characters, or a loop may have been inserted.

Lastly, might want to take a look for an updated cron patch.

If none of this illuminates the source we may want to open a case for more formalized troubleshooting or isolation, at the least these items will be eliminated from primary suspicion.

Let me know the results,

Keith
JDL_2
Advisor

Re: sadc processes

Hi keith. Sorry it took so long to response back but we couldn't find the source of what forks these processes. We ended up cloning the system through ignite, from the production system and making necessary modifications to the netconf file to get it back to its identiy. We then just laid all the data back from our omniback backups. That seems to have solved the issue. Thank you for your help and all the others that responded. This one is definitely has to be one of that could be put in the oddest issue vault. Thanks again.