Operating System - HP-UX
1819818 Members
3303 Online
109607 Solutions
New Discussion юеВ

Re: Cron not running for one account, password not expired

 
Brian Tawney
New Member

Cron not running for one account, password not expired

In the last two days I have seen something happening with cron that I have never seen before. We have an account that has been running a few jobs through the cron with no problems since 2007. It runs one particular job every five minutes.

Yesterday, it ran this job until 8:55 am, and after that it ceased to run any jobs at all. The owner account is still active, and the password was not expired. To be safe, we changed the password anyway, but that did not resolve the problem.

Here is the crontab entry for the job that runs every five minutes:

00,05,10,15,20,25,30,35,40,45,50,55 * * * * /aps/store/commands/roll_ai.sh Production > /aps/store/commands/logs/roll_ai.log 2>&1

We are certain that the scripts are not even starting. As you can see in the example above, we redirect stdout and stderr to a log file, and the date/time stamp on this file is not being updated.

However, in the /var/adm/cron/log file, it shows the tasks as starting. For example, it shows the following:

> CMD: /aps/store/commands/roll_ai.sh Production > /aps/store/commands/logs/roll_ai.log 2>&1
> mfg 538 c Thu Oct 8 10:00:00 PDT 2009
< root 536 c Thu Oct 8 10:00:00 PDT 2009
< mfg 538 c Thu Oct 8 10:00:31 PDT 2009 ts=13

The account that owns the crontab can execute the script exactly as is appears in the crontab when we run it from the shell. Nothing obvious has changed--file permissions are all the same. None of the partitions are full. We stopped and restarted the cron daemon, and that had no effect.

After researching it all day, and being unable to explain the problem, we rebooted the server at 7:00 pm last night, and the cron began working again. It ran every five minutes, as scheduled, until 7:55 pm, then it stopped again. It started again this morning and ran until 9:05 am.

This is only happening for the crontab associated with one account. Cron jobs owned by other users do not appear to be affected.
7 REPLIES 7
Pete Randall
Outstanding Contributor

Re: Cron not running for one account, password not expired

Brian,

Any chance you're running into the queue limit for how many jobs can be scheduled (100)? See "man queuedefs".


Pete

Pete
Matti_Kurkela
Honored Contributor

Re: Cron not running for one account, password not expired

Which version of HP-UX and Quality Pack level are we talking about?

A reboot would silently fix any issues related to /etc/utmp corruption, as the utmp file is reinitialized at boot. It would also fix any issues related to over-long /var/adm/wtmp(x) for the same reason (but the old copies are kept safe).

The cron log would indicate that cron did try to run the job, but failed with return code 13. The redirections would take effect for the actual job command only... so if there was a problem in transitioning to the "mfg" user, there might be some error messages in the local mailbox of the "mfg" user instead of anything in the log file.

Have you looked into /var/mail/mfg? There might be some more clues.

It is also possible that the job just dies with error code 13 without outputting anything at all. In that case, there would be no need to write anything to the job's log file.

In your log example, the process has spent at least 30 seconds before terminating with return code 13. That sounds like something might be timing out... is the time between job start and job end messages always about the same when the job does not run properly?

Run the 'ps' command with suitable options in an infinite loop to see what's going on with the user "mfg":

while true; do clear; ps -fu mfg; sleep 2; done

Wait for the next timeslot of the cron job and see what happens.

Or add "set -x" to the beginning of the /aps/store/commands/roll_ai.sh script: that guarantees you get at least _some_ output in the job log file if the script even begins running.

MK
MK
Steven E. Protter
Exalted Contributor

Re: Cron not running for one account, password not expired

Shalom,

I would look at:
cron logs
/var/adm/syslog/syslog.log

Putting debugging diagnostics into cron run scripts to see if they are even running at all.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Brian Tawney
New Member

Re: Cron not running for one account, password not expired

Pete--
Thank you for the suggestion. However, it does not appear we are running into the queue limit.

Matti--
Thank you for your suggestions. Unfortunately, /var/mail/mfg doesn't have anything related to these jobs. I didn't realize 13 was the return code from the process. If so, then it looks like an EACCES error, which might make be related to messages I am seeing in the syslog.

Steven--
Thanks for mentioning the syslog. After my original post, our job that is scheduled to run every five minutes ran once, at 11:05 am. When I take the messages in the syslog at times when the jobs have run, and compare them to times when the jobs have not run, I see a definite pattern, but I don't understand it yet.

On those five-minute intervals where the cron jobs have failed to run, there are messages in the syslog from adclient saying:

INFO <41 nssnextprpasswd=""> daemon.ipcclient Skipping orphaned user 'CN=rdutt,CN=Users,CN=PSGWMPP1-SOX,CN=Zones,CN=Centrify,CN=Program Data,DC=jnj,DC=com'

Two or three of these messages appear in the log, and about 40 seconds later cdcwatch detects that the adclient process is not running, and it restarts it.

I don't know what the adclient does, so I am not sure how that might cause the cron jobs to fail, but there seems to be a correlation.
Centrify
Occasional Advisor

Re: Cron not running for one account, password not expired

Brian

Centrify support has been working with JNJ on a similar issue. We will be more than happy to help resolve the potential issue, please contact us at support.us@centrify.com

Centrify Technical Support
Brian Tawney
New Member

Re: Cron not running for one account, password not expired

The problem is being caused by version 4.0.0 of the Centrify DirectControl agent, which is part of a service that allows a Unix machine to join an Active Directory domain. It appears that the root cron job forks, but the child process fails to setuid() to the user that owns the crontab.
Dennis Handly
Acclaimed Contributor

Re: Cron not running for one account, password not expired

>the child process fails to setuid() to the user that owns the crontab.

At least on 11.31, both setgid(2) & setuid(2) done to the user that did crontab.

I'm not sure why you are getting that SIGPIPE.