Operating System - HP-UX
1825795 Members
2317 Online
109687 Solutions
New Discussion

Re: Meaning of syslog entry

 
SOLVED
Go to solution
Ralph Grothe
Honored Contributor

Meaning of syslog entry

Hello,

this box exhibitted strange behaviour since yesterday evening.
I was told by application user as well as colleague sysadmin who was on duty that there were no logins possible and that the server finally had to be bounced.
However, the SG applications which are Oracle instances seemed to continue service,
and from the logs I couldn't detect any loss of cluster membership until the time of forced reboot this morning (i.e. no such cmcld entries in syslogs on both nodes)

I inspected all conceivable logs (also stm, ems, mwa, nettl etc.) but haven't found any clueful information.

I merely discovered a freshly dumped ts99 tombstone with populated TOC registers, but
no crash dump.

In the OLDsyslog.log the following entries have appeared since 5 days ago,

"Failed to open target localhost@null:Error logging in: TIMEOUT EXPIRED"

Unfortunately there is no process, pid, facility or level qualifier along with these messages so that I don't know who the originator was.
Probably theses log events are totally unrelated to the login difficulties since yesterday?

While going through the perfmon data I came accross a weird continuous linear rise of the global memory queue, starting from zero yesterday abt. 20:00 to hundred until this morning when the forced reboot happened.
Does this mean that procs were blocked waiting on memory pages, and what could be the cause,
maybe some leak?
Nevertheless there were no deactivations (viz. global swapout rate continously at zero),
and the total memory utilization was below 80%.
The other performance metrics were negligable (i.e. CPU, RunQ, CSwitches, Disk I/O, VM Disk Reads/Writes, LAN).

Thanks for your notice
Ralph

Madness, thy name is system administration
4 REPLIES 4
Chan 007
Honored Contributor

Re: Meaning of syslog entry

Hi Ralph,

I had a similar problem, where I it happened due to an orphan/rouge process tried to hold /tmp and kept leaping for 100% and forced a reboot during midnight.

Your's may be similar like a ran away process which started leaping the FS.

Hope this may be of any help.

I am just sharing my experience but not a solution.

Chan
Ralph Grothe
Honored Contributor

Re: Meaning of syslog entry

Hi Chan,

I think you are right that the cause must be a run-away proc.
Meanwhile I had a look at cron and batch jobs
and could identify one that started yesterday at 20:05, exactly when the MemQ started to build up according to a zoom into the PerfView chart.
Unfortunately the script is only a wrapper and dispatcher for a whole lot of SQL scripts that must have been placed there by the apps' Oracle DBAs.
I know too little about Oracle to know what's going on there (besides I'm not too interested to wade through that morras),
and passed the task of revisiting those jobs kindly to the DBAs, their originators.
Madness, thy name is system administration
Chan 007
Honored Contributor
Solution

Re: Meaning of syslog entry

Ralph,

Even my run away process was a DB job which used to do some archiving.

All you can do is just keep monitoring any arc process by doing ps -ef |grep arc, normally this should not have more CPU time.
If this jobs has comsumed more COPU time, better ask DBA to shutdown and start the DB.
Chan
Ralph Grothe
Honored Contributor

Re: Meaning of syslog entry

Yes, since I now know the potential culprit
I will set up another Nagios monitor which will keep an eye on the rss (for potential leakages) and cpu consumption.
This doesn't require much work since I already got a compiled check_proc Nagios plug-in.
So it's merely another service entry on the Nagios server or nrpe command definition on the monitored cluster node.
Madness, thy name is system administration