Operating System - HP-UX
1752782 Members
6593 Online
108789 Solutions
New Discussion юеВ

Oracle Query for Error Statii

 
Ralph Grothe
Honored Contributor

Oracle Query for Error Statii

Hello Oracle Wizards,

I would like to write a plug-in for my Nagios monitoring that checks for severe Oracle errors or other indications of trouble ahead.
So far I have been using good old check_log2.pl
which I have parse an SID's Oracle alert log for any entry with a string "ORA-".
Though this has been working quite well,
I thought that there also must be a more active method in sending some SQL query on a certain table which collects error or critical states in the Oracle data dictionary.
Also would such a query kill two birds with one stone as it would check the SID's availability and responsiveness on the fly
(I think I could measure the latency and give it out as the plugin's performance data to be charted by Nagiosgraph).

Can you Oracle experts please tell me which tables or views from the data dictionary that show these data (if possible without requiring the DBMS to extend or patch in any way, but use its vanilla setup)?
I guess the Status field from the view V$INSTANCE isn't quite covering this?

Also, I am convinced that there must be a configuration option for Oracle that stipulates the DBMS to send out traps to some definable management node.
Or maybe even some event handler with, at best, some interface to any sort of scripting language.

Thanks

Ralph

Madness, thy name is system administration
3 REPLIES 3
Steven E. Protter
Exalted Contributor

Re: Oracle Query for Error Statii

Shalom Ralph,

I would simply parse the alert logs and be done with it.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Hein van den Heuvel
Honored Contributor

Re: Oracle Query for Error Statii

Dittto... process the alert log.


>> checks for severe Oracle errors or other indications of trouble ahead.

The think with 'severe' errors is that they often block further writes into to the database. So the alter log is the only place for Oracle to leave a message.

A qyuick query to v$instance is not a bad plan.

For trouble ahead, the disk space tends to be a big, somewhat predictable culprit. There are many scripts out to monitor that, allthough auto-extend muddle that some.

Good luck!
Hein.
Ralph Grothe
Honored Contributor

Re: Oracle Query for Error Statii

Hello SEP & Hein,

you probably are right that I should stick to checking the instances' alert logs periodically .

At the moment I have running scheduled nagios checks every 5 mins.
A colleague however, who is seeing the customers more regularly at jours fixes meetings, was claiming that I should narrow the check intervals even down to minute intervals,
which I think is overkill and only detrimental to performance on the nagios server.
What do you think about the necessity of less than 5 min check intervals?

Meanwhile I also installed the freely downloadable Instant Oracle Client on my Nagios server.
With it I hope to be able to render the indirect check_oracle checks via NRPE useless.
So far Instant Client's sqlplus is working quite well on remote DB instances
but I still struggle enormously with the installation of the Perl DBI interface build of DBD::Oracle.
I wonder if anyone of you have successfully compiled DBD::Oracle on Linux with Instant Oracle Client (IOC)?
Well, if I don't get DBD::Oracle installed there's at least always a fallback to writing a mere wrapper script around IOC's sqlplus.

As for disk/filesystem space,
I already do monitor this via check_disk and NRPE.
Luckily, our Oracle installations write to filesystems instead of to raw devices as our Informix instances do.
Madness, thy name is system administration