Operating System - OpenVMS
1828038 Members
1938 Online
109973 Solutions
New Discussion

Re: OPCOM error messages for monitoring

 
roose
Regular Advisor

OPCOM error messages for monitoring

Hi,

We are currently setting up CA NSM and I would like to implement log monitoring. I am just wondering if someone, or if there is some links out there, that might list down the error strings that we can use monitor our operator.log file. I am just looking for the basic critical message strings to start with. We're on OpenVMS 7.3-1, ES80. Thanks in advance for your help.
6 REPLIES 6
labadie_1
Honored Contributor

Re: OPCOM error messages for monitoring

"-E-" and "-F-" spring to mind :-)

But I guess you want something more specific.

Steve-Thompson
Regular Advisor

Re: OPCOM error messages for monitoring

Hi Roose

We use a CA Console monitoring software...
I assume it's the same one ... "Console Manager"

We have serial lines connected to the Alphas, the HSG's and the SanSwitches at one end and a DECserver at the other and a dedicated Alpha server with the CA software to monitor the alarms.

Is that the setup you need to discuss?

Regards
Steven
roose
Regular Advisor

Re: OPCOM error messages for monitoring

Hi Steven,

Not necessarily Console Manager, but I believe you can do set it up as well to monitor OPCOM messages. So, yes, I am looking for any information on the commong error messages that are being sent to OPCOM/operator.log file so that NSM can help us monitor these.

Thanks.
Thomas Ritter
Respected Contributor

Re: OPCOM error messages for monitoring

We use the prefixes %OPS-I- ,%OPS-W- and %OPS-E- with a trailing message. The scan look for the prefixes. Warp it up with some guidelines and procedures and it works really well.

Steve-Thompson
Regular Advisor

Re: OPCOM error messages for monitoring

Hi Roose

As Thomas stated you can generate your own messages within batch scripts, follow a given format and create an "event" with the prefix of the error message.

As I said we monitor not only the Alphas but HSG's and the serial port of SANswitches too!

Starting with the console, probably the first "event" to look for might be ">>>"
Then:-
Please enter date and time (DD-MMM-YYYY HH:MM)
bootstrap failure
Machine check
failed to open
Halted
Restarting

For processing (and the startup procedure):-
%CNXMAN (If you have a cluster)
%SHADOW (IF you use this! Ie. %SHADOW-W as %SHADOW-I will produce a message for every disk mounted during the startup!)
PAGEFILE.SYS
%SYSTEM-F
%SYSINIT-W
%SYSINIT-E
Unrecoverable error
%SHUTDOWN
QMAN-W (use -E and -F too)
%SMP-I (-F)

Security:-
Last interactive login on (see when somone uses the console)
(or) Username:

Operation:-
Please mount volume*^

There's a few ideas!
the *^ at the end of the last example shows how to allow "anything that begins with" our event string.

If you want to see the alarms on your PC, ie. you're not on the NSM workstation..

Start a "X" listener software on your PC and then:

On the Alpha with the alarm software, make sure DECwindows is running and do this!

$ set display /crea /trans=tcpip /node=
$console C3

You should see the alarm window on your PC.

You can also set up your own "watchdog"...
Eg. a process (run/detached) that runs every "few" minutes to send a single ping to the critial servers or other network elements.

You would need something like:-
$ reply /enable
$ set def sys$%manager
$ @tcpip$define_commands
$ ping -c1 somenode
$ if .not. $STATUS
$ then
$ request "%Your error message that's picked up by an "
$ etc...

Good luck
Steven

Steve-Thompson
Regular Advisor

Re: OPCOM error messages for monitoring

Sorry!
... there's a typo .... in my last message!
I should imgine it's obvious!

"set def sys$manager" works better without the "%" in it!

:-)
Steven