1819504 Members
2971 Online
109603 Solutions
New Discussion юеВ

EMS and resmon

 
Paul Mckay_2
Occasional Contributor

EMS and resmon

We are looking to enable hardware monitoring of HP-UX machines.
I'm currently struggling to figure out the relationship between EMS and resmon.
The EMS sytem accessed through SAM shows no monitors set up, and when you set up monitors it uses rules based on on the status of various components.
However when I run monconfig from under /etc/opt/resmon/lbin it shows a bunch of monitors set up, with rules based around severity of events.
A look at the documentation (Using the Evenvt Monitoring Service - B7612-90015.pdf) tells me that

'If there are no requests and Sentinel monitors are not installed
- a message displays on your screen: Currently no resources are being monitored. Use the action
- The field area of the main screen is empty'
If Sentinel monitors are installed, the screen is simply blank.'

I just get a blank screen, so this seems to mean that Sentinel monitors are installed. There is no mention of these Sentinel monitors again in the docs. What are these Sentinel monitors?

All I know is from my tests that if I pull out a disk from the array attached to the machine, I only get any indication of it happening if I have set up EMS through the SAM gui to check the disks.
Do I have to explicitly switch on the checking of such components on every HP-UX install in order to get hardware error messages? Is there a quick way around doing this? Are there any configuration files I can use and copy about the different installations?

Any help anybody can give me on this is greatly appreciated.

Cheers,

Paul


5 REPLIES 5
Carsten Krege
Honored Contributor

Re: EMS and resmon

With SAM's EMS GUI you can only configure status type monitors. These are monitors that are polled for status. You cannot configure hardware monitors.

With monconfig you can solely manage hardware monitors, provided by the hardware online diagnostic. These are event type monitors that automatically report events without being polled by the EMS Framework. Event type resources often contain the word "event" in the resource name.


# resls /storage
Contacting Registrar on grcdg319

NAME: /storage
DESCRIPTION: Storage Resources
TYPE: /storage is a Resource Class.

There are 2 resources configured below /storage:
Resource Class
/storage/events
/storage/status

The first resource tree "/storage/events" is the event resource and can be configured with monconfig. THe 2nd resource tree is "/storage/status" and is the status monitor that can be setup by SAMs EMS Gui.


SAM and monconfig are simply different clients to the EMS framework that do
different jobs.

Carsten
-------------------------------------------------------------------------------------------------
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
Paul Mckay_2
Occasional Contributor

Re: EMS and resmon

Thanks for the info.

It appears maybe that I wasn't giving events enough time to appear in syslog. After I pulled a disk out, it took around 7-8 minutes before a line appeared in syslog.

Is this a usual thing to expect with EMS?
It seems like an awfully long time to me before reporting a hardware error. Or is this perhaps peculiar to the test I am doing?
Carsten Krege
Honored Contributor

Re: EMS and resmon

It depends from how the hardware problem is detected by the monitor. I would expect an EMS event popping up shortly after the hardware diag detected the problem. Of course hardware monitors also need to "poll" the hardware in regular intervals to detect their health status.

However, this might require someone more familiar with hardware monitoring to answer.

Carsten
-------------------------------------------------------------------------------------------------
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
Paul Mckay_2
Occasional Contributor

Re: EMS and resmon

Ah, after having a root around, it seems the default EMS polling interval is 60 minutes.
This pretty much means that your users/customers are going to be a far better monitor of your hardware than EMS!

The configuration files are kept under /var/stm/config/tools/monitor

of course, not under any ems or even resmon directory.

Andrew Merritt_2
Honored Contributor

Re: EMS and resmon

EMS is a framework that is used to deliver notifications by a number of monitors, mainly the EMS Hardware Monitors and the HA (High Availability) Monitors. The Peripheral Status Monitor is one of the HW monitors, and is configured through SAM.

The other HW monitors are configured through monconfig (the configuration is at a different level, where to send notifications rather than what to monitor).

You can find documentation for the EMS HW Monitors at http://www.docs.hp.com/hpux/diag/index.html#EMS%20Hardware%20Monitors%20%28for%20HP%209000%29

How long it takes for a problem to be noticed depends on the problem. If a disk is pulled out (or goes bad), and no I/O is attempted, then the first time the problem is detected may well be when the disk monitor next polls the device to check its status, on an hourly basis. However, if I/O is attempted then the drivers will spot the problem, which should then be reported by the disk monitor. Most of the EMS HW monitors report events based on both polling (checking the status of the device) and those detected by the drivers when they occur.

Critical Events are not only reported to syslog, but also by mail to root, and all severities of events are logged to /var/opt/resmon/log/event.log.

Andrew