Server Management (Insight Manager 7)
1833059 Members
2596 Online
110049 Solutions
New Discussion

Inconsistent Behaviour of Agents

 
Sinisa Kolaric
Occasional Advisor

Inconsistent Behaviour of Agents


Hi,

My conf: IM7sp2 + HP Management Agents 6.40 running on
W2K machines.

Are Agents inconsistent, or do I have an error in my
setup?

First point: Agents can detect when CPU time "fails".
In this case, Agents BOTH: 1) send a trap from CPQWINOS.MIB
and 2) flag (within themselves) the whole machine as "major".
(https://localhost:2381)

Next, HP Management Agents can also detect when a disk passes
its threshold. The corresponding trap from CPQTHRSH.MIB DOES get
sent. But Agents do NOT flag the machine as "major",
as is the case with CPUs.

Consequently, IM7 recognizes (through SNMP polling task)
that the machine is in "major" state for CPU, but does
not recognize the same fact for disk-full states.

Question: is this by design, or did I simply
make a mistake in my HP Mgmt Agents setup?

If this is by design, then that's totally inconsistent.
Because from my point of view, and moreso for my customers'
point of view, a machine is in "major" state BOTH 1) when
CPU time fails, AND 2) disk gets full.

If this is a mistake in my setup, can please somebody explain
how can I force HP Mgmg Agents to flag the machine as "major"
when a disk-threshold gets passed?

Thanks.
4 REPLIES 4
Rob Buxton
Honored Contributor

Re: Inconsistent Behaviour of Agents

I can only say that is is consistent, but probably not in the way you want.

CPU degradation is an ongoing problem which will give a performance problem whereas passing a disk threshold does not necessarily impact performance or the ability of the Server to deliver. I set the disk thresholds at 80% and 90% and do something about it before they run out of room.

The only equivalent of the CPU degradation is the Logical Disk performance degradation parameter. This retains the consistency as it is a direct ongoing performance issue that affects how the Server can deliver.

You will get Events generated whenever thresholds are passed and these can be used to alert people.


Sinisa Kolaric
Occasional Advisor

Re: Inconsistent Behaviour of Agents

Thanks Mr. Buxton.

You have a point. I had discussed this matter with my colleague, and we came to the same conclusion. CPU-failed is a critical state (a showstopper) that can affect the business immediately, and disk-threshold passed does not have to.

However we (you, my colleague and me) are technical persons. But let me tell you one thing: our customers can't understand the point above. They want to have the machine flagged as "major" in the case of disks as well.
David Claypool
Honored Contributor

Re: Inconsistent Behaviour of Agents

From a hardware point of view, the disk is always full :). When a disk is formatted, stuff is tucked into every sector. A failed device is a hardware condition whereas a full disk is an operating system condition.

More seriously, status polling is only a rough indication and really is supposed to be used as a "backup plan" in case the call for help in the form of an SNMP event trap is not received (since SNMP communications are not guaranteed delivery).

Further, SNMP status polling on anything that is not a ProLiant is really limited to up/down conditions (not just from IM7 but for anything that tries to do an SNMP status poll) and does not reflect individual component failures.

When a device is recognized by Insight Manager as a ProLiant running the Insight agents, it is able to go beyond the normal SNMP up/down indication by asking for the results of the "ProLiant Status Array." It is this unique SNMP OID available from the Insight agent that can provide the Major and Minor status indications (without this you are limited to OK/Critical).

All that being said, there is a way to get disk full conditions, and that is through setting the threshold. IM7 tries to make it easy to broadcast the threshold with a configuration task. SNMP status polling (even with the ProLiant Insight agent enhancement) is just not appropriate for gathering data such as disk utilization.
Rob Buxton
Honored Contributor

Re: Inconsistent Behaviour of Agents

Ahhh... Business Users and Techies.

I certainly would not want to see disk space thesholds affecting the HW status. So, it would need to be a configurable option.

If your Business Users want to know on a regulalr basis what disks are over a certain % full, then you can write SQL queries against the IM Database to extract this kind of information.
I've got a report I run weekly that does just that.
I'm afraid at this point you're going to have to tell the BU that this particular "free" product does not show disk space thresholds as a flag to the overall status of the Server.
It does generate an event, which you could leave uncleared until the issue was resolved. So the BU could trace the Event to the issue.