Operating System - HP-UX
1825766 Members
2092 Online
109687 Solutions
New Discussion

Re: EMS Notification of Disk Failing

 
Adey
Advisor

EMS Notification of Disk Failing

I have an L2000 system with 3 x 18GB drives.
I have HP-UX 11i installed, through SAM I'm setting up with the EMS GUI to report on disks, this should send an SNMP trap if the disk is not in the "UP" state.

Problem is that when I pull a disk from the system, an IOSCAN indicates NO_HW as it should, be but the EMS GUI front end lists the disks as "UP", so no SNMP traps are sent.

Anything I've missed in setting this up ?


A Whitby


4 REPLIES 4
Steven E. Protter
Exalted Contributor

Re: EMS Notification of Disk Failing

EMS doesn't need SNMP to work.

I used it for years on systems with no SNMP configuration.

Just use the GUI in SAM to set up for notification on disk failure by email (though this should be the default) and put in a valid email address and you should be good to go.

I suppose EMS has SNMP integration, but I've never used it.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Adey
Advisor

Re: EMS Notification of Disk Failing

I know SNMP is not required, however we are setting up remote monitoring in the Office and utilising the SNMP traps.
For some reason the disk when pulled from the server does not change state in the EMS GUI, but HP-UX does see the change and lists it as NO_HW.
John Waller
Esteemed Contributor

Re: EMS Notification of Disk Failing

Hi
It might not be a case of EMS not sending traps but a simple case of timing. I've found in the past that the EMS software is not the most responsive piece of software. I've no doubt other people on this forum could give a better explanation but from what I've seen of the EMS software it appears to monitor the system for events. If you look at an error message reported by EMS it mentions that it has received a number of events.
What you are seeing is when you pull the disk, unless attempts are made to access the disk no events are generated so this disk still appears UP. You can fool ioscan into reporting a removed disk by running ioscan -fnk. If you pull a disk and run ioscan -fnkC disk it will show all the disks as up and running. If you then try to dd from the disk, at that point your system will start complaining and the ioscan will report NO-HW. The same with EMS, If you dd from the disk (dd if=/dev/rdsk/cxtxdx of=/dev/null bs=1024) I believe you will then get the results you expect (possibly more so don't do this on a live server).
Andrew Merritt_2
Honored Contributor

Re: EMS Notification of Disk Failing

Hi,

What type of disk is involved, how are they connected (SCSI, FC, SAN, etc.), and what version of OnlineDiags do you have installed?

How long are you leaving the disk disconnected? Is any I/O attempted on the disk?

It depends on how the disk is connected whether the device driver can detect the disk removal.

The EMS HW monitors can report events detected in two ways, either by something being found by the device drivers and passed to the monitor (via diaglogd), or by the monitor polling the device and detecting the condition. Some of the monitors use just one method, and some use both. The disk monitor (disk_em) uses both.

The default polling interval for disk_em is 60 minutes, so if for some reason the device driver was not able to detect the disk removal, it could be up to an hour before disk_em would notice.

For some types of connection, such as Fibre, there were some issues with the drivers which needed a patch and some configuration changes to get disk removal detected. If you say how your disk are connected that would tell us if that was relevant.

In other cases, because of the nature of the connection, the device driver cannot detect that the disk has been pulled until I/O occurs.

Regarding John's point, for something like a disk being unresponsive, this would be reported as soon as the monitor detected it. It's less serious conditions that may happen occasionally without indicating a serious problem that require a threshold number of occasions to be met before an event is generated.
Andrew