Operating System - HP-UX
1834874 Members
2141 Online
110070 Solutions
New Discussion

EMS reports failed disk on ClarIIon every 24 hours

 
Paul F Rose
Advisor

EMS reports failed disk on ClarIIon every 24 hours

We just installed the following patch bundles on two of our 11.11 systems:

GOLDAPPS11i - June 2007
GOLDBASE11i - June 2007
HWEnable11i - December 2006
OnlineDiag - December 2006

# swlist OnlineDiag
# Initializing...
# Contacting target "facets"...
#
# Target: facets:/
#

# OnlineDiag B.11.11.18.05 HPUX 11.11 Support Tools Bundle, Dec 2006
OnlineDiag.Sup-Tool-Mgr B.11.11.18.05 Support Tools Manager for HPUX systems
OnlineDiag.EMS-KRMonitor A.11.11.05 EMS Kernel Resource Monitor
OnlineDiag.EMS-Core A.04.20.11 EMS Core Product
OnlineDiag.EMS-Config A.04.20.11 EMS Config
OnlineDiag.Contrib-Tools B.11.11.18.05 Contributed Tools

cstm is version A.53.05.

Since we applied these bundles we have begun receiving the following EMS messages regarding the disks on our ClarIIon disk array on a daily basis:

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Wed Oct 10 23:12:16 2007

facets sent Event Monitor notification information:

/storage/events/disks/default/1_0_4_0_0.10.4.0.0.0.6
is >= 3.
Its current value is SERIOUS(4).



Event data from monitor:

Event Time..........: Wed Oct 10 23:12:15 2007
Severity............: SERIOUS
Monitor.............: disk_em
Event #.............: 100472
System..............: facets

Summary:
Disk at hardware path 1/0/4/0/0.10.4.0.0.0.6 : Device connectivity or
hardware failure


I have configured EMS to stop generating these errors, but from reading other posts it seems that these types of messages from the EMC ClarIIon disk arrays should have been corrected, and we had not been receiving them prior to the installation of the above patch bundles.

Is there a patch I don't know about?
5 REPLIES 5
A. Clay Stephenson
Acclaimed Contributor

Re: EMS reports failed disk on ClarIIon every 24 hours

I wouldn't ignore those out of hand. It's just possible that the new Online Diagnostics actually fixed a problem and now error messages that should have been detected now are. Are you also getting messages in syslog indicating that LVM has switched to an alternate path?
If it ain't broke, I can fix that.
Paul F Rose
Advisor

Re: EMS reports failed disk on ClarIIon every 24 hours

When we first got these errors I checked with powermt and syslog to insure we weren't getting any error messages and they looked fine. I've also checked the old syslogs going back to Feb 2005 and we hadn't received any messages for at least that long.

So I really think it's a false positive from the patches installed . . .
Patrick Wallek
Honored Contributor

Re: EMS reports failed disk on ClarIIon every 24 hours

Do this:

# /etc/opt/resmon/lbin/set_fixed -L

The above will show all devices that EMS thinks are **DOWN**.

Now for each device listed do:

# /etc/opt/resmon/lbin/set_fixed -n

to set the device to UP.

Or to set ALL DEVICES to UP you can do:

# /etc/opt/resmon/lbin/set_fixed -n \*

To see the status of ALL devices do:

# /etc/opt/resmon/lbin/set_fixed -l
(lower case L).

Once all devices show UP, then monitor to see if you still get alerts. If you do then you either really have a problem or are getting false alerts.
A. Clay Stephenson
Acclaimed Contributor

Re: EMS reports failed disk on ClarIIon every 24 hours

I've seen quite a few false positive EMS warnings over the years. There is a later patch, PHSS_37100, but it doesn't specifically address your issue. You might try it.
If it ain't broke, I can fix that.
Andrew Merritt_2
Honored Contributor

Re: EMS reports failed disk on ClarIIon every 24 hours

Hi Paul,
There was a problem with that event being erroneously reported on Clariion drives, but it should be fixed in A.53.00.

There is a patch for that revision, which is PHSS_35808, though that has no fixes for disk_em.

A.53.00 is not a supported version anymore, I would recommend upgrading to A.57.00 which was the June 2007 release. I don't recall any reports of the problem with that release.

The patch that Clay mentions, PHSS_37100, is for the A.57.00 release; you can only apply it after installing A.57.00. Again, though, I don't see any signs of the particular problem you are reporting being fixed.

Do the disks show up normally in the ioscan output? Problems have been seen if they are, for some reason, showing as NO_HW.

My recommendation would be to upgrade to a currently supported version of the OnlineDiags, and contact HP support if that does not fix the problem.


Andrew