Integrity Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

EMS Alert level At specific time

 
shabash
Frequent Advisor

EMS Alert level At specific time

i am facing continous alert alarm daily b.w 10:50 to 11:00PM

i have checked system MP logs but unable to find any hardware problem.Only temperature issues were coming which was solved.

Now daily Alert bother me to check system health daily and i want to check why its giving this event at specific time when there is no power issue.

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Wed Sep 15 22:51:08 2010

osm sent Event Monitor notification information:

/system/events/ia64_corehw/core_hw is >= 1.
Its current value is CRITICAL(5).



Event data from monitor:

Event Time..........: Wed Sep 15 22:51:06 2010
Severity............: CRITICAL
Monitor.............: ia64_corehw
Event #.............: 103001
System..............: osm

Summary:

Power Supply : Failure is detected.

Description of Error:

The system has detected that one of the power supplies has failed.

Probable Cause / Recommended Action:

The power supply has failed. Contact your HP support representative to
check the power supply.

For information on the sensor that generated this event, refer to
FRU ID in Event Details section.

Additional Event Data:
System IP Address...: 10.1.67.16
Event Id............: 1030012920100915225101
Monitor Version.....: C.04.00.05
Event Class.........: System
Client Configuration File............:
/var/stm/config/tools/monitor/default_ia64_corehw.clcfg
Client Configuration File Version....: A.01.00
Qualification criteria met.
Number of events: 1
Associated OS error log entry id(s)
None
Additional System Data:
System Model Number.............: ia64 hp server rx6600
EMS Version.....................: A.04.20
STM Version.....................: NA
System Serial Number............: SGH4843FKJ
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/ia64_corehw.htm#E103001


v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v


Event Details :

Event Date ...................: Mon Aug 30 14:41:32 2010
Sensor Number .................: 0x40
Sensor Type ...................: Power Supply
Sensor Class ..................: Sensor specific
Sensor Reading/Offset .........: 0x1 (Sensor Reading)
Event Type ...................: Assertion
Entity ID .....................: 0xa
Generic Message ...............:
Power Supply Failure detected
Entity FRU Id Info ............: Power Supply 0(Sensor ID Power Supply 00(ff-ff-ff-ff-ff-0-ff-44))

Error Details:

Additional information on this event can be obtained from evweb
logviewer (Refer SFM User Guide) with the following log id: 114088



>---------- End Event Monitoring Service Event Notification ----------<
14 REPLIES 14
Dennis Handly
Acclaimed Contributor

Re: EMS Alert level At specific time

>but unable to find any hardware problem.

You've checked all of the power supplies and made sure that they are all plugged in securely?
shabash
Frequent Advisor

Re: EMS Alert level At specific time

Yes Mp logs showing all hardware fine.

On SMH there are same critical erros giving power supplyies issues.But System health status is ok.

The log repeated every night and giving alert information.
Viktor Balogh
Honored Contributor

Re: EMS Alert level At specific time

Hi,

Maybe there is no HW problem at all, only the EMS subsystem has some defects. Try the following procedure:

1. stop EMS

# /etc/opt/resmon/lbin/monconfig

and then "k" for Kill

2. remove/rename the file /var/stm/logs/monitor/remindersel.dat

3. start EMS

# /etc/opt/resmon/lbin/monconfig

and then "e" for enable


Regards,
Viktor
****
Unix operates with beer.
shabash
Frequent Advisor

Re: EMS Alert level At specific time

As service guard is configured so we cannot disable EMS.

Hardware event monitoring watches the system for hardware problems. If
you shut this facility down, the system will no longer be able to alert
you to many hardware problems.
In addition, if ServiceGuard is configured to use hardware event monitoring
to determine the health of your system, then disabling monitoring may cause
it to consider this system as having failed. This will result in a package
failover. Type "h" for help to find out more about the implications of
shutdown on ServiceGuard.
Furthermore, if you have used the Event Monitoring Service (EMS) Graphical
User Interface (GUI) within the System Administration Manager (SAM) to
configure the event monitors, this configuration will not be saved, and no
actual monitoring will take place until hardware event monitoring is
re-enabled and you add the monitoring requests back again using the EMS
GUI. Event monitoring resources show up in the EMS GUI under the resource
class "status".



Are you sure you wish to disable event monitoring?


Any alternate way around?
Viktor Balogh
Honored Contributor

Re: EMS Alert level At specific time

hi shabash,

Please read my above comment carefully. It's not about disabling EMS, it's only about restarting it. Anyway, we also use ServiceGuard and this advice came direct from HP support as we also had some problems with EMS. The monitoring subsystem subsequently sent error messages about a defective power supply failure, which was already changed a month before.

Regards,
Viktor
****
Unix operates with beer.
shabash
Frequent Advisor

Re: EMS Alert level At specific time

i have tried to renable it.

but still the problem remain the same.

every night at specific time the error generated.but MP logs not showing any hardware error.
Viktor Balogh
Honored Contributor

Re: EMS Alert level At specific time

You can try this:

1. stop EMS
# /etc/opt/resmon/lbin/monconfig
and then "k" for Kill

2. remove file /var/stm/logs/monitor/remindersel.dat

3. start EMS
# /etc/opt/resmon/lbin/monconfig
and then "e" for enable

Let's see if it works...

****
Unix operates with beer.
Viktor Balogh
Honored Contributor

Re: EMS Alert level At specific time


>Event Date ...................: Mon Aug 30 14:41:32 2010

Is the event current? Maybe you just need to clear the old issues from EMS. See my reply above..
****
Unix operates with beer.
Prashanth.D.S
Honored Contributor

Re: EMS Alert level At specific time

Hi There,

First things first, this is a old event triggered now i.e., actual issue could have occurred on Aug 30 but alert was written to event.log on Sept 15th..

Also noticed this..

EMS Version.....................: A.04.20
STM Version.....................: NA <=== STM versiion not seen here..

Verify if Support tool Manager was properly installed. If not reinstall the same.

If you are using SFM then check if its installed properly..

Also you may try clearing the FPL and SEL logs from MP that could stop the alerts being triggered.

Best Regards,
Prashanth
shabash
Frequent Advisor

Re: EMS Alert level At specific time

we kill the EMS.

no file found in location /monitor
ls -l
total 32
-rw-r--r-- 1 root root 14416 Apr 22 2009 cmc_em.dat
-rw-r--r-- 1 root root 0 Apr 22 2009 fpl_em.dat


we now again started the EMS and "e" it.

but still no progress?still at specific time event pop up

MP all logs clear already.

NO mp indications.

Prashanth.D.S
Honored Contributor

Re: EMS Alert level At specific time

Hi,

Try this and let me know the results..

1. Clear all logs in MP, SEL and FPL with the MP:>SL (System Logs) and the Clear all logs commands.

2. Stop the EMS monitoring with /etc/opt/resmon/lbin/monconfig ==> disable event monitoring

3. Move or remove all logfiles /var/stm/logs/os/fpl.log* to prevent resending the old events after the restart of EMS.

4. Remove /var/stm/logs/monitor/fpl_em.dat .

5. Remove /var/stm/logs/monitor/fpltime.dat .

6. Remove /var/stm/logs/monitor/remindersel.dat .

7. Start the EMS monitoring with /etc/opt/resmon/lbin/monconfig ==> enable event monitoring

Best Regards,
Prashanth
shabash
Frequent Advisor

Re: EMS Alert level At specific time

perform all steps but still the problem remain the same

getting Event Alert at same specific time.

Viktor Balogh
Honored Contributor

Re: EMS Alert level At specific time

> perform all steps but still the problem remain the same

If you have a support at HP then open a case for this problem. And don't forget to post the result of the analysis here, maybe someone will profit from this thread if he/she will have a similar problem.
****
Unix operates with beer.
shabash
Frequent Advisor

Re: EMS Alert level At specific time

The issue is solved by performing the below steps.

Test run
to check alert
# /opt/sfm/bin/sfmconfig -w -q
# /etc/opt/resmon/lbin/send_test_event
For example:
# /etc/opt/resmon/lbin/send_test_event disk_em
# /opt/sfm/bin/sfmconfig -t â p
****************************************
Steps:->

Disable the SFM provider module:
#cimprovider -d -m SFMProviderModule

Back up and Remove file /var/opt/sfm/data/reminderEvent.dat .

Enable the SFM provider module:
#cimprovider -e -m SFMProviderModule

Test run
to check alert
# /opt/sfm/bin/sfmconfig -w -q
# /etc/opt/resmon/lbin/send_test_event
For example:
# /etc/opt/resmon/lbin/send_test_event disk_em
# /opt/sfm/bin/sfmconfig -t â p