Operating System - HP-UX
1833332 Members
2951 Online
110051 Solutions
New Discussion

Re: Who is wrong: EMS or Hardware?

 
Volodimir
Occasional Contributor

Who is wrong: EMS or Hardware?

Hi All!

I faced a problem with EMS monitor installed on HP9000 T600 server.

Problem descr:

Host has two HP 28696A HP-PB FWD SCSI adapters.
EMS monitor started log error messages after firmware upgrade on HP 28696A (namely rev. 3944) has been done.

Error messages looks as following:
**************************************************
Notification Time: Wed Jan 23 04:02:46 2002

giant03 sent Event Monitor notification information:

/adapters/events/scsi123_em/0_28_36 is >= 3.
Its current value is CRITICAL(5).



Event data from monitor:

Event Time..........: Wed Jan 23 04:02:45 2002
Severity............: CRITICAL
Monitor.............: scsi123_em
Event #.............: 103104
System..............: giant03

Summary:
Disk at hardware path : Hardware Failure.


Error Description:

The SCSI3 driver exhausted all permissible retries while attempting to
download a transaction recipe to the Fast/Wide SCSI adapter.

Possible Cause / Recommended Action:

The Fast/Wide SCSI adapter is in a hung state. Perform a system shutdown.
Cycle power to the computer and wait for it to reboot. If this does not
clear the condition, contact HP support representative to have the Fast/Wide
SCSI adapter checked.


Additional Event Data:
System IP Address...: 192.168.10.2
Event Id............: 0x3c4e19c600000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_scsi123_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/893
EMS Version.....................: A.03.20
STM Version.....................: A.28.00
System Serial Number............: unavailable
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/scsi123_em.htm#103104
**************** end of message ***************

Host has installed Oracle 8 server in addition to HP-UX 11.00 operating environment.
It has also installed XSWGR1100 B.11.00.51.2 general release patches and PHSS_24149 patch.
As shown in EMS message error is critical(interface in hang state) and system won't work with such kind of problem.
But system works without any problem. It works even hardware path points to Cascade
disk array where database tables reside.

Dear guys, if You met such problem and know how to solve it please reply.

Sincerely,
Volodimir
vvcher
3 REPLIES 3
Xavier Gutierrez
Frequent Advisor

Re: Who is wrong: EMS or Hardware?

Hi, Volodimir

Maybe it is an intermittent problem.
Did you try to stress the failed hardware with STM?

It may be also that when EMS tried to test that hardware, as a part of periodically made tests, there were a lot of I/O requests

Anyway, try to stress it with STM and see what happens (be aware of performance impact while STM is testing the hardware).

Best regards,

Xavier
Live fast, die young!
John Waller
Esteemed Contributor

Re: Who is wrong: EMS or Hardware?

Hi,

I have a similar thread running at the moment. See:

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x44bea2db8513d6118ff40090279cd0f9,00.html

I am asking whether to trust EMS or not as I have ran stm tools on a EMS reported faulty disk and it appears fine. This is on a G70 running HP-UX 11.00 and the faulty disk is the root disk, but my system is running fine.

Volodimir
Occasional Contributor

Re: Who is wrong: EMS or Hardware?

Thanks a lot to Xavier.

I'll try to implement Your recommendations.
You are probably right talking about excessive
number of I/O requests during system operation.
I'm going to compare time when errors appear with host workload.

Thanks again,
Volodimir
vvcher