Disk Enclosures
cancel
Showing results for 
Search instead for 
Did you mean: 

Temperature problems in SC10

Marcel Burggraeve
Trusted Contributor

Temperature problems in SC10

We have the following problem at a customer site :
In the old situation we had a SC10 ( configured as JBOD ) with one controller connected to one HP9000 system running fine.
Another controller has been installed, configuration has been changed to split bus and the new controller has been connected to another HP9000.
Couple of days ago we received low temperature warnings via EMS for the new installed controller.
The controller has been replaced but now we're receiving temperature warnings on both controller.
One controller claims the temperature is too low ( 11 degrees Celsius, event 305 ) the other one ( the 'old' controller ) claims questionable data ( event 306 ).
I find it hard to believe that both controllers are defective, does anyone have a clue where to search for a solution ( patches, firmware, other STM version ? )

TIA

Marcel Burggraeve
Plusine B.V.
5 REPLIES
Steven E. Protter
Exalted Contributor

Re: Temperature problems in SC10

I WOULD suspect the controller or some common component to both(Core I/O???).

I'd need a model number to be more specific.

What is the actual temperature in there? I have a little keychain termometer that I can but in strategic parts of my servers and check the accuracy of the other devices.

It gets pretty hot inside those 9000 boxes.

Its only accurate in my opionin to plus or minus three degrees, but thats good enough for me to double check.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Eugeny Brychkov
Honored Contributor

Re: Temperature problems in SC10

Marcel,
please attach here:
1. these events;
2. information log for SC10 controllers from STM (cstm, map, sel dev #, info, il);
3. swlist -l fileset
As Steve said check the temperature :o)
Eugeny
Anu Mathew
Valued Contributor

Re: Temperature problems in SC10

Hi Marcel,

Firmware version is a definite thing to probe. As you have added the second controller recently, a small chance exists that the versions are different on both the controllers. Could you find out what F/W versions do you have?

Also, from my experience, I would also suggest a power-recycle of the entire array, as it has done magic to us manytimes. Shutting down involves, the drive enclosures first and then the drives. Power-On order is of drive enclosures first, wait for a couple of seconds and then start the controller enclosures. Obviously, this would need a downtime of about 30 or so minutes.

Hope this helps.

~AM
Marcel Burggraeve
Trusted Contributor

Re: Temperature problems in SC10

At the moment we're still waiting on more info regarding installed software and patches and the info log from STM.
What I do have is the EMS message from both HP9000 systems.

System 1 :
cmihp1(1)>------------ Event Monitoring Service Event
Notification ------------<

Notification Time: Wed Jun 18 18:47:03 2003

cmihp1 sent Event Monitor notification information:

/storage/events/enclosures/ses_enclosure/0_4_0_0.15.0
is >= 3.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time : Wed Jun 18 18:47:03 2003
Hostname : cmihp1.mi.pg.tno.nl IP Address : xxx.221.125.1
Event Id : 0x003ef0978700000001 Monitor : dm_ses_enclosure
Event # : 305 Event Class : I/O
Severity : MAJORWARNING

Enclosure at hardware path 0/4/0/0.15.0 : Hardware failure

Associated OS error log entry id(s):
None

Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_ses_enclosure.htm#305

Description of Error:

The temperature within the enclosure has dropped below the low warning
threshold of 15 C degrees.

Probable Cause / Recommended Action:

The current temperature in the enclosure is 11 C degrees. Check the
airflow, room temperature, and fans on and around the enclosure.

User Defined Annotation: .

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S
v-v-v-v-v-v-v-v-v-v-v-v-v

System 2 :
cmihp4(1)>------------ Event Monitoring Service Event
Notification ------------<

Notification Time: Wed Jun 18 15:09:59 2003

cmihp4 sent Event Monitor notification information:

/storage/events/enclosures/ses_enclosure/0_4_0_0.15.0
is >= 3.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time : Wed Jun 18 15:09:59 2003
Hostname : cmihp4.mi.pg.tno.nl IP Address : xxx.87.168.10
Event Id : 0x003ef064a700000000 Monitor : dm_ses_enclosure
Event # : 306 Event Class : I/O
Severity : MAJORWARNING

Enclosure at hardware path 0/4/0/0.15.0 : Hardware failure

Associated OS error log entry id(s):
None

Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_ses_enclosure.htm#306

Description of Error:

The temperature sensor on the enclosure services controller in slot A
is
reporting questionable data.

Probable Cause / Recommended Action:

The temperature sensor within the enclosure may be failing. Replace the
controller card.

User Defined Annotation: .

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S
v-v-v-v-v-v-v-v-v-v-v-v-v
Andrew Merritt_2
Honored Contributor

Re: Temperature problems in SC10

What revision of the OnlineDiags do you have installed?

I'm not sure if it's relevant, but there is a known issue where event 306 is generated erroneously for DS2300 and DS2405 enclosures. The fix is to install the latest version of OnlineDiags (HWE0303 A.38.00 for DS2300, or HWE0306, A.41.00 for 11.11 for DS2405 or HWE0303 plus PHSS_28956 for 11.00).

The other thing to check, as already mentioned, is what is the actual temperature in the room?