ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ML 350 G5: Internal SAS Enclosure Device Failure (Bay 1, Box 1, Port 2I, Slot 0)

blentes
Frequent Advisor

ML 350 G5: Internal SAS Enclosure Device Failure (Bay 1, Box 1, Port 2I, Slot 0)

Hi,

 

today in the morning a ML 350 G5 just answered to ping, but did nothing else. No login, No service (ssh, web, Virtual Machines) answering anymore. I had to switch it off hard. After reboot nearly everything is running fine. I have the following entry in the integrated management log: Internal SAS Enclosure Device Failure (Bay 1, Box 1, Port 2I, Slot 0).

The status of the entry is "repaired". The "time of event" is: 1/7/2014 3:49PM. The "updated time" is: 1/23/2014 1:37PM. ???

What could be the reason for this error ? Do i have to do anything ? Update of the firmware (1.80) of the RAID-Controller ?

Can i prevent it in the future ? Why is the updated time some months elder than the event time ?

The server has 8 SAS-Disks, each 146GB. It is running a SLES 10 SP4 64bit. Two CPU's, each 4 Cores, 32GB RAM.

 

Thanks for any help.

 

 

Bernd

2 REPLIES
waaronb
Respected Contributor

Re: ML 350 G5: Internal SAS Enclosure Device Failure (Bay 1, Box 1, Port 2I, Slot 0)

The older log messages from January of this year sound like a drive failed on Jan 7 and was replaced on Jan 23.

That's probably unrelated to the event you had this morning where the server was responding to pings but nothing else. Since the IML didn't show any events from today, it was probably a software error.

I don't know how SLES prioritizes net activity over everything else, but if there was something using 100% CPU or otherwise taking extra system resources, it may have been responding to pings, just barely, but nothing else was able to get in.

I've had that happen with Windows too where some misbehaving software used all the memory and started paging heavily... the system interrupts were so bad that nothing else could happen... couldn't login, even locallly.

I was surprised the HP ASR didn't kick in, so it must have just been able to send/receive heartbeats and avoid an ASR reboot.

So, it can happen... I don't know if that's what it was in your case, but that would be my first guess. Check your OS and software logs to see what might have been running at the time and go from there.
blentes
Frequent Advisor

Re: ML 350 G5: Internal SAS Enclosure Device Failure (Bay 1, Box 1, Port 2I, Slot 0)

Hi,

 

i think you are right. It's possible that i had this phenomen already once, i don't remember exactly. The logs show nothing because the system was so busy that nothing has been logged. What can i do ? I will run atop on the suspicious machine to see what's happening just before the heavy load. But for this i have to log in very short intervals. I'm thinking of one second. Anyone else has another idea ?

 

Bernd