Simpler Navigation for Servers and Operating Systems - Please Update Your Bookmarks
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
If you have bookmarked forums or discussion boards in Servers and Operating Systems, we suggest you check and update them as needed.
BladeSystem Server Blades
cancel
Showing results for 
Search instead for 
Did you mean: 

Multiple E200i Cache Module failures in BL460c G1 servers

fms-jerry
Occasional Visitor

Multiple E200i Cache Module failures in BL460c G1 servers

In the past 12 days, we've had 3 servers go down, due to what appears to be bad controller cache modules.       Last night alone, we had 2 go down.    Since it was after hours and the servers and the TS farm they're a part of were not in use, I was able to do a lot of troubleshooting.   I swapped parts between the working and non-working servers and pin-pointed it to the controller cache modules.   The batteries are fine, the controller/backplanes are fine, etc etc.    Everything works in the other servers, except the cache modules.      The error follows the cache modules... The error is that the controller failed the self-test.     It also appears there is no way to boot these servers with out a working cache module. 

 

The blade servers have been in use for about 2 years now.   Are we just hitting the life expectancy for the cache modules?   Was there maybe a bad batch of cache modules?      Any one know what might be going on?        1 failed after just a system reboot and the 2 last night went down after I installed more RAM.

 

I already contacted support and we have 2 more modules on their way, but I'd like to know for the future what may be causing the failures.  Maybe we need to purchase a couple extra modules to keep on hand?    We have 40 blades right now, so I'd just like to get tot he bottom of this before I have to go through 40 tech support calls and getting fired for having 40 servers fail in the next month.  ;)

 

The servers are all BL460c G1's  with e200i controllers.   

1 REPLY
fms-jerry
Occasional Visitor

Re: Multiple E200i Cache Module failures in BL460c G1 servers

Also, since it will probably help... the cache modules are part number 413486-001.    The 128MB modules with the attached battery pack.