BladeSystem - General
1748058 Members
5565 Online
108758 Solutions
New Discussion

Multiple E200i Cache Module failures in BL460c G1 servers

 
fms-jerry
New Member

Multiple E200i Cache Module failures in BL460c G1 servers

In the past 12 days, we've had 3 servers go down, due to what appears to be bad controller cache modules.       Last night alone, we had 2 go down.    Since it was after hours and the servers and the TS farm they're a part of were not in use, I was able to do a lot of troubleshooting.   I swapped parts between the working and non-working servers and pin-pointed it to the controller cache modules.   The batteries are fine, the controller/backplanes are fine, etc etc.    Everything works in the other servers, except the cache modules.      The error follows the cache modules... The error is that the controller failed the self-test.     It also appears there is no way to boot these servers with out a working cache module. 

 

The blade servers have been in use for about 2 years now.   Are we just hitting the life expectancy for the cache modules?   Was there maybe a bad batch of cache modules?      Any one know what might be going on?        1 failed after just a system reboot and the 2 last night went down after I installed more RAM.

 

I already contacted support and we have 2 more modules on their way, but I'd like to know for the future what may be causing the failures.  Maybe we need to purchase a couple extra modules to keep on hand?    We have 40 blades right now, so I'd just like to get tot he bottom of this before I have to go through 40 tech support calls and getting fired for having 40 servers fail in the next month.  ;)

 

The servers are all BL460c G1's  with e200i controllers.   

1 REPLY 1
fms-jerry
New Member

Re: Multiple E200i Cache Module failures in BL460c G1 servers

Also, since it will probably help... the cache modules are part number 413486-001.    The 128MB modules with the attached battery pack.