- Integrated Systems
- About Us
- Integrated Systems
- About Us
03-15-2012 10:50 AM
Temperature of I/O module in slot 3 is at alert level.
Fredy's customer was having some issues:
I desperately need some help or ideas about the above problem. I have been fighting this issue at a remote site for months now.
There are 3 different C3000 Enclosures in which ALL the Cisco MDS9124e’s in bays 3-4 are constantly logging these types of errors:
Mar 13 07:18:55 OA: Temperature of Interconnect in slot 4 is at alert level
Mar 13 07:18:58 OA: Temperature of Interconnect in slot 3 is at alert level
Mar 13 07:19:03 OA: Temperature of Interconnect in slot 3 is normal
Mar 13 07:19:05 OA: Temperature of Interconnect in slot 4 is normal
The only issue in the Enclosures are the 9124e’s (all 6 of them).
The fan speed goes through the roof every time it happens and if 2 of the Enclosures happen to do it at the same time the noise is unbearable.
Every unused slot and every port on all modules not in use are plugged. There is also adequate cooling right in front of the rack.
I have followed the “solution” for this issue in SAW: http://saw.cce.hp.com/km/saw/view.do?docId=emr_na-c02435334&hsid=34497407
I took the OA to 3.50 and reseated all trays but have not replaced any OA Trays (I can’t imagine that I have 3 bad trays??). The OA 3.50 update takes the OA tray firmware to 2.10. I also tried the 4 Fan configuration since there are only 4 blades (per Monty’s suggestion many months back), but it does not make a difference. The MDS9124e’s are at Firmware 5.0(4) and I am not sure if this is an issue (could not find any information). I am out of ideas on what else to look at!
Any and all suggestions or ideas welcome!
HP c-Class BladeSystem offers several interconnect cooling methods. Cisco selected the simplest method which involves the switch processor measuring its temperature and setting one of three bits to indicate Warm, Alert or Critical temp on the module.
The OA responds to this method of interconnect cooling with fixed increases to the system fans as each threshold is asserted, and when the asserted threshold is removed, the OA brings the fans back to the levels to cool the servers.
Warm threshold does not generate an alert in the OA syslog, but the Alert threshold does – which explains the back and forth fan speed increase and decrease and corresponding OA syslog messages about Alert temperatures on the interconnect.
The Cisco 9124 switch does not use the Warm threshold, it only supports Alert and Critical. Since the OA logs interconnect temp Alert and temp Critical – this explains the log messages. Temp alert drives the fans high – which explains the fan behavior until they cool the switch a minute or so later.
Let’s try increasing the fan speeds a bit by configuring the BL460c G7 servers in each enclosure for increased cooling.
Reboot the servers to RBSU and select Advanced Options > Thermal Configuration > Increased Cooling, then reboot them.
This will increase all the server fan requests which will increase the cooling for the switches. According to the Cisco showtech – one of the four temperature sensors is right at the alert threshold and the other three are considerably cooler.
Any other suggestions or comments?