ProLiant Servers (ML,DL,SL)
1751904 Members
5241 Online
108783 Solutions
New Discussion

Proliant 6400r enviromental abnormality shutdown problem

 
Candace_5
New Member

Proliant 6400r enviromental abnormality shutdown problem

We have just applied the softpaq 6.31a on 14 6400 servers. On 3 of those servers, an "envirnmental abnormality" error occurs after reboot when Windows NT starts and the server automatically shuts down in 60 seconds. You can not stop the shutdown with "shutdown -a" There were no problems before the new softpaq was installed. The hardware will work if an older version of the softpaq is used, containing 5.3 or 5.5 versions of the Management Agents. Although there does not seem to be a problem with the fans, there were errors detecting the fans during a couple of the restarts. This also does not occur with the older support pack configurations.
1 REPLY 1
Mark_601
Frequent Advisor

Re: Proliant 6400r enviromental abnormality shutdown problem

The info below is from a 6400r maintenance document.

Thermal Sensors
There are five Sensor Zones corresponding to three general areas in the ProLiant 6400R. The thermal sensors are located on the System Board in the processor areas and on the Power Backplane Board monitoring both power supply bays. Caution and Critical temperature thresholds are set causing specific system events to occur. The following considerations should be taken into account when accessing possible causes of thermal events:
Hot plug fans fans in the server and in the power supply are not monitored for marginal operation. Fan blades may be visibly spinning.but possibly not producing sufficient RPM to provide adequate cooling.
Primary and Secondary redundant hot plug fans are in constant operation.

The over heating of components on major assemblies is one of the most likely but not only cause of a thermal events.

The failure of the thermal sensors themselves.

From the System Configuration Utility, there is the ability to toggle the Thermal Shutdown parameter in NVRAM corresponding to EAAS, Environmental Abnormality Automatic Shutdown. It is enabled by default which allows for graceful system shutdowns when certain temperature thresholds are reached. If disabled, only warning/ cautions are reported.

Downlevel System ROM firmware and O/S Health Drivers. (Unlikely but the management of thermal events is controlled by these software components that could also contribute to problems)

The ASM-3 (Advanced Server Management) chip on the Standard Peripheral Board, SPB, receives feedback from the Thermal Sensors on the I2C bus. ASM-3 manages activities between the System ROM, O/S Health Drivers, redundant hot plug power supplies and fans, and the thermal sensors. If the ASM-3 fails or if there are other hardware problems on the I2C bus, then the SPB may need to be replaced.

Because different temperature thresholds are monitored for during POST and during normal system operation and there are programmable and non programmable thermal sensors used in the server, the periods between the reporting of thermal warnings and immediate power shutdowns may not always appear to be consistent.

Hope this helps.