ProLiant Servers (ML,DL,SL)
1752314 Members
5860 Online
108786 Solutions
New Discussion юеВ

ML370G3 Hanging Problem

 
Rob Pursley
New Member

Re: ML370G3 Hanging Problem

Yep, got all the latest updates including the bios and drivers for all the bits.
Simon Black
Occasional Advisor

Re: ML370G3 Hanging Problem

I have updated the BIOS.
Server has now been running for 24 hours.
I need to wait until it next hangs.
Simon Black
Occasional Advisor

Re: ML370G3 Hanging Problem

Our server has been up for 7 days without 'hanging'.

However, in response to a support request, HP have suggested clearing the CMOS on thhe server:
"Let clear cmos on the server.
Clearing CMOS
Down the server to clear nvram.
Access the cover when the server is powered down. See the cover for the location of the maintenance switch. Switch #6 on the system maintenance switch from a default position of off to on.
Then apply power to the server.
When you see video, then turn the server off.
Place the switch # 6 back to the default position and apply power. Select F9 at post and then save to reconfigure with the correct controller order and exit. "

We have done this today and rebooted.
If this has nay impact, I'll update this thread.
Rob Pursley
New Member

Re: ML370G3 Hanging Problem

Good news, at least for my machine. I called last week HP support and they said that it was the back plane for the power supply. According to the tech the old ones didn't have enough juice to handle the redundant fans that were installed. I dropped in the new one and fired up the server and it's not had one issue for almost a week now.

Good call HP; thanks!
Simon Black
Occasional Advisor

Re: ML370G3 Hanging Problem

Hi
One of our ML370s hung last Monday having been up for 32 days.
We updated both servers with RedHat up2date, latest version of PSP and BIOS last Thursday.
They then both hung during the weekend.
We have now been advised to disable Hyperthreading on the processors, so this has been done and the machines rebooted.

Edmund White
Frequent Advisor

Re: ML370G3 Hanging Problem

Which RAID controllers are you running on? The Smart Array 641 had a similar problem for awhile until a firmware upgrade came out.
Simon Black
Occasional Advisor

Re: ML370G3 Hanging Problem

Hi Edmund
Yes our ML370s are running on Smart Array 641 Raid Controllers.
How do we tell which firmware version is running?
What version should we upgrade to and where can it be downloaded?+
Incidentally, our two ML530s installed at the same time, with the same versions of Red Hat Linux, PSP etc have Smart Array 6402 Raid Controllers and these two servers have never exhibited the 'hanging' problem.
Edmund White
Frequent Advisor

Re: ML370G3 Hanging Problem

Right. The problem was related to the SA640, not the 6400 series. What version of Linux are you running on? My reply to a similar issue can be found at:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=645078

->snip<-
I bet it's the firmware on your Smart Array 641 controller....

I recently experienced a problem with the SA641 controller on ML350 and ML370 servers that caused the system load to rise very rapidly (> 40), halting most network services. It appeared as though the controller would shutdown and that processes that depended upon Disk I/O would go into STAT D (uninterruptible sleep), forcing the load up by one unit per process. Programs loaded into memory (the kernel, top, etc.) were unaffected. This always occured after 3-7 days of uptime (usually when physical memory was cached and swapping occurred).

This problem was fixed by replacing the 641 with a 6400 or 5300 series controller... OR downgrading the firmware (to the last revision from 2003) on the 641. The new firmware on the 641 was just released last week, and seems to have corrected the issue.

I spoke with several HP techs, as I have about 100 systems around to country to support. They told me to simply stop selling that raid controller until they released a new firmware. Messy. All of my systems are RedHat 8.0, run the 6.40 hpasm and cmastor drivers and use 5300, 6400 or 64x series raid controllers with custom vanilla 2.4.21 or 2.4.26 kernels. I experienced this in a repeatable fashion on a new ML350, but a coworker had the same issues with RHEL 3.0 on a 641-equipped ML370 and the 7.0 agents.

The bad firmware is the March 2004 Smart Array 1.92A. The good ones seem to be 2.26B or 1.30.

http://h18000.www1.hp.com/support/files/server/us/download/21214.html

So in this case, try downloading the new firmware for the 640 and see if it stops the crashes. To test, you may want to leave a console running top open on the server and watch the load rise. Most services will stop responding after the load hits 40+.
Simon Black
Occasional Advisor

Re: ML370G3 Hanging Problem

Hi Edmund
This is looking promising.
The Smart Array is currently running firmware 1.92.
I'll install 2.26B and see if it sorts it out.
Thanks for your help.
Simon