ProLiant Servers (ML,DL,SL)
1753464 Members
4656 Online
108794 Solutions
New Discussion юеВ

Re: Intermittent Server Hangs - DL360

 

Intermittent Server Hangs - DL360

I have a many DL360 servers. Some run Linux some run W2K. Recently I had a DL360 running Linux start giving me kernel panics. I suspected a drive problem, but wasn't sure. Rather than spending much time, I recreated the Linux server on a different DL360.

Now I am trying to determine what is wrong with this bad DL360 that randomly fails. Sometimes it runs for days - sometimes for only minutes and other times it will not boot at all.

Most of the time when it fails I get a W2K blue screen with stop error 0x0000001E ( 0xC0000006... although the stop error has been different rarely.

I've followed some threads regarding the memory.dmp file, but the disk drive usually has an amber drive array light - so no memory.dmp is written.

So far, I have:
1) Installed NEW RAM.
2) Installed a NEW Integrated Smart Array Controller.
3) Tried other hard drives, but even this one will work fine in another of my DL360s.
4) Swapped the drive to the 2nd bay.
5) Upgraded the P21 System Bios to the 11/15/2002 release.
6) Upgraded the Smart Array Controller firmware from 1.42 to 1.50.
7) Replaced the CD/Floppy module.
8) Removed the 866mhz CPUs one at a time.

None of these made any difference. Temperature is not an issue.

I've ran Compaq Diagnostics. All tests pass in flying colors. If I run it continuously the problem will eventually happen again, but the server generally reboots without providing any kind of error information.

Sometimes right after W2K startup I will get "Unknown Hard Error" dialog boxes.

I've tried 3 different drives in this box. All do the same thing, so I'm just not convinced it is a drive problem.

What can I do to test/troubleshoot this further?

Bob Kramer
22 REPLIES 22
Steven Clementi
Honored Contributor

Re: Intermittent Server Hangs - DL360

The system board?

The VRM's?


Have you checked the integrated Management Logs for errors?



Steven
Steven Clementi
HP Master ASE, Storage, Servers, and Clustering
MCSE (NT 4.0, W2K, W2K3)
VCP (ESX2, Vi3, vSphere4, vSphere5, vSphere 6.x)
RHCE
NPP3 (Nutanix Platform Professional)

Re: Intermittent Server Hangs - DL360

Blue Screen Trap (BugCheck, STOP: 0x0000001E (0xC0000006, 0x5FFC0EBC, 0x00000000, 0x5FFC0EBC)) - Operating System

ASR Detected by System ROM

POST Error: 1779-Drive Array Controller Detects Replacement Drives


There are other variations of these STOP errors. 3 or 4 maybe. Unfortunately I've been clearing IML frequently after replacing a piece of hardware. Most of the STOP errors are 0x0000001E/0xC0000006. Some of them say Drive Array Device Failure.

I know the drive/controller is failing... but the problem does not stay with the drive or the controller. Replacing them both with NEW parts doesn't stop the problem... :S

What is a VMR? Also, when you say System Board are you meaning the main motherboard?

Thanks.

Bob Kramer
Steven Clementi
Honored Contributor

Re: Intermittent Server Hangs - DL360

VRM = Voltage Regulator Module? Pretty sure the 360 has them. Usually right next to the CPU's. Is this a 360 G2? or G3?

When I say Systemboard... Yes, I mean the motherboard. However unlikely it might be, it is one of the things you have not swapped out. It would/should be the "last resort" of course, but still a possible suspect.


Steven
Steven Clementi
HP Master ASE, Storage, Servers, and Clustering
MCSE (NT 4.0, W2K, W2K3)
VCP (ESX2, Vi3, vSphere4, vSphere5, vSphere 6.x)
RHCE
NPP3 (Nutanix Platform Professional)

Re: Intermittent Server Hangs - DL360

This is just a DL360. Dual 866. 512MB. 18.2g Ultra Wide SCSI.
Jonathan Bonney
New Member

Re: Intermittent Server Hangs - DL360

Bob, I am very interested in this. I have a DL360 of the same series, and I have started to see the same problems. It took me a week to install Windows Server 2003 to this machine. After replacing these parts below I was able to finish the install but still have problems.

1. Upgraded the bios to the 11/15/2002 release.
2. Upgraded the firmware on the Array Controller to 1.50
3. Swapped the drives to other servers.
4. Installed a new SCSI Backplane.
5. Installed a new Power Supply.
6. Installed a new SPS-Filter.
7. Installed a new 16MB Raid Card.
8. Installed a new System Board.
9. Replaced one SCSI Drive.

Now I see random shutdowns, sometimes hours apart and sometimes days apart. Since 12/7 I have seen 8 Drive Array Device Failure errors.
I have also run the HP/Compaq diagnostic tools multiple and seen no errors. I used a burn in program to throw continuous amounts of information at the processors and it ran literally all day long with no adverse results, but when I tried to test read/write to the drives, system the server locked up.

This is a non-production server, but I'm told that I need to have it ready by the end of the year. Any info that you get on this would be appreciated.
Fred Armantrout
New Member

Re: Intermittent Server Hangs - DL360

I have had similar problems with Two DL360's. Both were running Microsoft 2000/2003 or SuSE Linux. The system give no warnnig other than the internal drives start failing. First one then both. Reboot and it will usually boot back up and possiby start rebuilding one drive from another. The more I tinker with it the faster it fails. Had both drives fail almost at once and the console screen blanks but the licensing (FlexLM) was still responding slowly.

Both systems are Single CPU 866's with 512M Ram and 9 or 18 Gig drives.

I have done about the same as you. Swap out the Power Supply, which I can do in about 2 Minutes. Replaced one of the failed drives... Replaced the Integrated controller card. Even PULLED the Intrgrated card completely and it just sees the attached disks as drives on a SCSI channel and it still acted up under Linux. Did not try it under Windows. All the BIOS for everything are up to date.

I don't have a spare CPU Regulator to swap out. Time to check for a spare part online. I have two systems about to heaed for the scrap pile and half a rack of them that I am now leery about. All installed around December 2000.

Re: Intermittent Server Hangs - DL360

I finally broke down and bought 3 new motherboards.

The old memory, integrated controller, hard drives, CPUs, etc. were all used with the new motherboards.

These boxes have not failed since the new MBs were installed.

While it 'fixed' the problems I was having I am not convinced that the old motherboards are defective.

I just don't know what the problem was/is but it really bothers me because I'm sure these 3 problematic motherboards are basically ok. I suspect that have some sort of conflict with drivers, firmware, etc. but I just can't put my finger on it.

Also, on the newly installed motherboards I upgraded the firmware and drivers in the same way as I did the ones that began filing.

The problematic motherboards always reported "Unknown Hardware Error" or some other problem related to disk controller or hard drive failure. Yet I am using the same hard drives and disk controllers in these new motherboards and there is no problem now for weeks.

*sigh*

I wish I had an answer for you...

Bob Kramer
CIS Internet Services
Jonathan Bonney
New Member

Re: Intermittent Server Hangs - DL360

I was lucky enough to have another server of the same model come available. I ended up swapping the drives out. The old drives are working great in the "new" server.
I still have the problem with the previous one though...and I'm supposed to have 15 more on the way. Kind of makes me nervous. I am still interested in a fix; I have to get the first one back into production.
Tom Parker_1
Advisor

Re: Intermittent Server Hangs - DL360

Bob,

I have had 4 servers with the same situation in the last two months. Did you ever get to the bottom of this issue. I have not yet gotten to the system board/motherboard, but that is what HP is recommending because I have tried everything else. I am nervous as hell becase I have 20 of these things running in production and now I feel like I am just sitting here waiting for them all to fail.

I have one theory however. What about the battery? All these systems are about the same age. Besides disk drives, what typically wears out? Batteries.

HP seems in the dark about this situation, but I think that is because the systems are so old and out of warranty that they don't care. Anyone got any ideas on how to get HP to get serious about looking at this situation?