Netservers
1752586 Members
4246 Online
108788 Solutions
New Discussion юеВ

Re: Blue Hardware Malfunction Screen on LC2000r server

 
SOLVED
Go to solution
Glen Kelly
Occasional Advisor

Blue Hardware Malfunction Screen on LC2000r server

We bought a used server and initially it looked like everything was a go, but it gets the blue screen intermittently saying the system has halted and contact your vendor for support. It was using NIC, Video, a fibre channel, and SCSI adapter cards. I took all those out and went with the embedded NIC, Video, and SCSI controller. I also reseated all RAM, VRAM, both processors. The RAM had all been replaced. It still gets the error. It boots fine and there are scarcely any errors in the event log. It doesn't create a minidump, or give any other clue as to what's happened. The hard drives have been ruled out because we replaced all those too. It's down to the board, processor, or SCSI components. Per the usage of the embedded SCSI controller, I see that I have the proper cabling and termination, looking at the manual. I haven't installed TopTools yet or the RAID utilities. Does anyone have any advice on how I can figure out what is causing this?
12 REPLIES 12
e4services
Honored Contributor

Re: Blue Hardware Malfunction Screen on LC2000r server

Is this an NT 'Blue Screen' we speak of?
Becuase if so, removing the hardware will not remove the driver.
I would reload the OS if it is NT.
Hot Swap Hard Drives
Glen Kelly
Occasional Advisor

Re: Blue Hardware Malfunction Screen on LC2000r server

Thanks for your reply! No, the blue screen is one I haven't seen before. We're on Windows Server 2003. I don't know if those get NT type blue screens? It boots fine and doesn't get the error in any of the safe modes, but it does in VGA mode. So yea, I'm thinking it's definitely a driver issue, but how to find the specific driver? If it made a minidump I'd be able to use the debugger, as I understand, but there's no trace.
Wouldn't Win2003 server reload the same old bad drivers again if I reloaded the OS?
Sean T. Craig
Honored Contributor
Solution

Re: Blue Hardware Malfunction Screen on LC2000r server

Hi Glen,

The first thing you want to do when you get a hardware malfunction is to check the Hardware Event Log. The easiest way to do that is to create a boot floppy from the following download:
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=50440&prodNameId=19399&swEnvOID=54&swLang=13&mode=2&taskId=135&swItem=ns-11635-1

Boot the server to the floppy and see if it shows any hardware events.

Try that and let us know what you find,

Sean T. Craig Sr. C.E.T., M.C.P.

I am King...of my apartment.
Glen Kelly
Occasional Advisor

Re: Blue Hardware Malfunction Screen on LC2000r server

Sean,

Thanks for that advice. I see the following error multiple times. However, the date doesn't seem to jive. The system date shows correctly while in windows. It should say today's date, as I got the error just now. Maybe my CMOS battery is dead? I do have it updating the time from another server...

I have 4 PC133 256MB SDRAM chips filling the four slots that are recommended for the server. I have already tried removing the RAM from the slot labeled "1", but still got the error. I don't have any RAM options in BIOS, such as error scrubbing. Apparently they are not ECC. I read that non ECC chips can ignore multiple bit errors, but in this case that doesn't seem to be happening.

511 01201 08/05/06 08:09:18 Multiple-bit error in memory slot 1 on bank/board 1

Is there software or a patch I can install to just scrub these errors?

Thanks Glen
Sean T. Craig
Honored Contributor

Re: Blue Hardware Malfunction Screen on LC2000r server

Hi Glen,

Check the part numbers on the RAM. The 256MB sticks for this server should have a factory part number of D8266A. If you use non-ECC RAM, the error-checking circuitry on the system board will attempt to correct any memory errors but fail because the RAM doesn't support it. Unfortunately, there is no option of disabling the ECC functionality of the server. Once the RAM issue is corrected, you can clear the hardware event log using the same utility or, if you get back into Windows, you can use Instant TopTools to do that.

Let us know what you find out,

Sean T. Craig Sr. C.E.T., M.C.P.

P.S. Please encourage participation in the forums by assigning points rating the value of the responses.
I am King...of my apartment.
Glen Kelly
Occasional Advisor

Re: Blue Hardware Malfunction Screen on LC2000r server

Sean,

Ok, right on, I understand now how to assign points and I'll take care of that. We had one Micron ECC registered, synch, 128MB chip laying around. I put it in, and the server has been fine since, allbeit slow like molasses. I had bought the other RAM with the part number you mentioned, but they sent third party clones. Apparently, it was ECC and registered, but still did not work. So the company is switching them out for us at no or little cost.

Now I've got another problem - in all the hardware testing to find the resolution to the blue screens, I lost my initial RAID 1 configuration on my RAID adapter card, and am using the embedded SCSI A controller. I have six drives, and I once had them configured as three logical drives of two physical drives each. Now I've got mismatched drives, the mirroring is out of synch. Two of the drives are not even recognized as online by the OS.
For example, on drive 0 (1st to the left when facing the rack mount server), I have the OS. I'd like to get it mirrored on drive 1, next to it, and similarly for the other application logical drive and admin logical drive. THe last time I tried to do a "rebuild" in NetRAID of the out-of-synch drive with the OS drive, it wouldn't boot and I had to do an ASR. So I'm gunshy. Any advice? If I install the RAID adapter and try to do an ASR that was built on the SCSI A configuration, to the logical drive, it won't continue. To be sure, RAID 1 is not possible using the embedded RAId, correct?

Another question - is NIC teaming possible on this server to speed up throughput"
Sean T. Craig
Honored Contributor

Re: Blue Hardware Malfunction Screen on LC2000r server

Poor Glen,

You've got all kinds of problems, don't you? Here's what you need to do:

1. You need to determine which drive has what information on it.

2. You then need to re-create your RAID-1 arrays.

So, to start, use the NetRAID Express Tools to create a single RAID-1 with 2 drives (any 2 will do), but fail one of the drives and try to boot the O/S. It is critically important that you do NOT initialize the array as that will wipe everything. If it boots to the O/S, you've found one of the O/S drives.

Then go back into NetRAID Express Tools and fail the online drive and force the failed one online. Restart the server and see if that is the other drive in the O/S Mirror.

Once you have found out which drives have the O/S, leave one of them online and rebuild the other so that both halves of the mirrors are identical.

Next, clear your config and recreate your O/S RAID-1 AND one other RAID-1. Repeat the process of failing one of the drives in the 2nd RAID-1 and booting to the O/S. This way you can determine what is on that drive.

Carry on in this manner until you have figured out which drives are part of which mirror and re-create your config.

You will want to make sure that you DO NOT try to boot up with both halves of any mirror online until one has been rebuilt to match the other.

Let me know if there's anything that seems unclear and I'll try to elaborate.

Good luck,

Sean T. Craig Sr. C.E.T., M.C.P.

P.S. There is no embedded NetRAID in an LC2000. The onboard controller is just a standard SCSI controller with no RAID capability.
I am King...of my apartment.
Sean T. Craig
Honored Contributor

Re: Blue Hardware Malfunction Screen on LC2000r server

Oops, missed a question...

Re: Adapter Teaming. HP Recommends an Intel Pro 100 Chipset NIC. That's what your embedded NIC is and will certainly make teaming easier. Next you need the Intel ProSet Utility. You can download the latest copy from Intel at this link:

http://downloadfinder.intel.com/scripts-df-external/Detail_Desc.aspx?agr=Y&ProductID=62&DwnldID=4275&strOSs=92&OSFullName=Windows*%202000%20ServerтМй=eng

Hope this helps,

Sean T. Craig Sr. C.E.T., M.C.P.

I am King...of my apartment.
Sean T. Craig
Honored Contributor

Re: Blue Hardware Malfunction Screen on LC2000r server

One other thing, if you get stuck, HP still offers free phone support for this model of Netserver and you can call them at 1-800-474-6836. When you get the automated attendant, say "Netserver" and then "LC2000". One of the friendly and knowledgable staff will be happy to walk you through the process. (Do I sound like an advertisement?)

Take care,

Sean T. Craig Sr. C.E.T., M.C.P.

I am King...of my apartment.