ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL585 PCI Bus errors

 
Vanvoorden Jo
Occasional Visitor

DL585 PCI Bus errors

Hello,

I'm trying to install a DL585 quad opteron 880, 16Gb memory installed, 4x146Gb scsi 15krpm with RHAS4 U4 AMD64.
The smart-array 5i is disabled and I have installed 2 HP single port U320 scsi HBA's. Each one in a 133 Mhz pci-x slot, attached the scsi cables to the seperate channels on the scsi-backplane and splitted the backplane.
After Installation (if it even succeeds) the drivers are loaded for those scsi hba's and the software raid starts to rebuild. At the moment the drivers are loaded, the pci-bus leds on top of the server for the slots where the scsi cards are inserted, turn amber. At this moment, the server still continues to work. A few moments later, the kernel panics with NMI exception.
I've tried loading the kernel with the following options:
- pci=nommconf
- numa=off maxcpus=1 nosmp
- I enabled and disabled the node interleaving None of the above options seem to help.

I tried attaching the scsi-controllers to an external storage box, disabled the smartarray controller and installed the OS on the external drives. No changes there, same errors occur.

I found this document on the AMD Rev F Opteron processors, claiming I should enable the linux x86_64 option in bios, but that option is not available in the latest bios for my DL585.
The processor boards are of the same revision, as far as I can tell. The memory is hp-branded and is from the same reseller.

A little side note: I bought 2 of these servers and the other server isn't giving me any problems.
I switched the scsi-cards with the other server, even replaced them by new ones.
I tried placing them in the pci-100 slots, same issue arrises.

Anyone experiencing or has experienced the same issues? Is it possible the mainboard/pci-bus is broken? Any help or tips would be greatly appreciated, I'm struggling with this server for 2 weeks now already.

Jo
5 REPLIES
sandeep_raman
Honored Contributor

Re: DL585 PCI Bus errors

chongkan
Trusted Contributor

Re: DL585 PCI Bus errors

Hi, Jo

Here are the steps to find defective hardware when having NMI errors:

0- Check memory configuration on RBSU and that your OS version supports it..

1- Get compatible memory from another compatible server and swap it on sets, test both servers and identify a failing dim or set.

2- If no failing dims are identifies, proceed in the same way with CPUs.

3- If no failing CPUs are identified, replace the System Board.

Microsoft Link:

http://support.microsoft.com/kb/101272/en-us

Hope it helps.

Regards
Vanvoorden Jo
Occasional Visitor

Re: DL585 PCI Bus errors

Hello again,

I had a system board replacement done yesterday, hoping that would solve the problem (apperantly we got a 4hours support contract)

The technicians told me it was due to a systemboard failure (because the pci-x bus was giving errors).
The problem isn't solved though :(
So I continue my search.
I'll try booting the server with only 2 cpu's and memory on one node, hoping it won't fail during setup.

I'll keep you informed.

Jo
Vanvoorden Jo
Occasional Visitor

Re: DL585 PCI Bus errors

Hello again,

I'm just posting this message to inform you all that the errors still exist with the new mainboard (had a sewcond replacement too already)
The case has been escalated to a higher lvl and I'm waiting for some response of them.

While I was waiting, I installed one of my other systems (having dual channel raid controllers instead of the single ones) and this system does seem to work.

I then placed the dual channels in my broken system and miraculous the other system worked perfectly.

All this, just to let you know that the single channel cards don't really seem to be working in the DL585 G1 series.

I hope hp will come with a solution
sandeep_raman
Honored Contributor

Re: DL585 PCI Bus errors

Thanks for sharing the information.
Eager to know the final result.

SRH