Netservers
1753885 Members
7452 Online
108809 Solutions
New Discussion юеВ

LH 6000 U3, memory and pci problems

 
Dmitry Melekhov_2
Frequent Advisor

LH 6000 U3, memory and pci problems

Hello!

We have 2 HP 6000 U3 with 4 700Mgz 2Mb Xeons, with 4 Gb RAM and with 12 hdd drives on built-in raid.
We hav no high load on this servers, but OS on
one of them (Suse Linux Enterprise Server 7) hangs on one of them and I also sometimes have
Oracle files corruption on this server.
Suse support analyzed crash logs and guess that I have RAM problem.
I tried to test RAM with latest memtest-x86.
But this software shows that almost all RAM is
broken, if cache is enabled in test 2, when it
copies 0000000 over all RAM. Is it normal for
this hardware configuration?

Also I tried to test server with HP supplied tools (on navigator cd).
This tool says that I have PCI problem.
I see in server events following:
PCI system error: DataByte2=04 DataByte3=18
this appears during test.
I removed all pci cards (lan and top tools)
but problem exists.
And I have the same diagnistics an another server.
Are this servers broken?
7 REPLIES 7
Greg Carlson
Honored Contributor

Re: LH 6000 U3, memory and pci problems

Dmitry,

Boot into the Navigator and check your Hardware Event Log. Do you have any memory errors in there? Either Single Bit or Multi Bit errors? Do you have any PCI Parity errors in the Event log also?

Next try running your tests with only a single CPU, Base memory of two sticks (one in each bank) and no PCI cards. Nics are very common causes of PCI parity errors. Also which version of DiagTools are you running? Make sure you are running: http://h20004.www2.hp.com/keeper_rnotes/bsdmatrix/matrix70270.html

Ciao,
Greg
Lets Roll!
Dmitry Melekhov_2
Frequent Advisor

Re: LH 6000 U3, memory and pci problems

I have no memory related errors in event log.
But there are errors with pci problems.

And I use buil-in raid controller and built-in network card. May be they produces errors.
How can I check this?
Thank you!
Greg Carlson
Honored Contributor

Re: LH 6000 U3, memory and pci problems

Dimitry,

What PCI Errors do you have in the Hardware Event Log (HEL)? Do they appear every boot?

Test this by what I recommended on the first POST. Remove any and all PCI cards, run at a single CPU and base memory. Do the PCI entries still continue every boot? If they do boot into F2, Disable NetRaid and remove the i/o cache. Do the errors disappear? If the errors disappear with the i/o cache removed and netraid disabled try re-enabling NetRaid and try a different i/o cache from another one of your LH6000 servers and see if the error is from the i/o board or the i/o cache and replace the appropriate piece of hardware.

Ciao,
Greg
Lets Roll!
Dmitry Melekhov_2
Frequent Advisor

Re: LH 6000 U3, memory and pci problems

Thank you!

I downloaded latest diagtools and
I have following problem with pci test:
PCI Bus transfer using Bus Master Cycles failed.

Error code is 008A.

When I disable built-in scsi controller all is OK, i.e. this test is skipped :-)

Looks like I have problem with scsi adapter- usually server hangs during writing to tape....

RAM test is OK.

Thank you!
Dmitry Melekhov_2
Frequent Advisor

Re: LH 6000 U3, memory and pci problems

OK. Now I know that if Ultrium 215i is
connected to SE SCSI than I'll have
PCI test failed, if it is disconnected test will runs OK.
Does it mean that I have problems with tape drives or not?

Anyway, now I suspect that problem is with raid5 in built-in controller.
I wrote another message about this...
Greg Carlson
Honored Contributor

Re: LH 6000 U3, memory and pci problems

Dimitry,

I am thinking you need this Ultra160 card P3413A which is supported in the LH6000/r for the Ultrium tape drive you have.

Ciao,
Greg
Lets Roll!
Dmitry Melekhov_2
Frequent Advisor

Re: LH 6000 U3, memory and pci problems

May be :-)
I contacted HP support with question about
SE SCSI and Ultrium 215 compatibility.
But this drive works OK with this SE SCSI
and usually reads and writes data without any problems...
Only server crashes, which are caused by raid5,
as I think now (because I just had crash with
tape drive disconnected and raid5 is only difference between 2 servers) forced me to test hardware :-)

Thank you!