HP 9000 and HP e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

D390 - Boot fault

SOLVED
Go to solution
Vinod Tandon
Occasional Advisor

D390 - Boot fault

Help!

We have an old HP 9000 D-Class (D390 run HP-US 10.2) machine. We use this as development server for Oracle E-business suite. This recently started giving 'Single bit errors'. We were considering to replace the faulty memory chip pair when i read on HP forum that if it is single bit error we can reset the PDT table. I went ahead and reset the PDT table. Just after that it started giving the following error:
>>>>>>>>>>>>>>>>
WARN 3004
FLT 11BF
>>>>>>>>>>>>>>>>

I checked back the memory summary log that i had taken using cstm before i had reset PDT and it shows :

>>>>>>>>>>>>>>>>
Memory Error Log Summary
Error
CAB/CELL DIMM Error Address Error Type Page Count
------------- ----------------- ---------- --------- -----
EXT0 3a 0x0000000036c13e80 Single-Bit 0x0036c13 6090
EXT0 4b 0x0000000046508060 Single-Bit 0x0046508 210876
EXT0 4a 0x0000000010aa93e0 Single-Bit 0x0010aa9 21318
EXT0 4a/4b 0x0000000048eaf000 Multi-Bit 0x0048eaf 0
EXT0 4a/4b 0x00000000490ee000 Multi-Bit 0x00490ee 0
EXT0 4a/4b 0x0000000045a27000 Multi-Bit 0x0045a27 0
EXT0 5a/5b 0x0000000040310000 Multi-Bit 0x0040310 0
>>>>>>>>>>>>>>>>

Please advice on how to fix this. I'm very concerned about this screw up.

Thanks in advance.

-Vinod
7 REPLIES
Patrick Wallek
Honored Contributor
Solution

Re: D390 - Boot fault

If you have repeated single-bit errors you really do need to replace the DIMM. Only then should you clear the PDT.

In your case you appear to have some single-bit and multi-bit errors.

At this point I would replace the DIMMs in slots 3a, 4a, 4b, 5a and 5b.
Vinod Tandon
Occasional Advisor

Re: D390 - Boot fault

Thanks Patrick.

So is there no way to boot-up this server without replacing the memory modules?

How can so many memory modules go bad at the same time? it is difficult to understand.

I should not have reset PDT table and now i'll have to wait for atleast 4 days untill the memory gets replaced.

James R. Ferguson
Acclaimed Contributor

Re: D390 - Boot fault

Hi Vinod:

> now i'll have to wait for atleast 4 days until the memory gets replaced.

You could try using an alcohol wipe (like a nusre would use) to clean each DIMMs contacts after removing them from their carrier slots. I have successfully rectified single-bit errors on old K-class machines this way in the past. Given time and enough dirty air, particles accumuate over the contacts leading (sometimes) to what you have seen.

This may or may not help. Your milage may vary.

Regards!

...JRF...
Torsten.
Acclaimed Contributor

Re: D390 - Boot fault

Single bit errors are normally not a problem, a single corrected error may cause a lot of related messages, especially with this old hard- and software - but multi bit errors are.
Try to re-seat the DIMMs. If you have a bit bad luck, you have some bad dimms, you are really out of luck, you may have a bad processor board. If possible, try to swap the dimms between different slots and watch the results.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Bill Hassell
Honored Contributor

Re: D390 - Boot fault

The PDT is a list of pages that the kernel found to be too unreliable to use, based on errors. Clearing the PDT is like turning off the fire alarm. The errors are real (slot 4b report 200 thousand errors) and have probably gotten worse with age (it is very rare that electronic errors get better). The only reason to reset the PDT is because you have replaced the bad memory and you want the counts to start at zero for the new memory.


Bill Hassell, sysadmin
tkc
Esteemed Contributor

Re: D390 - Boot fault

if you are not planning to replace the dimm so soon, swap the dimm on slot 4b with maybe slot 5b. clear the pdt, and start the count from zero again. if the single error appear again and this time on slot 5b, by all means replace this dimm which is previously in slot 4b. if the single bit error is still reported on slot 4b, do replace the memory carrier.
Vinod Tandon
Occasional Advisor

Re: D390 - Boot fault

I had tried to replace the bad ones by taking the good ones from the bottom slots. I had taken care of filling slots in pair and in sequence. This also did not help. I still got WARN 3005, FLT 11BF errors.
--
Then we replaced the memory chips as they were. Still the same error.
--
Then we removed the console and it worked!!!
--
We still can see the SBE errors in the memory but that is okay as the system works normal.
--
Thanks guys for supporting me in this issue.