HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

rp74/84## HPMC error

 
Guvnor
Occasional Visitor

rp74/84## HPMC error

I have been working on alot of rp7410's and rp8400's lately and have experienced some strange behaviour which seems to be happening on both types of servers. It sometimes takes a few goes for the servers to accept "boot lan install" and once the OS is loaded and rebooting it will just hang at the first stage of the boot process. I have checked the system logs and the same error message is repeated throughout.

120 PDC 0,1,1 *2 0x2000552071266dc2 0x00ffff01ffffff93 ERR_DNA_MHG_FEWPJ__LOC
120 PDC 0,1,1 *2 0x58005d0000006dc0 0x00ffc80803022f25 09/03/67380 02:47:37
119 PDC 0,1,1 *2 0x7000542071266dc2 0x0000000000b90200 ERR_DNA_MHG_FEWPJ__DET
119 PDC 0,1,1 *2 0x58005c0000006dc0 0x00ffc80803022f25 09/03/67380 02:47:37
118 PDC 0,1,1 *2 0x20005320710b6de3 0x00ffff01ffffff93 ERR_DNA_MPD_FEW__LOC
118 PDC 0,1,1 *2 0x58005b0000006de0 0x00ffc80803022f25 09/03/67380 02:47:37
117 PDC 0,1,1 *2 0x70005220710b6de3 0x0000000000b90200 ERR_DNA_MPD_FEW__DET
117 PDC 0,1,1 *2 0x58005a0000006de0 0x00ffc80803022f25 09/03/67380 02:47:37
116 PDC 0,1,1 *2 0x2000512393096e92 0x00ffff01ffffff93 ERR_DNA_ROUT_RIN2ROUTF
E__LOC
116 PDC 0,1,1 *2 0x5800590000006e90 0x00ffc80803022f25 09/03/67380 02:47:37
115 PDC 0,1,1 *2 0x7000572393096e92 0x0000000000990200 ERR_DNA_ROUT_RIN2ROUTF
E__DET
115 PDC 0,1,1 *2 0x58005f0000006e90 0x00ffc80803022f25 09/03/67380 02:47:37
114 PDC 0,1,1 *2 0x2000562393086e51 0x00ffff01ffffff93 ERR_DNA_RIN_FECMPNOHI


Both cells FW are 17.009.

Another problem i have come across is that the cells won'r accept a mixture of memory

ie A6098-60001 and A6098-60101

Can anyone help with this as i have searched endlessly for an explanation.

Thanks
8 REPLIES 8
Guvnor
Occasional Visitor

Re: rp74/84## HPMC error

Sorry forgot to post this aswell

PARTITION STATUS: E indicates error since last boot
Partition 0 state Activity
------------------ --------
E HPMC processing I/O system bus adapter configurat 397 Logs

# Cell state Activity
- ---------- --------
1 Cell has joined partition
Stefan Stechemesser
Honored Contributor

Re: rp74/84## HPMC error

Hi,

I'm not aware of any problems mixing the two types of memory dimms with the newest firmware.
With firmware prior to 16.011 you may observe an error "MEM_NOT_SYSTEM_DIMM" but the memory will be used by the system anyway.

Regarding you fault: It looks like an HPMC (High Priority Machine Check) has happened. The cause can be hardware (most probable) but also software (f.e. bad or outdated I/O driver in the Ignite kernel, access to invalid addresses etc.).
You should call the HP support and give them the following information for analysis:

1.) In the BCH Service menue the output of "pim" (processor internal memory) and "el" (Error Log).
2.) the error and activity log from the MP ("sl").

It is not possible to determine that cause of the problems from the information you have posted.

When the same error happens on more than one server, then I think you should update your Ignite Server with a more recent boot kernel.

best regards

Stefan
Guvnor
Occasional Visitor

Re: rp74/84## HPMC error

Thanks for that. There is definately an issue with the mixing of 5x5 PN's on the servers. I have spoken to others regarding this fault and they have mentioned the PN's of the memory,,,, so this may be a fault that is not recongised by HP. As for our ignite it is all up to date including the kernal. We also received a couple of servers from a HP resellor which displayed the same fault. This is one i have exhausted all research on, hence why i am posting it now.
Sameer_Nirmal
Honored Contributor

Re: rp74/84## HPMC error

Hi,

The errors shown by PDC are occuring during the npar bootup. I guess these errors are co-related to the HPMCs on the cell board. The cause of these HPMC could be software or hardware. But the probability of hardware issue is more with HPMCs.

Are you using the memory modules from the same OEM? It is always recommended to installed same "MAKE" modules in a pair or in bank. I am not sure about what you mean by "5x5 PN's".

As per my understanding the "A6098-60001 and A6098-60101" part numbers belongs to the same memory module. They are equivalent having new part numbers.I am not sure about their OEMs though.

You need to provide more details to know the cause of the problem.
Besides the information Stefan asked you need to provide to get more inside of the system/cell board.
BCH >IN > ALL
BCH > IN > ALL

By the way which Ignite-UX version you are dealing with?
Andrew Rutter
Honored Contributor

Re: rp74/84## HPMC error

hi guvnor,

If you are experiencing the same error on many systems and expecially newer ones from HP, then it would indicate to me that its more a configuration mixmatch somewhere.

What version of Hpux are you trying to ignite on to these servers and also what date of the OS install. If its an older version there could be patch/install issues.

Also you need to ensure the cell boards are configured correctly before you start.

As for the memory issue, it is not something that I have personally come across, but I have had warnings about counterfiet memory supposedly been distributed.
Could be worth checking.
The memory modules not only should look the same, but have the same style part number stickers on them. on counterfeit ones they are slightly different shaped. Also they all should have the PAss turn on sticker applied, some are missing. and also the artwork revision doesnt match what the memory is labelled as(A4/A3). This can be checked in the Mp or STM.
Note, I have never seen any or do not know if this is fully the true case, but something worth noting.

Failing this I would contact HP, in the past Hp has released memory, same product numbers but slightly different spec dimms. They have had to then release a newer version of firmware to accomadate the mix of refresh rates.

Andy
Guvnor
Occasional Visitor

Re: rp74/84## HPMC error

What about the DNA errors i have been getting?
Stefan Stechemesser
Honored Contributor

Re: rp74/84## HPMC error

Hi,

maybe you mixed up "DNA" and "DMA". DNA is a chip on the cellboard.
During an HPMC the firmware logs all relevant registers from CPUs, Chipset and PCI busses to the NVRAM for later analysis. For some of these registers a chassis log is generated by firmware for the MP logs during the reading of the register (to get some more information in case the HPMC routine fails due to a very severe hardware problem).
This is NOT an error, only an informational log (Alert Level 2).
Each ERR_DNA_XXX log is an indication that the register with the name XXX on the specific chip successfully have been read and in many cases, the data portion of the chassis code is the register contents.
Only the higher level HP support people and HP labs know which register is really ment and the registers & names are different for each chipset and firmware revision. Only some register reads will produce a chassis code.

In general, the chassis codes are not sufficient to analyze the HPMC. Better is to get the saved data from NVRAM. It is copied to the directory /var/tombstones during every reboot or you can get it offline with the "pim" and "el" command in BCH service menue.

In addition to this, it is sometimes helpful to take a look on the activity log (FPL on newer systems) on the MP ("sl") and check if there have been an error directly before the HPMC processing begun (f.e. MEM_MBE_IN_RANK => Multibit Memory error).

You cannot analyze an HPMC dump from a cellbased system without help from HP support.

best regards

Stefan
Guvnor
Occasional Visitor

Re: rp74/84## HPMC error

Thanks all. I think we have a HP Engineer coming in now so i'll hopefully find out what the cause was.