HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

rp7410 boot failed, and HPMC occured

 
SOLVED
Go to solution
stephen peng
Valued Contributor

rp7410 boot failed, and HPMC occured

dear all,
i've got a problem about an rp7410's booting,there was two CELL in this rp7410, and when it booted, after self test of CPU and memory, just at the beginning of I/O discovery, CELL 0 failed wich HPMC occured, and the server still could boot up with CELL 1, and when the os is ready, half of the adaptor disappeared, all the parts with the hardware path starting with "0/" could not seen in os. i interchanged the location of CELL 0 and CELL 1, still CELL 0 (CELL at slot 0 )has HPMC. what could be the problem?
attached file is MP's SL output
thanks a lot!
20 REPLIES 20
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

attached file is the newest SL output
Phil uk
Honored Contributor

Re: rp7410 boot failed, and HPMC occured

Stephen,

You mentioned earlier that you could get to BCH. Can you still do this now that you swapped the cells over??....if not, swap them back to the original locations.

If you can get to BCH, then do:
sr off >>Scroll [ON|OFF]

BCH> SER

now capture the following outputs

SER> PIM

SER el

...and post the results here pls.

Phil,
ps, looks good from the MP>PS outputs, apart from the MP low battery warning.
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

hi,
all the time i could boot the system, though i could not make use of CELL 0. beside, the MP's battery is LOW and status is FAULT DETECTED.
Phil uk
Honored Contributor

Re: rp7410 boot failed, and HPMC occured

Hi Stephen,

Can you do,

If you can get to BCH, then do:
sr off >>Scroll [ON|OFF]

BCH> SER

now capture the following outputs

SER> PIM

SER el

...and post the results here pls.


It may help to pin-point the problem.
Phil
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

hi Phil,
i can not quite following you instruction, how to produce a SEA> prompt? how to run SEA> PIM command?
Phil uk
Honored Contributor

Re: rp7410 boot failed, and HPMC occured

Hi Steven,

I'm refering to the SERVICE Menu from BCH
which is SER not SEA
Phil
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

hi Phil,
see attached file for service output
thans a lot
Phil uk
Honored Contributor
Solution

Re: rp7410 boot failed, and HPMC occured

Hi Stephen,

Thanks for doing that.
I have looked at the information, and it is currently inconclusive....but we may be able to overcome this.

To explain, the PIM information has no valid timestamps, but the EL information does have valid timestamps - so the information from the two sets of logs doesn't match.

Therefore, we need to get this information again but "in sync".

So, if you can put the machines back to its original config (which I think you have done), then let the machine try to boot HPUX again.

Hopefully when booting unix it will HPMC and generate a new set of PIM and EL information.

At this point - dont move any of the cells and re-post the PIM / EL information as you did earlier today.
(remember to do BCH>sr off before capturing the PIM & EL data)

Hopefully this will generate PIM and EL information with valid & synchronised timestamps.

Then i'll have a look at the new info.
I hope this makes sense

Regards,
Phil
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

hi Phil,
i appreciate for your consideration. i will post the output tomorrow.
now we are considering to replace the backplane or the pci midplane of that rp7410, hopefully it would fix the problem. now we could make a conclusion that all the cells were ok, after interchange there slots, when they all could boot up the os. do you think it is a correct way? we had remove the io adaptors in io chassis 0, which cell 0 connected to, and when it booted, it still HPMC, so we though that it may be problem of the pci midplane or backplane. one thing confused me was that, there was only one pci mid plane, would it function in two part?

anyway, thank you for your reply!
Phil uk
Honored Contributor

Re: rp7410 boot failed, and HPMC occured

Hi,

My suspicion is the backplane also.
The PCI Chassis "plug" into the same PCI backplane if i remember correctly (hopefully) - therefore if this was at fault i would expect the other IO chassis to have problems also.
Yes - swapping cells logically suggests the cells are OK (hopefully).

The logs do suggest DNA errors - these are basically fabric errors on the Crossbars through which the cells communicate with there respective IO chassis (via the PCI backplane).

So, my initial feeling is the Backplane that the Cells plug into has a problem- but i cant obviously be 100%.
This is the part that I would change first.

(Before you change it make a note of all of your boot paths etc)

Regards,
Phil

Do post and let us know how it all works out...and the culprit compnent if not the backplane
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

hi Phil,
i will post the ser output late today.
furthermore, would it change the boot path when i replace the backplane? or would any other hardware path change? it this happen, it would be disaster.
we plan to replace the backplane the day after tomorrow.

thanks a lot
Phil uk
Honored Contributor

Re: rp7410 boot failed, and HPMC occured

Stephen,
Complex profiles are stored on the MP, the Backplane and the Cells.
As you have a "bad" battery on your MP, and the cells are being removed and your Backplane is being replaced (possibly), there are possibilities that config info could be lost - but IS recoverable.

You may want to make a note of your MP information also (as power is going to be removed from it and the battery is no good).
Make a note of
CTRL>B

CM> LS
CM> ID
CM> SYSREV (you may need to update/downgrade Firmware after Backplane Replacement)
CM> CP
CM> IO

If you can get to BCH:

BCH> IN (information menu)
BCH>IN> ALL

Capture everything for you records and recovery.


Hopefully, with all of the above you can recover your system OK.

Good luck,
Phil
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

hi Phil,
may the attached file help.
Phil uk
Honored Contributor

Re: rp7410 boot failed, and HPMC occured

Hi Stephen,

Because Cell 0 (well the slot as you've proved this by swapping the cells), isn't reporting any information, the HPMC is incomplete.

A brief summary of my findings:

HPMC data may be invalid
Earliest timestamp: 2007/10/19 12:41:05
Latest timestamp: 2007/10/19 15:30:19
Time Span: 0 day(s), 02 hour(s), 49 minute(s), 14 second(s)

So Stephen, this can be caused by a cell failure or a fabric failure - so i think it may be a Fabric/DNA failure (Crossbars etc) as you've swapped the Cells.

Regarding the timestamps, this exceeds the maximum allowed time difference (earliest vs Latest) as Cell 0 isn't reporting any errors within the time span allowed.

So, I would still suggest changing the Backplane for the Cell Boards

Regards,
Phil


Phil uk
Honored Contributor

Re: rp7410 boot failed, and HPMC occured

..remember to make a note of the info i specified earlier from the MP (and BCH if possible) before you change the Backplane
Cheers,
Phil
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

hi,Phil,
we plan to replace the backplane tonight, and if it does not work, we will replace the pci midplane. so i will post the outcome tomorrow.

thank you
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

hi Phil,
can you figure out what exactly we need to upgrade after replacing the backplane? does firmware upgrade necessary? and i also heard of some SS upgrade thing, what is that?

thanks a lot
Phil uk
Honored Contributor

Re: rp7410 boot failed, and HPMC occured

Stephen,

If you haven't done this before, then I recommend that you get HP onsite to do it.

If you are OK with this then see below.

Firstly Firmware.

READ THE README's on how to upgrade FW

From your MP
CM>SYSREV
you can cross reference with the links below to find out which version you currently have.

When you change the BP, you may find that when you start the machine and do another SYSREV, that the output may be different to before (particularly GPM).
If so then you need to get it all back in sync to a defined "recipe".

You will use the appropriate "recipe" that you have downloaded from below.

You will (most likely) use a laptop with a X-Over Cable linked between the MP and your Laptop and use FTP.

Ensure all Firewall/Antivirus Services on laptop are stopped.
Ensure Laptop is plugged into mains power - dont run on battery.

When you extract the "zipped" file (with WINZIP perhaps on your laptop) ENSURE that under
Options
Configuration
Miscellaneous
..that TAR file SMART CR/LF conversion is UNCHECKED (if checked then it will corrupt the firmware file)

You will basically use the MP commands OSP and FW to update the firmware as applicable to the differences that you find (using FTP from the MP to your laptop.)

Latest: PF_CKEYMAT0605
Found at
ftp://ftp.itrc.hp.com/firmware_patches/hp/cpu/

Older versions PF_CKEYMAT060x.tar.gz
Found at:
ftp://ftp.itrc.hp.com/superseded_patches/firmware_patches/hp/cpu/

If you aren't happy about all of this,and perhaps haven't done it before then I do recommend you get HP to do it for you.

ALWAYS read the release notes before attempting the firmware.

Secondly,
The MP>CM>ID command has all of the complex profile info in it.
This is stored on your MP card and Cell Bd's.
As long as it is not corrupt you wont need to make any updates.
The main concern with this is your MP - low battery.
This is the Main source of the Complex profile.

You may want to change the MP first - while the main 48Volts is running in the machine
But ensure the MP that you install doesn't have any "old" complex profile info on it - or it will update your machine with this info - NOT Good.

Worst case, it is all recoverable, but you would need HP to do it onsite.

So, if you have ANY doubts about the tasks - ask HP to do this work for you.


Regards,
Phil

stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

hi Phil,
it turned of to be the problem of the pci midplane, though some experienced engineers said it should be caused by the backplane. no extra trouble, just do some scsi default job at SERVICE of BCH when search could not proceed smoothly and some memory deconfiguration problem to deal with.
anyway, thank you for your support all the time, sincerely.
stephen peng
Valued Contributor

Re: rp7410 boot failed, and HPMC occured

problem solved after replacing the pci midplane.