HPE 9000 and HPE e3000 Servers
1752465 Members
5641 Online
108788 Solutions
New Discussion

rp 5450 boot problem

 
meekrob
Super Advisor

rp 5450 boot problem

Dear Gurus,

 

im having boot problem with one of our rp5450 (HP9000 L2000). On startup, it checks all components and passes POST stage without any error message. However, while proceeding to boot from the primary boot path at 0/0/2/0.2.0 , im getting the following:

alloc_pdc_pages: Relocating PDC from 0xffff800000 to 0x3fb00000.

The system goes on a loop for a while and the prints out the following message:

 

******SYSTEM ALERT*********

SYSTEM  NAME:    GSP

DATE: 22/08/2012        TIME: .....

ALERT LEVEL: 12  =  software failure

REASON FOR ALERT:

SOURCE:  1 = processor

SOURCE DETAIL:  1 = processor  general   SOURCE ID: 0

PROBLEM DETAIL: 0 = no problem detail

LED state: unexpected reboot. Running non-os code. Non critical error detected.

 

0xA0E000C01100B000      00000000     000005E9 - type 20 = major change in system state.

 

I really need to know whats going on so I've  tried to replace its mobo and its PCI backplane using spares from another running server without success.

 

Any help / suggestion is much appreciated.

 

Regards

5 REPLIES 5
Robert_Jewell
Honored Contributor

Re: rp 5450 boot problem

This alert is just telling you that the OS crashed.  In order to see the actual crash event message (that may help you determine what is wrong) you can disable alerts.

 

From the GSP use the AC command.  Disable alerts permanently (you can change it back later).

Reset the server and monitor the console when the crash occurs.

 

You could also try viewing the console log (CL command) to see if you can view that info.

 

-Bob

----------------
Was this helpful? Like this post by giving me a thumbs up below!
meekrob
Super Advisor

Re: rp 5450 boot problem

I proceed to deactivate alerts within GSP and now we 're getting the attached output that contains many of those "Resetting SCSI -- lbolt" messages. In your opinion is it a boot disk issue or a SCSI ribbon cable problem? or even the disks backplane or the core I/O card?  I even tried to install the OS on a new disk and always the same output. Any suggestion is much appreciated as i really want to know what is going on,

and BTW is there a chassis code decoder for these servers' models as i searched HP website without success.

 

Regards

Robert_Jewell
Honored Contributor

Re: rp 5450 boot problem

Looks like there are multiple devices being called out in these messages:

 

cb010002  --> c1t0d0

cb011002  --> c1t1d0

 

and then there is this line:

 

I/O hardware probe timed out at path 0/0/1/1.7

 

Given this, and the fact that you say you tried a new boot disk, I would suspect the controller (bottom slot) Perhaps the SCSI cable if it looks suspect, but these dont move around much normally so its an unlikely cause.  Another thought is to check external terminators if you have any.  These can and do go bad.

 

 

I am not aware of a "decoder" for these chassis codes, but I think they can be manually figured out.   Search for "rp5400 service manual".  I think this info is even in the Users Guide.

 

 

-Bob

----------------
Was this helpful? Like this post by giving me a thumbs up below!
meekrob
Super Advisor

Re: rp 5450 boot problem

Thanks for your reply.

Now taking into consideration all what was said, and proceeding by elimination i replaced the 3 ribbon SCSi cables connecting the disk backplane to the core i/o card at the bottom and still same result => so i guess i should opt to replace this bottom card.

In addition, when mentioning cb010002  --> c1t0d0  , how could i know to which device does it relate viewing that i can not boot the OS? is there a possibility to do this via BCH?

No external terminators are present => im left with these 2 possibilities :  either the disks backplane or the core I/O card and i think i should replace the core I/O card.

What do you think?

 

Many Thanks in advance

Robert_Jewell
Honored Contributor

Re: rp 5450 boot problem

 

> when mentioning cb010002  --> c1t0d0  , how could i know to which device does it relate viewing that i can not boot the OS? is there a possibility to do this via BCH?

 

There is no way to view the device file mapping from the BCH since this is an OS kernel function.  Most of the times when these errors are reported the OS is stable enough to at least login and run "ioscan" to determine the mapping.

 

This is a good reason to have a copy of such configuration files and outputs available for reference. The commonly referenced "NICKEL" script is a good tool for this.

 

-Bob

----------------
Was this helpful? Like this post by giving me a thumbs up below!