1827892 Members
1704 Online
109969 Solutions
New Discussion

panic cpu:0

 
Enrique Carballido
Occasional Contributor

panic cpu:0

Hi,
I Had a problem that crash my Alpha/Tru64/Trucluster 5.1. This is the output from the UERF command. Does anyone knows what happened ? Can anyone help me ?
I can send more info, ...
Thanks for the help ...

********************************* ENTRY 7. *********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 110. MACHINE STATE
SEQUENCE NUMBER 0.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Thu Nov 13 18:57:54 2003
OCCURRED ON SYSTEM db1
SYSTEM ID x00080022
SYSTYPE x00000000
SYSTEM STATE x0003 CONFIGURATION

********************************* ENTRY 8. *********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 26418.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Thu Nov 13 18:53:59 2003
OCCURRED ON SYSTEM db1
SYSTEM ID x00080022
SYSTYPE x00000000
MESSAGE panic (cpu 0): What happened here ?

********************************* ENTRY 9. *********************************

----- EVENT INFORMATION -----

EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 250. ASCII MSG
SEQUENCE NUMBER 26417.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Thu Nov 13 18:53:23 2003
OCCURRED ON SYSTEM db1
SYSTEM ID x00080022
SYSTYPE x00000000
MESSAGE mchan0: Node 1 is going offline

********************************* ENTRY 10. *********************************
18 REPLIES 18
Joris Denayer
Respected Contributor

Re: panic cpu:0

Your system did a panic. With the provided information, we know nothing.

I don't know if this forum is the place for crash_analysis, but let's give it a try.

First the official way to handle this.
Run "sys_check -escalate".
Open a call at HP-services
On request, send the file escalate.tar and escalate.vmzcore to HP.

Second, we can have also have a short look at your crash.
So, send in attachment the most recent crash-data.* file from the directory /var/adm/crash.

This gives already an idea of what happened.
As the information in te crash_data file is limited, this is without garantuee



Rgrds


Joris
To err is human, but to really faul things up requires a computer
Michael Schulte zur Sur
Honored Contributor

Re: panic cpu:0

Hi,

cd /var/adm/crash
see if you have a crash-data file and do
do a grep _panic_string crash-data.version and post it.

greetings,

Michael
Joris Denayer
Respected Contributor

Re: panic cpu:0

Also interesting to know are the installed patchkits.
Post the output of following command.

# dupatch -track -type kit

Joris
To err is human, but to really faul things up requires a computer
Ralf Puchner
Honored Contributor

Re: panic cpu:0

From the information given we know the panic string:

panic (cpu 0): What happened here ?

this indicates some kind of corruption or wrong parameter. But depend on the stack of the crash.

Please post you crash_data for further analysis.
Help() { FirstReadManual(urgently); Go_to_it;; }
Joris Denayer
Respected Contributor

Re: panic cpu:0

Ralf,

I verified the sources of v5.1 and the TruCluster kernel routines.

The panicstring "What happened here" doesn't exist.

I think this is a comment of the original poster.

Joris
To err is human, but to really faul things up requires a computer
Ralf Puchner
Honored Contributor

Re: panic cpu:0

Joris,

have a look to the entry 8. last sentence, this is not a comment it is well known panic string but the root cause needs further analysis.
Help() { FirstReadManual(urgently); Go_to_it;; }
Joris Denayer
Respected Contributor

Re: panic cpu:0

Ralf,

Indeed. this is a panic_string in the v4 release stream (occurs in fifo_vnops.c)

But, it doesn't exist any longer in V5

The first post states we have Alpha/Tru64/Trucluster 5.1

Hmm.. this sounds interesting

Joris

To err is human, but to really faul things up requires a computer
Enrique Carballido
Occasional Contributor

Re: panic cpu:0

I send the crash file.

Joris Denayer
Respected Contributor

Re: panic cpu:0

Enrique,

This is a tricky one. I do not understand this. I do not find the panic_string in the V5.1 sources. Is this system upgraded round 15 may 2003. This is the boottime before the crash.

Can you also send the output of
# dupatch -track -type kit
# dupatch -track -type patches
# sizer -v
# what /vmunix | grep fifo_vnops

Maybe this shed another light.

Joris
To err is human, but to really faul things up requires a computer
Joris Denayer
Respected Contributor

Re: panic cpu:0

OK,

I have to apoligize.
The mistery is solved. It's my fault.
This _panic_message changed with the installation of PatchKit #5.
Sorry for wasting entries

From Patchkit #5 onwards, the message changed to
"NULL fifo_bufhdr append pointer"

Joris

To err is human, but to really faul things up requires a computer
Michael Schulte zur Sur
Honored Contributor

Re: panic cpu:0

Hi Enrique,

how many members has your cluster? Have you had any troubles with the quorum disk lately?

greetings,

Michael
Michael Schulte zur Sur
Honored Contributor

Re: panic cpu:0

Hi Enrique,

you find this quite often in the crash file. I wonder, if the memory channel is damaged.

regards

Michael

rm_state_change: mchan0 slot 1 offline
rm_lrail_remove_node: logical_rail 0 hubslot 1
CNX MGR: communication error detected for node 2
CNX MGR: delay 1 secs 0 usecs
CNX QDISK: Cluster transition, releasing claim to 1 quorum disk vote.
CNX MGR: quorum lost, suspending cluster operations.
Joris Denayer
Respected Contributor

Re: panic cpu:0

Enrique,

~30 seconds before the crash, the other member went offline.
There are some known issues with clustermembers crashing, when an other member shutdown or powerdown.
Most of these issues are fixed in the patchkits.
Indeed, as Michael said, problems with the cluster interconnect will not help to make a cluster stable. Everything should be correct in terms of cabling, firwmware version, strapsetting, etc..
However, seen the uptime of the system, I believe more in a software problem.

Therefore, I should plan
1) the installation of the latest patchkit (kit#6)
or even better
2)an upgrade to v5.1B

This is only a free advise. If you want to know what happened exactly with your system and you have a valid contract then you can open a case at HP Services.

Joris
To err is human, but to really faul things up requires a computer
Michael Schulte zur Sur
Honored Contributor

Re: panic cpu:0

Hi Enrique,

are you stil with us?

greetings,

Michael
Enrique Carballido
Occasional Contributor

Re: panic cpu:0

I send more info.
Enrique Carballido
Occasional Contributor

Re: panic cpu:0

More info.
Joris Denayer
Respected Contributor

Re: panic cpu:0

Enrique,

The version of fifo_vnops.c is not the version delivered with patchkit#6.
I see that you installed patchkit #6, but it must have been installed only partially.
The vfs.mod which contains fifo_vnops.c is part of OSF510-1163
This is one of the most important patches (patch 1163). A lot of other patches depend on the installation of this one.
As it is not present in the patchlist, it must have been rejected during the run of dupatch.

I propose to run a dupatch baseline. This will give the reason why it is not installed.

Joris

To err is human, but to really faul things up requires a computer
Michael Schulte zur Sur
Honored Contributor

Re: panic cpu:0

Hi Enrique,

is the problem solved?

Michael