Operating System - OpenVMS
1825659 Members
3465 Online
109686 Solutions
New Discussion

VMS 7.3-1 Crash at IPL 31

 
SOLVED
Go to solution
Jan van den Ende
Honored Contributor

VMS 7.3-1 Crash at IPL 31

Hi,

A collegue asked my assistance with this crash.
I do have some suspicions, but my knowledge in this area is unsufficient to give a definitive verdict.

Help much appreciated.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
8 REPLIES 8
Uwe Zessin
Honored Contributor

Re: VMS 7.3-1 Crash at IPL 31

> CPU 00 -- MACHINECHK, Machine check while in kernel mode

A machine check is almost always a hardware problem - I suggest a look at the error log.
.
Kris Clippeleyr
Honored Contributor

Re: VMS 7.3-1 Crash at IPL 31

Jan,
Looking at the info, I must agree with Uwe. The crash happens in routine EXE$SYSTEM_CORRECTED_ERROR_C that resides in module SYS$CPU_ROUTINES. So, I think that the system was trying to correct an error and while doing so saw another error (heavy speculation here). But my guess is that CPU 0 has to be replaced (could even be both CPUs). Also, let someone have a look at the errorlog, and/or have a hardware engineer run some diagnostics on the console.
Regards,
Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Jan van den Ende
Honored Contributor

Re: VMS 7.3-1 Crash at IPL 31

Uwe,

those were about my ideas.

Errorlog info:
$ DIAG /sin=9-dec-2005
ENTRY 6768
... Timestamp 01:44:19 ... Time since reboot 233 days ...
ENTRY 6769
... Timestamp 01:54:05 ... Time since reboot 0 days ...

Not much new info..

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Uwe Zessin
Honored Contributor

Re: VMS 7.3-1 Crash at IPL 31

Jan,
do you know what the reason for the last reboot was? I am surprised that the time is nearly identical.

Many, many years ago I was managing a VAX8650 that went down every 14 days on a friday afternoon - turned out bad memory.

Many years ago I had a cluster of two MicroVAX 3400 that went down after a certain number of days - turned out a bug in VMS, a 10 msec counter was wrapping and the CPU-specific module did not handle this properly which resulted in a solid freeze - in that case, even a [Reset] did not work.
.
Jan van den Ende
Honored Contributor

Re: VMS 7.3-1 Crash at IPL 31

Uwe,


I am surprised that the time is nearly identical.


I think there is some misunderstanding.

Look at the entry #s and uptime:
These are consegutive entries, one before and one after the (automatic) reboot.

There happened to be running a rather sensitive batchjob, and when next morning the database was not available, and the data exchange with other systems only partly done, the users and application manager were "not amused" (translation of: !@#$%^&*).
So, it needs to be explained satisfactorily.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Jim_McKinney
Honored Contributor
Solution

Re: VMS 7.3-1 Crash at IPL 31

You might try hopping back into SDA and extract any resident error log buffers for an examination:

$ ANAL/CRAS SYS$SYSTEM:SYSDUMP.DMP
SDA> CLUE ERRLOG
SDA> EXIT
$ DIAG/INCL=(CPU,MEM,MACH) CLUE$ERRLOG ! {.SYS}
Jan van den Ende
Honored Contributor

Re: VMS 7.3-1 Crash at IPL 31

Jim,

Full bingo!
.
.
.
Machine Check Reason x0098 Fatal Alpha Chip Detected Hard Error
.
.
.

Well, that system WAS scheduled to be replaced "any time soon" (meaning: perhaps this year, if schedules are met).

Good reason to speed things up.

Thanks.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Jan van den Ende
Honored Contributor

Re: VMS 7.3-1 Crash at IPL 31

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.