Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Seeking help in Crash Dump analysis.

 
SOLVED
Go to solution
Anjan Ganguly
Frequent Advisor

Seeking help in Crash Dump analysis.

I have Alpha DS 10 machine where Open VMS 7.2-1 Version is installed.My system go crash today morning at around 10:45AM.I have got the dump file with me but not able to understand the reason from the file.I am attaching the dump file.Can some body help me in analysing the cause of the crash from the file.
11 REPLIES 11
Kris Clippeleyr
Honored Contributor

Re: Seeking help in Crash Dump analysis.

Although I'm not an expert in crash dump analysis (contact Volker for that), I guess your Alpha crashed due to a hardware problem. It might be CPU or memory related.
Greetz,
Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Ian Miller.
Honored Contributor

Re: Seeking help in Crash Dump analysis.

have a look in your hardware error log too.
____________________
Purely Personal Opinion
Anjan Ganguly
Frequent Advisor

Re: Seeking help in Crash Dump analysis.

Can u tell cow to see the hardware error log and where it indicates in the crash dump that it is an hardware fault.I am asking because I am not able to decode the registry (program counter) chech and all in the clue file.Can u highligh me in this regard?
Kris Clippeleyr
Honored Contributor
Solution

Re: Seeking help in Crash Dump analysis.

Hi,

You might want to use WEBES to analyze the file SYS$ERRORLOG:ERRLOG.SYS.
And...
> and where it indicates in the crash dump that it is an hardware fault.
I've seen too many "MACHINECHK, Machine check while in kernel mode" crashes that where all related to hardware failures.
Greetz,
Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Richard Brodie_1
Honored Contributor

Re: Seeking help in Crash Dump analysis.

Try analyze/err/elv translate/since
Anjan Ganguly
Frequent Advisor

Re: Seeking help in Crash Dump analysis.

Richard,
While trying with analyze/err, it says that N"New header format found.Install Decevent and run conversion utility."


My Open VMS version is 7.2-1.
I am not able to run your given command.
Ian Miller.
Honored Contributor

Re: Seeking help in Crash Dump analysis.

Unfortunately you need System Event Analyser which is part of WEBES.
Current WEBES runs on Windows only. Older versions where available for OpenVMS.

There is also DECevent - no longer supported but still available for download
http://h18023.www1.hp.com/support/svctools/decevent/index.html#VMS


____________________
Purely Personal Opinion
Jim_McKinney
Honored Contributor

Re: Seeking help in Crash Dump analysis.

> There is also DECevent

However, though DECevent is wonderful for decoding device errors and issues with old AlphaServers, the DS10's inception occurred after DECevent support was already abandoned and so decoding core failures of the DS10 using DECevent will be unsucessful.
Bob Blunt
Respected Contributor

Re: Seeking help in Crash Dump analysis.

Regardless what the age of the machine in Alphaland DECevent should decode with enough intelligence to give you, at least, an indication if the problem is CPU, BCACHE, physical memory, etc.

Much would depend on the age, CPU type and speed of the DS10, too. I'd try to run ERRLOG.SYS through DECevent before I went to the pain and trouble of finding and installing WEBES and SEA for that. You may also have trouble finding a WEBES/SEA kit that works on V7.2-1 and the DECevent that's available will work on that version.

The system routine in use at the time of the crash is an indicator in this case. That "correctable error" should be your first hint there. BUT, keep in mind, that there are some correctable errors that don't really point to a hard failure of the hardware. There are several BCACHE errors that can cause a crash that are correctable that only require consideration if you're getting them several times in a one week timeframe. While this can be annoying in a highly critical environment only you and your management can decide if the frequency exceeds your pain threshold and then get your maintenance provider (if any) to investigate and fix the problem.

Of course the alternate solution or idea would be to contact HP and make a copy of the dump available for their perusal. If, of course, they handle your maintenance... Otherwise the work would be per-call for the repair.

bob
Hoff
Honored Contributor

Re: Seeking help in Crash Dump analysis.

This box isn't worth fixing.

If your management is willing to waste more money (by attempting to save money) then start building up spare parts.

If your management is willing to waste even larger buckets of money (by not upgrading software), you're going to end up figuring out how to get those error logs decoded across versions and boxes, and you're in the range where +4+ tools can be required, depending on the error. ELV (which arrived at V7.3), ANALYZE /ERROR (which was very problematic in this range), DECevent DIAGNOSE, and the replacement for DECevent known as SEA / WEBES / WEBEM / WBEM or whatever it is know known as. (Downloads for the older OpenVMS-based versions are/were available. The current stuff (version doesn't usually matter here, as this hardware being diagnosed is old enough) requires a client and installing some software over on Windows.

Move to emulation, or move to newer (new or used) hardware (probably Itanium, as all of the Alpha stuff is getting pretty old), get your organization a formal support and escalation path, or start accumulating parts and skills to maintain this box within your organization.

As for your immediate requirements, re-seat everything, and expect to need to swap memory and replace disks, and quite possibly more parts.

In summation, your management needs a replacement path for this configuration.
Anjan Ganguly
Frequent Advisor

Re: Seeking help in Crash Dump analysis.

I have replaced the mother board of the machine.