1819814 Members
2694 Online
109607 Solutions
New Discussion юеВ

Re: Sysdump Analysis

 
Rajarshi Gupta
Frequent Advisor

Sysdump Analysis

Hi,

We have a recent strange system crash, no error on operator log and application side. Looks like system internal problem and need to analyse the system dump. Could you please let me know how to send the system dump to analyze.

Harware : VA 4100
O/S Open VMS 7.1

Thanks for early response
23 REPLIES 23
Volker Halle
Honored Contributor

Re: Sysdump Analysis

Rajarshi,

if you have a service contract with HP (or any other service provider), you should first log a call. With the call, you should supply the following ASCII text file from your crashed system (assuming VA 4100 means VAX 4000-100): CLUE$OUTPUT:CLUE$LAST_nodename.LIS (this is the so-called CLUE file and contains important information from the sysdump dump). Your service provider will then inform you, whether the dump file is needed and provide information on how to deliver the binary dump file (SYSDUMP.DMP).

Feel free to provide the CLUE file as an text attachment in your reply to this thread and I'll have a look - I'm somewhat experienced with crash analysis ;-)

Volker.
Ian Miller.
Honored Contributor

Re: Sysdump Analysis

Rajarshi,
as well as posting the CLUE file as requested parhaps you can find the time to assign some points. See
http://forums1.itrc.hp.com/service/forums/helptips.do?#33

You can use the following link to find your previous messages.
http://forums1.itrc.hp.com/service/forums/pageList.do?userId=CA1255548&listType=unassigned&forumId=1
____________________
Purely Personal Opinion
John Travell
Valued Contributor

Re: Sysdump Analysis

Rajarshi,
If you do NOT have a service supplier capable of providing you with crash analysis, please say so as there are a few people in this group who have the skills.
Is this machine is an Alphaserver 4100 or a Vax 4000-100 ? The difference is quite critical ...
Today I can offer (chargeable) analysis of Alpha crashdumps on versions no-longer supported by HP. I cannot offer VAX crash analysis as I don't have a VAX.

Please follow the advice from Volker, attach the CLUE file to a reply and we will be able to tell you a lot more about the crash than we currently know, maybe even a complete solution.

John Travell.
comarow
Trusted Contributor

Re: Sysdump Analysis

Please provide information.
At least send the output of
anal/crash sysdump.dmp

sda>clue crash


Give people a chance to help you.

Bob
Daniel Fernandez Illan
Trusted Contributor

Re: Sysdump Analysis

Hi
You can use ANAL/ERROR/ELV to check some crash also.
Saludos.
Daniel.
John Travell
Valued Contributor

Re: Sysdump Analysis

Daniel,
maybe, but only to a very limited extent, and even then it is really only of value if the crash was caused by a hardware detected error.
Remember, ANAL/ERROR/ELV only has access to the errorlog, not the crashdump.
Bugcheck entries in the errorlog rarely have any intrinsic value as such, and really are little more than a note that the event occurred.
If the crash WAS caused by a hardware problem, there are sometimes useful entries preceding the bugcheck entry.
The only time in 16 years of VMS crash analysis that I found the bugcheck errorlog entries to be truly useful was when looking at a collected set of about 50 such entries. I was able to show that there were 3 different crash patterns, and that the solution for the current crash was unlikely to fix all 3. It didn't.
Rajarshi Gupta
Frequent Advisor

Re: Sysdump Analysis

Hi Volker
Please find the attached Clue file as requested. Please let me know if you find anythiong after you analysed

Thanks
Volker Halle
Honored Contributor

Re: Sysdump Analysis

Rajarshi,

this is an OpenVMS VAX V7.1 HALT restart crash, so most of the tips given in previous replies do not apply. (SDA> CLUE ... only works on OpenVMS Alpha, ANAL/ERR/ELV as well).

The current image SYSTSQ.EXE has executed a HALT instruction in KERNEL mode - this caused a HALT system crash, as the console halt variable is set to RESTART.

You need to examine the instruction stream in the dump to determine, WHY this may have happened:

$ ANAL/CRASH SYS$SYSTEM
SDA> EXA/INS 7E08-40;50

Please post the results of the above examine command, so that we can confirm, that the crash happened in a valid instruction stream. Another possibility would be, if the PC would be incorrectly pointing into a data area. The HALT instruction is a binary ZERO.

Please find out, who is supporting this application image, as you will probably need both source listings and linker map to find out, why the application may issued a HALT instruction in kernel mode.

Volker.
John Travell
Valued Contributor

Re: Sysdump Analysis

Rajarshi,
Can you also do an 'SDA> show process/image' and return the results ? While the PC is listed as being in image SYSTSQ.EXE;56, and will be at offset 1808 from the image base, the results from 'show process/image' will tell us more about which module the failing PC is located in.
Of course, this all presumes that the PC is in legitimate code.
A thought. the image is version 56. When was this image linked ? recently ?
JT:
Rajarshi Gupta
Frequent Advisor

Re: Sysdump Analysis

Volker,
Please find the output of the command
$ analyse/crash sys$system
SDA> EXA/INS 7E08-40;50

OpenVMS (TM) VAX System dump analyzer

Dump taken on 29-JUN-2005 21:20:01.80
HALT, Halt instruction restart

%SDA-W-INSKIPPED, unreasonable instruction stream - 1 bytes skipped
00007DC9: ADDL2 R4,R2
00007DCC: REMQUE @00(R2),-(SP)
00007DD0: BVS 00007DDB
00007DD2: CALLS #01,00007200
00007DD9: BRB 00007DCC
00007DDB: PUSHL R3
00007DDD: CALLS #01,00007200
00007DE4: MOVL #09808001,R0
00007DEB: RET
00007DEC: MOVZWL #00,@04AC(R0)
00007DF1: TSTF @-2BAB(R4)
%SDA-E-NOINSTRAN, cannot translate instruction
Process index: 00AB Name: SYSTSQ Extended PID: 000002AB
-----------------------------------------------------------

Jhon,
Please find the ouput of the SDA>Show Process/image command output


Process activated images
------------------------

ICB Start End Type Image Name Major ID,Minor ID
-------- -------- -------- -------------- -----------------------------
7FFBA148 00006600 000109FF MAIN SYSTSQ 0,0
7FFBA1B8 00000200 000065FF GLOBAL PRT SHR DFILES 0,0
7FFBAD00 0008F600 0013B1FF GLOBAL SHR MCLIB 1,1
7FFBAD70 0007C400 0007C9FF GLOBAL PRT SHR MCIOS 1,1
7FFBA228 00085E00 0008F5FF GLOBAL PRT SHR MCPRIV 1,1
7FFBA308 0007CA00 00085DFF GLOBAL SHR PASRTL 1,103
7FFBADE0 00073400 0007C3FF GLOBAL SHR FORRTL 1,100
7FFBAE50 00049000 000733FF GLOBAL SHR MTHRTL 129,32781
7FFBAEC0 0003C000 00048FFF GLOBAL SHR SORTSHR 2,29
7FFBAF30 00010A00 000191FF GLOBAL SHR LIBRTL2 1,12
7FFBA298 00019200 0003BFFF GLOBAL SHR LIBRTL 1,14
7FFBB400 014E6000 014E77FF MERGED SORTMSG 0,0


Press RETURN for more.
Process index: 00AB Name: SYSTSQ Extended PID: 000002AB
-----------------------------------------------------------

ICB Start End Type Image Name Major ID,Minor ID
-------- -------- -------- -------------- -----------------------------
7FFBB390 014E7800 014E9DFF MERGED SHR PASMSG 0,0
7FFBB320 014E9E00 014EB1FF MERGED SHR TRACE 0,0
7FFBB2B0 014EB600 014F7BFF MERGED SHR DBGTBKMSG 0,0

Total images = 15 Pages allocated = 2660

Please let me know your analysis. I am attaching the file also.


Volker Halle
Honored Contributor

Re: Sysdump Analysis

Rajarshi,

instruction address 7E08 is in MAIN SYSTSQ image.

But the instruction stream decode failed to decode an instruction and gave up...

You need to try:

SDA> EXA/INS 7E08
SDA> EXA/INS 7E08-1;10
SDA> EXA/INS 7E08-2;10
...

and so on until you get a valid instruction stream. Instructions on VAX are variable length, so sometimes you have to try some different starting offsets until you get a valid instruction stream.

Also try:

SDA> EXA/INS 7DEE;30

You can also do

SDA> EXA 7DEC:7E08

to dump the hex contents of memory preceeding 7E08 - it's then more work to decode the instructions then manually...

To look at the code calling the current i-stream try:

SDA> EXA/INS 7FA1-20;30

Volker.
John Travell
Valued Contributor

Re: Sysdump Analysis

Rajarshi,
Your profile says you are in the UK, can you contact me offline ? Look in my profile for human readable version of my Email address.

Also, could you assign points to the responses to your various questions. It is the only way that those of us able to respond get any credit for doing so. If you have lost the links to old questions, look in your own profile for them.
Rajarshi Gupta
Frequent Advisor

Re: Sysdump Analysis

Volker,

Your advise is going top of my head , I am not able to understand . Could you please be more elaborate. I tried for all the instruction but didn't get any clue.
Jhone,

Sorry for not being given the rating earlier. I will give rating all my previous questions. As you asked to contact you, could you please let me know email id.

Thanks all of you for help.
Rajarshi Gupta
Frequent Advisor

Re: Sysdump Analysis

Jhon,

I am unable to find any link to my earlier questions in my profile. please suggest where I could be able to see and give ratings on my previous questions.
Volker Halle
Honored Contributor

Re: Sysdump Analysis

Rajarshi,

you'll find John's email address by clicking on the link from his name in ITRC.

If you click at your user name, you'll find the following under basic information:

I have assigned points to 0 of 30 responses to my questions.

Click on my questions.


The SYSTSQ.EXE image has a routine running in kernel mode. This routine has executed a HALT instruction (binary opcode 0), which has caused the system to crash.

The Program Counter reported on the stack in the CLUE file is:

7FFE7760 00007E08 <= Exception PC
7FFE7764 00C00000 <= Exception PSL (cur mode: kernel)

SDA> EXA/INS 7E08
should report a HALT instruction.

The next question to answer is: HOW did this routine get to PC = 7E08 ?

The last return PC on the stack is:

7FFE7778 00007FA1 <= Saved PC

The program has issued a CALLS #1,address to the current routine and while executing this routine, has arrived at the HALT instruction at PC=00007E08

SDA> EXA/INS 7FA1-10;10

should show the CALLS instruction and the address of the routine called. You could work from there to PC=00007E08

As VAX instruction are variable length, when working backwards, EXA/INS pc-n;n does not always allow SDA to decode the instruction stream, so you sometimes have to vary n (try n+1,n+2 etc.) until you get a meaningful instruction stream.

Just trust me, I've done this often enough ;-)

Doing crashdump analysis via a forum like this, may not be most efficient, but it works - step by step.

Please try to execute the instructions given and post the results, then we'll see further.

Volker.
Ian Miller.
Honored Contributor

Re: Sysdump Analysis

for help on points see
http://forums1.itrc.hp.com/service/forums/helptips.do?#33

Use the following link to display your questions with repies with unassigned points
http://forums1.itrc.hp.com/service/forums/pageList.do?userId=CA1255548&listType=unassigned&forumId=1
____________________
Purely Personal Opinion
Ian Miller.
Honored Contributor

Re: Sysdump Analysis

John's email address is

john at jomatech dot com

he has a web site at
www dot jomatech dot com
____________________
Purely Personal Opinion
Richard White_5
Advisor

Re: Sysdump Analysis

Good Morning Rajarshi...

Looks like you have received excellent suggestions from both Volker and John. There is certainly the possibility that your "Halt-Instruction" could be caused from recent code modifications to your main SYSTSQ image. From the small PC-Trace sample, we can see that you are using your Stack-Region to contain data-structures or elements in a doubly-linked list/queue. The code does make a check for an empty-queue via the "BVS" (branch-on-overflow-set) instruction after the REMQUE instruction.

Unfortunately though, it is not evident whether or not the code-stream from 7DC9 to 7DEB is code that is executed in Kernel Mode or User-Mode.

I am always skeptical of manipulating data on the stack, in particular the K-Stack, because the CALLS/G and Ret instructions require consistency on the respective stack for the AP/FP registers. Certainly seen my share of incorrect number of PUSH/POP (or MOVL/Q) combinations to/from the Stack in the past, which in turn modify the contents of a Saved-FP...

It may prove beneficial, if, when you are able to supply the source-listings that you supply the "Macro-32/Assembly-Code" along with the Fortran and/or Pascal Code; as I notice that the Fortran and Pascal RTL's are linked to your SYSTSG image. With the exact macro-32 listings, it will be easier to translate the "word" that proceeds the actual instruction-stream code. For example, I suspect that there is an at offset 7DEC for 2 bytes, immediately following the RET instruction at offset 7DEB.

It looks like your Called-Procedures require a parameter (which looks like R3, maybe an IRP/CDRP) and have saved the contents of R4 (PCB), as part of the . But we might be able to ascertain more info, with the listings and map files.

But if there have not been any changes recently to your source code, (even if you are up to version 56), then it is possible that you might have a hardware problem. I have seen ALU/Shift-Registers and IR/IBUF failures causing the auto-increment logic to yield incorrect results, which in turn has caused the RET instruction to POP the FLW (Frame-Longword) off the stack, instead of the Return-PC.

If I remember correctly, the VAX architecture will execute the "00" op-code (Halt) only while in Kernel-Mode. Which means that the Failing-PC (7E08) is an updated PC. The actual Halt-PC may very well be 7E07. If the system was in user/super/exec mode, then the system would get a "rsvd-opcode" fault (on a 00 opcode), and vector thru the SCB, and the Failing-PC would not be updated.

If you are able to execute the command from SDA> exam/inst 7E07 as well as SDA> exam/inst 7E08 , they may decode as Halts, but in reality, they chould be part of an field. Hope this helps, and is not too confusing...

Thanx,
whynot3k
comarow
Trusted Contributor

Re: Sysdump Analysis

Because this is a vax, it will create in the sys$managere:directory
clue_last_node.lis file.

That's a great plasce to start.

It would help if you would clarify VAX or alpha.

Good Luck.
John Travell
Valued Contributor

Re: Sysdump Analysis

Bob, this is a VAX, with a Halt restart bugcheck.
CPU Type = VAX 4000-105A
VMS Version = V7.1
Unless policy has changed within the last couple of years HP will not look at this UNLESS the customer already has a PVS contract. If not, there is (was?) no HP option for chargeable analysis.
While with enough persistance this group may well find a solution, it would take a fair bit of time. If the customer wants a fix quickly I can supply the chargeable option that is not available from HP.
comarow
Trusted Contributor

Re: Sysdump Analysis

If it is has a recognizable footprint we will be glad to look at it. We have a database of crashes we can compare against.
The clue file can be used against the database.

I'll check to see if it's recognizeable.
Volker Halle
Honored Contributor

Re: Sysdump Analysis

Rajarshi,

depending on the type of system, the HALT PC (and other registers) may be wiped out by the console during the restart processing. In that case, the PC information will not be available from the dump, but ONLY from the console terminal.

The correct HALT PC value is needed to correctly diagnose the cirumstances for the HALT. You may need to connect a printer (better: notebook with terminal emulator in data capture mode) to the console line to capture the HALT message and PC.

Volker.
Rajarshi Gupta
Frequent Advisor

Re: Sysdump Analysis

Some other issues took predence of it, Will look into later