Operating System - OpenVMS
1752782 Members
6072 Online
108789 Solutions
New Discussion юеВ

Re: BUGCHECK after reboot

 
berlioz
Regular Advisor

BUGCHECK after reboot

Hello,

i'm currently using alpha DS25 under VMS 7.3-2

since i've installed patch "DEC-AXPVMS-TCPIP-ECO-V0504-152.1" i've sometime (not always)the following problem after reboot

The startup procedure stop after write than eia0 port has been set to 100 Mb Full by console

then the system reboot and a bugcheck procedure is called (process : startup, new image : sysman.exe)

after that the system reboot and start normally.

What exactly happened ? what is the trouble ?

i need help
19 REPLIES 19
Willem Grooters
Honored Contributor

Re: BUGCHECK after reboot

One thing that comes to my mind: Though you set your port to 100Mb FullDuplex, what is your switch doing? IIRC there are some issues with certain combinations of NIC and switch that cause trouble. Though I cannot imagine why this should be a problem here - unless you try to access the network in your startup (Decnet!) and TCPIP is doing something nasty.

The most annoying part of your question is:

... sometimes (not always) ...


Does this happen as well if you cycle power in stead of reboot without? In that case, it _might_ be an idea to force a bus reset prior to booting. There is a SRM variable for that but I cannot recall which one by heart.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Ian Miller.
Honored Contributor

Re: BUGCHECK after reboot

If you have a support contract for this system then send the CLUE crash file (should be in SYS$ERRORLOG) to hp where it can be looked at. It may be due to boot_reset issues like Willem said or many other things.
____________________
Purely Personal Opinion
labadie_1
Honored Contributor

Re: BUGCHECK after reboot

and about SRM variables, memory_test should equal full and nothing else. Partial can give subtle and annoying problems.
Willem Grooters
Honored Contributor

Re: BUGCHECK after reboot

Ian,

two remarks:

Given the location, it's pretty possible there is no crashdump (SYSDUMP.DMP) at all, an incomplete (or even invalid) one, or no valid data in SYS$ERRORLOG.DMP. (Or am I mistaken here). I had similar problesm just a moth or so ago and NO data at all - but on the (graphic) console.

Just suppose there is a crshdump, it would be of no use if there is no support contract....

Gerard,
Good sugestion. I'll do it myself (had mem problems) - new thread on this.
Berlioz - follow that recommendation, since the intermittend occurrence.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: BUGCHECK after reboot

Without the output of at least the 1st page of SDA> CLUE CRASH from the dumpfile (which is part of the CLUE file, as mentioned by Ian - see CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS) everything is speculation...

SYSMAN may just be executing IO AUTO and touching the LAN interface, which may have been in a strange state after shutdown.

Volker.
berlioz
Regular Advisor

Re: BUGCHECK after reboot

thanks for all (i apologized for my poor english too)

i've no support a that time (i've asked for it to my commercial but...)

> "sometimes" : it seems to worked better when the speed is fixed on the switch (100 Full) and when i started computer from button

> here's first page of clue**.lis


OpenVMS (TM) Operating System, Version V7.3-2 -- System Dump Analysis 3-SEP-2004 13:59:25.33 Page 1
Crashdump Summary Information:



Crash Time: 3-SEP-2004 13:59:25.33
Bugcheck Type: MACHINECHK, Machine check while in kernel mode
Node: PC1CA2 (Standalone)
CPU Type: AlphaServer DS25
VMS Version: V7.3-2
Current Process: STARTUP
Current Image: PC1CA2$DKA0:[SYS0.SYSCOMMON.][SYSEXE]SYSMAN.EXE
Failing PC: FFFFFFFF.80018088 EXE$SYSTEM_CORRECTED_ERROR_C+00768
Failing PS: 30000000.00001F04
Module: SYS$CPU_ROUTINES_2608 (Link Date/Time: 1-OCT-2003 21:19:12.40)
Offset: 00008088

Boot Time: 3-SEP-2004 13:59:21.00
System Uptime: 0 00:00:04.33
Crash/Primary CPU: 00/00
System/CPU Type: 2608
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 1024 MByte (4194304 PFNs, discontiguous memory)
Dumpfile Pagelets: 30367 blocks
Dump Flags: writecomp,errlogcomp
Dump Type: compressed,selective,shared_mem
EXE$GL_FLAGS: poolpging,init,bugdump
Paging Files: 1 Pagefile and 1 Swapfile installed

Stack Pointers:
KSP = 00000000.7FF87870 ESP = 00000000.7FF8BA00 SSP = 00000000.7FF9CD00
USP = 00000000.7AE7D370

General Registers:
R0 = 00000000.00000210 R1 = 00F00000.00000080 R2 = FFFFFFFF.8103D010
R3 = 00000000.00000000 R4 = 00000000.000000B0 R5 = 80000000.00000000
R6 = 00000000.00000042 R7 = 00000000.00000038 R8 = FFFFFFFF.811F8BB4
R9 = 00000000.00000005 R10 = FFFFFFFF.8387C088 R11 = FFFFFFFF.8100A7A0
R12 = FFFFFFFF.810414D0 R13 = FFFFFFFF.8104A930 R14 = FFFFFFFF.810419F0
R15 = FFFFFFFF.8104AAB0 R16 = 00000000.00000215 R17 = 00F00000.00000080
R18 = 00700000.00000080 R19 = 00000000.00000000 R20 = FFFFFFFF.80018054
R21 = 00000000.0000000D R22 = FFFFFFFF.00000000 R23 = 00000000.00000000
R24 = FFFFFFFF.8140E000 AI = 00000000.00000001 RA = FFFFFFFF.80004030
PV = FFFFFFFF.81008000 R28 = FFFFFFFF.80018054 FP = 00000000.7FF87870
PC = FFFFFFFF.8001808C PS = 30000000.00001F04

System Registers:
Page Table Base Register (PTBR) 00000000.00002128
Processor Base Register (PRBR) FFFFFFFF.8140E000
Privileged Context Block Base (PCBB) 00000000.0424E080
System Control Block Base (SCBB) 00000000.00000392
Software Interrupt Summary Register (SISR) 00000000.00000000
Address Space Number (ASN) 00000000.000000FD
AST Summary / AST Enable (ASTSR_ASTEN) 00000000.0000000F
Floating-Point Enable (FEN) 00000000.00000001
Interrupt Priority Level (IPL) 00000000.0000001F
Machine Check Error Summary (MCES) 00000000.00000000
Virtual Page Table Base Register (VPTB) FFFFFEFC.00000000

OpenVMS (TM) Operating System, Version V7.3-2 -- System Dump Analysis 3-SEP-2004 13:59:25.33 Page 2
Crashdump Summary Information:



Failing Instruction:
EXE$SYSTEM_CORRECTED_ERROR_C+00768: BUGCHK

Instruction Stream (last 20 instructions):
EXE$SYSTEM_CORRECTED_ERROR_C+00718: LDQ R26,#XFF60(R2)
EXE$SYSTEM_CORRECTED_ERROR_C+0071C: BIS R31,#X21,R16
EXE$SYSTEM_CORRECTED_ERROR_C+00720: BIS R31,#X01,R25
EXE$SYSTEM_CORRECTED_ERROR_C+00724: LDBU R3,(R3)
EXE$SYSTEM_CORRECTED_ERROR_C+00728: BLBC R3,#X000002
EXE$SYSTEM_CORRECTED_ERROR_C+0072C: LDQ R27,#XFF68(R2)
EXE$SYSTEM_CORRECTED_ERROR_C+00730: JSR R26,(R26)
EXE$SYSTEM_CORRECTED_ERROR_C+00734: LDL R3,#X0010(FP)
EXE$SYSTEM_CORRECTED_ERROR_C+00738: LDQ R26,#XFFE0(R2)
EXE$SYSTEM_CORRECTED_ERROR_C+0073C: BIS R31,#X07,R16
EXE$SYSTEM_CORRECTED_ERROR_C+00740: BIS R31,#X01,R25
EXE$SYSTEM_CORRECTED_ERROR_C+00744: BEQ R3,#X000006
EXE$SYSTEM_CORRECTED_ERROR_C+00748: LDL R3,#XFE98(R2)
EXE$SYSTEM_CORRECTED_ERROR_C+0074C: LDQ R27,#XFFE8(R2)
EXE$SYSTEM_CORRECTED_ERROR_C+00750: JSR R26,(R26)
EXE$SYSTEM_CORRECTED_ERROR_C+00754: BIS R3,#X05,R16
EXE$SYSTEM_CORRECTED_ERROR_C+00758: BUGCHK
EXE$SYSTEM_CORRECTED_ERROR_C+0075C: BR R31,#X000003
EXE$SYSTEM_CORRECTED_ERROR_C+00760: LDL R0,#XFE98(R2)
EXE$SYSTEM_CORRECTED_ERROR_C+00764: BIS R0,#X05,R16
EXE$SYSTEM_CORRECTED_ERROR_C+00768: BUGCHK
EXE$SYSTEM_CORRECTED_ERROR_C+0076C: BIS R31,FP,SP
EXE$SYSTEM_CORRECTED_ERROR_C+00770: LDQ R26,#X0018(FP)
EXE$SYSTEM_CORRECTED_ERROR_C+00774: LDQ R2,#X0020(FP)
EXE$SYSTEM_CORRECTED_ERROR_C+00778: LDQ R3,#X0028(FP)

OpenVMS (TM) Operating System, Version V7.3-2 -- System Dump Analysis 3-SEP-2004 13:59:25.33 Page 3
Current Registers: Process index: 0003 Process name: STARTUP PCB: 814F8FC0 (CPU 0)



R0 = 00000000.00000210 %SYSTEM-W-RESULTOVF, resultant string overflow
R1 = 00F00000.00000080
R2 = FFFFFFFF.8103D010 SMP_STD$EXTENDED_HW_SETUP+00260
R3 = 00000000.00000000
R4 = 00000000.000000B0
R5 = 80000000.00000000
R6 = 00000000.00000042
R7 = 00000000.00000038
R8 = FFFFFFFF.811F8BB4 SYS$GHDRIVER+017B4
R9 = 00000000.00000005
R10 = FFFFFFFF.8387C088
R11 = FFFFFFFF.8100A7A0 SMP$GQ_DEBUG
R12 = FFFFFFFF.810414D0 SYS$OPDRIVER+100D0
R13 = FFFFFFFF.8104A930 SCS$GA_LOCALSB+00230
R14 = FFFFFFFF.810419F0 CON$INIT_CTY+001A0
R15 = FFFFFFFF.8104AAB0 SCS$GA_LOCALSB+003B0
R16 = 00000000.00000215
R17 = 00F00000.00000080
R18 = 00700000.00000080
R19 = 00000000.00000000
R20 = FFFFFFFF.80018054 EXE$SYSTEM_CORRECTED_ERROR_C+00734
R21 = 00000000.0000000D
R22 = FFFFFFFF.00000000
R23 = 00000000.00000000
R24 = FFFFFFFF.8140E000 MP_CPU (CPU Id 0)
AI = 00000000.00000001
RA = FFFFFFFF.80004030 IOC_STD$ZERO_LOCAL_BITMAP_C
PV = FFFFFFFF.81008000 EXE$GR_SYSTEM_DATA_CELLS
R28 = FFFFFFFF.80018054 EXE$SYSTEM_CORRECTED_ERROR_C+00734
FP = 00000000.7FF87870
PC = FFFFFFFF.8001808C EXE$SYSTEM_CORRECTED_ERROR_C+0076C
PS = 30000000.00001F04 Kernel Mode, IPL 31, Interrupt

OpenVMS (TM) Operating System, Version V7.3-2 -- System Dump Analysis 3-SEP-2004 13:59:25.33 Page 4
Stack Decoder:


Volker Halle
Honored Contributor

Re: BUGCHECK after reboot

So it's a MACHINECHK (whenever you talk about a system crash, try to at least mention the bugcheck type in the problem description), which is more likely to be caused by hardware, but can also be due to drivers incorrectly accessing/loading device registers.

For a MACHINECHK crash, start with SDA> CLUE ERRLOG to extract the errlog information from the dumpfile and analyse CLUE$ERRLOG.SYS with your favourite ERRLOG analyzer tool (DECevent, SEA/WSEA, ELV).

Starting with V7.3-1, you can also find the Interrupted PC/PS correctly decoded on the stack. Look in the CLUE file for Saved PC/Saved PS below Interrupt/Exception Frame: and check the symbolic address (execlet/driver) of the saved PC. It can be a hint where the problem may be coming from.

Volker.
berlioz
Regular Advisor

Re: BUGCHECK after reboot

volker,

how accessing sda> ?

i've no CLUE$ERRLOG.SYS file too...
Volker Halle
Honored Contributor

Re: BUGCHECK after reboot

Some crashdump basics:

$ ANAL/CRASH name-of-system-dump-file (by default: SYS$SYSTEM:SYSDUMP.DMP, but can be on another disk when using DOSD = Dump Off System Disk).

The System Dump Analyzer (SDA) will prompt you with SDA>

SDA> CLUE ERRLOG will show you the errlog buffers from the dumpfile and extract them into CLUE$ERRLOG.SYS file in your current default directory.

Volker.