Operating System - OpenVMS
1839261 Members
2834 Online
110137 Solutions
New Discussion

Re: Disk File Optimizer Error

 
SOLVED
Go to solution
Zeni B. Schleter
Regular Advisor

Re: Disk File Optimizer Error

I left out that I was talking about the SYSGEN parameter of SMP_SPINWAIT.
Lisa Collins
Advisor

Re: Disk File Optimizer Error

That's OK, I figured it was in sysgen. Currently that parameter is set to 100000 and it does not seem to be a dynamic parameter. Looks like another reboot is in order.

Another question that some of you might be able to give me advice on is should I have concerns about running DFO on a very active disk. The server I am dealing with is an email server. So even if I run the optimizing software at night, certain disks will be fairly active. (Ex: one disk runs anti-virus software, another disk runs the anti-spam software, etc.) In my reading, I had not read that it is necessary to run the DFO software with certain services shut down. But I thought that some of you with experience using DFO may have an opinion on this.

Thanks, Lisa
Phillip Thayer
Esteemed Contributor

Re: Disk File Optimizer Error

Lisa,

Zeni is correct. There was a bug with V7.3-1 where Spinwait was set to low and you had to set it to 300000 or you would suffer from performance problems and system crashes. I have run into this as well.

Set it using the MIN_SMP_SPINWAIT in modparams and autogen with a reboot.

Phil
Once it's in production it's all bugs after that.
Lisa Collins
Advisor

Re: Disk File Optimizer Error

Also, this document

http://mvb.saic.com/disk$vmsdoc0731/000000/731final/6652/6652pro_009.html

refers to the SMP_LNGSPINWAIT system parameter.
Phillip Thayer
Esteemed Contributor

Re: Disk File Optimizer Error

Lisa,

On the second part of your post about running DFO on a busy disk. You can do this but it will take longer to defragment the disk. Once you get the DFO working correctly, you can set it up so that it runs all the time with I/O Trhottled back so it will not interfere with the applications running on your disk. Not sure if DFO has this but I know Diskeeper does so I'm guessing that DFO does have I/O Throttling.

Phil
Once it's in production it's all bugs after that.
Jim_McKinney
Honored Contributor

Re: Disk File Optimizer Error

Before concluding that the spin wait timeout was a result of software you should verify that CPU 1 is currently functional and also inspect the error log for possible failures of CPU 1. Failed CPUs often result in machinechecks such as this.

$ show cpu ! are there 2 present?
$ diag/sinc=yest/incl=cpu ! any failure?
$ anal/cras sys$system:sysdump.dmp
SDA> clue errlog
SDA> exit
$ diag/incl=cpu clue$errlog.sys
Phillip Thayer
Esteemed Contributor

Re: Disk File Optimizer Error

Opps, Your right. It should be SMP_LNGSPINWAIT not SMP_SPINWAIT. Sorry.

Phil
Once it's in production it's all bugs after that.
Lisa Collins
Advisor

Re: Disk File Optimizer Error

System: NEOMN2, AlphaServer DS20 500 MHz

CPU ownership sets:
Active 0,1
Configure 0,1

CPU state sets:
Potential 0,1
Autostart 0,1
Powered Down None
Failover None

I don't have Diagnose installed. I was able to create the CLUE$ERRLOG.SYS, but I get a lot of binary when I use just analyze. At the end I see this

Analyze Object File 3-MAR-2006 12:51:25.82 Page 20
D1:[COLLINS]CLUE$ERRLOG.SYS;1
ANALYZ A07-04

This is an OpenVMS VAX object file

3. MODULE HEADER (OBJ$C_HDR_MHD), 1720 bytes
*** End of module record is missing from previous module.

structure level: 0
maximum record size: 0
module name: ""
*** Length of symbol is not within valid range of 1 to 39.
module version: ""
*** Length of symbol is not within valid range of 1 to 31.
creation date/time:
*** Date is not in the correct 17-byte format (dd-mmm-yyyy hh:mm).
*** Object record is longer than maximum size in module header (0 bytes).


*** End of module record is missing from previous module.
*** No psects were defined in this module.


Analyze Object File 3-MAR-2006 12:51:25.82 Page 21
D1:[COLLINS]CLUE$ERRLOG.SYS;1
ANALYZ A07-04

SUMMARY STATISTICS:

Record Type Count Total Bytes

OBJ$C_HDR 2 9912
OBJ$C_GSD 0 0
OBJ$C_TIR 0 0
OBJ$C_EOM 0 0
OBJ$C_DBG 0 0
OBJ$C_TBT 0 0
OBJ$C_LNK 0 0
OBJ$C_EOMW 0 0

Totals 2 9912


The analysis uncovered 13 errors.
Lisa Collins
Advisor

Re: Disk File Optimizer Error

If it is the smp_lngspinwait, then I am already set to 300000

SMP_LNGSPINWAIT 3000000 3000000 1 8388607 10 usec.
Volker Halle
Honored Contributor

Re: Disk File Optimizer Error

Lisa,

when you did a SDA> CLUE ERRLOG to create CLUE$ERRLOG.SYS, can you post the SDA output ? If a HW problem is causing the CPUSPINWAIT crash, an errlog entry should be immediately preceeding the crash entry.

The CPUSPINWAIT crash was caused by CPU 00 executing code in system space and holding the SCHED spinlock for too long. We need to find out, which code CPU 00 was executing:

$ ANAL/CRASH SYS$SYSTEM:
SDA> READ/EXEC/NOLOG
SDA> SET CPU 00
SDA> EXA/INS 8FA229AC-30;50
SDA> SHOW PROC
SDA> CLUE CALL

You need DECevent ($ DIAGNOSE command) or even SEA to analyse CLUE$ERRLOG.SYS. No need to do this, if there were no HW errors immediately preceeding the crash (as shown by SDA> CLUE ERRLOG).

Volker.
Jim_McKinney
Honored Contributor

Re: Disk File Optimizer Error

> when I use just analyze

Try

$ ANALYZE/ERROR/ELV TRANS CLUE$ERRORLOG.SYS

rather than plain old ANALYZE (which defaults to /OBJECT). (I'm not sure of that syntax as I am able use DECevent on all of my systems.) I don't know if you'll be able to decode DS20 error packets with that utility - but, give it a try.
Zeni B. Schleter
Regular Advisor

Re: Disk File Optimizer Error

SMP_LNGSPINWAIT by default is 3 million . It was the SMP_SPINWAIT that was too low and needed to be raised. This doesn't mean that the crash was not for a real problem. Just it could be that one processor was busy for an unusually long time and the SPINWAIT timed out because it was too low for your purposes. Also this was a problem encountered in 7.1 and I don't know if it was there in another form later on. Doing a wildcard search through the 7.1 release_notes for patches got a lot of hits.

Here is a bit from the ALPPORTS01_071

o Multiprocessor Systems with CIPCAs: SMP_SPINWAIT Restriction

If your system uses a CIPCA adapter and you operate with
MULTIPROCESSING set to a non-zero value, you must reset the
value of the SMP_SPINWAIT parameter to 300000 (3 seconds)
instead of the default 100000 (1 second).

If you do not change the value of SMP_SPINWAIT, a CIPCA adapter
error could generate a CPUSPINWAIT system bugcheck similar to
the following:

**** OpenVMS (TM) Alpha Operating System V7.1 - BUGCHECK ****
** Code=0000078C: CPUSPINWAIT, CPU spinwait timer expired

This restriction will be removed in a future OpenVMS release.

- I looked on the patch release notes on a VMS 7.3-2 system and there were still mentions of cpuspinwait bugchecks.

Volker Halle
Honored Contributor

Re: Disk File Optimizer Error

All,

another 'speculation' warning !

A CPUSPINWAIT crash can have many different reasons. It is possible to extract some more information from the dump, to be able to do a more 'educated' guess about the problem. See my previous request for more info...

Lisa,

please make sure, that you enable a logfile for the DFG$SCRIPT32 script. Without the error message from the logfile (/NOLOG is the default), you open up room for even more speculation:

$ DEFRAGMENT MODIFY DFG$SCRIPT32/LOG=SCRIPT32.LOG

Volker.
Lisa Collins
Advisor

Re: Disk File Optimizer Error

Volker,

Here is the output of the creating the CLUE$ERRLOG.SYS

Dumpfile Errorlog Entry Information:
------------------------------------
Sequence Date Time Error Message Type
-------- ----------- ----------- --------------------------------
38069 3-MAR-2006 07:21:19.81 Timestamp Entry
38061 3-MAR-2006 07:02:01.03 Volume Mount
38070 3-MAR-2006 07:26:28.09 * Crash Entry

Config Entry and Errlog Entries written to CLUE$ERRLOG.SYS file, use COMPAQ Anal
yze or DECevent to analyze.



Here is the output of the additional analyze of the dump that you outlined.

$ ANAL/CRASH SYS$SYSTEM:

OpenVMS (TM) Alpha system dump analyzer
...analyzing a compressed selective memory dump...

Dump taken on 3-MAR-2006 07:26:28.09
CPUSPINWAIT, CPU spinwait timer expired

SDA> READ/EXEC/NOLOG
%SDA-W-OPENIN, error opening SDA$READ_DIR:RCDDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:RMTDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:LAVDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:PWIPDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:UCXDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:NTYDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:INETDRIVER as input
-RMS-W-FNF, file not found
SDA> SET CPU 00
SDA> EXA/INS 8FA229AC-30;50
FFFFFFFF.8FA2297C: ZAPNOT R20,#XFD,R20
FFFFFFFF.8FA22980: BIS R31,R31,R17
FFFFFFFF.8FA22984: BIS R18,R20,R18
FFFFFFFF.8FA22988: STL R18,#X000C(R16)
FFFFFFFF.8FA2298C: LDA R27,#XFD88(R2)
FFFFFFFF.8FA22990: LDL R22,#X003C(R21)
FFFFFFFF.8FA22994: ADDL R22,#X01,R22
FFFFFFFF.8FA22998: STL R22,#X003C(R21)
FFFFFFFF.8FA2299C: BSR R26,#XFFF70C
FFFFFFFF.8FA229A0: LDL R6,(R6)
FFFFFFFF.8FA229A4: XOR R6,R5,R0
FFFFFFFF.8FA229A8: BNE R0,#XFFFFE9
FFFFFFFF.8FA229AC: LDL R3,(R3)
FFFFFFFF.8FA229B0: BIS R31,R4,R16
FFFFFFFF.8FA229B4: BLBS R3,#X000002
FFFFFFFF.8FA229B8: MTPR IPL
FFFFFFFF.8FA229BC: BR R31,#X000004
FFFFFFFF.8FA229C0: BIS R31,R31,R31
FFFFFFFF.8FA229C4: BIS R31,R4,R0
FFFFFFFF.8FA229C8: LDA R27,#X5730(R2)
FFFFFFFF.8FA229CC: BSR R26,#XFF3FB2
SDA> sh proc

Process index: 013E Name: PMDFFB pmas Extended PID: 0000053E
--------------------------------------------------------------------
Process status: 00040001 RES,PHDRES
status2: 00000001 QUANTUM_RESCHED

PCB address 816D0B00 JIB address 817F2240
PHD address 85C5A000 Swapfile disk address 00000000
KTB vector address 816D0DEC HWPCB address FFFFFFFF.85C5A080
Callback vector address 00000000 Termination mailbox 001D
Master internal PID 0001013E Subprocess count 0
Creator extended PID 00000000 Creator internal PID 00000000
Previous CPU Id 00000000 Current CPU Id 00000000
Previous ASNSEQ 000000000000034A Previous ASN 0000000000000013
Initial process priority 4 # open files remaining 189/200
Delete pending count 0 Direct I/O count/limit 200/200
UIC [00001,000004] Buffered I/O count/limit 10000/10000
Abs time of last event 000251E3 BUFIO byte count/limit 97824/97824
# of threads 1 ASTs remaining 248/250
Swapped copy of LEFC0 00000000 Timer entries remaining 199/200
Swapped copy of LEFC1 00000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 729

Press RETURN for more.
SDA>

Process index: 013E Name: PMDFFB pmas Extended PID: 0000053E
--------------------------------------------------------------------
Global cluster 3 pointer 00000000 Global WS page count 336
PCB Specific Spinlock 816D3A00

Press RETURN for more.
SDA>

Process index: 013E Name: PMDFFB pmas Extended PID: 0000053E
--------------------------------------------------------------------

Thread index: 0000
------------------
Current capabilities: System: 0000000C QUORUM,RUN
User: 00000000
Permanent capabilities: System: 0000000C QUORUM,RUN
User: 00000000
Current affinities: 00000000
Permanent affinities: 00000000
Thread status: 00040001
status2: 00000001

KTB address 816D0B00 HWPCB address FFFFFFFF.85C5A080
PKTA address 7FFEFF98 Callback vector address 00000000
Internal PID 0001013E Callback error 00000000
Extended PID 0000053E Current CPU id 00000000
State CUR 00 Flags 00000000
Base priority 4 Current priority 5
Waiting EF cluster 4 Event flag wait mask FFFFFFFE

Press RETURN for more.
SDA>

Process index: 013E Name: PMDFFB pmas Extended PID: 0000053E
--------------------------------------------------------------------
CPU since last quantum FFF7 Mutex count 0
ASTs active NONE

SDA>
SDA> clue call


Call Chain: Process index: 013E Process name: PMDFFB pmas PCB: 816D0B00
(CPU 0)
--------------------------------------------------------------------------------
-------
Procedure Frame Procedure Entry Return A
ddress
------------------ ---------------------------------------------- --------
----------------------------------------
7FFA1DE8 Null 800B82B0 SMP$ACQUIREL_C
7FFA1E40 Stack 8FA228D8 8FA2342C

7FFA1E80 Stack 8FA233E0 8F9EF994

7AE27DD0 Stack 0005FD00 PMAS_MASTER+0005FD00 000632BC
PMAS_MASTER+000632BC
7AE27F80 Stack 0005FD00 PMAS_MASTER+0005FD00 000632BC
PMAS_MASTER+000632BC
7AE28130 Stack 0005FD00 PMAS_MASTER+0005FD00 00062D88
PMAS_MASTER+00062D88
7AE282E0 Stack 0005FD00 PMAS_MASTER+0005FD00 000621DC
PMAS_MASTER+000621DC

Press RETURN for more.
SDA>

Call Chain: Process index: 013E Process name: PMDFFB pmas PCB: 816D0B00
(CPU 0)
--------------------------------------------------------------------------------
-------
Procedure Frame Procedure Entry Return A
ddress
------------------ ---------------------------------------------- --------
----------------------------------------
7AE28490 Stack 0005FD00 PMAS_MASTER+0005FD00 0006441C
PMAS_MASTER+0006441C
7AE28640 Stack 00063B60 PMAS_MASTER+00063B60 00056BF8
PMAS_MASTER+00056BF8
7AE28760 Stack 00056B60 PMAS_MASTER+00056B60 00043DA4
PMAS_MASTER+00043DA4
7AE287C0 Stack 00043670 PMAS_MASTER+00043670 00046408
PMAS_MASTER+00046408
7AE28A10 Stack 00045400 PMAS_MASTER+00045400 00041930
PMAS_MASTER+00041930
7AE28F90 Stack 00040F00 PMAS_MASTER+00040F00 00040068
PMAS_MASTER+00040068


Press RETURN for more.
SDA>

Call Chain: Process index: 013E Process name: PMDFFB pmas PCB: 816D0B00
(CPU 0)
--------------------------------------------------------------------------------
-------
Procedure Frame Procedure Entry Return A
ddress
------------------ ---------------------------------------------- --------
----------------------------------------
7AE29800 Stack 00040000 PMAS_MASTER+00040000 7BCE05B0
PTHREAD$RTL+525B0
7AE29850 Stack 7BCE0320 PTHREAD$RTL+52320 7BCBE31C
PTHREAD$RTL+3031C
7AE29A50 Stack 7BCBE140 PTHREAD$RTL+30140 8028D63C
SYS$IMGSTA_C+0015C
7AE2DB30 Stack 8028D4E0 SYS$IMGSTA_C 7AF75DD8

7AE2DBB0 Stack 7AF75C2C



Zeni,
You are saying to still change my
SMP_SPINWAIT from 100,000 to 300,000 correct? Do you think I should also change my SMP_LNGSPINWAIT from 3 million to 9 million as the 7.3--1 release notes (the link I posted) suggests? It mentions modifying this param to reduce the likelood of encountering a problem.


Jim,
The command ANALYZE/ERROR/ELV TRANS CLUE$ERRORLOG.SYS did not work.

Jim_McKinney
Honored Contributor

Re: Disk File Optimizer Error

> ANALYZE/ERROR/ELV TRANS CLUE$ERRORLOG.SYS did not work.

> 38061 3-MAR-2006 07:02:01.03 Volume Mount
> 38070 3-MAR-2006 07:26:28.09 * Crash Entry

No need to examine the errorlog entries - as Volker previously stated - if you didn't find an entry that was almost simultaneous with the crash then there'll be nothing to look at. Let Volker (or whomever) follow the trail thru SDA and see if they can find the fault. Again, heed Volker's advice and not jump to a conclusion that you need just increase the spinwait timeout value and your problem is cured. There are many things that can produce failures such as this.

Volker Halle
Honored Contributor

Re: Disk File Optimizer Error

Lisa,

the code currently executing on CPU 00 is very likely NOT standard OpenVMS code. SDA should be able to correctly map all OpenVMS drivers to symbolic adresses.

Your system is running Multinet and it's very likely, that the code running at PC = 8FA229AC is Multinet code.

The CLUE CALL output also shows return addresses 8FA2342C and 8F9EF994 which SDA did not symbolize, another strong indication for non-OpenVMS code.

Consider to scan the output of SDA> SHOW EXEC to find loaded executive images, which span the address space shown by the above 3 address values. These addresses must fall between the start and end address of the image, then tell us the name(s) of those image(s).

ANAL/ERR/ELV first showed up in V7.3-2, you're running V7.3-1.

Please don't play with your SPINWAIT SYSGEN parameters until you understand the scenario, which led to this system crash.

Volker.
Jim_McKinney
Honored Contributor

Re: Disk File Optimizer Error

MultiNet ships with an STB. Might not be terribly useful to load it without knowing MultiNet internals - but at least it may tell you that you are in MultiNet's code.

Insert a

SDA> read multinet:multinet

into Volker's recipe adjacent to the "READ/EXEC" directive and see if more of the address values are symbolized by SDA.
Volker Halle
Honored Contributor

Re: Disk File Optimizer Error

Lisa,

here are some instruction for obtaining crash info, if Multinet may be involved:

http://www.process.com/tcpip/2005Newsletters/10-2.html

$ def mu$sda multinet:multinet$sda
$ ANAL/CRASH sys$system:
SDA> read/exec
SDA> mu check

then

SDA> CLUE CRASH
SDA> SET CPU 00
SDA> EXA/INS 8FA229AC
SDA> SHOW STACK
SDA> CLUE CALL

Volker.
Zeni B. Schleter
Regular Advisor

Re: Disk File Optimizer Error

According to the release notes about SMP_LNGSPINWAIT, that was only if you were involved in special debugging to drivers and the like. Personally , I don't have occasion (thankfully) to work at that level of coding. It also referred to other system parameters that were best left alone in normal operating conditions. You aren't
" using one of these special modes (for example, to debug a device driver or other complex application)..." , are you? If so, you are way beyond anything I have tackled.

In any case, I have not had a need to change SMP_LNGSPINWAIT.
Lisa Collins
Advisor

Re: Disk File Optimizer Error

Volker,

Process Software did go over similar output with me and their determination was that I should have HP or someone experienced in VMS also take a look. They said PMAS (PreciseMail antispam, which is also their product and referenced in the dump)was not the culprit. They did not seem to suspect Multinet either.

$ define mu$sda multinet:multinet$sda
$ ANAL/CRASH sys$system:

OpenVMS (TM) Alpha system dump analyzer
...analyzing a compressed selective memory dump...

Dump taken on 3-MAR-2006 07:26:28.09
CPUSPINWAIT, CPU spinwait timer expired

SDA> read/exe
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$WSDRIVER.EXE;1
%SDA-W-OPENIN, error opening SDA$READ_DIR:RCDDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:RMTDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:LAVDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:PWIPDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:UCXDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:NTYDRIVER as input
-RMS-W-FNF, file not found
%SDA-W-OPENIN, error opening SDA$READ_DIR:INETDRIVER as input
-RMS-W-FNF, file not found
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$SMDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]NET$LOOP_APPLICATION.EXE
;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$RTTDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$CTDRIVER.EXE;1
%SDA-I-READSYM, 523 symbols read from SYS$COMMON:[SYS$LDR]NET$OSDRIVER.STB;1
%SDA-I-READSYM, 645 symbols read from SYS$COMMON:[SYS$LDR]NET$OSVCM.STB;1
%SDA-I-READSYM, 500 symbols read from SYS$COMMON:[SYS$LDR]NET$DRIVER.STB;1
%SDA-I-READSYM, 1105 symbols read from SYS$COMMON:[SYS$LDR]NET$TRANSPORT_OSI.STB
;1
%SDA-I-READSYM, 652 symbols read from SYS$COMMON:[SYS$LDR]NET$TPCONS.STB;1
%SDA-I-READSYM, 810 symbols read from SYS$COMMON:[SYS$LDR]NET$TRANSPORT_NSP.STB;
1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]NET$ROUTING_VCM.EXE;1
%SDA-I-READSYM, 1205 symbols read from SYS$COMMON:[SYS$LDR]NET$ROUTING_ES.STB;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]LES$NETMANLDR.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]LES$NETMAN.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]NET$CSMACD.EXE;1
%SDA-I-READSYM, 723 symbols read from SYS$COMMON:[SYS$LDR]SYS$NAME_SERVICES.STB;
1
%SDA-I-READSYM, 1244 symbols read from SYS$COMMON:[SYS$LDR]NET$SESSION_CONTROL.S
TB;1
%SDA-I-READSYM, 92 symbols read from SYS$COMMON:[SYS$LDR]NET$ALIAS.STB;1
%SDA-I-READSYM, 180 symbols read from SYS$COMMON:[SYS$LDR]LES$LES_V30.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$IIDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$DVDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$LRDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$YSDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$DQDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$EWDRIVER_DE500BA.EXE
;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$GZDRIVER.EXE;1
%SDA-I-READSYM, 30 symbols read from SYS$COMMON:[SYS$LDR]SYS$LAN_CSMACD.EXE;1
%SDA-I-READSYM, 126 symbols read from SYS$COMMON:[SYS$LDR]SYS$LAN.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$EWDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$MKDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$DKDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$PKWDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$PIPEDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$FTDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]NET$MESSAGE.EXE;1
%SDA-I-READSYM, 357 symbols read from SYS$COMMON:[SYS$LDR]NT_EXTENSION.STB;1
%SDA-I-READSYM, 357 symbols read from SYS$COMMON:[SYS$LDR]VMS_EXTENSION.STB;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]DDIF$RMS_EXTENSION.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]RECOVERY_UNIT_SERVICES.S
TB;1
%SDA-I-READSYM, 108 symbols read from SYS$COMMON:[SYS$LDR]RMS.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$DRDRIVER.EXE;1
%SDA-I-READSYM, 0 symbols read from SYS$COMMON:[SYS$LDR]SYS$TTDRIVER.EXE;1
%SDA-I-READSYM, 4 symbols read from SYS$COMMON:[SYS$LDR]ACME.EXE;1
%SDA-I-READSYM, 30 symbols read from SYS$COMMON:[SYS$LDR]SSPI.EXE;1
%SDA-I-READSYM, 1479 symbols read from SYS$COMMON:[SYS$LDR]SYS$NTA.STB;1
%SDA-I-READSYM, 8 symbols read from SYS$COMMON:[SYS$LDR]SYS$MME_SERVICES.STB;1
%SDA-I-READSYM, 444 symbols read from SYS$COMMON:[SYS$LDR]SYSLDR_DYN.STB;1
%SDA-I-READSYM, 403 symbols read from SYS$COMMON:[SYS$LDR]MULTIPATH.STB;1
%SDA-I-READSYM, 14 symbols read from SYS$COMMON:[SYS$LDR]SYS$IPC_SERVICES.EXE;1
%SDA-I-READSYM, 514 symbols read from SYS$COMMON:[SYS$LDR]SYS$XFCACHE.STB;1
%SDA-I-READSYM, 707 symbols read from SYS$COMMON:[SYS$LDR]SYS$NETWORK_SERVICES.S
TB;1
%SDA-I-READSYM, 16 symbols read from SYS$COMMON:[SYS$LDR]SYS$UTC_SERVICES.EXE;1
%SDA-I-READSYM, 40 symbols read from SYS$COMMON:[SYS$LDR]SYS$TRANSACTION_SERVICE
S.EXE;1
%SDA-I-READSYM, 2 symbols read from SYS$COMMON:[SYS$LDR]LMF$GROUP_TABLE.EXE;1
%SDA-I-READSYM, 361 symbols read from SYS$COMMON:[SYS$LDR]SYSGETSYI.STB;1
%SDA-I-READSYM, 580 symbols read from SYS$COMMON:[SYS$LDR]SECURITY.STB;1
%SDA-I-READSYM, 404 symbols read from SYS$COMMON:[SYS$LDR]IMAGE_MANAGEMENT.STB;1
%SDA-I-READSYM, 8 symbols read from SYS$COMMON:[SYS$LDR]CPULOA.EXE;1
%SDA-I-READSYM, 368 symbols read from SYS$COMMON:[SYS$LDR]SYSLICENSE.STB;1
%SDA-I-READSYM, 939 symbols read from SYS$COMMON:[SYS$LDR]F11BXQP.STB;1
%SDA-I-READSYM, 429 symbols read from SYS$COMMON:[SYS$LDR]LOGICAL_NAMES.STB;1
%SDA-I-READSYM, 390 symbols read from SYS$COMMON:[SYS$LDR]MESSAGE_ROUTINES.STB;1
%SDA-I-READSYM, 456 symbols read from SYS$COMMON:[SYS$LDR]LOCKING.STB;1
%SDA-I-READSYM, 349 symbols read from SYS$COMMON:[SYS$LDR]SHELL8K.STB;1
%SDA-I-READSYM, 671 symbols read from SYS$COMMON:[SYS$LDR]SYS$VM.STB;1
%SDA-I-READSYM, 722 symbols read from SYS$COMMON:[SYS$LDR]PROCESS_MANAGEMENT.STB
;1
%SDA-I-READSYM, 374 symbols read from SYS$COMMON:[SYS$LDR]SYSDEVICE.STB;1
%SDA-I-READSYM, 901 symbols read from SYS$COMMON:[SYS$LDR]IO_ROUTINES.STB;1
%SDA-I-READSYM, 582 symbols read from SYS$COMMON:[SYS$LDR]EXCEPTION.STB;1
%SDA-I-READSYM, 382 symbols read from SYS$COMMON:[SYS$LDR]ERRORLOG.STB;1
%SDA-I-READSYM, 587 symbols read from SYS$COMMON:[SYS$LDR]SYSTEM_SYNCHRONIZATION
_MIN.STB;1
%SDA-I-READSYM, 1654 symbols read from SYS$COMMON:[SYS$LDR]SYSTEM_PRIMITIVES_MIN
.STB;1
%SDA-I-READSYM, 78 symbols read from SYS$COMMON:[SYS$LDR]SYS$OPDRIVER.EXE;1
%SDA-I-READSYM, 6 symbols read from SYS$COMMON:[SYS$LDR]SYS$CNBTDRIVER.EXE;1
%SDA-I-READSYM, 98 symbols read from SYS$COMMON:[SYS$LDR]SYS$CPU_ROUTINES_2208.E
XE;1
%SDA-I-READSYM, 4622 symbols read from SYS$COMMON:[SYS$LDR]SYS$BASE_IMAGE.EXE;1
%SDA-I-READSYM, 511 symbols read from SYS$COMMON:[SYSLIB]SYS$PUBLIC_VECTORS.EXE;
1
SDA> mu check
%SDA-I-READSYM, 1314 symbols read from MULTINET_COMMON_ROOT:[MULTINET]MULTINET.S
TB;2
%MNET-I-READSYM, 0 symbols read from MULTINET:INETDRIVER.EXE
%MNET-I-READSYM, 0 symbols read from MULTINET:UCXDRIVER.EXE
%MNET-I-READSYM, 0 symbols read from MULTINET:PWIPDRIVER.EXE
%MNET-I-READSYM, 0 symbols read from MULTINET:NTYDRIVER.EXE
%MNET-I-READSYM, 0 symbols read from MULTINET:RCDDRIVER.EXE
%MNET-I-READSYM, 0 symbols read from MULTINET:RMTDRIVER.EXE
%MNET-I-READSYM, 0 symbols read from MULTINET:LAVDRIVER.EXE
%MNET-I-READSYM, 1629 symbols read from MULTINET_COMMON_ROOT:[MULTINET]MULTINET.
EXE;3
%MNET-I-READSYM, 230 symbols read from MULTINET_COMMON_ROOT:[MULTINET.LOADABLE_I
MAGES]IF_SE.EXE;2

%MNET-S-OK, Magic numbers Ok
%MNET-S-OK, NMBCLUSTERS Ok
%MNET-S-OKFREE, 32 byte free memory chain Ok, 0 buffers
%MNET-S-OKFREE, 64 byte free memory chain Ok, 0 buffers
%MNET-S-OKFREE, 128 byte free memory chain Ok, 301 buffers
%MNET-S-OKFREE, 256 byte free memory chain Ok, 44 buffers
%MNET-S-OKFREE, 512 byte free memory chain Ok, 0 buffers
%MNET-S-OKFREE, 1024 byte free memory chain Ok, 0 buffers
%MNET-S-OKFREE, 2048 byte free memory chain Ok, 8 buffers
%MNET-S-OK, Number of free mbufs Ok
%MNET-S-OK, Number of free miniprocs Ok
%MNET-S-OK, System didn't panic(); Ok
%MNET-I-QDMSGS, 0 queued console messages
%MNET-I-ERRORS, 0 errors found
%MNET-I-WARNINGS, 0 warnings found

SDA> clue crash

Crashdump Summary Information:
------------------------------
Crash Time: 3-MAR-2006 07:26:28.09
Bugcheck Type: CPUSPINWAIT, CPU spinwait timer expired
Node: NEOMN2 (Standalone)
CPU Type: AlphaServer DS20 500 MHz
VMS Version: V7.3-1
Current Process: PMDFF9 pmas
Current Image: DRA2:[PMAS.][ALPHA_EXE]PMAS_MASTER.EXE
Failing PC: FFFFFFFF.8008C3A4 SMP$TIMEOUT_C+00064
Failing PS: 08000000.00000804
Module: SYSTEM_SYNCHRONIZATION_MIN (Link Date/Time: 18-JUL-2002 20
:08:27.64)
Offset: 000003A4

Boot Time: 3-MAR-2006 07:00:44.00
System Uptime: 0 00:25:44.09
Crash/Primary CPU: 01/00
System/CPU Type: 2208
Saved Processes: 66
Pagesize: 8 KByte (8192 bytes)

Press RETURN for more.

SDA>

Crashdump Summary Information:
------------------------------
Physical Memory: 1024 MByte (131072 PFNs, contiguous memory)
Dumpfile Pagelets: 187183 blocks
Dump Flags: olddump,writecomp,errlogcomp,dump_style
Dump Type: compressed,selective,shared_mem
EXE$GL_FLAGS: poolpging,init,bugdump
Paging Files: 2 Pagefiles and 1 Swapfile installed

Stack Pointers:
KSP = 00000000.7FFA1E88 ESP = 00000000.7FFA6000 SSP = 00000000.7FFAC100
USP = 00000000.7AE28690

General Registers:
R0 = 00000000.00000000 R1 = FFFFFFFF.81461F00 R2 = FFFFFFFF.80154060
R3 = 00000000.00679FA7 R4 = 00000000.0000001C R5 = 00000000.00000000
R6 = FFFFFFFF.81461F00 R7 = 00000000.011F0950 R8 = 00000000.011E8EAF
R9 = 00000000.00000041 R10 = 00000000.00000061 R11 = FFFFFFFF.81461F00
R12 = FFFFFFFF.810B1820 R13 = FFFFFFFF.810B3660 R14 = FFFFFFFF.810B6700
R15 = 00000000.7FFA1EC8 R16 = 00000000.0000078C R17 = 00000000.00000000
R18 = FFFFFFFF.8103C1C0 R19 = FFFFFFFF.81008000 R20 = FFFFFFFF.801540C4

Press RETURN for more.
SDA>

Crashdump Summary Information:
------------------------------
R21 = 00000000.00679FA7 R22 = FFFFFFFF.00000000 R23 = 00000000.00000000
R24 = 00000000.7FFA1E20 AI = FFFFFFFF.81008000 RA = 00000000.00000000
PV = 00000000.00000000 R28 = FFFFFFFF.81010698 FP = 00000000.7AE286A0
PC = FFFFFFFF.8008C3A8 PS = 08000000.00000804

CPUSPINWAIT Bugcheck:
Cause: timeout processing IPINT and/or acquiring spinlock
Spinlock name: SCHED
Spinlock address: 810B6700
Spinlock owner CPU Id: none
Crash CPU Id: 01

CPU Id CPUDB BugCode State WorkReq
Interrupted PC
------ -------- --------------- -------- ------------------------
---------------------------------------
00 81412000 CPUEXIT Run
8FA229AC SPP_FASTTIMO_C+000D4


Press RETURN for more.
SDA>

Crashdump Summary Information:
------------------------------
01 81461F00 CPUSPINWAIT Stopped resched


System Registers:
Page Table Base Register (PTBR) 00000000.00018709
Processor Base Register (PRBR) FFFFFFFF.81461F00
Privileged Context Block Base (PCBB) 00000000.2B998080
System Control Block Base (SCBB) 00000000.00000E3B
Software Interrupt Summary Register (SISR) 00000000.00000008
Address Space Number (ASN) 00000000.0000008F
AST Summary / AST Enable (ASTSR_ASTEN) 00000000.0000000F
Floating-Point Enable (FEN) 00000000.00000001
Interrupt Priority Level (IPL) 00000000.00000008
Machine Check Error Summary (MCES) 00000000.00000000
Virtual Page Table Base Register (VPTB) FFFFFEFC.00000000





Press RETURN for more.
SDA>

Crashdump Summary Information:
------------------------------
Failing Instruction:
SMP$TIMEOUT_C+00064: BUGCHK

Instruction Stream (last 20 instructions):
SMP$TIMEOUT_C+00014: LDA R27,#X0550(R13)
SMP$TIMEOUT_C+00018: BSR R26,#X00104D
SMP$TIMEOUT_C+0001C: BNE R0,#X000004
SMP$TIMEOUT_C+00020: LDQ R28,#X0018(R13)
SMP$TIMEOUT_C+00024: LDL R17,(R28)
SMP$TIMEOUT_C+00028: AND R17,#X04,R27
SMP$TIMEOUT_C+0002C: BEQ R27,#X000005
SMP$TIMEOUT_C+00030: LDQ R28,#X0008(SP)
SMP$TIMEOUT_C+00034: LDQ R0,#X0010(SP)
SMP$TIMEOUT_C+00038: LDQ R13,#X0018(SP)
SMP$TIMEOUT_C+0003C: ADDQ SP,#X20,SP
SMP$TIMEOUT_C+00040: RET R31,(R28)
SMP$TIMEOUT_C+00044: SUBQ SP,#X10,SP
SMP$TIMEOUT_C+00048: STQ R16,#X0008(SP)
SMP$TIMEOUT_C+0004C: STQ R17,(SP)

Press RETURN for more.
SDA>

Crashdump Summary Information:
------------------------------
SMP$TIMEOUT_C+00050: LDQ R17,#X0010(R13)
SMP$TIMEOUT_C+00054: BIS R17,#X04,R17
SMP$TIMEOUT_C+00058: BIS R31,R17,R16
SMP$TIMEOUT_C+0005C: LDQ R17,(SP)
SMP$TIMEOUT_C+00060: ADDQ SP,#X08,SP
SMP$TIMEOUT_C+00064: BUGCHK
SMP$TIMEOUT_C+00068: HALT
SMP$TIMEOUT_C+0006C: BIS R31,R31,R31
SMP$CALC_THRESHOLD_C: LDQ R16,#XFF88(R27)
SMP$CALC_THRESHOLD_C+00004: BIS R31,#X20,R23
SDA> set cpu 00
SDA> EXA/INS 8FA229AC
SPP_FASTTIMO_C+000D4: LDL R3,(R3)
SDA> show stack








Process Stacks (on CPU 00)
--------------------------
Current Operating Stack (KERNEL):
00000000.7FFA1D38 18000000.00001F04 PTE$C_URKW+00004
00000000.7FFA1D40 FFFFFFFF.8104AE10 MMG$ALLOC_ZERO_CO
LOR_64
00000000.7FFA1D48 00000000.00000000
00000000.7FFA1D50 00000000.00000047
SP => 00000000.7FFA1D58 00000300.00000004 CTL$C_KRP_SIZE
00000000.7FFA1D60 FFFFFFFF.811EA070 SYS$EWDRIVER+0E07
0
00000000.7FFA1D68 00000000.00010008 LNM$C_DEL_OVERLAY
+00004
00000000.7FFA1D70 00000000.00000000
00000000.7FFA1D78 FFFFFFFF.8FA5AC58 VMS_DISABLE_FORKS
+00098
00000000.7FFA1D80 00000000.0005FD00 PMAS_MASTER+5FD00
00000000.7FFA1D88 FFFFFFFF.81008000 EXE$GR_SYSTEM_DAT
A_CELLS
00000000.7FFA1D90 00000300.00000004 CTL$C_KRP_SIZE
00000000.7FFA1D98 00000000.00000000

Press RETURN for more.
SDA>

Process Stacks (on CPU 00)
--------------------------
00000000.7FFA1DA0 FFFFFFFF.8F9EE3C0 FAST_TQE
00000000.7FFA1DA8 00000000.00000000
00000000.7FFA1DB0 FFFFFFFF.8F9F2760 VMS_SMP_SPLNET_C+
000D8
00000000.7FFA1DB8 00000000.00000003
00000000.7FFA1DC0 00000000.00000000
00000000.7FFA1DC8 00000001.00000000
00000000.7FFA1DD0 00000000.00000000
00000000.7FFA1DD8 00000000.00000000
00000000.7FFA1DE0 FFFFFFFF.8F9F2760 VMS_SMP_SPLNET_C+
000D8
00000000.7FFA1DE8 FFFFFFFF.810B5A48 SMP$ACQUIREL
00000000.7FFA1DF0 FFFFFFFF.8FA22934 SPP_FASTTIMO_C+00
05C
00000000.7FFA1DF8 00000000.7FFA1E40
00000000.7FFA1E00 FFFFFFFF.8FA55F60 SPP_FASTTIMO
00000000.7FFA1E08 FFFFFFFF.81008AC8 SMP$GL_FLAGS
00000000.7FFA1E10 00000000.00010008 LNM$C_DEL_OVERLAY
+00004

Press RETURN for more.
SDA>

Process Stacks (on CPU 00)
--------------------------
00000000.7FFA1E18 FFFFFFFF.8FA63870 NSPCB
00000000.7FFA1E20 FFFFFFFF.8FA63870 NSPCB
00000000.7FFA1E28 00000000.00000038
00000000.7FFA1E30 FFFFFFFF.8FA229AC SPP_FASTTIMO_C+00
0D4
00000000.7FFA1E38 00000000.00000804 PDT$L_PORT_IDLE
00000000.7FFA1E40 FFFFFFFF.8FA55F60 SPP_FASTTIMO
00000000.7FFA1E48 FFFFFFFF.8FA2342C PFFASTTIMO_C+0004
C
00000000.7FFA1E50 FFFFFFFF.8FA63760 NSDOMAIN
00000000.7FFA1E58 FFFFFFFF.8FA639B0 NSSW+00060
00000000.7FFA1E60 00000000.00000000
00000000.7FFA1E68 FFFFFFFF.8F9EE3C0 FAST_TQE
00000000.7FFA1E70 FFFFFFFF.81412038 CPUDB+00038
00000000.7FFA1E78 00000000.7FFA1E80
00000000.7FFA1E80 FFFFFFFF.8FA561B8 PFFASTTIMO
00000000.7FFA1E88 FFFFFFFF.8F9EF994 VMS_DISABLE_FORKS
_C+006DC
00000000.7FFA1E90 00000000.00000000

Press RETURN for more.
SDA>

Process Stacks (on CPU 00)
--------------------------
00000000.7FFA1E98 00000000.00000000
00000000.7FFA1EA0 00000000.7AE27DD0
00000000.7FFA1EA8 00000000.7FFA1EB0
00000000.7FFA1EB0 FFFFFFFF.81049938 EXE_STD$TQEIDX_RE
M_NEXT_TQE
00000000.7FFA1EB8 8F9EE3C0.8004B3A8 EXE$SWTIMER_FORK_
C+00078
00000000.7FFA1EC0 FFFFFFFF.8004B944 EXE$SWTIMER_FORK_
C+00614
00000000.7FFA1EC8 00000000.00000000
00000000.7FFA1ED0 00000000.00000000
00000000.7FFA1ED8 00000000.00000000
00000000.7FFA1EE0 FFFFFFFF.8F9EE3C0 FAST_TQE
00000000.7FFA1EE8 FFFFFFFF.810495B0 EXE$SWTIMER_FORK
00000000.7FFA1EF0 00000000.7AE27DD0
00000000.7FFA1EF8 8F9EE3C0.8F9EE3C0 FAST_TQE
00000000.7FFA1F00 00000000.00070CC0 DAP$K_VALID_F2R+0
0CDA


Press RETURN for more.
SDA>

Process Stacks (on CPU 00)
--------------------------
00000000.7FFA1F08 FFFFFFFF.8004AAD4 EXE_STD$IOFORK_CP
U_C+003D4
00000000.7FFA1F10 FFFFFFFF.81049318 EXE_STD$PRIMITIVE
_FORK+00020
00000000.7FFA1F18 00000000.7AE28670
00000000.7FFA1F20 00000000.000812F0 PMAS_MASTER+812F0
00000000.7FFA1F28 00000000.00000012
00000000.7FFA1F30 00000000.00000009
00000000.7FFA1F38 00000000.00000000
00000000.7FFA1F40 00000000.7FFFFFFF CIXCD_PQBBR_M_QBB
ASE
00000000.7FFA1F48 00000000.0002B920 PMAS_MASTER+2B920
00000000.7FFA1F50 00000000.00000000
00000000.7FFA1F58 00000000.7AE28670
00000000.7FFA1F60 00000000.0005FD00 PMAS_MASTER+5FD00
00000000.7FFA1F68 00000000.00000001
00000000.7FFA1F70 00000000.0000065B REAL_Q_REC+00003
00000000.7FFA1F78 00000000.00000000
00000000.7FFA1F80 6E750B12.0B270800

Press RETURN for more.
SDA>

Process Stacks (on CPU 00)
--------------------------
00000000.7FFA1F88 00000000.7FFFFFFF CIXCD_PQBBR_M_QBB
ASE
00000000.7FFA1F90 77767574.73727170
00000000.7FFA1F98 00000000.00000000
00000000.7FFA1FA0 00000000.000632BC PMAS_MASTER+632BC
00000000.7FFA1FA8 00000000.00014038 PMAS_MASTER+14038
00000000.7FFA1FB0 00000000.00060F10 SCDRP$M_FLAG_MODE
_SENSE+00F10
00000000.7FFA1FB8 00000000.7AE27DD0
00000000.7FFA1FC0 00000000.00014038 PMAS_MASTER+14038
00000000.7FFA1FC8 00000000.0133DBA6
00000000.7FFA1FD0 00000000.0110D9C4
00000000.7FFA1FD8 00000000.00000006
00000000.7FFA1FE0 00000000.0000001C
00000000.7FFA1FE8 00000000.00000003
00000000.7FFA1FF0 00000000.00060F10 SCDRP$M_FLAG_MODE
_SENSE+00F10
00000000.7FFA1FF8 00000000.0000001B


SDA> clue call



Call Chain: Process index: 013E Process name: PMDFFB pmas PCB: 816D0B00
(CPU 0)
--------------------------------------------------------------------------------
-------
Procedure Frame Procedure Entry Return A
ddress
------------------ ---------------------------------------------- --------
----------------------------------------
7FFA1DE8 Null 800B82B0 SMP$ACQUIREL_C
7FFA1E40 Stack 8FA228D8 SPP_FASTTIMO_C 8FA2342C
PFFASTTIMO_C+0004C
7FFA1E80 Stack 8FA233E0 PFFASTTIMO_C 8F9EF994
VMS_DISABLE_FORKS_C+006DC
7AE27DD0 Stack 0005FD00 PMAS_MASTER+0005FD00 000632BC
PMAS_MASTER+000632BC
7AE27F80 Stack 0005FD00 PMAS_MASTER+0005FD00 000632BC
PMAS_MASTER+000632BC
7AE28130 Stack 0005FD00 PMAS_MASTER+0005FD00 00062D88
PMAS_MASTER+00062D88
7AE282E0 Stack 0005FD00 PMAS_MASTER+0005FD00 000621DC
PMAS_MASTER+000621DC

Press RETURN for more.
SDA>


Call Chain: Process index: 013E Process name: PMDFFB pmas PCB: 816D0B00
(CPU 0)
--------------------------------------------------------------------------------
-------
Procedure Frame Procedure Entry Return A
ddress
------------------ ---------------------------------------------- --------
----------------------------------------
7AE28490 Stack 0005FD00 PMAS_MASTER+0005FD00 0006441C
PMAS_MASTER+0006441C
7AE28640 Stack 00063B60 PMAS_MASTER+00063B60 00056BF8
PMAS_MASTER+00056BF8
7AE28760 Stack 00056B60 PMAS_MASTER+00056B60 00043DA4
PMAS_MASTER+00043DA4
7AE287C0 Stack 00043670 PMAS_MASTER+00043670 00046408
PMAS_MASTER+00046408
7AE28A10 Stack 00045400 PMAS_MASTER+00045400 00041930
PMAS_MASTER+00041930
7AE28F90 Stack 00040F00 PMAS_MASTER+00040F00 00040068
PMAS_MASTER+00040068


Press RETURN for more.
SDA>

Call Chain: Process index: 013E Process name: PMDFFB pmas PCB: 816D0B00
(CPU 0)
--------------------------------------------------------------------------------
-------
Procedure Frame Procedure Entry Return A
ddress
------------------ ---------------------------------------------- --------
----------------------------------------
7AE29800 Stack 00040000 PMAS_MASTER+00040000 7BCE05B0
PTHREAD$RTL+525B0
7AE29850 Stack 7BCE0320 PTHREAD$RTL+52320 7BCBE31C
PTHREAD$RTL+3031C
7AE29A50 Stack 7BCBE140 PTHREAD$RTL+30140 8028D63C
SYS$IMGSTA_C+0015C
7AE2DB30 Stack 8028D4E0 SYS$IMGSTA_C 7AF75DD8

7AE2DBB0 Stack 7AF75C2C 7AF75C18

SDA>

Analyzing these files is not something I've had traing or experience with. I appreciate everyones time and assistance with this.
Thanks!!
Ian Miller.
Honored Contributor

Re: Disk File Optimizer Error

Any sign of non-paged pool fragmentation?
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: Disk File Optimizer Error

Lisa,

CPU 0 was definitely executing Multinet code (routine named SPP_FASTTIMO_C has been symbolized for the interrupted PC after reading the Multinet symbol table files).

At the time of the crash, the SCHED spinlock is not owned anymore. But there are signs on the stack of CPU 0, that it also executed spinlock routine SMP$ACQUIREL recently.

Could you provide the crash history of this system ($ TYPE CLUE$HISTORY:) to see if there have been other unusual crashes in the past.

I would also strongly suggest to upgrade your firmware. You're running V6.2-1 and the current DS20 FW version is V7.1. There was an obscure HW/FW problem for these types of systems, which has been solved with V6.6 (or higher).


Process Software did go over similar output with me and their determination was that I should have HP or someone experienced in VMS also take a look


Consider this done ;-)

Ian,

nonpaged pool fragmentation related problems would typically show up in relation to the POOL spinlock and pool allocation/deallocation routines on the stack of the CPU holding that spinlock.

Lisa,

to check for pool fragmentation, please issue the following command against the crash:

SDA> SHOW MEM/POOL/FULL

and provide the output of this command.

Volker.
Allan Bowman
Respected Contributor

Re: Disk File Optimizer Error

Lisa,

Just curious - from your original ANA/DISK result, it appears that you recently performed a MULTINET upgrade. Did the upgrade not require a reboot afterwards? If you did perform a reboot, then there was something that went wrong and left the deleted file headers hanging around. If you did not reboot, then doing the /REPAIR to clean up the headers may have done something to confuse MULTINET.

In either case, has your system been stable since it came up after the crash/reboot?

Allan in Atlanta
Lisa Collins
Advisor

Re: Disk File Optimizer Error

Here is the Clue file

$ TYPE CLUE$HISTORY
Date Version System/CPU Node Bugcheck Process
PC Module Offset
----------------- -------- ------------------- ------ ------------ --------
------- -------- ----------------------- --------
10-SEP-2004 11:45 V7.3-1 AlphaServer DS20 50 NEOMN2 OPERCRASH NULL
8675A4C8 00000000
25-DEC-2005 13:39 V7.3-1 AlphaServer DS20 50 NEOMN2 UNXSIGNAL PMDF55 p
opstore 816B6F4C 00000000
17-JAN-2006 09:34 V7.3-1 AlphaServer DS20 50 NEOMN2 UNXSIGNAL BATCH_84
9 80261B28 F11BXQP 0003DB28
3-MAR-2006 07:26 V7.3-1 AlphaServer DS20 50 NEOMN2 CPUSPINWAIT PMDFF9 p
mas 8008C3A4 SYSTEM_SYNCHRONIZATION_ 000003A4

On Feb 8th, we had changed the nonpaged dynamic memory to enhance performance. This was done on the recommendations of a Sr VMS specialist within our organizational structure. The current and initial size settings were the same number on the 8th after reboot. I was monitoring those setting throughout the month, I saw that the current size had grown to 24. After the system bugchecked, it went back to 12.39 and the initial and current once again match. Here's the output from the crash


SDA> SHOW MEM/POOL/FULL

System Memory Resources from Crashdump on 3-MAR-2006 07:26:28.09
-----------------------------------------------------------------

Nonpaged Dynamic Memory (Lists + Variable)
Current Size (MB) 12.39 Current Size (Pagelets) 25376
Initial Size (MB) 12.39 Initial Size (Pagelets) 25376
Maximum Size (MB) 61.98 Maximum Size (Pagelets) 126944
Free Space (MB) 8.72 Space in Use (MB) 3.66
Largest Var Block (MB) 8.40 Smallest Var Block (KB) 2.37
Number of Free Blocks 658 Free Blocks LEQU 64 bytes 0
Free Blocks on Lookasides 652 Lookaside Space (KB) 315.68

(No Bus Addressable Memory allocated)

Paged Dynamic Memory
Current Size (MB) 5.08 Current Size (Pagelets) 10416
Free Space (MB) 2.85 Space in Use (MB) 2.23
Largest Var Block (MB) 2.84 Smallest Var Block (By) 16.00
Number of Free Blocks 122 Free Blocks LEQU 64 bytes 105

Lock Manager Dynamic Memory

Press RETURN for more.
SDA>

System Memory Resources from Crashdump on 3-MAR-2006 07:26:28.09
-----------------------------------------------------------------
Current Size (MB) 1.02 Current Size (Pages) 131
Free Space (MB) 0.04 Hits 12987
Space in Use (MB) 0.98 Misses 69
Number of Empty Pages 0 Expansions 131
Number of Free Packets 164 Packet Size (By) 256.00


Allan,

It has been awhile since we have touched the Multinet Software. April 2004 we installed an ECO. Reviewing that install log it is interesting to see that in one part of the log I see that you must reboot (meaning system?) after installing


*********** NOTE ************
KERNEL-UPDATE-020_A044 and later REQUIRES that you be running ucxdriver
eco UCXDRIVER-020_A044 or later.
*****************************

You must reboot after installing this ECO kit for the changes to
take effect.


In another part of the install I see that they mention a restart is required


You do not have to reboot after installing this kit.
New sessions will automatically use the new images.

SMTP should be restarted by @MULTINET:START_SMTP,

The MultiNet master_server should be restarted by @MULTINET:START_SERVER.

1) MULTINET (new image)

....................................

The log closes before I see if a restart or reboot was done.


Volker Halle
Honored Contributor

Re: Disk File Optimizer Error

Lisa,

the 17-JAN-2006 UNXSIGNAL crash shows the footprint of the HW/FW problem I've mentioned in my previous update. Please upgrade this system to the latest FW V7.1. If you want a second opinion before the FW upgrade, please log a call with HP and send them the CLUE$NEOMN2_170106_0934.LIS file.

There was no problem with fragmented pool in the CPUSPINWAIT dump. Only 6 packets are in the variable nonpaged pool list.

In the CPUSPINWAIT crash, pool had not yet expanded beyond the initial size, but the system was only up for 25 minutes. If nonpaged pool had grown to 24 MB before, then it may be time to add MIN_NPAGEDYN=24000000 into MODPARAMS.DAT and do an AUTOGEN.

What are the news about the %DFG-W-UNRECOVERR problem ;-)

Volker.