Operating System - OpenVMS
1827674 Members
3533 Online
109967 Solutions
New Discussion

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

 
SOLVED
Go to solution
Ruslan R. Laishev
Super Advisor

AS4100/VMS 8.2 + ECOs => BUGCHECK

Hello All!

I installed some set of ECO (see bellow the list) on my test AS4100 This AS is not included in a support contract, so is there someone who have the same problem (Bugcheck Type: SPLACQERR) and can officaly ask HP support about this?

The application (RADIUS Server for OpenVMS) was not changed for the past year, and works fine under 7.3-2, and 8.2 ... up to today.

VMS82A_ACRTL-V0100.PCSI-DCX_AXPEXE
VMS82A_F11X-V0200.PCSI-DCX_AXPEXE
VMS82A_INSTAL-V0100.PCSI-DCX_AXPEXE
VMS82A_LMF-V0200.PCSI-DCX_AXPEXE
VMS82A_LOADSS-V0200.PCSI-DCX_AXPEXE
VMS82A_MONTOR-V0200.PCSI-DCX_AXPEXE
VMS82A_PTHREAD-V0100.PCSI-DCX_AXPEXE
VMS82A_SYS-V0300.PCSI-DCX_AXPEXE
VMS82A_TDF-V0100.PCSI-DCX_AXPEXE
VMS82A_UPDATE-V0200.PCSI-DCX_AXPEXE


Crashdump Summary Information:
------------------------------
Crash Time: 30-MAY-2006 16:14:47.96
Bugcheck Type: SPLACQERR, Spinlock(s) of higher rank already owned by CPU
Node: STRBCK (Cluster)
CPU Type: AlphaServer 4100 5/533 4MB
VMS Version: V8.2
Current Process: RADIUS Server
Current Image: $1$DUA1160:[RADIUS.ALPHA_EXE]RADIUS_SERVER.EXE;2
Failing PC: FFFFFFFF.80165CAC EXE$WAKE_BLOCKED_C+001AC
Failing PS: 10000000.00000800
Module: PROCESS_MANAGEMENT (Link Date/Time: 12-APR-2006 14:15:55.15)
Offset: 00041CAC

Boot Time: 30-MAY-2006 15:49:34.00
System Uptime: 0 00:25:13.96
Crash/Primary CPU: 01/00
System/CPU Type: 1605
Saved Processes: 49
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 2048 MByte (262144 PFNs, contiguous memory)
Dumpfile Pagelets: 205082 blocks
Dump Flags: olddump,writecomp,errlogcomp
Dump Type: compressed,selective,shared_mem
EXE$GL_FLAGS: poolpging,init,bugdump
Paging Files: 1 Pagefile and 0 Swapfiles installed

Stack Pointers:
KSP = 00000000.7FF87D50 ESP = 00000000.7FF8C000 SSP = 00000000.7FF9CC80
USP = 00000000.7AE4B840

General Registers:
R0 = 00000000.00000001 R1 = 00000000.000006B0 R2 = FFFFFFFF.810E55F8
R3 = FFFFFFFF.81A2B580 R4 = FFFFFFFF.81A2B580 R5 = FFFFFFFF.81008B48
R6 = 00000000.00000001 R7 = FFFFFFFF.81A2B580 R8 = FFFFFFFF.81A2B580
R9 = 00000000.7FF9DDF0 R10 = FFFFFEFC.000000C0 R11 = 00000000.00008000
R12 = FFFFFFFF.8196B240 R13 = 00000000.00036000 R14 = FFFFFFFF.810EA0C8
R15 = FFFFFEFE.00189E08 R16 = 00000000.000006B5 R17 = FFFFFFFF.81A2B580
R18 = 00000000.00000000 R19 = 00000000.00000043 R20 = 00000000.00000043
R21 = FFFFFFFF.81711A80 R22 = FFFFFFFF.00000000 R23 = FFFFFFFF.81A2A300
R24 = FFFFFFFF.81711A80 AI = 00000000.00000002 RA = 00001000.00000000
PV = FFFFFFFF.810E55F8 R28 = FFFFFFFF.801598F0 FP = 00000000.7FF87D50
PC = FFFFFFFF.80165CB0 PS = 10000000.00000800

System Registers:
Page Table Base Register (PTBR) 00000000.00006359
Processor Base Register (PRBR) FFFFFFFF.81711A80
Privileged Context Block Base (PCBB) 00000000.0C6B0080
Crashdump Summary Information:
------------------------------
System Control Block Base (SCBB) 00000000.000014CD
Software Interrupt Summary Register (SISR) 00000000.00000000
Address Space Number (ASN) 00000000.00000076
AST Summary / AST Enable (ASTSR_ASTEN) 00000000.0000008F
Floating-Point Enable (FEN) 00000000.00000001
Interrupt Priority Level (IPL) 00000000.00000008
Machine Check Error Summary (MCES) 00000000.00000000
Virtual Page Table Base Register (VPTB) FFFFFEFC.00000000


Crashdump Summary Information:
------------------------------
Failing Instruction:
EXE$WAKE_BLOCKED_C+001AC: BUGCHK

Instruction Stream (last 20 instructions):
EXE$WAKE_BLOCKED_C+0015C: LDL R21,(R21)
EXE$WAKE_BLOCKED_C+00160: EXTBL R19,R5,R19
EXE$WAKE_BLOCKED_C+00164: BLBC R19,#X00003E
EXE$WAKE_BLOCKED_C+00168: LDL R23,#X03DC(R3)
EXE$WAKE_BLOCKED_C+0016C: LDL R21,#X00C8(R21)
EXE$WAKE_BLOCKED_C+00170: LDL R24,(R23)
EXE$WAKE_BLOCKED_C+00174: BIS R31,R19,R20
EXE$WAKE_BLOCKED_C+00178: XOR R0,R24,R24
EXE$WAKE_BLOCKED_C+0017C: LDL R21,(R21)
EXE$WAKE_BLOCKED_C+00180: CMPEQ R0,R21,R0
EXE$WAKE_BLOCKED_C+00184: BIS R31,R0,R6
EXE$WAKE_BLOCKED_C+00188: BNE R24,#X000005
EXE$WAKE_BLOCKED_C+0018C: LDQ R25,#X02E8(R4)
EXE$WAKE_BLOCKED_C+00190: BIS R25,R26,R25
EXE$WAKE_BLOCKED_C+00194: STQ R25,#X02E8(R4)
EXE$WAKE_BLOCKED_C+00198: BR R31,#X000018
EXE$WAKE_BLOCKED_C+0019C: LDQ_U R31,(SP)
EXE$WAKE_BLOCKED_C+001A0: LDL R1,#X0078(R2)
EXE$WAKE_BLOCKED_C+001A4: BEQ R6,#X000002
EXE$WAKE_BLOCKED_C+001A8: BIS R1,#X05,R16
EXE$WAKE_BLOCKED_C+001AC: BUGCHK
EXE$WAKE_BLOCKED_C+001B0: LDQ R26,#X0038(R2)
EXE$WAKE_BLOCKED_C+001B4: BIS R31,R23,R16
EXE$WAKE_BLOCKED_C+001B8: BIS R31,#X01,R25
EXE$WAKE_BLOCKED_C+001BC: LDQ R27,#X0040(R2)


The bugcheck take place when control program signaling to RADIUS server with $ENQ request. RADIUS server use Blocking AST routine to accept request and waikuping the main thread ( with sys$wake() ). Probably it make a sence.

Thanks!
32 REPLIES 32
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Ruslan,

this almost certainly is a software problem. The crash happens in process context (int=0) in the RADIUS_SERVER process, which is most likely involved here (that's your code, right ?). It could also be a problem in an underlying OpenVMS routine...

Please also post

SDA> CLUE REGISTER
SDA> CLUE CALL
SDA> SHOW SPIN/BRIEF/STATIC

Is this reproducable ? If so, you could probably use spinlock tracing to find out what's going on:

$ ANAL/SYS
SDA> SPL LOAD
SDA> SPL START TRACE

then run whether makes the SPLACQERR happen...

In the dump, look at SDA> SPL SHOW TRACE

Volker.
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Ruslan,

the routine EXE$WAKE_BLOCKED does not exist in OpenVMS Alpha V8.2 (without patches) - so it seems to be 'new code' ...

Have a look at the VMS82A_SYS patches (this code is in PROCESS_MANAGEMENT), but be aware, that the problem may come from other code as well.

If this is reproducable, you may be able to back out VMS82A_SYS-V0300 and try to reproduce.

Spinlock tracing would probably be the best tool to narrow down this one, if this is a real spinlock acquisition problem.

Volker.
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Ruslan,

EXE$WAKE_BLOCKED is definitely new code, which comes with PROCESS_MANAGEMENT.EXE linked 12-APR-2006 14:15:55.15 in VMS82A_SYS-V0300

This code was NOT present in VMS82A_SYS-V0200.

Try to back out this patch and see what happens.

Volker.
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Ruslan,

this may be closely related with a fix described in VMS732_SYS-V0900, the only other reference to routine EXE$WAKE_BLOCKED I could find so far.

Volker.
Ruslan R. Laishev
Super Advisor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Hello, Volker!

Thanks for the answers. An additional information:

1) A piece of code wich cause BUGCHEK works under VAX/VMS 6.x, Alpha/VMS 7.x and Alpha/VMS 8.2 up to latest ECO (i think that it's PTHREAD ECO - is the reason of the problem).

2) SDA> CLUE REGISTER
Current Registers: Process index: 0026 Process name: RADIUS Server PCB: 8196B700 (CPU 0)
------------------------------------------------------------------------------------------------
R0 = 00000000.00000001 %SYSTEM-S-NORMAL, normal successful completion
R1 = 00000000.000006B0
R2 = FFFFFFFF.810E55F8 EXE$WAKE_BLOCKED
R3 = FFFFFFFF.8196B700 PCB (Username SYSTEM, Procnam RADIUS Server)
R4 = FFFFFFFF.8196B700 PCB (Username SYSTEM, Procnam RADIUS Server)
R5 = FFFFFFFF.81008B48 SMP$GL_FLAGS
R6 = 00000000.00000001
R7 = FFFFFFFF.8196B700 PCB (Username SYSTEM, Procnam RADIUS Server)
R8 = FFFFFFFF.8196B700 PCB (Username SYSTEM, Procnam RADIUS Server)
R9 = 00000000.7FF9DDF0
R10 = FFFFFEFC.000000C0
R11 = 00000000.00008000
R12 = FFFFFFFF.81970F80 IRP (Device DUA1160:, UCB FFFFFFFF.81710FC0)
R13 = 00000000.00036000
R14 = FFFFFFFF.810EA0C8 MMG$PAGEFAULT
R15 = FFFFFEFE.00162560
R16 = 00000000.000006B5
R17 = FFFFFFFF.8196B700 PCB (Username SYSTEM, Procnam RADIUS Server)
R18 = 00000000.00000000
R19 = 00000000.00000043
R20 = 00000000.00000043
R21 = FFFFFFFF.81630000 MP_CPU (CPU Id 0)
R22 = FFFFFFFF.00000000
R23 = FFFFFFFF.818CA400 SPL
R24 = FFFFFFFF.81630000 MP_CPU (CPU Id 0)
AI = 00000000.00000002
RA = 00001000.00000000
PV = FFFFFFFF.810E55F8 EXE$WAKE_BLOCKED
R28 = FFFFFFFF.801598F0 EXE$INFORM_TM_AST_C+00430
FP = 00000000.7FF87D50
PC = FFFFFFFF.80165CB0 EXE$WAKE_BLOCKED_C+001B0
PS = 10000000.00000800 Kernel Mode, IPL 8


SDA> CLUE CALL
Call Chain: Process index: 0026 Process name: RADIUS Server PCB: 8196B700 (CPU 0)
-----------------------------------------------------------------------------------------
Procedure Frame Procedure Entry Return Address
------------------ ---------------------------------------------- ------------------------------------------------
7FF87D50 Stack 80165B00 EXE$WAKE_BLOCKED_C 8015994C EXE$INFORM_TM_AST_C+0048C
7FF87D80 Null 8015DEC0 EXE$PFW_AST_C
7FF87D90 Stack 801594C0 EXE$INFORM_TM_AST_C 8017B9C0 SYS$VM+0B9C0
7FF87E60 Stack 801794E0 MMG$PAGEFAULT_C 80150930 SCH$PAGEFAULT+00070
7AE4B850 Stack 80A50EB0 PTHREAD$RTL+56EB0 80A2A444 PTHREAD$RTL+30444
7AE4B8E0 Null 7AF758B0 DCL+798B0
7AE4BA50 Stack 80A2A260 PTHREAD$RTL+30260 8032FF94 SYS$IMGSTA_C+00154
7AE4FB30 Stack 8032FE40 SYS$IMGSTA_C 7AF7BA64 DCL+7FA64
7AE4FBB0 Stack 7AF7B8B4 DCL+7F8B4 7AF7B8A0 DCL+7F8A0

SDA> SHOW SPIN/BRIEF/STATIC
System static spinlock structures
---------------------------------

Spinlock Owner
Address Name IPL Rank Depth CPU
-------- ------------ ---- -------- -------- --------
810B9100 EMB 001F 00000000 FFFFFFFF None
810B9100 MCHECK 001F 00000000 FFFFFFFF None
810B9200 MEGA 001F 00000001 FFFFFFFF None
810B9300 HWCLK 0016 00000002 FFFFFFFF None
810B9400 INVALIDATE 0015 00000003 FFFFFFFF None
810B9500 PERFMON 000F 00000004 FFFFFFFF None
810B9600 POOL 000B 00000005 FFFFFFFF None
810B9700 MAILBOX 000B 00000006 FFFFFFFF None
810B9800 IOLOCK11 000B 00000007 FFFFFFFF None
810B9900 IOLOCK10 000A 00000008 FFFFFFFF None
810B9A00 IOLOCK9 0009 00000009 FFFFFFFF None
810B9B00 SCHED 0008 0000000A 00000000 00000000
810B9C00 MMG 0008 0000000B FFFFFFFF None
810B9D00 IO_MISC 0008 0000000C FFFFFFFF None
810B9F00 PORT 0008 0000000E FFFFFFFF None
810B9E00 TIMER 0008 0000000D FFFFFFFF None
810BA000 TX_SYNCH 0008 0000000F FFFFFFFF None
810BA100 SCS 0008 00000010 FFFFFFFF None
810BA200 LCKMGR 0008 00000011 FFFFFFFF None
810BA300 FILSYS 0008 00000012 FFFFFFFF None
810BA400 QUEUEAST 0006 00000013 FFFFFFFF None

3) A list of patches (all works fine before 30-may)
----------------------------------- ----------- ----------- --------------------
PRODUCT KIT TYPE OPERATION DATE AND TIME
----------------------------------- ----------- ----------- --------------------
DEC AXPVMS VMS82A_UPDATE V2.0 Patch Install 30-MAY-2006 18:43:20
DEC AXPVMS VMS82A_TDF V1.0 Patch Install 30-MAY-2006 18:39:05
DEC AXPVMS VMS82A_SYS V3.0 Patch Install 30-MAY-2006 18:38:24
DEC AXPVMS VMS82A_PTHREAD V1.0 Patch Install 30-MAY-2006 18:37:19
DEC AXPVMS VMS82A_MONTOR V2.0 Patch Install 30-MAY-2006 18:36:14
DEC AXPVMS VMS82A_LOADSS V2.0 Patch Install 30-MAY-2006 18:35:32
DEC AXPVMS VMS82A_LMF V2.0 Patch Install 30-MAY-2006 18:34:48
DEC AXPVMS VMS82A_INSTAL V1.0 Patch Install 30-MAY-2006 18:34:04
DEC AXPVMS VMS82A_F11X V2.0 Patch Install 30-MAY-2006 18:32:49
DEC AXPVMS VMS82A_ACRTL V1.0 Patch Install 30-MAY-2006 18:32:01
DEC AXPVMS VMS82A_ACRTL V1.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_BASRTL V1.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_F11X V2.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_INSTAL V1.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_LMF V2.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_MONTOR V2.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_PTHREAD V1.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_SYS V3.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_TDF V1.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_UPDATE V2.0 Patch Install 30-MAY-2006 13:26:04
DEC AXPVMS VMS82A_KITTING V1.0 Patch Install 30-MAY-2006 13:16:54
DEC AXPVMS VMS82A_PCSI V1.0 Patch Install 30-MAY-2006 13:16:22




DEC AXPVMS VMS82A_AMATHRTL V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_BACKUP V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_CMATIS V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_CPU270F V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_DDTM V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_DRIVER V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_FIBRE_SCSI V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_IOGEN V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_IPC V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_KITTING V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_LAT V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_MONTOR V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_MUP V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_RTPAD V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_SHADOWING V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_SYS V2.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_SYSLOA V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_UPDATE V1.0 Patch Install 26-JAN-2006 14:35:07
DEC AXPVMS VMS82A_XFC V1.0 Patch Install 26-JAN-2006 14:35:07
CPQ AXPVMS CDSA V2.1-331 Full LP Install 26-JAN-2006 13:36:09
DEC AXPVMS OPENVMS V8.2 Platform Install 26-JAN-2006 13:36:09
DEC AXPVMS VMS V8.2 Oper System Install 26-JAN-2006 13:36:09

4) $ dir sys$loadable_images:PROCESS_MANAGEMENT.EXE /date

Directory SYS$COMMON:[SYS$LDR]

PROCESS_MANAGEMENT.EXE;1
12-APR-2006 14:15:55.29

Total of 1 file.
$



Thanks for you help!
Ruslan R. Laishev
Super Advisor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Hello Volker!


Spinlock Trace Information:
---------------------------
Timestamp CPU Spin/Forklock/IPL Caller's/Fork PC EPID Operation Trace Buffer
---------------------- --- --------------------- -------------------------------------- -------- ----------------- -----------------
31-MAY 13:29:46.599325 00 81799740 81799740 804F5378 LAN$TRANSMIT_FDT_CSMACD_C+008 202000A9 Restorel FFFFFFFF.56D4CBC0
31-MAY 13:29:46.599285 00 81799740 81799740 804F5340 LAN$TRANSMIT_FDT_CSMACD_C+008 202000A9 Acqnoipl FFFFFFFF.56D4CBA0
31-MAY 13:29:46.599278 00 8177C040 8177C040 804F5378 LAN$TRANSMIT_FDT_CSMACD_C+008 202000A9 Restorel FFFFFFFF.56D4CB80
31-MAY 13:29:46.599257 00 8177C040 8177C040 804F5340 LAN$TRANSMIT_FDT_CSMACD_C+008 202000A9 Acqnoipl FFFFFFFF.56D4CB60
31-MAY 13:29:44.636864 00 818E4900 TCP_MALLOC 805F641C TCPIP$INTERNET_SERVICES+2041C 00000000 Acquirel FFFFFFFF.56D4CB40
31-MAY 13:29:46.599028 00 818E4280 PCB$202000A9 8016825C SCH$QAST_C+0020C 202000A9 Restorel FFFFFFFF.56D4CB20
31-MAY 13:29:46.599026 00 810B9B00 SCHED 80168240 SCH$QAST_C+001F0 202000A9 Restore FFFFFFFF.56D4CB00
31-MAY 13:29:46.599023 00 810B9B00 SCHED 80168560 SCH$QAST_C+00510 202000A9 Acqnoipl (own) FFFFFFFF.56D4CAE0
31-MAY 13:29:46.599020 00 818E4280 PCB$202000A9 801680F8 SCH$QAST_C+000A8 202000A9 Acquire (nospin) FFFFFFFF.56D4CAC0
31-MAY 13:29:46.598992 00 810B9B00 SCHED 8017B45C SYS$VM+0B45C 202000A9 Acqnoipl FFFFFFFF.56D4CAA0
31-MAY 13:29:46.599446 01 810B9B00 SCHED 8014FFD4 PROCESS_MANAGEMENT+2BFD4 20201ABC Release FFFFFFFF.56D4CA80
31-MAY 13:29:46.598967 00 8165A140 XFC 803B03D4 SYS$XFCACHE+223D4 202000A9 Releasel FFFFFFFF.56D4CA60
31-MAY 13:29:46.598950 00 810B9C00 MMG 8005C014 LDR_STD$DEALLOC_S0S1_VA_C+001 202000A9 Restore FFFFFFFF.56D4CA40
31-MAY 13:29:46.599420 01 810B9B00 SCHED 80155D8C EXE$SYNCH_LOOP_C+0065C 202000AF Acquire FFFFFFFF.56D4CA20
31-MAY 13:29:46.598940 00 810B9C00 MMG 8005BFAC LDR_STD$DEALLOC_S0S1_VA_C+000 202000A9 Acquire FFFFFFFF.56D4CA00
31-MAY 13:29:46.598937 00 810B9C00 MMG 803AA6FC SYS$XFCACHE+1C6FC 202000A9 Restore FFFFFFFF.56D4C9E0
31-MAY 13:29:46.598935 00 810B9400 INVALIDATE 801BF048 MMG$TBI_DATA_64_THREADS_C+004 202000A9 Restore FFFFFFFF.56D4C9C0
31-MAY 13:29:46.598929 00 810B9400 INVALIDATE 801BEC3C MMG$TBI_DATA_64_THREADS_C+000 202000A9 Acquire FFFFFFFF.56D4C9A0
31-MAY 13:29:46.598927 00 810B9400 INVALIDATE 801BF048 MMG$TBI_DATA_64_THREADS_C+004 202000A9 Restore FFFFFFFF.56D4C980
31-MAY 13:29:46.599397 01 810B9B00 SCHED 8033C908 NSA$STORE_PERSONA_C+001A8 202000AF Restore FFFFFFFF.56D4C960
31-MAY 13:29:46.598919 00 810B9400 INVALIDATE 801BEC3C MMG$TBI_DATA_64_THREADS_C+000 202000A9 Acquire FFFFFFFF.56D4C940
31-MAY 13:29:46.598917 00 810B9400 INVALIDATE 801BF048 MMG$TBI_DATA_64_THREADS_C+004 202000A9 Restore FFFFFFFF.56D4C920
31-MAY 13:29:46.599392 01 810B9B00 SCHED 8033C848 NSA$STORE_PERSONA_C+000E8 202000AF Acquire FFFFFFFF.56D4C900
31-MAY 13:29:46.598909 00 810B9400 INVALIDATE 801BEC3C MMG$TBI_DATA_64_THREADS_C+000 202000A9 Acquire FFFFFFFF.56D4C8E0
31-MAY 13:29:46.598907 00 810B9400 INVALIDATE 801BF048 MMG$TBI_DATA_64_THREADS_C+004 202000A9 Restore FFFFFFFF.56D4C8C0
31-MAY 13:29:46.598891 00 810B9400 INVALIDATE 801BEC3C MMG$TBI_DATA_64_THREADS_C+000 202000A9 Acquire FFFFFFFF.56D4C8A0
31-MAY 13:29:46.599366 01 810BA000 TX_SYNCH 8037CF94 EXE$GENERATE_UID_C+00294 202000AF Release FFFFFFFF.56D4C880
31-MAY 13:29:46.599362 01 810BA000 TX_SYNCH 8037CD68 EXE$GENERATE_UID_C+00068 202000AF Acquire FFFFFFFF.56D4C860
31-MAY 13:29:46.598885 00 810B9C00 MMG 803AA60C SYS$XFCACHE+1C60C 202000A9 Acquire FFFFFFFF.56D4C840
31-MAY 13:29:46.599328 01 810BA200 LCKMGR 801CD900 LOCKING+01900 202000AF Release FFFFFFFF.56D4C820
31-MAY 13:29:46.598838 00 8165A140 XFC 803AF48C SYS$XFCACHE+2148C 202000A9 Acquirel FFFFFFFF.56D4C800
31-MAY 13:29:46.599306 01 810BA200 LCKMGR 801CE3DC LOCKING+023DC 202000AF Acquire FFFFFFFF.56D4C7E0
31-MAY 13:29:46.599276 01 810BA300 FILSYS 802090BC F11BXQP+050BC 202000AF Release FFFFFFFF.56D4C7C0
31-MAY 13:29:46.599274 01 810BA300 FILSYS 8020906C F11BXQP+0506C 202000AF Acquire FFFFFFFF.56D4C7A0
31-MAY 13:29:46.599222 01 8195E600 PCB$202000AF 80166CC0 EXE$SHOW_MEMBER_IDS_C+00B20 202000AF Releasel FFFFFFFF.56D4C780
31-MAY 13:29:46.599220 01 8195E600 PCB$202000AF 80167724 PROCESS_MANAGEMENT+43724 202000AF Acquirel FFFFFFFF.56D4C760
31-MAY 13:29:46.599215 01 8195E600 PCB$202000AF 8016825C SCH$QAST_C+0020C 202000AF Restorel FFFFFFFF.56D4C740
31-MAY 13:29:46.599212 01 8195E600 PCB$202000AF 801680F8 SCH$QAST_C+000A8 202000AF Acquire (nospin) FFFFFFFF.56D4C720
31-MAY 13:29:46.598735 00 810B9300 HWCLK 8004EA64 EXE$INIT_HWCLOCK_C+004F4 00000000 Release FFFFFFFF.56D4C700
31-MAY 13:29:46.599204 01 8165A140 XFC 803B7A04 CACHE$DEACCESS_CHECK_C+001B4 202000AF Releasel FFFFFFFF.56D4C6E0
31-MAY 13:29:46.598728 00 810B9300 HWCLK 8004E934 EXE$INIT_HWCLOCK_C+003C4 00000000 Acqnoipl FFFFFFFF.56D4C6C0
31-MAY 13:29:46.599198 01 8165A140 XFC 803B78C4 CACHE$DEACCESS_CHECK_C+00074 202000AF Acquirel FFFFFFFF.56D4C6A0
31-MAY 13:29:46.599191 01 810BA100 IOLOCK8 800FC5D0 ACP_STD$MOUNT_C+00440 202000AF Release FFFFFFFF.56D4C680
31-MAY 13:29:46.599188 01 810BA100 IOLOCK8 800FCCB0 ACP_STD$MOUNT_C+00B20 202000AF Acquire FFFFFFFF.56D4C660
...
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Ruslan,

thanks for providing the additional information. As you've also provided the spinlock trace information, I assume that you can reproduce the problem at will.

This information seems to support my theory, that this crash has something to do with the new EXE$WAKE_BLOCKED code and the fix mentioned in VMS732_SYS-V0900, which also mentions EXE$INFORM_TM_AST, which is involved here as well.

PTHREAD is most likely triggering the problem (your RADIUS_SERVER is most likely multi-threaded), but I doubt, that PTHREADs itself is at fault. I would not expect PTHREADs to directly use spinlocks.

I believe your program is generating some 'unexpected' - by OpenVMS - path into EXE$WAKE_BLOCKED. CPU 01 (the crashing CPU) is not even holding any static spinlock (SCHED is being held by CPU 0). Your program is quite certainly not at fault, it's an OpenVMS problem !

Could you please format the spinlock (SPL) address in R23:

SDA> SHOW SPIN/ADDR=818CA400

Also please provide the brief listing of all spinlocks:

SDA> SHOW SPIN/BR

Please provide that information in an attached .TXT file, which would be more easily readable.

I still bet on the VMS82A_SYS-V0300 patch (maybe in combination with the VMS82A_PTHREAD-V0100 patch), so please try to UNDO the most recent patches and try to reproduce after removing VMS82A_SYS-V0300 (before also removing the PTHREAD patch).

It also may be that this SPLACQERR bugcheck is not really a spinlock acquisition ranking problem, but some other problem just using that bugcheck name. I would expect SPLACQERR bugchecks to be only seen/used in the SYSTEM_SYNCHRONIZATION execlet.

Keep in mind that the SYS-V0300 patch is quite new and there may not be that many sites running V8.2, who have already installed that patch and are running a heavily multi-threaded image.

Volker.
Volker Halle
Honored Contributor
Solution

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Ruslan,

we're getting there...

before you format the spinlock, issue a SDA> READ SYSDEF command.

00168: LDL R23,#X03DC(R3) ! load PCB$L_SPINLOCK(R3)
- R3 = PCB of RADIUS_SERVER
- R23 = address of PCB spinlock
0016C: LDL R21,#X00C8(R21) ! load SPL? into R21
00170: LDL R24,(R23) ! load SPL$L_OWN_CPU into R24 (owner CPU addr)
00174: BIS R31,R19,R20 ! copy R19 to R20 (=43)
00178: XOR R0,R24,R24 ! R0 xor R24 -> R24
0017C: LDL R21,(R21) ! R21 is loaded with MP_CPU (CPU 0)
00180: CMPEQ R0,R21,R0 ! R0=R21 ? YES: R0=1
- test whether R0 is the same CPUDB addr as R21
00184: BIS R31,R0,R6 ! copy R0 -> R6 (=1)
00188: BNE R24,#X000005 ! R24 .NE.0 -> branch to 001A0

...
0019C: LDQ_U R31,(SP) ! dummy instruction

001A0: LDL R1,#X0078(R2) ! load bugcheck code
001A4: BEQ R6,#X000002 ! R6 = 1 - fall through
001A8: BIS R1,#X05,R16 ! bugcheck code into R16
001AC: BUGCHK ! crash

The code seems to test, whether the PCB spinlock of the current process is owned on the same CPU as an 'other' SPINLOCK or whether one of the spinlocks is owned and the other not (?) - and if so, causes a SPLACQERR crash. This is as much as I can guess from the Alpha instruction stream...

Volker.

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Hi all,

Have the same problem after upgrading from v7.3-2 to v8.2. Also have testing machine GS60E and do not have official support from HP. But the problem is java appliction.
OpenVMS Operating System, Version V8.2 -- System Dump Analysis 30-MAY-2006 13:26:10.02 Page 1
Crashdump Summary Information:



Crash Time: 30-MAY-2006 13:26:10.02
Bugcheck Type: SPLACQERR, Spinlock(s) of higher rank already owned by CPU
Node: OMNI41 (Standalone)
CPU Type: Compaq AlphaServer GS60E 6/700
VMS Version: V8.2
Current Process: TAMARA_192
Current Image: $1$DKA0:[SYS0.SYSCOMMON.][JAVA$142.BIN]JAVA$JAVA.EXE;1
Failing PC: FFFFFFFF.80173CAC EXE$WAKE_BLOCKED_C+001AC
Failing PS: 10000000.00000800
Module: PROCESS_MANAGEMENT (Link Date/Time: 12-APR-2006 14:15:55.15)
Offset: 00041CAC

Boot Time: 24-MAY-2006 13:52:49.00
System Uptime: 5 23:33:21.02
Crash/Primary CPU: 11/00
System/CPU Type: 0C08
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 5120 MByte (655360 PFNs, contiguous memory)
Dumpfile Pagelets: 322037 blocks
Dump Flags: writecomp,errlogcomp
Dump Type: compressed,selective,shared_mem
EXE$GL_FLAGS: poolpging,init,bugdump
Paging Files: 1 Pagefile and 1 Swapfile installed

Stack Pointers:
KSP = 00000000.7FF87D50 ESP = 00000000.7FF8C000 SSP = 00000000.7FF9CC80
USP = 00000000.7AD57B20

General Registers:
R0 = 00000000.00000001 R1 = 00000000.000006B0 R2 = FFFFFFFF.820E81F8
R3 = FFFFFFFF.82D0AAC0 R4 = FFFFFFFF.82D0AAC0 R5 = FFFFFFFF.82008B48
R6 = 00000000.00000001 R7 = FFFFFFFF.82D0AAC0 R8 = FFFFFFFF.82D0AAC0
R9 = 00000000.7FF9DDF0 R10 = FFFFFEFC.00000700 R11 = 00000000.00007800
R12 = FFFFFFFF.831331C0 R13 = 00000000.001C0000 R14 = FFFFFFFF.820ECCC8
R15 = FFFFFEFE.003A9D78 R16 = 00000000.000006B5 R17 = FFFFFFFF.82D0AAC0
R18 = 00000000.00000000 R19 = 00000000.00000043 R20 = 00000000.00000043
R21 = FFFFFFFF.82874A00 R22 = FFFFFFFF.00000000 R23 = FFFFFFFF.82D0B080
R24 = FFFFFFFF.82874A00 AI = 00000000.00000002 RA = 00001000.00000000
PV = FFFFFFFF.820E81F8 R28 = FFFFFFFF.801678F0 FP = 00000000.7FF87D50
PC = FFFFFFFF.80173CB0 PS = 10000000.00000800

System Registers:
Page Table Base Register (PTBR) 00000000.0003F06E
Processor Base Register (PRBR) FFFFFFFF.82874A00
Privileged Context Block Base (PCBB) 00000000.58336080
System Control Block Base (SCBB) 00000000.0000226B
Software Interrupt Summary Register (SISR) 00000000.00000000
Address Space Number (ASN) 00000000.0000005C
AST Summary / AST Enable (ASTSR_ASTEN) 00000000.0000008F
Floating-Point Enable (FEN) 00000000.00000001
Interrupt Priority Level (IPL) 00000000.00000008
Machine Check Error Summary (MCES) 00000000.00000000
Virtual Page Table Base Register (VPTB) FFFFFEFC.00000000

OpenVMS Operating System, Version V8.2 -- System Dump Analysis 30-MAY-2006 13:26:10.02 Page 2
Crashdump Summary Information:



Failing Instruction:
EXE$WAKE_BLOCKED_C+001AC: BUGCHK

Instruction Stream (last 20 instructions):
EXE$WAKE_BLOCKED_C+0015C: LDL R21,(R21)
EXE$WAKE_BLOCKED_C+00160: EXTBL R19,R5,R19
EXE$WAKE_BLOCKED_C+00164: BLBC R19,#X00003E
EXE$WAKE_BLOCKED_C+00168: LDL R23,#X03DC(R3)
EXE$WAKE_BLOCKED_C+0016C: LDL R21,#X00C8(R21)
EXE$WAKE_BLOCKED_C+00170: LDL R24,(R23)
EXE$WAKE_BLOCKED_C+00174: BIS R31,R19,R20
EXE$WAKE_BLOCKED_C+00178: XOR R0,R24,R24
EXE$WAKE_BLOCKED_C+0017C: LDL R21,(R21)
EXE$WAKE_BLOCKED_C+00180: CMPEQ R0,R21,R0
EXE$WAKE_BLOCKED_C+00184: BIS R31,R0,R6
EXE$WAKE_BLOCKED_C+00188: BNE R24,#X000005
EXE$WAKE_BLOCKED_C+0018C: LDQ R25,#X02E8(R4)
EXE$WAKE_BLOCKED_C+00190: BIS R25,R26,R25
EXE$WAKE_BLOCKED_C+00194: STQ R25,#X02E8(R4)
EXE$WAKE_BLOCKED_C+00198: BR R31,#X000018
EXE$WAKE_BLOCKED_C+0019C: LDQ_U R31,(SP)
EXE$WAKE_BLOCKED_C+001A0: LDL R1,#X0078(R2)
EXE$WAKE_BLOCKED_C+001A4: BEQ R6,#X000002
EXE$WAKE_BLOCKED_C+001A8: BIS R1,#X05,R16
EXE$WAKE_BLOCKED_C+001AC: BUGCHK
EXE$WAKE_BLOCKED_C+001B0: LDQ R26,#X0038(R2)
EXE$WAKE_BLOCKED_C+001B4: BIS R31,R23,R16
EXE$WAKE_BLOCKED_C+001B8: BIS R31,#X01,R25
EXE$WAKE_BLOCKED_C+001BC: LDQ R27,#X0040(R2)
----------------------------------- ----------- ----------- --------------------
PRODUCT KIT TYPE OPERATION DATE AND TIME
----------------------------------- ----------- ----------- --------------------
DEC AXPVMS VMS82A_LOADSS V2.0 Patch Install 30-MAY-2006 15:59:52
DEC AXPVMS VMS82A_FIBRE_SCSI V2.0 Patch Install 30-MAY-2006 15:56:10
DEC AXPVMS JAVA142 V1.4-25 Full LP Install 24-MAY-2006 14:56:18
DEC AXPVMS JAVA142 V1.4-24P2 Full LP Remove 24-MAY-2006 14:56:18
DEC AXPVMS VMS82A_MONTOR V2.0 Patch Install 18-MAY-2006 23:34:58
DEC AXPVMS VMS82A_BASRTL V1.0 Patch Install 18-MAY-2006 23:34:28
DEC AXPVMS VMS82A_LMF V2.0 Patch Install 18-MAY-2006 23:33:58
DEC AXPVMS VMS82A_PTHREAD V1.0 Patch Install 18-MAY-2006 23:33:01
DEC AXPVMS VMS82A_SYS V3.0 Patch Install 18-MAY-2006 23:32:05
DEC AXPVMS DWMOTIF_ECO01 V1.5 Patch Install 25-APR-2006 10:47:44
DEC AXPVMS TCPIP V5.5-11ECO1 Full LP Install 25-APR-2006 10:47:07
DEC AXPVMS TCPIP V5.5-11 Full LP Remove 25-APR-2006 10:47:07
DEC AXPVMS DNVOSIECO01 V8.2 Patch Install 25-APR-2006 10:45:27
DEC AXPVMS VMSI18N V8.2-0E1 Full LP Install 25-APR-2006 10:42:21
DEC AXPVMS VMS82A_ACRTL V1.0 Patch Install 25-APR-2006 10:41:02
DEC AXPVMS VMS82A_LOADSS V1.0 Patch Install 25-APR-2006 10:38:41
DEC AXPVMS VMS82A_F11X V2.0 Patch Install 25-APR-2006 10:38:06
DEC AXPVMS VMS82A_INSTAL V1.0 Patch Install 25-APR-2006 10:37:29
DEC AXPVMS VMS82A_LMF V1.0 Patch Install 25-APR-2006 10:37:01
DEC AXPVMS VMS82A_TDF V1.0 Patch Install 25-APR-2006 10:36:26
DEC AXPVMS VMS82A_UPDATE V2.0 Patch Install 21-APR-2006 15:44:56
DEC AXPVMS VMS82A_PCSI V1.0 Patch Install 21-APR-2006 13:46:54
CPQ AXPVMS CDSA V2.1-331 Full LP Install 21-APR-2006 09:41:58
DEC AXPVMS DECNET_OSI V8.2 Full LP Install 21-APR-2006 09:41:58
DEC AXPVMS DWMOTIF V1.5 Full LP Install 21-APR-2006 09:41:58
DEC AXPVMS OPENVMS V8.2 Platform Install 21-APR-2006 09:41:58
DEC AXPVMS TCPIP V5.5-11 Full LP Install 21-APR-2006 09:41:58
DEC AXPVMS VMS V8.2 Oper System Install 21-APR-2006 09:41:58
HP AXPVMS AVAIL_MAN_BASE V8.2 Full LP Install 21-APR-2006 09:41:58
HP AXPVMS KERBEROS V2.1-72 Full LP Install 21-APR-2006 09:41:58
HP AXPVMS TDC_RT V2.1-69 Full LP Install 21-APR-2006 09:41:58


After return on v7.3-2 the application work fine.
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Michail,

this certainly IS the same problem. JAVA is also multi-threaded and you're also running it on an SMP system.

Try to back out VMS82A_SYS-V0300 and try again - no guarantee, but it may be the best 'workaround' and will also CONFIRM, where the problem is coming from...

Volker.

Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Michail,

have you been (heavily) using JAVA between 18-MAY 23:46 (patch installation) and 30-MAY 13:26 (crash time) ?

In your case, there have been just 5 patches installed on 18-MAY (before the crash) and the ONLY one or two, which may be relevant are:

VMS82A_PTHREAD-V0100
VMS82A_SYS-V0300

There is no explicit dependency on each other as documented in the patch release notes, so back out the SYS patch and try again. This is my best guess - and I've been right before ;-)

Volker.

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Hi Volker!

Product undo patch does not work. I need to restore system from backup. I think I have last backup before installing sys-0300 patch.
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Ruslan, Michail,

what are the settings of the SYSGEN parameter MULTITHREAD on your system ?

You might get away without these crashes if setting MULTITHREAD = 0 or 1 - to prevent multiple kernel threads in your multi-threaded processes.

Volker.

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Hi Volker,

In modparams found this record:
MIN_MULTITHREAD = 16 ! compaq for webes.
It was recomendation set this value for webes.
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Michail,

MULTITHREAD > 1 allow pthreaded processes to run their threads on multiple CPUs (via kernel threads) at the SAME time. This may make certain operations within the application go faster (by spreading the CPU load across multiple physical CPUs), but ALSO increases the likelyhood of synchronization problems within the operating system - as is probably the case with this SPLACQERR crash.

I would suggest to set MULTITHREAD = 1 (as a workaround), if you are interested to prevent further crashes on V8.2 (with the current set of patches applied), until the underlying problem has been diagnosed and fixed.

Volker.

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Hi Volker,

If I set MULTITHREAD in 0 or 1 it work no problems. But this setting decrease performance of server. By evening are trying to do restore system before sys-0300 patch. Also have question: How HP-support will know this problem, if I can not open issue for testing server?
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Michail,

so you tested with MULTITHREAD < 2 and running your JAVA application and the system did NOT crash with SPLACQERR in EXE$WAKE_BLOCKED ? Please confirm this explicitly - as this will be quickest workaround.

If you don't have a service contract, it would be hard to raise this call to HP. But as soon as others would install that patch and run those kind of applications, the problem will certainly surface. Until then, let's see how the inoffical 'network' works ;-)

I believe I've done everything necessary to analyse this problem and document it for OpenVMS engineering to pick up easily. And everyone seeing this problem and doing some basic research in google for 'SPLACQERR' should find this information.

Volker.

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Volker,

I confirm that if I set MULTITHREAD < 2 the java application is work no CRASH.
I have HP support on 4 Alpha servers, but this system run now 7.3-2 version and having this problem I can not upgrade it on 8.2. This system work as 24x7x365. But for old testing system support can not open issue :-(. Say upgrade live system, do some crashes and open it...
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Michail,

if this problem prevents you from upgrading your SUPPORTED systems, you should certainly (with all the analysis I've already done) be able to open a call with HP ! Refer to this ITRC entry and my name and ask for immediate escalation to OpenVMS engineering.

Volker.
Ruslan R. Laishev
Super Advisor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Hi Volker!

I can provide SYSDUMP.DMP for any who interesting by investigation of the problem.
My AS4100 is Dual CPU, so MULTITHREAD = 2, in other case a performance degradation is take place.

I have also a production cluster with two DS15 (AAA-A12/CDMA-450i) under OVMS 8.2, so I cannot perform installation the set of ECO before elemination reason of the CRASH.
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Ruslan,

I would be interested in looking at the crash, although only HP OpenVMS Engineering will be able to fix the problem.

If you can put the dump on a public FTP server, you could send me mail (look at my ITRC profile) with access information.

May I suggest, that we continue to use this topic for further exchange and status information about this crash.

Thanks,

Volker.
Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

OpenVMS engineering would need a dump with SPINLOCK tracing enabled to be able to figure out, who called EXE$WAKE_BLOCKED with the incorrect spinlock setup (owning SCHED, but not owning PCB spinlock).

Volker.
Ruslan R. Laishev
Super Advisor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Hi Volker!

I just open a case on HP-Support. SYSDUMP.ZIP (~120 MB) is ready for downloading.

FTP Host:StarLet.DeltaTelecom.RU
User:Anonymous
Pass:field

Volker Halle
Honored Contributor

Re: AS4100/VMS 8.2 + ECOs => BUGCHECK

Ruslan, Michail,

we are at crash time + 48 hours - the problem has already been identified by OpenVMS engineering ;-)

Stay tuned,

Volker.