Operating System - OpenVMS
1839249 Members
2938 Online
110137 Solutions
New Discussion

Re: CPU Spin wait CPU Spinwait timer expired

 
Mohd Zahazan Zainon
New Member

CPU Spin wait CPU Spinwait timer expired

Hi, I was just wonder how this CPU spinwait happends and what are the caused of this spinwait to happend.

also how to resolved this matter, below I attached the problem cause

OpenVMS (TM) system dump analyzer
...analyzing an Alpha compressed selective memory dump...

Dump taken on 29-JUN-2010 15:11:44.89
CPUSPINWAIT, CPU spinwait timer expired


Bitmask of CPUs active/available: 000000FF/000000FF


CPU bugcheck codes:
CPU 02 -- CPUSPINWAIT, CPU spinwait timer expired
CPU 04 -- CPUSPINWAIT, CPU spinwait timer expired
6 others -- CPUEXIT, Shutdown requested by another CPU


CPU 04 reason for Bugcheck: CPUSPINWAIT, CPU spinwait timer expired


Process currently executing on this CPU: SRV1356_02_0


Current image file: DSA119:[CERNER.W_STANDARD.WH2002_02.][VMSALPHA]SRV_DRVR.EXE


Current IPL: 8 (decimal)


CPU database address: 821B0980


CPUs Capabilities: QUORUM,RUN

General registers:

R0 = 00000000.00000000 R1 = FFFFFFFF.821B0980 R2 = 00000000.00000000
R3 = 00000000.B8FF62DA R4 = 00000000.00000023 R5 = 00000000.00000090
R6 = FFFFFFFF.80580368 R7 = 00000000.7FF87D80 R8 = 00000000.0437FD18
R9 = 00000000.00000000 R10 = 00000000.00000000 R11 = FFFFFFFF.821B0980
R12 = FFFFFFFF.818B9000 R13 = FFFFFFFF.818B5700 R14 = FFFFFFFF.818B9000
R15 = 00000000.7FF87BE8 R16 = 00000000.0000078C R17 = 00000000.00000000
R18 = FFFFFFFF.8183F9C0 R19 = FFFFFFFF.81808000 R20 = FFFFFFFF.801D13D0
R21 = 00000000.B8FF62DA R22 = FFFFFFFF.00000000 R23 = FFFFFFFF.FFFFFFFD
R24 = 00000000.7FF87B40 AI = FFFFFFFF.81808000 RA = 00000000.00000000
PV = 00000000.00000000 R28 = FFFFFFFF.818128A0 FP = 00000000.7FF87CB0
PC = FFFFFFFF.8007A388 PS = 28000000.00000800




Processor Internal Registers:


ASN = 00000000.000000C4 ASTSR/ASTEN = 0000000F
IPL = 00000008 PCBB = 00000000.6627C080 PRBR = FFFFFFFF.821B0980
PTBR = 00000000.0003313F SCBB = 00000000.00000A99 SISR = 00000000.00000180
VPTB = FFFFFEFA.00000000 FPCR = 0C000000.00000000 MCES = 00000000.00000000



Press RETURN for more.




Clue Crash – Info

Crashdump Summary Information:
------------------------------
Crash Time: 29-JUN-2010 15:11:44.89
Bugcheck Type: CPUSPINWAIT, CPU spinwait timer expired
Node: SELSV2 (Cluster)
CPU Type: hp AlphaServer GS1280 7/1150
VMS Version: V7.3-2
Current Process: SRV1356_02_0
Current Image: DSA119:[CERNER.W_STANDARD.WH2002_02.][VMSALPHA]SRV_DRVR.EXE
Failing PC: FFFFFFFF.8007A384 SMP$TIMEOUT_C+00064
Failing PS: 28000000.00000800
Module: SYSTEM_SYNCHRONIZATION_MIN (Link Date/Time: 10-AUG-2005 11:31:10.29)
Offset: 00000384

Boot Time: 24-JUN-2010 01:45:21.00
System Uptime: 5 13:26:23.89
Crash/Primary CPU: 04/00
System/CPU Type: 270F
Saved Processes: 153
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 16384 MByte (268435456 PFNs, discontiguous memory)
Dumpfile Pagelets: 2765901 blocks
Dump Flags: olddump,writecomp,errlogcomp
Dump Type: compressed,selective,shared_mem
EXE$GL_FLAGS: poolpging,init,bugdump
Paging Files: 1 Pagefile and 1 Swapfile installed

Stack Pointers:
KSP = 00000000.7FF87BA8 ESP = 00000000.7FF8C000 SSP = 00000000.7FF9CD00
USP = 00000000.017D39A0

General Registers:
R0 = 00000000.00000000 R1 = FFFFFFFF.821B0980 R2 = 00000000.00000000
R3 = 00000000.B8FF62DA R4 = 00000000.00000023 R5 = 00000000.00000090
R6 = FFFFFFFF.80580368 R7 = 00000000.7FF87D80 R8 = 00000000.0437FD18
R9 = 00000000.00000000 R10 = 00000000.00000000 R11 = FFFFFFFF.821B0980
R12 = FFFFFFFF.818B9000 R13 = FFFFFFFF.818B5700 R14 = FFFFFFFF.818B9000
R15 = 00000000.7FF87BE8 R16 = 00000000.0000078C R17 = 00000000.00000000
R18 = FFFFFFFF.8183F9C0 R19 = FFFFFFFF.81808000 R20 = FFFFFFFF.801D13D0
R21 = 00000000.B8FF62DA R22 = FFFFFFFF.00000000 R23 = FFFFFFFF.FFFFFFFD
R24 = 00000000.7FF87B40 AI = FFFFFFFF.81808000 RA = 00000000.00000000
PV = 00000000.00000000 R28 = FFFFFFFF.818128A0 FP = 00000000.7FF87CB0
PC = FFFFFFFF.8007A388 PS = 28000000.00000800

CPUSPINWAIT Bugcheck:
Cause: timeout processing IPINT and/or acquiring spinlock
Spinlock name: LCKMGR

Press RETURN for more.

9 REPLIES 9
P Muralidhar Kini
Honored Contributor

Re: CPU Spin wait CPU Spinwait timer expired

Hi Zahazan,

Welcome to the ITRC forum.

I am not sure if ITRC forum is the right place to do a crash dump analysis to
find out the root cause for the problem.

I would recommend you to get in touch with the local HP customer support for
further assistance.

Hope this helps.

Regards,
Murali
Let There Be Rock - AC/DC
Volker Halle
Honored Contributor

Re: CPU Spin wait CPU Spinwait timer expired

Hi,

you've left out the MOST important information in a CPUSPINWAIT crash: which CPU owned the spinlock for too long and what code was executing at it's kernel stack...

This data is following the CPUSPINWAIT Bugcheck: header (and has been added to the default CLUE CRASH output on my request - some years ago).

Consider to add the full CLUE file as an ASCII attachment to your enxt reply.

Volker.
Mohd Zahazan Zainon
New Member

Re: CPU Spin wait CPU Spinwait timer expired

I have contacted HP support and they will send a representative for this solution
Andy Bustamante
Honored Contributor

Re: CPU Spin wait CPU Spinwait timer expired

Looking at Current image file: DSA119:[CERNER.W_STANDARD.WH2002_02.][VMSALPHA]SRV_DRVR.EXE

HP may recommend you contact Cerner to review their product. Open a parallel case wither Cerner and get started on that. There are sysgen parameters to configure this timer, some third party vendors will recommend increasing these values.

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Mohd Zahazan Zainon
New Member

Re: CPU Spin wait CPU Spinwait timer expired

Hi Andy,

Thank you for your response but what are the reason why I need to contact cerner to resolved this issue.

Please do explain more.

Thanks & Regards
Hoff
Honored Contributor

Re: CPU Spin wait CPU Spinwait timer expired

Andy is suggesting the "shotgun" approach toward support. Specifically, opening a third front in your effort, with a call into Cerner. (Third? ITRC, HP, and now Cerner.)
Hein van den Heuvel
Honored Contributor

Re: CPU Spin wait CPU Spinwait timer expired

>> Thank you for your response but what are the reason why I need to contact cerner to resolved this issue.

Because most folks do not get this crash and most folks do not run cerner apps. Ergo... :-).

I'd focus on it being the lockmanager spinlock.
Open up your T4 performance data repository and poke at all lockmanager stats, notably in the LCK73 group, and the mock tree migration.
Note, the LCK73 numbers are only there when using a dedicated lock manager. Do you? Should you?

Next investigation quit possibly should be the spinlock trace, notably at times in the day where T4 suggested potential high lock manager activity. Just use @SYS$EXAMPLES:SPL.COM to get going.

Does the system use some lock scan tools which might run GETLKI often and for a long time?

Hope this helps some,
Hein van den Heuvel
HvdH Performance Consulting

Andy Bustamante
Honored Contributor

Re: CPU Spin wait CPU Spinwait timer expired

Hi Hoff,

My own preference would have been to start with Cerner as the database vendor. Assuming (yes I know all about where that leads) that a Cerner application is the primary use of this system, Cerner should have recommendations on tuning the operating system. HP may well come back with a generic "you can changes these settings until the problem goes away" response. Cerner should have a better idea of what the application is doing and have specific recommendations. I made aggressive changes to these knobs when I was with a vendor to avoid this

Nice thing about shotguns, you have more than one round heading to the target(s).

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Volker Halle
Honored Contributor

Re: CPU Spin wait CPU Spinwait timer expired

If you would supply the CPUSPINWAIT Bugcheck: data from the CLUE CRASH output, which includes the state of all the other CPUs and what they were doing at that time and also the CLUE CONFIG output, we could probably tell in much more detail, what the problem was ! In case of a CPUSPINWAIT bugcheck, the current CPU/image is ALWAYS only a victim, NOT the problem !

To reduce further speculation about the problem, please provide this data !

Volker.