Operating System - OpenVMS
1753834 Members
7639 Online
108806 Solutions
New Discussion юеВ

Re: CPUSPINWAIT, CPU spinwait timer expired

 
Gregory Githens
Occasional Advisor

CPUSPINWAIT, CPU spinwait timer expired

Yesterday morning I applied the following patches to our OpenVMS 7.3-2 system;
DEC AXPVMS VMS732_GRAPHICS V5.0
DEC AXPVMS VMS732_GRAPHICS V4.0
DEC AXPVMS VMS732_FIBRE_SCSI V11.0
DEC AXPVMS VMS732_DCL V8.0
DEC AXPVMS VMS732_SYS V13.0
DEC AXPVMS VMS732_RMS V4.0
DEC AXPVMS VMS732_AUDSRV V4.0
DEC AXPVMS TCPIP_ECO V5.4-156
I applied them in order from the bottom up in the early morning hours. At around 1:30 pm our system crashed with CPUSPINWAIT, CPU spinwait timer expired. I am attaching the output of ANALYZE/CRASH_DUMP and the clue file.

I would greatly appreciate any help in figuring out what happened and what I can do to prevent this in the future.

Could this be from the patches applied or is that just a coincidence?

As a little bit of background our system usually has 100-200 users logged in using ssh2 via public key authentication. The user that it shows in the clue file and ANALYZE/CRASH_DUMP is an unprivileged user and the executable DEBTOR.EXE is a custom app compiled from Basic that we have been running for years and years without this problem.

Thanks for the help.
Greg Githens
17 REPLIES 17
Gregory Githens
Occasional Advisor

Re: CPUSPINWAIT, CPU spinwait timer expired

Here is the clue file.
Volker Halle
Honored Contributor

Re: CPUSPINWAIT, CPU spinwait timer expired

Gregory,

CPU 01 incurred a HALT instruction in kernel mode. The HALT PC reported is 8044F1FC
CPU 0 tried to send an interprocessor interrupt to CPU 01, but the operation timed out, so CPU 0 took down the system with a CPUSPINWAIT crash.

Your AUTO_ACTION console environment variable is most likely NOT set to RESTART but to HALT. Otherwise you would have gotten a HALT restart bugcheck.

The problem is caused by whatever code was executing on CPU 1.

SDA> EXA/INS 8044F1FC

if this shows a HALT instruction, continue with:

SDA> EXA/INS 8044F1FC-20;30

Consider setting AUTO_ACTION RESTART and try to capture the console output, so you have some more data, if this problem happens again.

Also try SDA> CLUE ERRLOG to check, if there were errors reported immediately preceeding the crash.

Volker.
Duncan Morris
Honored Contributor

Re: CPUSPINWAIT, CPU spinwait timer expired

Gregory,

just an aside, your console firmware version is very old (6.6 - currrent = 7.3).

Your KGPSA and Gig-E adapters would probably benefit from the console update as well! See the release notes for the DS20 firmware.

The firmware page is here....

ftp://ftp.digital.com/pub/DEC/Alpha/firmware/index.html

Regards,

Duncan
Dean McGorrill
Valued Contributor

Re: CPUSPINWAIT, CPU spinwait timer expired

Hi Greg,
Volker's right on, I seem remember you get a cpuspinlock timeout crash if its not the primary cpu that issues a halt. I'd doubt your basic app would be at fault unless your doing something tricky, ie stack
swapping. curious what you find.
Gregory Githens
Occasional Advisor

Re: CPUSPINWAIT, CPU spinwait timer expired

Volker,
Thanks for the help. I ran the commands your talking about but I really didn't understand the output. I am attaching the output.

Also we have a dumb terminal connected to the console, how can I capture the output? I was thinking of maybe setting up a pc with a termial emulation program to capture the output but I am kind of leary to do that in case the pc hangs.

Any further assistance or info you can give would be greatly appreciated.


Duncan,
Thanks for the information about the console firmware version. I will look into updating it.
Volker Halle
Honored Contributor

Re: CPUSPINWAIT, CPU spinwait timer expired

Greg,

the instruction stream is inside RMS, but there is no HALT instruction in that instruction stream. The HALT PS = 0000000A is somehow consistent with being in RMS (current mode = EXEC).

When the crash footprint is not explainable, one starts to think about possible HW (CPU) problems. CPU 1 is an older EV6 Pass 2.3 module.

Consider connecting a PC with a terminal emulator and enable session logging to capture the console output. Make a copy of the crash (SDA> COPY dev:filename) for further reference.

Volker.
Dean McGorrill
Valued Contributor

Re: CPUSPINWAIT, CPU spinwait timer expired

Unless I missed it as (rusty), what is the spinlock and who owned it? you can get that from sda show spinlock/full on the dump. It might help. We, (decnet+) used get a few iolock8 spinwait timeout crashs until we lightened our heavy handed use of it.
Gregory Githens
Occasional Advisor

Re: CPUSPINWAIT, CPU spinwait timer expired

Volker,
Thanks for the info about the older cpu. My hardware supplier thinks there may be an issue with having a pass 2.3 cpu and pass 2.5 cpu on the same system so we are going to look upgrading the 2.3 one.

Dean,
I am attaching the output of the command you requested.

Thanks,
Greg Githens
Gregory Githens
Occasional Advisor

Re: CPUSPINWAIT, CPU spinwait timer expired

Ooops, I didn't notice the Press return for more. Attached is the full output.