Operating System - OpenVMS
1828798 Members
3195 Online
109985 Solutions
New Discussion

Re: SWIS_INCONSTATE bugcheck, vms v8.2

 
SOLVED
Go to solution
Galen Tackett
Valued Contributor

SWIS_INCONSTATE bugcheck, vms v8.2

I got my first OpenVMS V8.2 I64 bugcheck today. Unfortunately our govt. customer has allowed our software support to lapse so I can't submit this problem via official means.

First, this is an rx2600 with unpatched V8.2 and TCP/IP V5.5 also unpatched. For discussion's sake let's call it I64VMS (not its real name.)

At a V7.3-2 Alpha system I said:

---
$ ssh i64vms show sys/noproc
[Usual first-time messages about host key not found]
system's password: ...
authentication successful
disconnected; connection lost (connection closed)
---
The disconnect came immediately and i64vms gave an opcom message about host not in proxy cache. (I checked the proxies on i64 and the incoming host and user SYSTEM did have a proxy.)

$ ssh i64vms show sys/noproc

This went as above until right after the "authentication successful" message. Then I saw that the I64 system was bugchecking.

I can't interpret much from the CLUE listing beyond this:

Current process: TCPIP$SSH_BG983
Current Image: DKA0:[SYS0.SYSCOMMON.][SYSEXE]TCPIP$SSH_SSHD2.EXE
Failing PC: FFFFFFFF.800CA460 SWIS$RAISE_IPL_C+00100
Failing PS: 00000000.00000200
Module: SYSTEM_PRIMITIVES_MIN
Offset: 000BA460
20 REPLIES 20
Volker Halle
Honored Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Galen,

a SWIS_INCONSTATE crash is an inline bugcheck declared in the new SWIS (Software Interrupt Services) code, which performs actions inside the OpenVMS I64 operating system, which have previously been performed by the Alpha firmware.

If this crash happens in conjunction with TCPIP activities, it may be an OpenVMS I64 or a TCPIP problem.

Could you attach the full CLUE file (or mail it to me) ? SDA> SHOW CALL/SUMM will give you a concise listing of the current call stack.

Volker.
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Volker,

Unfortunately I can't easily get a CLUE listing to you except perhaps by fax, due to security restrictions.

I will try SHOW CALL/SUM, though I did notice this in the CLUE output:

"Stack Decoding not available on I64."
Volker Halle
Honored Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Galen,

SDA> SHOW CALL/SUMM should work on OpenVMS V8.2 - both Alpha and I64.

There is also a SDA> SHOW SWIS command, which should display the SWIS log (RING_BUFFER) in the crash (SDA> HELP SHOW SWIS). It might require the system parameter SYSTEM_CHECK = 1 to be set.

Volker.
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Volker,

I am going to set SYSTEM_CHECK = 1 but could you verify that this is the correct setting to enable SWIS?

The V8.2 documentation seems to have overlooked this. SYSTEM_CHECK=1 is
documented (see http://h71000.www7.hp.com/doc/82FINAL/6048/6048pro_096.html)
as being the same as BUGCHECKFATAL=1 which hardly implies anything like SWIS. And the System Analysis Tools manual's description of SHOW SWIS (see http://h71000.www7.hp.com/doc/82FINAL/6549/6549pro_025.html#command_77) says nothing about SYSTEM_CHECK.
Ian Miller.
Honored Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

I thought SWIS logging was enabled by VMSD2
bit 0 =1 enables SWIS logging
bit 1 =1 disables logging of clock interrupts

____________________
Purely Personal Opinion
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Ian,

Your memory chimes with my notes from the Itanium Developers Forum back in September, which I hadn't found at first.

So I've set VMSD2 and will see if I can get a crash again tomorrow.
Ian Miller.
Honored Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Check data is being collected with SHOW SWIS.
Setting SYSTEM_CHECK may also help.
____________________
Purely Personal Opinion
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

I've attached the output from SHOW SWIS/RING and SHOW CALL/SUM. I had to retype it all so I've shortened it up. I hope I haven't left out any of the important fields.

Let me know if there's more that you want to see.
Volker Halle
Honored Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Galen,

to me it looks like SYSFTDRIVER wants to raise IPL and may be using an incorrect IPL parameter value.

The crash happens in process context, kernel mode at IPL 2, so the current process is certainly involved, this also matches your description of what you have been doing to cause this crash.

Whether the error is in TCPIP SSH or in FTDRIVER or due to some memory/pool corruption cannot be diagnosed from the available data.

In the SWIS RING buffer lines, there are also columns called Data1 and Data2. In the IPL-related lines (Ident including the string 'IPL'), what are the values of Data1 and Data2 (these may be the current and new IPL) ?

Volker.
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Attached is a revised version of the original listing.

Also, for immediate reference here's the whole SWIS info with formatting squished out courtesy of the ITRC:

SDA>SHOW SWIS/RING (output edited for brevity)

Ident Symbolized value 'a' Symbolized value 'b'&'c'
-------- ------------------------- --------------------------
SWPCXout EXCEPTION+12E000
SWPCTXin FRED
3 1F RaisIPL EXCEPTION+1B5E0
2 3 RaisIPL EXCEPTION+1AF60
ExcpDisp Bugcheck Breakpoint Trap SWIS$RAIS_IPL_C+100
2 208 RSetIPL SYS$FTDRIVER+17920
2 2 NCSetIPL EXE_STD$ALLOCBUF_C+160
0 2 RSetIPL PTD$SERVICES_SHR+12C20
EntKSrvc TCPIP$SSH_SSHD2+630E3
SSSwRet TCPIP$SSH_SSHD2+6BC73
RetKSrvc %SYSTEM-S-NORMAL
EntKSRVC SYS$$SETIMR_C+1E0 TCPIP$SSH_SSHD2+6BC73
SSSwRet TCPIP$SSH_SSHD2+6BC33
RetKSrvc %SYSTEM-S-NORMAL
ASTRET
8 2 LSetIPL EXE$ALL_ASTS_DONE_C+9A0
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Did SYS$FTDRIVER try to raised the IPL to 208?? Or are the upper bits of that value used for something else besides the IPL?
Volker Halle
Honored Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Galen,


Did SYS$FTDRIVER try to raised the IPL to 208?? Or are the upper bits of that value used for something else besides the IPL?


I don't know - don't have access to the I64 sources or any SWIS specs - but it certainly looks VERY SUSPICIOUS and a good reason for a SWIS_INCONSTATE crash...

All the other examples I've seen so far (in the SDA manual), seem to just use the IPL value as parameters to any xxxIPL SWIS calls.

Now the next question is, where did SYS$FTDRIVER pick up that 'bad' IPL value from ?

Volker.
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Volker,


Now the next question is, where did SYS$FTDRIVER pick up that 'bad' IPL value from ?


I'll play with SDA and see if I can tell anything, but being unfamiliar with I64 internals I may not glean much.
Volker Halle
Honored Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Galen,

just a little program with a kernel-mode routine doing:

MTPR #^x208,#PR$_IPL

will crash with INVEXCEPTN on an E8.2 rx2600, but the SWIS ring buffer will show:

99D58779 7FF43BD0 00000042a FFFFFFFF.800BA2A0b 00 ExcpDisp Illegal instruction trap SWIS$RAISE_IPL_C+00220
99D5827E 00000000 00000208 00000000.000201C0a 00 RSetIPL SYS$K_VERSION_08+001A0

so our assumptions regarding the parameters logged in the data 1 (current IPL) and data 2 (IPL to be set) column seem to be correct.

Volker.
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Volker,

So far I can't understand enough of the I64 code to know where the 208 came from. I did see it in one of the registers (R32? R35?) displayed by SHOW CPU but maybe that tells us nothing.

The SHOW CALL/SUM shows that SYS$FTDRIVER has called SYS$PAL_MTPL_IPL_C. How are the arguments passed to this routine in I64-land?

If you can you offer any clues on this or anything else I might look for, perhaps I can discover something.

Galen
Volker Halle
Honored Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Galen,

you can try to look at the registers for the call frame to SYS$FTDRIVER

SDA> READ SYSDEF
SDA> SHOW CALL 7FF2E198

then try a SDA> FORMAT value for all registers, which seem to point to data-structures in nonpaged pool (not those which point to system routine names).

Look for 0208 in those data structures.

The SWIS_INCONSTATE bugcheck code did not seem to exist in the E8.2 FT code of routine SWIS$RAISE_IPL (it's causing an INVEXCEPTN on E8.2, if I feed it an IPL of 0x0208).

Volker.
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Volker,

I did as you suggested. Here's the little bit that I learned after examing the output from SHOW CALL 7FF2E198:

It shows R3 pointing to an FTRD structure. I figured out that this is a SYS$FTDRIVER Read Data structure and saw that it was associated with TCPIP$SSH_BG694. I take it that's the process that would receive the AST associated with this FTRD. I didn't see the value 208 anywhere in the FTRD.

R4 points to the PCB of TCPIP$SSH_BG694 where I also don't see a 208.

R5 points to the UCB of FTA14. No 208 seen, but it might be possible to miss it in such a long structure. I'll do a SEARCH on the output and reply again if it turns out I did miss it, otherwise assume I didn't miss it.

R44 points to the same FTRD as R3.

The remaining registers did not contain values that could have pointed to nonpaged pool.

Let me know if you'd like to see the data from any of those structure, or if there's anything else you'd like to look at.

Galen
Volker Halle
Honored Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Galen,

once I've upgraded our rx2600 to V8.2, I'll try to reproduce this. With the current information, I'm assuming some pool corruption problem causing an invalid IPL to be picked up from a datastructure in nonpaged pool.

Volker.
Volker Halle
Honored Contributor
Solution

Re: SWIS_INCONSTATE bugcheck, vms v8.2

Galen,

looks like this type of problem has been solved in VMS82I_SYS-V0100

5.2.12 SWIS_INCONSTATE Bugcheck

5.2.12.1 Problem Description:

A system can crash with a SWIS_INCONSTATE bugcheck. The most common scenario is that the system has the SSH server enabled and the crash occurs when an SSH connection is made. However, the crash might occur at other times as well. In all cases, R32 contains a value that is greater than 32 (or negative), and the crash happens in either SWIS$LOWER_IPL_INT or SWIS$RAISE_IPL_INT.

Volker.
Galen Tackett
Valued Contributor

Re: SWIS_INCONSTATE bugcheck, vms v8.2

.