Operating System - OpenVMS
1839229 Members
3866 Online
110137 Solutions
New Discussion

Re: Troubleshooting DELPEN process

 
Richard Brodie_1
Honored Contributor

Troubleshooting DELPEN process

I've got an unusual DELETE pending process; it's delete pending but stuck in LEF state, with no channels busy (OpenVMS V8.3/I64). Any thoughts on how to troubleshoot it? It could possibly be related to ICC communication, since it uses those calls.

 

 

21 REPLIES 21
Volker Halle
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

there is PWAIT$SDA, a SDA extension provided by Ian Miller, which helps to diagnose process wait states.

 

http://www.encompasserve.org/~miller/ has the most recent version, but there are also versions on the OpenVMS Freeware CDs.

 

I've also done a DECUS presentation some years ago about Analyzing Process Hangs, which includes a DELPEN example (sorry, it's in German, but you will get the idea ;-)

 

http://www.decus.de/slides/sy2007/19_04/3g03.pdf

 

Volker.

John Gillings
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

   I'm intrested in the possible ICC connection. Do you know what's happening with the ICC association? Could it be clogged up with messages? Is this process a reader, writer, or both? If you trace the stack from SDA are there ICC service frames present?

 

  On "golden rule" in diagnosing DELPEN processes is to NOT attempt to STOP/ID the process again (but usually that advice is only ever given too late!).

A crucible of informative mistakes
Richard Brodie_1
Honored Contributor

Re: Troubleshooting DELPEN process

The program shouldn't have more than one ICC operation at once. It runs a fairly straightforward asynchronous request/response protocol. I couldn't see any sign of ICC operations on the stack, just stuff like:

EXE$EXIT_INT_C+002B0, NEXTASTMODE+0023C, THREAD_SWITCH_KPB_C+000F0, SYS$SYNCH_INT_C

 

This is what PWAIT makes of it:

 

Thread 0: state LEF AST pending SU active (none) blocked (none)
Process has been waiting for 00:00:02.81
Process thread resource wait is ENABLED
Process is marked for deletion
Process exec mode rundown is active
Analyzing process locks
        Process owns no locks
Event Flag Wait Mask EFWM 0000000D Wait Event Flag Cluster WEFC 4
Local Event flags 32-63 C0000000 31-0 C0000000
waiting for event flag 128 (EFN$C_ENF). EFWM is not relevant
process has 53 channels 0 of which are busy

Volker Halle
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

the key information is: 'Process exec mode rundown is active'

 

Is the process completely hung or is it still consuming CPU time ? Use SET TERM/WID=132, ANALYZE/SYS, set context to that process and report the output of SDA> CLUE CALL

 

Volker.

Richard Brodie_1
Honored Contributor

Re: Troubleshooting DELPEN process

SDA> clue call
Not available on I64  -  Use SHOW CALL or SHOW CALL/SUMMARY
SDA> show call
Cannot display call frame (error)
SDA>
SDA> show call/summary
%SDA-W-LOSTPROCESS, cleaning up: SDA's current process no longer exists; context now set to process running SDA

Volker Halle
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

does the process still consume CPU ?

 

Finding the exec mode rundown handlers for this process probably needs a thorough read through the IDSM (Internals and Data Structures Manual).

 

Consider to post the output of SDA> SHOW STACK and/or SDA> SHOW STACK/EXEC or attach a .TXT file with that output.

 

Volker.

 

Richard Brodie_1
Honored Contributor

Re: Troubleshooting DELPEN process

Volker, it doesn't consume CPU when you're not looking at it. When I'm poking around in SDA it clocks a few centiseconds.

 

Stack details attached.

Volker Halle
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

the exec mode run down handler routine addresses should be in the APLD and pointed to by ctl$gl_usrundwn_exec

 

What does SDA> exa @ctl$gl_usrundwn_exec;^d42*4  report (in the context of the hung process) ? If it reports any S0 space addresses, try to do an SDA> EXA/INS @value on each of them. These are the per-process exec mode rundown handlers.

 

Also what does SDA> EXA @EXE$GL_USRUNDWN_EXEC;10 report (system wide exec rundown handler vector) ?

 

Just trying to determine, which exec mode handler might have been called...

 

Good luck,

 

Volker.

Richard Brodie_1
Honored Contributor

Re: Troubleshooting DELPEN process

Volker, I think this is what you were asking for: it would be an S0 address, if sign-extended.

 

SDA> exa @ctl$gl_usrundwn_exec;^d42*4
00000000 00000000 00000000 0030E060  `à0.............     00000000.7FFB7CB0

Zeros suppressed from 00000000.7FFB7CC0 through 00000000.7FFB7D4F

0030E060 8EFA11D0 00000000 00000000  ........Ð.ú.`à0.     00000000.7FFB7D50
SDA> exa/ins @8EFA11D0
                        { .mii
SYS$IPC_SERVICES+67380:             alloc       r43 = ar.pfs, 10, 01, 00
                                    add         r9 = 200F68, r1
                                    add         r12 = 3FF0, r12

SDA> EXA @EXE$GL_USRUNDWN_EXEC;10

Virtual locations 00000000.00000000 through 00000000.0000000F are not accessible

Volker Halle
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

and what is at SDA> EXA/INS 0030E060 - could this be in your image in P0 space ?

 

Volker.

Richard Brodie_1
Honored Contributor

Re: Troubleshooting DELPEN process

It looks like it's in a protected shareable image, from a third party supplier.

Volker Halle
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

now you know whom you could try to ask next...

 

Let's make this troubleshooting example a bit more detailled. PWAIT$SDA could be amended to provide the adresses and symbolization of the exec (and kernel) mode rundown handlers for the given process, if ERDACT is set (as indicated by the PWAIT$SDA message: Process exec mode rundown is active).

 

During exec mode rundown, the SYS$ERNDWN system service (in module [SYS]SYSRUNDWN) invokes the exec mode rundown services in exec mode after setting PCB$V_ERDACT. Note that ERDACT will only be CLEARED at the end of kernel mode rundown, so a problem in either exec mode rundown or kernel mode rundown may exist here !

 

The APLD (Activated Privileged Library Dispatch vector) can be formatted in SDA like this:

 

$ CREATE APLDDEF.MAR

; assemble with $ mac/migr aplddef+sys$library:lib/libr
; link with     $ link/sym aplddef/noexe
        $aplddef GLOBAL
        .end

<CTRL-Z>
$ mac/migr aplddef+sys$library:lib/libr
$ link/noexe/sym aplddef
%ILINK-W-USRTFR, image NL:[].EXE; has no user transfer address

$ ANALYZE/SYS

SDA> SET PROC/ind=<pid-of-hung-process>

SDA> READ APLDDEF

SDA> FORMAT @ctl$a_dispvec/type=apld           ! format the APLD

...

00000000.7FFB7CB0   APLD$PS_EXEC_RUNDOWN_VECTOR               0030E060      <<< from your case... 
00000000.7FFB7CB4                                   00000000
00000000.7FFB7CB8                                   00000000.00000000
       ...                                                 ...
00000000.7FFB7D58   APLD$PS_KERN_RUNDOWN_VECTOR              8EFA11D0      SYS$IPC_SERVICES+67380

00000000.7FFB7D5C                                   0030E060                                                              <<< from your case...
00000000.7FFB7D60                                   00000000.00000000
       ...                                                 ...

 

This indicates, that there is an exec AND kernel mode rundown handler within the P0 address space of your image. Use SDA> SHOW PROC/IMAGE to find out, in which image or library that address resides.

 

If you can find out, in which mode the processes is hanging, this could provide further evidence as to which rundown handler may be executing. Try

 

SDA> SHOW EXCEPTION

 

Volker.

Richard Brodie_1
Honored Contributor

Re: Troubleshooting DELPEN process

Morning, Volker, and thanks for your help so far,

 

So, it has the same routine as both an exec and a kernel mode rundown handler, plus SYS$IPC_SERVICES+67380 (which I presume is ICC related). 

 

SDA> show exc

Exception Frame Summary
-----------------------

 Exception Frame    Type              Stack          IIP / Ret_Addr     Trap_Type / Service_Number
-----------------   ----              -----         -----------------   --------------------------
00000000.7FF43B30   SSENTRY           Kernel        FFFFFFFF.806016B0   0100015D  SYS$SYNCH_INT
00000000.7FF43D50   SSENTRY           Kernel        FFFFFFFF.80B6A370   01000028  SYS$DELPRC
00000000.7FF43F40   SSENTRY           Kernel        FFFFFFFF.80B68BE0   0100018E  SYS$EXIT_INT
%SDA-W-NOREAD, unable to access location 00000000.7FF43FE0
00000000.7FF67F40   SSENTRY           Executive     FFFFFFFF.805D4110   01000197  SYS$HIBER_INT

Volker Halle
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

try SDA> CLUE REGISTER  ! trying to determine the current execution context...

 

Try SDA> SHOW EXCEPTION 7FF43B30 ! to display the register values

 

The exception frames can be located o.k. on the current kernel stack.

 

If you somehow have to reboot the system, please force a crash to document this process state for later analysis.

 

Volker.

Richard J Maher
Trusted Contributor

Re: Troubleshooting DELPEN process

Is it not time for a support call to your third-party UWSS vendor? Or look at the source code if you have it?

 

Maybe their rundown handler is doing a send/receive/transceive in a rundown handler that can be completed somehow avoiding a reboot.

Richard Brodie_1
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

I could try that: they do provide good support but they aren't really kernel gurus.

 

Also, the ICC calls are from my code, so if the problem is in there, it's not their responsibility anyway. I presume the system hooks its own rundown handler for that: it's not my doing.

 

 

Richard Brodie_1
Honored Contributor

Re: Troubleshooting DELPEN process

Unfortunately, it doesn't seem to want to give up registers on the live system:

 

SDA> set proc/in=177
SDA> clue register
SDA> clue register

 

System Service Entry Frame at 00000000.7FF43B30
-----------------------------------------------

    IPL              =                 00
    SERVICE_NUMBER   =           0100015D      SYS$SYNCH_INT
    RET_ADDR         =  FFFFFFFF.806016B0      EXE_STD$SYNCH_LOOP_C+001A0

    PREVSTACK        =                 00
    BSP              =  00000000.7FF2E7C8
    BSPSTORE         =  00000000.7FF2E6E8
    BSPBASE          =  00000000.00000000
    RNAT             =  00000000.00000000

    RSC              =  00000000.00000000      LOADRS   BE   PL   MODE
                                               0000     0    0    Enforced lazy

    PFS              =  00000000.00000E24      PPL    PEC    RRB.PR   RRB.FR   RRB.GR     SOR       SOL           SOF
                                               0       0.       0.       0.       0.       0.    28. (32-59)   36. (32-67)

    FLAGS            =                 00
    STKALIGN         =           00000160

    PPREVMODE        =                 00

Volker Halle
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

the $ICC Kernel-Mode Rundown Routine (ICC_KMODE_RUNDWN in module [IPC]ICC_SUBS) will try to disconnect all connections and close all associations and clean up the ICC data structures for the current process/image. If successful, it will clear the process entry in the ICCPDB_VECTOR.

 

SDA> EXAM @ICC$GL_ICC_PDB_VECTOR;4*(@SGN$GW_MAXPRCCT&ffff)

 

There should one 1 entry (longword) for each process on the system, indexed by PID. If you find a non-zero entry for the current process, it will indicate, that ICC has not (yet) been completely run down for this process.

 

Volker.

Richard Brodie_1
Honored Contributor

Re: Troubleshooting DELPEN process

Volker,

 

Yes, there is still an entry in the ICCPDB_VECTOR.

Volker Halle
Honored Contributor

Re: Troubleshooting DELPEN process

Richard,

 

this indicates, that either the 3rd party rundown routine is blocking the process or that the ICC rundown routine is the culprit. If you can find out from your 3rd party supplier, what they do in their rundown routine and check in the running system, whether those steps have been performed, you can pretty much pinpoint this problem. Only HP should be able to help you with ICC/IPC.

 

Does the IPC$SDA (SDA> IPC) extension help to look at some of the data structures ? Probably not, but an ICC$SDA extension is supposed to be available in V8.4 !

 

Volker. 

Richard J Maher
Trusted Contributor

Re: Troubleshooting DELPEN process

Hi Richard,

 

But they do know what their EXEC rundown handler does right? (Trying to output with RMS is always a good one :-)

 

I don't imagine their rundown handler to be thousands of MACRO-32 lines long? (Or perhaps it's not linked /PROTECT and they're calling out?)

 

Maybe it is nothing to do with their UWSS and something peculiar to $ICC that is only seen in your configuration, but it would be unusual.

 

Cheers Richard