Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

LOGOUT puts 2 processes in infinite loops

 
Jess Goodman
Esteemed Contributor

LOGOUT puts 2 processes in infinite loops

Friday I logged off all my telnet sessions to VMS and turned off my computer. Today I noticed that two of my processes on two different Alphas, (V7.3-2 and V6.2-1H3) are in kernel-mode infinite loops with the DELPEN process bit set.

- CPU constantly accumulating in kernel mode
- No I/Os going on
- Can't suspend them or change base priority
- no images active
- no locks taken out
- no timers
- only channel 0010 shows (system disk no file)
- CLUE PROCESS/RECALL says last command is LOG
- TNA terminal devices do not exist
- no sockets associates with processes

Attached is output from PCS trace.

Any possibilities other than rebooting both systems?
I have one, but it's personal.
19 REPLIES
Jess Goodman
Esteemed Contributor

Re: LOGOUT puts 2 processes in infinite loops

Sorry, here's the attachment
I have one, but it's personal.
Hoff
Honored Contributor

Re: LOGOUT puts 2 processes in infinite loops

The attachment didn't unpack quite right locally, though I'm not working on an OpenVMS box right now and it may well be a VFC-ism.

Take a look at the process quotas. (You may have to FORMAT the PCB and JIB structures directly, if SDA won't let you at the pieces through its normal commands.) Look specifically at the I/O counts in the PCBs and over in the JIB. See if an I/O is missing -- a lost I/O somewhere, or some lost quota.

If this is a lost quota count, you can choose to bump the quotas to fake the rundown into completion (patching from the console, or via some kernel-mode hackery), or you can use this opportunity to reboot to apply the ECOs and then reboot.

Where the I/O went -- if this is a lost I/O -- is a whole 'nother discussion.

As for ECOs to apply, the usual all-mandatory and current UPDATE kits, current IP kit, and toss in any kernel-mode code you may be using on these two systems. For grins, check for ECOs to the disk drivers and any device firmware.


John Gillings
Honored Contributor

Re: LOGOUT puts 2 processes in infinite loops

Jess,

Another instance of a system that needs a reboot to recover? :-(

You may need to further symbolize the PC samples. In particular the TCPIP$INTERNET_SERVICES calls.

Guessing, the trace looks like you're stuck walking the open channels list, trying to deallocate devices. For some reason one or more devices haven't been removed. Check the sources for IOC$SCAN_IODB and see what structure it scans. Examine it from SDA to try to determine why there's junk left on it.

If (when?) you reboot, make sure you force a crash and submit the dump to HP for analysis.
A crucible of informative mistakes
Jess Goodman
Esteemed Contributor

Re: LOGOUT puts 2 processes in infinite loops

Thanks for the responses so far. Here is SDA>SHOW PROCESS from the VMS 7.3-2 (fully patched) system:

Direct I/O count/limit 189/199
Buffered I/O count/limit 209/199
BUFIO byte count/limit 512000/512000
ASTs remaining 199/199
Timer entries remaining 199/199

Process index: 03F0 Name: _TNA213: Extended PID: 41A7BFF0

Process active channels
Channel CCB Window Status Device accessed
0010 7FF60000 00000000 AX38$DKA0:
Total number of open channels : 1.

So subtracting count from limit VMS believes that there are 10 active DIOs and -10 active BIOs. The Availability Manager process quota display is DIO: 10/199, BIO: -10/199.

But there are no active I/O channels. No job quotas are in use other than a small amount of page file.

On the V6.2-1H3 system it is much the same except only 1 DIO in use and only(!) -1 BIO in use.

Is there some kernel mode code hack I can use to zero the active I/O counters?
I have one, but it's personal.
Dean McGorrill
Valued Contributor

Re: LOGOUT puts 2 processes in infinite loops

hi Jess,
by the look, a guess, it looks like its trying to tear down the conx. interesting we see a call to LAN$COMPLETE_XMT_CSMACD_C which might imply
it doesn't think the conx is closed. May I
ask, what did you use to connect to the vms
boxes? Dean
Jess Goodman
Esteemed Contributor

Re: LOGOUT puts 2 processes in infinite loops

I use PowerTerm to connect via TELNET from my Windows 2000Pro system.
I have one, but it's personal.
Dean McGorrill
Valued Contributor

Re: LOGOUT puts 2 processes in infinite loops

Ok, well you issued a logout and didn't just
abort powerterm. I use powerterm to daily, no problems. from your trace, I see the process attempting to dealloc a device, aquiring and releasing spinlocks. probably really just the iolock8 spinlock. curious how much smp time, from a monitor modes. At the very least, gather as much info as you can, and as suggested get
a crashdump of the system for HP to look
at. those pc snapshots should help them
see where the code is traversing. Dean
Hoff
Honored Contributor

Re: LOGOUT puts 2 processes in infinite loops

>>>But there are no active I/O channels. No job quotas are in use other than a small amount of page file.<<<

And therein lies the problem.

Assuming you have current TCP/IP Services ECOs (and assuming the IP stack is the HP stack) and you have a support contract, force a crash and send it along to HP.

Something definitely looks to be leaking.

>>>Is there some kernel mode code hack I can use to zero the active I/O counters?<<<

I usually roll a specific case, and a little code behind a $cmkrnl call. There's a variation of what you need to do here posted over in a process deletion thread; I tossed a pointer to some kernel-mode code that clears the NODELET bit in the PCB somewhere here in ITRC within the last couple of days. Try a google search with the site:forums1.itrc.hp.com keyword. Look for that, and NODELET and RBH, or such.

I also prefer to test such code somewhere else-node, before setting it loose on a production server. If you're fast and cluster timers are set tolerant and you're on a "continue-able" system, you can halt and bomb core and continue from the console prompt.

Patching the value doesn't mean this problem goes away; if that I/O is eventually "found" or if extra and otherwise lost I/O is built up somewhere, a Bad Thing could happen to the process(es) involved, or to OpenVMS itself.
Jess Goodman
Esteemed Contributor

Re: LOGOUT puts 2 processes in infinite loops

Ok, thanks again for the replies. Since I couldn't find any code to help I wrote my own kernel mode program that clears the DELPEN bit of the PCB$L_STS longword of a given process. I have attached it here in case anyone else might find it useful.

After running this program on my systems I could then use SHOW PROCESS commands for the problem processes and eventually I did a SET PROCESS/SUSPEND=KERNEL, which worked; so the processes are no longer burning CPU cycles.

My program should work on VAX/ALPHA VMS 6.2 and above and I would guess only a minor change is necessary for Integrity.
I have one, but it's personal.
Jess Goodman
Esteemed Contributor

Re: LOGOUT puts 2 processes in infinite loops

Sorry, accidentally posted old version of code. Here's the new version, although both work.
I have one, but it's personal.
Jur van der Burg
Respected Contributor

Re: LOGOUT puts 2 processes in infinite loops

Watch out with this code. It references memory at ipl 8 without locking it, and it never unlocks the spinlock in the error path! In other words, it could crash your system.

Besides that, you really have to know what you're doing when clearing this bit in the pcb. There might be a good reason that it's disabled.

Jur.
Jess Goodman
Esteemed Contributor

Re: LOGOUT puts 2 processes in infinite loops

Ok, thaks for the tip. A new version of my code is attached. I don't normally bother with locking down one page of high IPL code (I like working without a net. What are the odds?) But since I did release this code I have been a good boy and followed the rules in this latest version. I also fixed the problem in my error return path.

The bit I'm clearing is the process delete-pending bit. The reason the bit is set is because the process was supposed to be killed via LOGOUT, STOP/ID, etc. But if you're running this program that obviously didn't happen for some reason. Clearing the DELPEN bit lets you get some control of the process back.

Of course like any kernel mode hack a crash is possible, but since in most cases when you might want to use this program the alternative is a reboot (or as recommended above, forcing a crash dump) that may not really be a downside.
I have one, but it's personal.
Dean McGorrill
Valued Contributor

Re: LOGOUT puts 2 processes in infinite loops

jess,
maybe you need the iolock8 spinlock.
anyway keep the versions coming, I keep
grabbing them, I like tools like that.
at least you have the processes quiet.
actually if you got that process to
crash the system, the footprint might be
useful for the tcpip folks. Dean
Jess Goodman
Esteemed Contributor

Re: LOGOUT puts 2 processes in infinite loops

Since you asked...here comes another version.

Minor fix so the locked-down code doesn't reference an unlocked longword of data.
I have one, but it's personal.
Jur van der Burg
Respected Contributor

Re: LOGOUT puts 2 processes in infinite loops

That's better. You need sched to synchronize, and not iolock8 as mentioned before so your code is correct.

One small nit: the conditionals do '.IF DF ALPHA' meaning that if you ever want to run this on Itanium the VAX path will be chosen. It's better to distinguish between 'VAX' and 'Others', like .IF DF VAX because Alpha and Itanium are (most of the times) the same.

Jur.
Jess Goodman
Esteemed Contributor

Re: LOGOUT puts 2 processes in infinite loops

Yes, I know that that usually works. But since in this case I did not know if any other changes were necessary for Itanium (thought there might be differences in how pages are locked) I decided not to do it so that it didn't look like I was saying that my code was Itanium ready.

Can I just reverse the R31 test to NE to define VAX, use .IF NDF VAX and the code will work on all 3 architectures?
I have one, but it's personal.
Dean McGorrill
Valued Contributor

Re: LOGOUT puts 2 processes in infinite loops

Jess,
is this reproducable, have you
reconnected and logged out again? I tried
reproducing here, aborting powerterm etc. no problems.
Jess Goodman
Esteemed Contributor

Re: LOGOUT puts 2 processes in infinite loops

I've been using PowerTerm for years and I think this problem only happened once before. What was really weird is that it happened to two of my sessions at once. I probably had 8 going when I logged them all out.
I have one, but it's personal.
Dean McGorrill
Valued Contributor

Re: LOGOUT puts 2 processes in infinite loops

interesting, maybe try killing 8
again, I will try it. is your system
a single cpu? It would be nice to
be able to reproduce. I worked with the
guys in tcpip, and know they'd like
to see and fix this..