Operating System - OpenVMS
1753559 Members
5939 Online
108796 Solutions
New Discussion юеВ

Re: OPCOM cannot be stopped - KILL needed?

 
Paul Jerrom
Valued Contributor

OPCOM cannot be stopped - KILL needed?

Howdy all,

IA64 cluster of 2xRX2620s, running VMS V8.3. I haven't found out why yet, but OPCOM is running in a tight CPU loop. I cannot STOP/ID or STOP/ID/EXIT= or even kill it using a bit of macro that does a $forcex. There are no reads outstanding or IOs being clocked; the process is not reading its mailbox (so I've had to write a DCL routine to clear it out, otherwise other processes trying to communicate with OPCOM get a mailbox full error).
I HAVE managed to set the priority down to 0!!
Anyone know how I can kill this process? [Short of running OPCCRASH - I have a steel works attached to this cluster so really don't want to shutdown if I can help it, and next scheduled downtime is a week or so away!!]

Thanks,
PJ
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
17 REPLIES 17
John Gillings
Honored Contributor

Re: OPCOM cannot be stopped - KILL needed?

PJ,

Can you see what it's doing? Or even what it thinks it's doing? If STOP/ID doesn't help, the process is most likely in an inner mode, or at AST level (which is blocking the $FORCEX AST).

Does SET PROCESS/SUSPEND help? Otherwise, take some CPU samples and examine the instruction streams (though that's not exactly easy on an integrity). If you're really desperate, you may be able to find something in memory you can change to break out of the CPU loop, otherwise it's reboot time!

On the other hand, if you can SUSPEND the process, or can tolerate it running at priority 0, you may be able to start up another OPCOM process to service the mailbox (that will probably take a manual RUN command to change the process name, and it depends on what, if any, exclusive resources OPCOM is holding).
A crucible of informative mistakes
Paul Jerrom
Valued Contributor

Re: OPCOM cannot be stopped - KILL needed?

Hi John,

No, cannot suspend process, and if I try to create another OPCOM manually it stack dumps with a 'device allocated to another user' error.

Ho hum.
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Volker Halle
Honored Contributor

Re: OPCOM cannot be stopped - KILL needed?

Paul,

consider to elevate this problem to HP. As far as I remember, there still might be a problem causing an OPCOM loop and OpenVMS engineering is/was working on that last time I've heard.

You can easily obtain PC samples with the PCS$SDA extension:

$ ANAL/SYS
SDA> PCS ! for help
SDA> PCS LOAD
SDA> PCS START TRACE/PID=
...
SDA> PCS STOP TRACE
SDA> PCS SHOW TRACE
SDA> PCS UNLOAD

If you can't stop OPCOM, the loop must be in the image/process rundown code in the operating system itself and may therefore also possible affect other processes ...

Are you up to the current patch level ?

Volker.
Paul Jerrom
Valued Contributor

Re: OPCOM cannot be stopped - KILL needed?

Howdy Volker,

As far as I am aware I am up to date, but will check...

Attached is PCS log, will attempt to log a call tomorrow (it's been too long a day to struggle with logging a support call now...).

Cheers,

PJ
Have fun,

Peejay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it can't be done with a VT220, who needs it?
Volker Halle
Honored Contributor

Re: OPCOM cannot be stopped - KILL needed?

Paul,

looping in LIBRTL !

This is the instruction reported most of the time in your PCS trace:

{ .mib
LIBRTL+001C8740:
cmp4.lt p6, p0 = r8, r0
mov r1 = r51
(p6) br.cond.dptk.few 1FFFFE0 ;;
}

So it 'looks' like a branch !!!

SDA> SET PROC OPCOM
SDA> SHOW CALL/SUMM

would report the call stack.

As far as I remember, this matches the symptoms engineering is/was working on...

Volker.
Volker Halle
Honored Contributor

Re: OPCOM cannot be stopped - KILL needed?

Paul,

did you try STOP/ID=.../EXIT=mode ?

Start with USER, then SUPER, then EXEC, then KERNEL.

Volker.
Hoff
Honored Contributor

Re: OPCOM cannot be stopped - KILL needed?

Use SDA, and take a look at the loop.

There are kernel-mode tools around which allow clearing the NODELET flag, after which the process can be nuked.

nb: I'm not where I can check an existing OpenVMS OPCOM process PCB right now, to see if this PCB$V_NODELET flag is set for this process.

If the bit _is_ set, here's an example Really Big Hammer for this task:

http://mvb.saic.com/freeware/vmslt00b/vu/stop-i-mean-it-src.txt

This is kernel-mode code and it writes to the process PCB, with all the risks inherent.

Personally, I'd tend to let this process mimic the null process for a week or two, assuming this is a production server and it can be held together, pending a reboot or input from HP. If you need to use the RBH approach, I'd first test it on an OpenVMS I64 box off to the side.

Stephen Hoffman
HoffmanLabs LLC

Volker Halle
Honored Contributor

Re: OPCOM cannot be stopped - KILL needed?

OPCOM does not have the NODELET bit set.

Process index: 0011 Name: OPCOM Extended PID: 22000411
--------------------------------------------------------------------
Process status: 00140001 RES,PHDRES,LOGIN
status2: 00000111 QUANTUM_RESCHED,TCB

Volker.
Dean McGorrill
Valued Contributor

Re: OPCOM cannot be stopped - KILL needed?

tx for the hammer pointer hoff,

someone in here had some code to set quotas,
one could kick down quotas and hope it goes
into rwast. but if its really in a tight
loop that might not work. Dean