1752272 Members
5533 Online
108786 Solutions
New Discussion юеВ

Madgoat Watcher

 
SOLVED
Go to solution
James T Horn
Frequent Advisor

Madgoat Watcher

Has anyone seen any strange behavior on running Watcher V3.2-1 on I64 platform?
We have run Watcher for years with no issues, until recently. When we moved from AXP to I64, on our development system, it acted as it ignored the config file that was setup, but our production system worked fine. Then recently the production system started ignoring configuration file setup.
17 REPLIES 17
Craig A Berry
Honored Contributor

Re: Madgoat Watcher

Have you enabled tracing to see if it gives you any clues about what's going on? Does the control program show you the rules that you think you've entered in the configuration file, e.g.,

$ mcr watcher_dir:wcp show watch

?

Note that WATCHER_STARTUP.COM does a RUN/DETACHED with a number of quotas explicitly set. The values that were appropriate on Alpha might well be inappropriate on Itanium.

I've only dabbled with Watcher but haven't rolled it out because I couldn't get the following very simple rule working. Basically this should allow everyone to stay logged in during the day but kick them out at night:

WATCH *
EXCLUDE */DURING=(PRIMARY:6-20,SECONDARY:6-20)

After enabling tracing and reading the code I was pretty sure that UIC wildcarding doesn't work for the EXCLUDE directive (and probably never has). But I don't know BLISS so I'm not at all sure I've understood the code correctly.

But in answer to your question, no, I have not seen a general inability to read the configuration file.
James T Horn
Frequent Advisor

Re: Madgoat Watcher

I've "SET DEBUG=31", and looking at the log it sets up my exclusions:

Example:
EXCLUDE HORN/DURING=(PRIMARY:0-23,SECONDARY=8-17)

Here I believe it is checking to see if HORN is excluded and finds the exclude record:
( 8-NOV-2010 13:58:04.15) Process: PID=000886BF, user=HORN, term=ERP01$FTA397:, accpor=
( 8-NOV-2010 13:58:04.15) -- Searching exclude list...
( 8-NOV-2010 13:58:04.15) -- Username: exclude=SYSTEM, process=HORN
( 8-NOV-2010 13:58:04.15) -- Username: exclude=FIELD, process=HORN
( 8-NOV-2010 13:58:04.18) -- Username: exclude=HORN, process=HORN
( 8-NOV-2010 13:58:04.18) -- UIC: exclude=[0,0], process=[2,5]
( 8-NOV-2010 13:58:04.24) -- Exhausted list: no match.
( 8-NOV-2010 13:58:04.24) -- Process found on count list
( 8-NOV-2010 13:58:04.24) -- Searching override list...
( 8-NOV-2010 13:58:04.25) -- Username: exclude=SAMINFO, process=HORN
( 8-NOV-2010 13:58:04.25) -- Username: exclude=NGL_SAMINFO, process=HORN
( 8-NOV-2010 13:58:04.25) -- Exhausted list: no match.
( 8-NOV-2010 13:58:04.25) -- Queueing count record for checking



I believe from this HORN has been idle for 54 minutes, and will be terminated.

( 8-NOV-2010 13:58:08.71) Check for warn/force: PID=000886BF, user=HORN
( 8-NOV-2010 13:58:08.71) -- Force check: Is 00:54:55.68 GTR 01:10:00.00?
( 8-NOV-2010 13:58:08.71) -- Warn check: Is 00:54:55.68 GTR 01:00:00.00?
Jan van den Ende
Honored Contributor

Re: Madgoat Watcher

James,

from your Forum Profile:


I have assigned points to 53 of 112 responses to my questions.


Maybe you can find some time to do some assigning?

http://forums1.itrc.hp.com/service/forums/helptips.do?#33

Mind, I do NOT say you necessarily need to give lots of points. It is fully up to _YOU_ to decide how many. If you consider an answer is not deserving any points, you can also assign 0 ( = zero ) points, and then that answer will no longer be counted as unassigned.
Consider, that every poster took at least the trouble of posting for you!

To easily find your streams with unassigned points, click your own name somewhere.
This will bring up your profile.
Near the bottom of that page, under the caption "My Question(s)" you will find "questions or topics with unassigned points " Clicking that will give all, and only, your questions that still have unassigned postings.
If you have closed some of those streams, you must "Reopen" them to "Submit points". (After which you can "Close" again)

Do not forget to explicitly activate "Submit points", or your effort gets lost again!!

Thanks on behalf of your Forum colleagues.

PS. - nothing personal in this. I try to post it to everyone with this kind of assignment ratio in this forum. If you have received a posting like this before - please do not take offence - none is intended!

PPS. - Zero points for this.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Craig A Berry
Honored Contributor

Re: Madgoat Watcher

James, I think your reading of the trace makes sense. Which means, I *think*, that it's not ignoring the config file. You're not using wildcard UICs in the exclusion list, so there's no reason to believe your problem is related to mine.

Does anything appear in the log file? Does it say that it logged out processes that are in fact still there?
James T Horn
Frequent Advisor

Re: Madgoat Watcher

I changed the config to use your example (exclude * /during=(primary:6-20,secondary:8-18) and it still is "ignoring" the exclusions:


( 9-NOV-2010 10:28:02.25) Process: PID=0005E987, user=HORN, term=ERP01$FTA403:, accpor=
( 9-NOV-2010 10:28:02.25) -- Searching exclude list...
( 9-NOV-2010 10:28:02.25) -- Username: exclude=SYSTEM, process=HORN
( 9-NOV-2010 10:28:02.25) -- Username: exclude=*, process=HORN
( 9-NOV-2010 10:28:02.25) -- UIC: exclude=[0,0], process=[2,5]
( 9-NOV-2010 10:28:02.25) -- Exhausted list: no match.
( 9-NOV-2010 10:28:02.25) -- Process found on count list
( 9-NOV-2010 10:28:02.25) -- Searching override list...
( 9-NOV-2010 10:28:02.25) -- Username: exclude=SAMINFO, process=HORN
( 9-NOV-2010 10:28:02.25) -- Username: exclude=NGL_SAMINFO, process=HORN
( 9-NOV-2010 10:28:02.25) -- Exhausted list: no match.
( 9-NOV-2010 10:28:02.25) -- Queueing count record for checking
( 9-NOV-2010 10:28:02.51) Check for warn/force: PID=0005E987, user=HORN
( 9-NOV-2010 10:28:02.51) -- Force check: Is 00:07:13.17 GTR 01:10:00.00?
( 9-NOV-2010 10:28:02.51) -- Warn check: Is 00:07:13.17 GTR 01:00:00.00?

currently using "SET NOACTION" so it does not stop processes, but seems to be saying it will stop all processes.


Craig A Berry
Honored Contributor
Solution

Re: Madgoat Watcher

I can now reproduce your problem with ignored EXCLUDE directives using both wildcarded and explicitly specified usernames. And I think I'm starting to understand what the trace means. It matches on username, but fails to match on UIC. Since we're not specifying a UIC, it should default to a wildcarded UIC which matches everything. However, if you look at what it thinks it has like so:

$ mcr watcher_dir:wcp show exclude

You'll see records like

Username: BACKUP, UIC: [0,200]
Device: *, Port name: *
Running image: *
Times: MONDAY:(0-23),TUESDAY:(0-23),WEDNESDAY:(0-23),THURSDAY:(0-23),FRIDAY:(0-23),SATURDAY:(0-23),SUNDAY:(0-23)


This derives from the record that looks like:

EXCLUDE BACKUP

in the sample configuration file. The design appears to be that all the fields not specified explicitly default to wildcard values. However, a UIC of [0,200] is definitely not a wildcard UIC. Not sure if it's supposed to display as [*,*] or [0,0], but [0,200] is definitely not right. Any attempt to specify a UIC explicitly looks like this:

WCP> exclude backup/uic=[1,1]
%WCP-W-UICERR, error translating UIC "[1,1]"
-LIB-F-SYNTAXERR, string syntax error detected by LIB$TPARSE

The use of LIB$TPARSE might be a clue here, as I believe it's translated VAX code and probably ought to be replaced with LIB$TABLE_PARSE. I thought I'd have a go at doing that, so I installed this:

$ prod show hist *bliss*
------------------------------------ ----------- ----------- --- -----------
PRODUCT KIT TYPE OPERATION VAL DATE
------------------------------------ ----------- ----------- --- -----------
HP I64VMS BLISSI64 V1.12-72 Full LP Install (U) 11-NOV-2010
------------------------------------ ----------- ----------- --- -----------

1 item found

Is that really the latest? It's dated 2006 and it's hard to believe there haven't been any tweaks to the Itanium BLISS compiler since then, but this was the latest I could find; I got it from

ftp://ftp.hp.com/pub/openvms/freeware/bliss

Then I grabbed the sources for Watcher v4.0 (which appears identical to v3.2-1 except for the license text). I got it from:

http://vms.process.com/scripts/fileserv/fileserv.com?WATCHER

Before making any code changes, I tried compiling like so:

$ @[.source]compile
$ BLISS/LIBR=FIELDS.L32I FIELDS.R32
$ BLISS/LIBR=WATCHER.L32I WATCHER.R32
; %MESSAGE: Structure GBLDEF size: 296 bytes
; %MESSAGE: Structure TRMDEF size: 206 bytes
; %MESSAGE: Structure EXCDEF size: 507 bytes
; %MESSAGE: Structure IDDEF size: 72 bytes
$ BLISS/LIBR=WATCHER_PRIVATE.L32I WATCHER_PRIVATE.R32
; %MESSAGE: Structure CTRDEF size: 535 bytes
; %MESSAGE: Structure PRCDEF size: 469 bytes
; %MESSAGE: Structure PGBLDEF size: 10 bytes
; %MESSAGE: Structure CHKDEF size: 12 bytes
; %MESSAGE: Structure MIOSBDEF size: 8 bytes
$ BLISS/OBJECT=[-.BIN-IA64]WATCHER.OBJ WATCHER.B32
$ LIBRARY/REPLACE [-.BIN-IA64]WATCHER.OLB [-.BIN-IA64]WATCHER.OBJ
$ BLISS/OBJECT=[-.BIN-IA64]COLLECT.OBJ COLLECT.B32
$ LIBRARY/REPLACE [-.BIN-IA64]WATCHER.OLB [-.BIN-IA64]COLLECT.OBJ
$ BLISS/OBJECT=[-.BIN-IA64]LOG.OBJ LOG.B32
$ LIBRARY/REPLACE [-.BIN-IA64]WATCHER.OLB [-.BIN-IA64]LOG.OBJ
$ BLISS/OBJECT=[-.BIN-IA64]FORCE.OBJ FORCE.B32
$ LIBRARY/REPLACE [-.BIN-IA64]WATCHER.OLB [-.BIN-IA64]FORCE.OBJ
$ BLISS/LIBR=WCP.L32I WCP.R32
; %MESSAGE: Structure DFLTDEF size: 41 bytes
$ BLISS/OBJECT=[-.BIN-IA64]CONFIG.OBJ CONFIG.B32
$ LIBRARY/REPLACE [-.BIN-IA64]WATCHER.OLB [-.BIN-IA64]CONFIG.OBJ
$ BLISS/OBJECT=[-.BIN-IA64]DECW_DISPLAY.OBJ DECW_DISPLAY.B32
$ LIBRARY/REPLACE [-.BIN-IA64]WATCHER.OLB [-.BIN-IA64]DECW_DISPLAY.OBJ
$ BLISS/OBJECT=[-.BIN-IA64]MEM.OBJ MEM.B32
$ LIBRARY/REPLACE [-.BIN-IA64]WATCHER.OLB [-.BIN-IA64]MEM.OBJ
$ MESSAGE /OBJECT=[-.BIN-IA64]WATCHER_MSG.OBJ WATCHER_MSG.MSG
$ LIBRARY/REPLACE [-.BIN-IA64]WATCHER.OLB [-.BIN-IA64]WATCHER_MSG.OBJ
$ BLISS/OBJECT=[-.BIN-IA64]PERFORM_DISCONNECT.OBJ PERFORM_DISCONNECT.B32

PRESERVE=NO);
.............................^
%BLS32-W-TEXT, Illegal register number 0 in LINKAGE declaration
at line number 183 in file D0:[craig.WATCHER.source]perform_disconnect.b32;1

PRESERVE=NO);
.............................^
%BLS32-W-TEXT, Illegal register number 1 in LINKAGE declaration
at line number 183 in file D0:[craig.WATCHER.source]perform_disconnect.b32;1

PRESERVE=NO);
.............................^
%BLS32-W-TEXT, Illegal register number 2 in LINKAGE declaration
at line number 183 in file D0:[craig.WATCHER.source]perform_disconnect.b32;1

NEWIPL=.TMP, PRESERVE=NO);
..........................................^
%BLS32-W-TEXT, Illegal register number 0 in LINKAGE declaration
at line number 190 in file D0:[craig.WATCHER.source]perform_disconnect.b32;1

NEWIPL=.TMP, PRESERVE=NO);
..........................................^
%BLS32-W-TEXT, Illegal register number 1 in LINKAGE declaration
at line number 190 in file D0:[craig.WATCHER.source]perform_disconnect.b32;1

NEWIPL=.TMP, PRESERVE=NO);
..........................................^
%BLS32-W-TEXT, Illegal register number 2 in LINKAGE declaration
at line number 190 in file D0:[craig.WATCHER.source]perform_disconnect.b32;1
$

All I can find on this is that you can't use registers 0, 1, and 2 in BLISS on IA64, but I have no idea why it thinks this code is doing that.

Clearly someone has built Watcher on Itanium, but it seems unlikely they did so using the released sources with the released BLISS compiler. I'm a bit stuck here unless some BLISS gurus can spot something I'm doing wrong.
James T Horn
Frequent Advisor

Re: Madgoat Watcher

I have it down to two rules:

wcp show exclu
%WCP-I-READCFG, read configuration from file SYS2:[WATCHER]WATCHER_CONFIG.WCFG;146

Exclude records:

Username: SYSTEM, UIC: [0,0]
Device: *, Port name: *
Running image: *
Times: MONDAY:(8-23),TUESDAY:(8-23),WEDNESDAY:(8-23),THURSDAY:(8-23),FRIDAY:(8-23),SATURDAY:(8-17),SUNDAY:(8-17)

Username: *, UIC: [0,0]
Device: *, Port name: *
Running image: *
Times: MONDAY:(6-20),TUESDAY:(6-20),WEDNESDAY:(6-20),THURSDAY:(6-20),FRIDAY:(6-20),SATURDAY:(8-18),SUNDAY:(8-18)

I figured it had something to do with matching USERNAME and UIC, and I was willing to code the exclusions with /UIC=... but you can't specify the UIC without getting that error.

With the environment of our system, I am basically running watcher from between 7pm and 5am so processes will be cleaned up, then during the day leaving them going with no exclusion.

I'm hoping someone with Bliss knowledge can assist.
Hoff
Honored Contributor

Re: Madgoat Watcher

I'm not inclined to power up and boot an Itanium box to go poke at this, but (from a cursory look) it appears that either the syntax of the $devicelock and $deviceunlock macros has changed, or possibly that the macros aren't being found during the compilation.

If you've not rebuilt the Bliss system require libraries after the most recent OpenVMS upgrade or since installing Bliss, here are the 32-bit library builds...

$ BLISS /LIBRARY=SYS$COMMON:[SYSLIB]LIB.L32 -
SYS$LIBRARY:LIB.REQ+SYS$LIBRARY:STARLET.REQ
$ BLISS /LIBRARY=SYS$COMMON:[SYSLIB]STARLET.L32 -
SYS$LIBRARY:STARLET.REQ

Have a look at the macro declarations within the LIB.REQ file (IIRC, though if it's not there go look in STARLET.REQ), as the REQ files are directly readable; they're source code.
James T Horn
Frequent Advisor

Re: Madgoat Watcher

Ok, so I am a little excited for the help on this issue, I might have been a little too generous on points.