Operating System - OpenVMS
1827807 Members
2949 Online
109969 Solutions
New Discussion

Re: Anyone remember DECInspect?

 
Robert Manning_2
Valued Contributor

Anyone remember DECInspect?

Hi,

I suddenly have a problem with DECInspect 3.2 and I was wondering whether anyone out there may have any ideas?

I've got a 2-member AlphaCluster (AS4100s with OVMS 6.2-1H3), both members running DECInspect although using a common resource disk and license database.

Each node has its own INSPECT$PORTAL and INSPECT$EXEC processes, however the EXEC process on one node has, for some reason, quit.

If I stop and restart Inspect, the processes start up, there apears to be some I/O on the EXEC process, then a process called INSPECT$Esnd_3 runs for a couple of seconds and quits, taking INSPECT$EXEC with it.

We can find no reason for the problem - it didn't happen immediately following a reboot or any infrastructure changes - my licenses are current, and the security team who use the software would have no reason to make any changes to it.

The error log, curiously, makes reference to a device and files that do not exist on our servers or indeed, anywhere in our enterprise, and so I'm left wondering whether we've somehow reverted to default or example settings of some kind.

I'm sorry if this seems a little vague - to be honest, DECInspect is not something I've used often, and it's been years since I needed to have anything to do with it. I've been through the manual, but the Troubleshooting section didn't appear to offer anything I could use.

I've attached a file showing extracts from logs - if anyone has any advice to offer, I'd appreciate it.

Thanks,

Bob
14 REPLIES 14
Volker Halle
Honored Contributor

Re: Anyone remember DECInspect?

Bob,

there is no attachment - please try again.

Volker.
Robert Manning_2
Valued Contributor

Re: Anyone remember DECInspect?

Sorry, Volker - I'll try again...
Karl Rohwedder
Honored Contributor

Re: Anyone remember DECInspect?

Robert,
I have no knowledge of DECinspect, but could it be, that after a reboot some logical names concerning configuration data were missing/wrong and so the unkown devices were introduced?

regards Kalle
Chinraj Rajasekaran
Frequent Advisor

Re: Anyone remember DECInspect?

Hi,

even if the disk not existing, it is possible access to a disk from your system with the logical name mentioned in the errorlog file..!!

You should check if any logical for the device mentioned in the error log is missing ?

>>>If I stop and restart Inspect, the processes start up...

May be you should look for some initialization file for all the symbols and logicals before you start inspect startup procedure?


you can check files in sys$startup or also you can look for files in the sysman database.
-----------------------------------------



regards
Raj
Robert Manning_2
Valued Contributor

Re: Anyone remember DECInspect?

Hi, and thanks for the replies...

Kalle: I thought that, but from what I can tell, the software was working fine up to 60 days after the most recent reboot.

Raj: The logicals are set from the SYS$STARTUP:INSPECT$STARTUP.COM routine. I've compared it with the one that runs on the other cluster member and everything is as it should be. I believe the disks and images my system is looking at are part of some example routine - the disk labels are named after fictional detectives (Columbo, Bergerac) - I just don't know how or why.

I'll keep going through available command scripts and see if anything looks strange or unusual...

Thanks,

Bob
Volker Halle
Honored Contributor

Re: Anyone remember DECInspect?

Bob,

no knowledge of DECinspect either, but...

The errors in Inspect$Exec_Error.log look like showing the source code module and line number of the DECinspect code, from where the errors are being signaled. If this would be so, the errors on your system would be:

%RMS-E-FND, ACP file or directory lookup failed
%SYSTEM-F-VOLINV, volume is not software enabled

being reported when executing the test called: 'Checking Candidate Test - Public System Dirs(2) 1 [OpenVMSFileProtection]'

Have a look at the SHOW DEV D output on your node where DECinspect is failing. Any disks in an unusual mount state ?

Volker.
Robert Manning_2
Valued Contributor

Re: Anyone remember DECInspect?

Volker,

Thanks - no, all the disks are where they're supposed to be. There's a two-member shadowset that acts as the INSPECT$ROOT location, and it's not showing any errors or anything.

I've noticed as well that RID isn't running, and I believe this may be connected to the Inspect issue. I'm looking into that now to see whether there's any clues as to what's going on...

Cheers,

Bob
Volker Halle
Honored Contributor

Re: Anyone remember DECInspect?

Bob,

%RMS-E-FND and %SYSTEM-F-VOLINV could relate to some RMS operation on a mounted disk. This would make sense in the context of the failing test name: Public System Dirs(2) 1 [OpenVMSFileProtection]

Try a DIR/SEC [000000] on all of your mounted disks from that system. On an unmounted disk, you would get a RMS-E-DNR error.

Then there is also the possibility of enabling $ SET WATCH FILE/CLASS=MAJ - if you can edit any .COM file, which will be executed by starting Inspect$Exec.

Volker.
Robert Manning_2
Valued Contributor

Re: Anyone remember DECInspect?

Volker,

Thanks - I'll give it a try.

I'm not familiar with SET WATCH, though - what does it do and where should I enable it?

Bob
Volker Halle
Honored Contributor

Re: Anyone remember DECInspect?

Bob,

$ SET WATCH FILE/CLASS=MAJ is an undocumented but well-known debugging tool for XQP (file system) operations. You can try it interactively in your process. Then issue some DIR command and you'll see XQP debug output. This command needs CMKRNL privilege.

You turn off the debug output with $ SET WATCH FILE/CLASS=NOALL

IF you find a DCL-procedure, which is being called during INSPECT$EXEC startup, you could add this command and you should get at least the filename or fid-ids listed of all files, which INSPECT$exec tries to access.

Volker.
John Gillings
Honored Contributor

Re: Anyone remember DECInspect?

Bob,

I run DECinspect on numerous systems. Although it's mostly reliable, I've occasionally found that it stops working for no apparent reason. The inner workings are deliberately obscured, so methods of diagnosis are limited. First thing to try is to shut it down on all nodes across the cluster, then restart them all.

Very occasionally it needs a "MS" solution. If you have the original kit, try reinstalling.

If that seems too drastic, you can experiment by creating your own inspectors to see if they work. If necessary try each subsystem individually to see which ones work. Check the files in the [DATABASE] directory for corruption.
A crucible of informative mistakes
Robert Manning_2
Valued Contributor

Re: Anyone remember DECInspect?

Hi,

Apologies for not replying sooner - I've been away for a few weeks...

Volker,

I used that debug tool as you suggested, adding it at each end of the script INSPECT$START_EXEC.COM. I then used the Inspect GUI to start the executor, which was not running, and it did so, apparently kicking off an Inspector.

The logfiles now show normal operation, which looks encouraging, and the INSPECT$EXEC process is in a HIB state nearly 90 minutes later, which is even better.

However I'm uncertain whether the act of running the SET WATCH command could have been responsible for this. I had the impression it was merely a tool to show where a problem might lie, rather than to actively fix it. Is this the case?

If so, then perhaps my problem is solved, and thank you kindly for your assistance.

If not, then further research is indicated - I hate it when problems 'fix themselves'...

John,

Thanks. I take your point about the obscurity of the system - I don't think I've encountered a package that was more troublesome to diagnose. But then I haven't had much cause to go near it in 10 years so there you go...

As I mentioned above, it seems to be working again after I started the Executor with Volker's SET WATCH utility.

But I'll try setting up some custom Inspectors as you suggest and see what happens. If it's still running after a week I figure it'll be okay...

Bob
Jan van den Ende
Honored Contributor

Re: Anyone remember DECInspect?

Bob,

the only way I can think of that SET WATCH cured your problem is, if there is a timing problem, and the slight delays introduced by writing the SET WATCH output causes you to just miss the error window.
Also, _NOT_ a real solution!

BTW, have you looked with ACCOUNTNG for the final status of the disapearing processes? There MIGHT be interesting information there too.

hth

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Robert Manning_2
Valued Contributor

Re: Anyone remember DECInspect?

Jan,

Thanks - I'm not sure about the idea of timing errors, if by that you mean how the inspector is scheduled. The error/fault/whatever would happen whether the process was run at its scheduled time or interactively, with similar results.

If you're suggesting that perhaps there's something happening within the process that's causing it to trip over itself, then maybe...

At the moment, however, what I've done is to remove the SET WATCH commands from the INSPECT$START_EXEC.COM and run the processes again, both scheduled and interactively, and things remain okay.

We'll be keeping tabs on it for a while to see whether this happens again, and if so I'll use ACCOUNTNG to see if anything shows up. I'll also be running some test jobs to see if the problem repeats.

Cheers,

Bob