Operating System - OpenVMS
1752286 Members
4550 Online
108786 Solutions
New Discussion юеВ

Re: Phantom interactive login reset

 
Richard W Hunt
Valued Contributor

Phantom interactive login reset

Config: Alpha ES40 mod 2, reasonably up to date in firmware; OpenVMS 7.3-2, reasonably up to date in patches, maybe about 3 months ago was my last patch window. Next one is the end of the month.

We've had a problem in the past where something resets interactive logins to zero. Happens maybe once every several months, often more than a year apart. So rare that I cannot point to any job that I run in batch. But I've searched for whatever might do that with some pretty far-reaching SEARCH verbs. No joy in the searches so far.

After a couple of times when this happened in the past, I put a fragment in another job that runs every 15 minutes so that I could monitor login resets. It checks the current setting for interactive logins, writes it to a file.

System had been up over 80 days, no problems. Yesterday according to the logs, between 16:15 and 16:30, something or someone reset interactive logins to zero. I had walked out the door at 15:00 so wasn't me, and my operators know better than to futz around like that. The production support crew also claims ignorance of the matter. (I know there's a straight line in there somewhere, but this isn't funny so I'll skip it.)

I keep the binary audit logs for processing so I looked in there with a /FULL for everything from 15:00 to 18:00, no joy.. Nothing showed up as changing anything. I am set up to audit all of the following (bear with me):

ACL, Mount, Authorization, Install, SYSGEN, Breakin:(dialup,local,remote,network,detached,server), Logfailure:(batch,dialup,local,remote,network,subprocess,detached,server), privilege use (Security), Privilege failure (Security), File Access Bypass:(write,delete,control), Queue access: Other (Create)

I also referred to my operator.log file from the same time frame. No joy there.

The only code that I have to zero out the interactive logins like that is something that only runs during my customized, multi-threaded system startup. The synchronizer that controls the threads sets logins to zero and won't enable them until all of my other startup threads have completely exited. The code works fine and has worked for literally ten years without causing this kind of problem. The particular code segment that has the SET LOGIN commands in it has long since run (AND exited) and won't run again until I reboot. So I don't think my culprit is that startup module.

Question 1: Has anyone else seen a phantom reset of login/interactive like this?

Question 2: I cannot find ANY reference to this in either the audit logs or the operator log. Is there some other parameter I should set so I can trap this event in the logs?
Sr. Systems Janitor
20 REPLIES 20
Thomas Ritter
Respected Contributor

Re: Phantom interactive login reset

We use to run lat based load balancing software for juggling users across the cluster. It would set logins to zero to move new connections to another node. It was inhouse written.
My guess the command is located in some command procedure. Tried a big search across all your directories ? May be a buggy ops Menu.
Bets are there is $ set login/inter =0 somewhere on the system.
Hoff
Honored Contributor

Re: Phantom interactive login reset

Scan your audits for changes to the IJOBLIM system parameter; that's the core knob here.

Please modify the local IJOBLIM-related code to report (or to audit or to log) its own activation and its status, using syslog or whatever site-local means is in use here to track activity.

The relative age of the code does not point to the absence of bugs; I've encountered bugs that were latent and lurking for twenty or thirty years. That was in heavily-used code, too.

Trust, but verify.
Ian Miller.
Honored Contributor

Re: Phantom interactive login reset

Also look for something setting the IJOBLIM system parameter. Although I see that you have SYSGEN auditing so it should have shown up there.
____________________
Purely Personal Opinion
Richard W Hunt
Valued Contributor

Re: Phantom interactive login reset

To Thomas Ritter and Hoff: Thanks for your comments. However, I think I'm ahead of you there.

I have searched my personal directory, all operator-class directories, the system directory, and my production-support team. I have searched my special SYS$TOOLS folder where I keep home-grown useful programs. I have searched the user account support tools. I have searched the startup tools. Nobody has any code that contains the sequence "/intera" (looking for set login/interactive, of course). I found a few /inter that were other keywords. Nothing juxtaposed with login.

We are no longer a cluster. My nodes are now standalone. We do not run LAT support any more, either. Our security guys block LAT protocols at the closest "smart" switch. So it wouldn't be anything LAT based that I can see.

As to logging when my specialty startup code runs: That code does a $ REQUEST/TO=CENTRAL "message" any time it does anything. That REQUEST message shows up in the operator.log file among other places.

I have reviewed the operator logs, audit logs, and the accounts of everyone who has sufficient privilege to do this. I cannot find anything using

$ SEARCH domain "login","/inte"/match=AND

The domains I have used span at least 50-75 user home directories and their sub-directories. Not all of the users have the required privileges, but they are on the same disks as a few users who DO have that level of privilege, and I didn't constrain the search. I even checked my batch job list for that time. Nothing obvious.

This has happened before, perhaps a couple of years ago, and a couple of years before that. It's why I built the interactive login monitor feature. BUT... the last code fragment I found that could do that to me has been fixed long ago (and no, a backup restoration hasn't occurred since then).

While I freely admit there could be an elusive snippet of code somewhere that could do this, I've searched maybe a couple of hundred directories in total and so far, no joy. Given my system's disk architecture, I'm running out of places to look.
Sr. Systems Janitor
Ian Miller.
Honored Contributor

Re: Phantom interactive login reset

What privileged code do you run?
____________________
Purely Personal Opinion
Richard W Hunt
Valued Contributor

Re: Phantom interactive login reset

Ian, as to privileged code:

Most significant of what we run is ORACLE client to a back-end server. I searched their .COM and .LOG files anyway, nothing in their directories seems relevant.

We have a product called SmartStar that is a piece of middleware that speaks ORACLE-ese. The account is privileged enough to disable logins, but I looked both at the files that SmartStar runs and the last couple of days of log files. No joy in that search.

We run normal things like TCPIP Services for OpenVMS (v 5.4 ECO 7 at the moment). I searched that, too. Nada.

The only other site-specific things we run are all running in user context and have only those permissions and privileges of the individual application users. I.e not installed with privileges.

This problem occurred in the past but I can't find any reference to it subsequent to about summer, 2006 when we were still on a lower version of OpenVMS, v 7.2-something. My notes don't tell me when it last happened, and my (admittedly far from perfect) memory says it is VERY infrequent. Once every couple of years seems right. In fact, if I didn't have to account for it to the government supervisors, I would have blown it off and just enabled logins again. Logins ARE enabled, but I can't stop searching for an explanation just yet.

I'm a bit frustrated because I am audit-logging SYSGEN events, which should be sensitive to this. I have other events in the audit log, including when one of my operators had to reset a password to something pre-expired for a new operator, and then again when the new operator reset his own password. Password resets showed up, a couple of security queries showed up (using SECURITY privilege to look at something), but no reset of interactive login counts.
Sr. Systems Janitor
Hoff
Honored Contributor

Re: Phantom interactive login reset

If you're not seeing any IJOBLIM parameter audits but are seeing the value zeroed, then the next obvious step is a look at the kernel-mode code here and for CMEXEC or CMKRNL audits around the time of the error; privileged-mode code executing on this server would appear to be writing to the SCH$GW_IJOBLIM cell.

Richard W Hunt
Valued Contributor

Re: Phantom interactive login reset

Hoff, where would I look for this CMKRNL or CMEXEC level event if not in the AUDIT logs? Surely you don't mean the error logs? I thought that kernel mode exceptions would crash the system and that exec mode exceptions would crash a service. I'm looking at over 82 days uptime at the moment and have no service outages. Also, SHOW ERROR says only 1 count each from DQB0 and DQB1, both of them boot-up events.

I don't have a sources kit so can't search for anything in the executive code anyway. I'm willing to do some research but I've started to run out of options.
Sr. Systems Janitor
Ian Miller.
Honored Contributor

Re: Phantom interactive login reset

You can enable audits for the use of CMKEXEC and CMKRNL
____________________
Purely Personal Opinion