Operating System - OpenVMS
1753758 Members
4387 Online
108799 Solutions
New Discussion юеВ

Re: Detecting restart in SYSTARTUP_VMS

 
SOLVED
Go to solution
Jack Trachtman
Super Advisor

Re: Detecting restart in SYSTARTUP_VMS

I remember reading years ago that when the
ANALYZE/CRASH cmd executes and finds a valid
dump file, the first thing it does is to change something in the first block of the dump file so that a subsequent crash will not cause a reanalysis of the file.

If I'm recalling this correctly, does any know what is being changed in the dump file for a crash-restart indication?
Volker Halle
Honored Contributor

Re: Detecting restart in SYSTARTUP_VMS

Jack,

sure, the OLDDUMP bit is set, once SDA has initially opened a system dump file.

From SYS$LIBRARY:LIB.REQ:

macro DMP$V_OLDDUMP = 4,0,1,0 %; ! SET IF DUMP ALREADY ANALYZED

This is the BLISS structure definition and maps to bit 0 of the 2nd longword in SYSDUMP.DMP.

On OpenVMS Alpha, CLUE$SDA evaluates this bit and responds with a

%CLUE-I-ALRDYANA, dumpfile has already been analyzed

message (see SYS$MANAGER:CLUE$STARTUP_node.LOG), when executing the CLUE HISTORY command during startup.

Volker.
Jan van den Ende
Honored Contributor

Re: Detecting restart in SYSTARTUP_VMS

Jack,

Volker's explanation IS correct to one side:
_IF_ you have a valid new dump, _THEN_ this is the first reboot after a crash.
But, you can absolutely _NOT_ conclude to the reverse: a previously-read SYSDUMP does _NOT_ imply that the previous shutdown was operator-requested. It might well be, but it can also indicate that for any reason a dump was not/could not be written.

^P immediately comes to mind, but consider this scenario: (we were bitten by it).
Two nodes, connected by one SCSI-bus to each other and to HSZ40 controllers.
In hindsight, one of the SCSI connector cables to one node was broken, but normally the broken edges touched. At irregular intervals (by vibration or temperature change probably) connection got interrupted.
That node crashed, but... no connection to the disks. And that constitutes a fairly strong reason for not writing a dump to disk!

As a thought experiment, it is rather easy to construct configurations and/or power issues that somehow break at a point or a moment that _WILL_ prevent writing the dump.

So, _NO_ valid dump does _NOT_ imply operator requested shutdown!

hth

Proost.

Have one on me.

Jan

Don't rust yours pelled jacker to fine doll missed aches.
Jack Trachtman
Super Advisor

Re: Detecting restart in SYSTARTUP_VMS

I appreciate everyone reminding me that I can't cover all possible scenarios, but at least catching a crash-restart is helpful.

Volker,

I took your info:

...maps to bit 0 of the 2nd longword in SYSDUMP.DMP

and tried to confirm this by adding the following line to SYSTARTUP_VMS both before and after the ANA/CRASH statements:

$ dump/blocks=end:1 sysdump.dmp

but the output looked exactly the same!
Shouldn't I have been able to see the bit
in the longword being toggled? Am I doing
something simple wrong in my test?
Jan van den Ende
Honored Contributor

Re: Detecting restart in SYSTARTUP_VMS

Jack,

all-in-all I tend to the idea that David's suggestion comes closest to consistency.

Maybe, there are 2 issues (one, a matter of policy, the other, rather low probability) that I can think of:

1. How about operator requested shutdown requesting _NOT_ to execute site-specific shutdown?
You either equate that to a crash, OR you fiddle around with SHUTDOWN.COM, and be prepared for some consequences at upgrades.

2. The system COULD potentially crash AFTER writing your "shutdown requested" file.
This would present a crash-during-shutdown as "operator requested shutdown". But in this case, the executing operator might well have noted "something" ??? And, here the "fresh dump check" might still help..

Still,
far out the best consistency as far as I can reckon.

hth,

Proost.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Lawrence Czlapinski
Trusted Contributor

Re: Detecting restart in SYSTARTUP_VMS

Jack, unfortunately we have seen cases where the system crashed without a crash dump.
Is this for a cluster?
For our clusters, we use a CLUSTER_MONITOR.COM which can be used to trigger a pager through DECtalk. However, this doesn't tell you whether there was crash or not.

You may have to implement something realizing that it won't work all the time.
Writing a file at the end of the site specific operator shutdowns would filter out a lot of the shutdowns. Occasionally a system could crash after the file is written but that would normally be a low probability. Over time you would try to cover more shutdowns. Depending on a system dump being written is riskier. As others have stated there can be assorted reasons why a dump isn't written after a crash.
Lawrence
Volker Halle
Honored Contributor

Re: Detecting restart in SYSTARTUP_VMS

Jack,

CLUE$STARTUP.COM is run earlier than SYSTARTUP_VMS.COM and will automatically analyze the crash for you (creating a CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS file).

So if you really want to see the OLDDUMP bit clear, you need to add your DUMP command to SYLOGICALS.COM. In SYSTARTUP_VMS.COM, it will be too late...

May I repeat my suggestion to look at OPERATOR.LOG;-1 for the 'node' shutdown string (to see what I'm referring to, just try a TYPE/TAIL SYS$MANAGER:OPERATOR.LOG;-1 if the system has just been rebooted). If you want to known, if the system had been shut down normally, this is the place to look. You need to substitute the local nodename in the SEARCH command to not match shutdown messages from other nodes, i.e.:

$ SEARCH SYS$MANAGER:OPERATOR.LOG;-1 -
" ''F$GETSYI("NODENAME")' shutdown was requested by the operator."

If you include /WINDOW=(2,0) you'll also see when the shutdown happend and which user had done it.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Detecting restart in SYSTARTUP_VMS

Also note that e.g. memory corruption may make your node hang instead of crash.

I first only checked if clue created an analysis file shortly before the reboot. If yes, this was a crash. But I had multiple cases were the system simply hanged. So, now I have a site manager node pinging the important parts (node, hubs, sanswitch) and generating alarms when they are not seen. And we have a permanent decnet link between the node and the site manager node. If that link is dropped we also get an alarm.

But of course these alarms are also given during a normal shutdown, in which case the operational guys are warned.

Wim
Wim
Robert_Boyd
Respected Contributor

Re: Detecting restart in SYSTARTUP_VMS

I checked out your suggestion Volker -- apparently what you suggested won't work on every system. I checked several Alphas running V7.2-1 and V7.3-2 here and the ones I checked do NOT have the "shutdown requested" string in the OPERATOR.LOG file. This seems a bit odd to me, but in any case -- the important thing is that there is the "Logfile was closed by operator" message. It is possible to search for that instead. However, that message does not guarantee that the closing of the file was for the shutdown. If however you're checking the end of the ;-1 version during a boot, I think that would be a reliable check.

Another way that I have handled a similar requirement in the past was to create a detached process that runs all the time in the background. It wakes up every few seconds and does 2 things -- it updates a timestamp in a file, and it checks for the appearance of the logical name SHUTDOWN$TIME. SYS$SYSTEM:SHUTDOWN.COM defines the logical name in the system table when a shutdown is in progress. The detached process was set up to check to see how close to the shutdown time it was and when it got to under 1 minute to do whatever final steps were needed to mark the shutdown and then exit. Then on system startup the code checked to see how much time had elapsed and if a shutdown occurred. It then wrote a log record recording the downtime. If the downtime was not associated with a shutdown then a crash or system hang was assumed and logged appropriately ( with an accompanying email message to the system managers).

Robert
Master you were right about 1 thing -- the negotiations were SHORT!
Dale A. Marcy
Trusted Contributor

Re: Detecting restart in SYSTARTUP_VMS

Robert,

Please check again. I just tried the command on an AlphaServer 4100 5/400 running VMS V7.3-2. At first it didn't work, and then I typed the tail of the log and noticed that I left the word "the" out of the search string. I repeated the command again adding the missing "the" into the sentence and it worked as advertised.