Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

VMS Puzzle (Boot Time) for your Christmas enjoyment.

SOLVED
Go to solution
Jon Pinkley
Honored Contributor

VMS Puzzle (Boot Time) for your Christmas enjoyment.

A Boottime puzzle for your Christmas enjoyment.

We moved our datacenter last weekend, and after the move we rebooted the two member cluster.

The puzzle is to explain the following apparent inconsistency of the times that the system formed and joined the cluster with the times reported by boottime and show system/noprocess (uptime).

View of Cluster from system ID 1046 node: OMEGA 25-DEC-2007 02:56:52
+-----------------------+-----------------------------+
| SYSTEMS | MEMBERS |
+--------+--------------+---------+-------------------+
| NODE | SOFTWARE | STATUS | TRANSITION_TIME |
+--------+--------------+---------+-------------------+
| OMEGA | VMS V7.3-2 | MEMBER | 21-DEC-2007 22:46 |
| SIGMA | VMS V7.3-2 | MEMBER | 21-DEC-2007 22:36 |
+--------+--------------+---------+-------------------+
+----------------------------------------------------------------------------------------------+
| CLUSTER |
+--------+-----------+----------+---------+------------+-------------------+-------------------+
| CL_EXP | CL_QUORUM | CL_VOTES | QF_VOTE | CL_MEMBERS | FORMED | LAST_TRANSITION |
+--------+-----------+----------+---------+------------+-------------------+-------------------+
| 3 | 2 | 3 | YES | 2 | 21-DEC-2007 22:36 | 21-DEC-2007 22:46 |
+--------+-----------+----------+---------+------------+-------------------+-------------------+

From the above it appears the cluster was formed by the booting of SIGMA as the founding member of the cluster, with OMEGA joining 10 minutes later, and in fact SIGMA was the first node booted into the cluster, and OMEGA was booted approximately 10 minutes later after we did some preliminary checks for presence of all disks.

Now the puzzle:

$ sysman set env/cluster
%SYSMAN-I-ENV, current command environment:
Clusterwide on local cluster
Username JON will be used on nonlocal nodes

SYSMAN> do show time
%SYSMAN-I-OUTPUT, command execution on node SIGMA
25-DEC-2007 03:01:34
%SYSMAN-I-OUTPUT, command execution on node OMEGA
25-DEC-2007 03:01:36
SYSMAN> do write sys$output f$getsyi("boottime")
%SYSMAN-I-OUTPUT, command execution on node SIGMA
21-DEC-2007 22:35:11.73
%SYSMAN-I-OUTPUT, command execution on node OMEGA
22-DEC-2007 05:59:16.00
SYSMAN> do show system/noprocess
%SYSMAN-I-OUTPUT, command execution on node SIGMA
OpenVMS V7.3-2 on node SIGMA 25-DEC-2007 03:02:12.10 Uptime 2 19:15:37
%SYSMAN-I-OUTPUT, command execution on node OMEGA
OpenVMS V7.3-2 on node OMEGA 25-DEC-2007 03:02:14.75 Uptime 2 19:03:40
SYSMAN> Exit
$ write sys$output f$cvtime("25-DEC-2007:03:02:12.10-2-19:15:37","ABSOLUTE") ! SIGMA current time minus uptime
22-DEC-2007 07:46:35.10
$ write sys$output f$cvtime("25-DEC-2007:03:02:14.75-2-19:03:40","ABSOLUTE") ! OMEGA current time minus uptime
22-DEC-2007 07:58:34.75
$

Normally the boottimes will be close to the times reported by show cluster, and also close to the "current time" minus the uptime of the node. However, if you look at the above reported values you will see they do not correspond.

No exec or kernel mode software was used to write into the system time (EXE$GQ_SYSTIME), boottime (EXE$GQ_BOOTTIME), or uptime (EXE$GL_ABSTIM*) cells, with the exception of the normal updating done by the VMS operating system. There was something "special" done using software from the freeware disk, but it did not write to any of the above data cells. The time was not changed with $ SET TIME.

The question is, what could have been done to get the apparently inconsistent result reported above?

Those are the clues I will give at the start. If no one has any guesses, I will give more clues.

Have fun,

Jon
it depends
10 REPLIES
Volker Halle
Honored Contributor
Solution

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

Jon,

one way to get into this state is:

boot OMEGA with SETTIME=1 and manually enter a time in the future during boot.

EXE$GQ_BOOTTIME will be set when the hardware clock is initialized and this happens before becoming a cluster member. Once OMEGA joins the cluster, the clock will automatically be updated from the time value of an existing cluster member.

This scenario may also happen, if the BBW clock on OMEGA had been set to a future date by a previous boot before booting into the cluster. A failure of the BBW chip may also cause this behaviour.

Volker.
DECxchange
Regular Advisor

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

Hello,
Do you have any other servers on your intranet that are used for time synchronization? Is your VMS system time synchronized with other servers?

Merry Christmas and Happy New Year
Jon Pinkley
Honored Contributor

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

Volker is on the correct path, however it was SIGMA that had its time changed at boottime. When we booted SIGMA, we used a conversational boot and SETTIME was set to 1. When we booted OMEGA, we did not use a conversational boot and SETTIME was set to 0. This explains the following output:

SYSMAN> do write sys$output f$getsyi("boottime")
%SYSMAN-I-OUTPUT, command execution on node SIGMA
21-DEC-2007 22:35:11.73
%SYSMAN-I-OUTPUT, command execution on node OMEGA
22-DEC-2007 05:59:16.00

However, it does not explain the discrepancy between cluster transition times and the times reported by show system/noprocess, specifically why is there such a discrepancy between SIGMA's 21-DEC-2007 22:36 (cluster transition) and 22-DEC-2007 07:46:35.10 (current time minus uptime), and OMEGA's 21-DEC-2007 22:46 (cluster transition) and 22-DEC-2007 07:58:34.75 (current time minus uptime)?

In tabular form with another clue (NODE_SWINCARN):

Node Cluster Transition NODE_SWINCARN BOOTTIME Current - uptime
SIGMA 21-DEC-2007 22:36 22-DEC-2007 05:47:00 21-DEC-2007 22:35:11.73 22-DEC-2007 07:46:35.10
OMEGA 21-DEC-2007 22:46 22-DEC-2007 05:59:01 22-DEC-2007 05:59:16.00 22-DEC-2007 07:58:34.75


RE: "Do you have any other servers on your intranet that are used for time synchronization? Is your VMS system time synchronized with other servers?"

No, there is no time synchronization with other servers. Other than that at boottime, the node joining the cluster has it software clock set to the value from another node in the cluster (in this case, there will only be one other node, so immediately after booting the second node into the cluster, the clocks are very close to being synchronized.

PS. I will award points after the scenario has been described.

PPS. The systems have not been rebooted yet, so it you have any other questions related to this puzzle that you want me to display output from, let me know.
it depends
Andy Bustamante
Honored Contributor

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

What are the votes on each system? Was quorum adjusted?
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Zeni B. Schleter
Regular Advisor

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

Only thing I can think of for a two hour shift would be something amiss with the Daylight savings time that was tweaked. I have seen the offset shifted in the wrong direction causing some reporting to be wrong but I didn't think that was affect the system clock as such.

Is TCPIP in use or another IP provider?
Wim Van den Wyngaert
Honored Contributor

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

If you have settime enabled in audit you can check if soemthing unexpected happened during the lifetime of the cluster.

Wim
Wim
Jon Pinkley
Honored Contributor

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

Sorry for the delay in response, we had to move the rest of the datacenter.

Just to be clear, I KNOW why there is a discrepancy, by puzzle I meant "Brain Teaser"

RE: "What are the votes on each system? Was quorum adjusted?" Each ES47 has 1 vote, and the quorum disk has one vote. Expected votes is 3. Quorum was not adjusted.

RE: "Daylight savings time" Not related, and AUTO_DLIGHT_SAV is 0

RE: "Is TCPIP in use or another IP provider?" TCPIP

RE: "If you have settime enabled in audit you can check if something unexpected happened during the lifetime of the cluster." Good thing to check, we have time auditing enabled. The only entries were from the shutdown where it does a

$ SET TIME="''f$time()'"

Which results in audit alarms like this:

Security audit (SECURITY) on SIGMA, system id: 1045
Auditable event: System time set
Event time: 21-DEC-2007 21:33:19.73
PID: 2040264B
Process name: SHUTDOWN
Username: JON
Process owner: [SYSTEM]
Image name: DSA1407:[SYS0.SYSCOMMON.][SYSEXE]SET.EXE
New system time: 21-DEC-2007 21:33:19.72
Old system time: 21-DEC-2007 21:33:19.72
Posix UID: -2
Posix GID: -2 (%XFFFFFFFE)




Ok, here's another clue:

Before moving anything, we shutdown the cluster; the cluster was down for about two hours. While it was down, we unpresented units from the ES40 and presented them to the ES47's. Before moving anything we rebooted the cluster with only the two processors being moved in phase I, and set the time to 5 minutes after we had shut the cluster down (as there were many scheduled batch jobs we did not want to begin processing). After verifying that all devices were present, and that the applications were working, we shut the cluster down one last time while in the original location. The time from the cluster's point of view was 21-DEC-2007 22:30, wall clock time was around 22-DEC-2007 00:30. When the systems were shutdown, the Battery Backed up Watch chip was updated to 22:30, about two hours off wall clock time. This accounts for the difference between the SW_INCARN times and the (current time minus uptime values). It does not explain the discrepancy between the cluster transition times and the (current time minus uptime), a difference of over 9 hours and 10 minutes.

Here is another clue, after the equipment was physically moved, and reconnected, we Booted SIGMA first, with a conversational boot and set SETTIME 1, and set the time to 21-DEC-2007 21:35, so the Cluster believed it had been down for only 5 minutes. After the cluster booted, I did something to cause the time to be correct by Monday morning, but I did not use the SET TIME command.

Jon
it depends
Jim_McKinney
Honored Contributor

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

> I did something to cause the time to be correct by Monday morning


Change the values of EXE$GL_TICKLENGTH and
EXE$GL_TIMEADJUST perhaps?
Volker Halle
Honored Contributor

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

Jon,

did you use TBO (Freeware V6) ?

Volker.
Jon Pinkley
Honored Contributor

Re: VMS Puzzle (Boot Time) for your Christmas enjoyment.

RE: "Change the values of EXE$GL_TICKLENGTH and EXE$GL_TIMEADJUST perhaps?" Yes, that's the VMS mechanism that was employed.

RE: "did you use TBO (Freeware V6) ?" Yes, Specifically the command was:

$ tbo /direction=forward /range=99360 /delta=33120 /info ! drift the clock forward by 33120 seconds (9 hours 12 minutes)

EXE$GL_ABSTIM_TICS is the number of 1/100th second "soft ticks" since system initialization, and it is incremented based on the number of hardware clock interrupts needed average to 0.01 second. The system uptime displayed in the header line of SHOW SYSTEM is based on EXE$GL_ABSTIM_TICS (for the local system), and it will correct even if the system time is changed.

The reason we did this was because we had batch processing that had needed to be processed at the "correct" time. There were no external dependencies, so this was the easiest way to achieve the goal. NTP or any other software that adjusts the clock must be disabled while this is being done.

For those interested in how all this works, if you have access to the VMS source listings, see [.SYS.LIS]INTERVAL_TIMER.LIS. In the Alpha VMS 7.3-2 listing kit it is on CD 4/5

DISK$AXPVMS732LS4:[V732.SYS.LIS]INTERVAL_TIMER.LIS

The description in the AXP V1.5 IDSM Chapter 12, Section 12.3 System Timekeeping is still basically correct as well.

Jon
it depends