1753905 Members
9733 Online
108810 Solutions
New Discussion

ntp big offset

 
1221
Advisor

ntp big offset

Hi,

 

There is a big time difference between the NTP server and the two HP-UX servers which is in Oracle RAC Cluster.

 

NTP Server1: c1-dc10.kc.med

NTP Server2: g1-dc12.kc.med

RAC Node1: mhs01.kc.med

RAC Node2: mhs02.kc.med

 

RAC Node1 "ntp.conf" file configuration;

 

server c1-dc10.kc.med version 3 prefer
server g1-dc12.kc.med version 3
peer mhs02.kc.med version 3
server 127.127.1.1 stratum 10
authenticate no
statsdir /var/tmp/ntp/
statistics loopstats
filegen loopstats file loopstats type day enable
filegen loopstats file loopstats type day link enable
driftfile /etc/ntp.drift

 

RAC Node2 "ntp.conf" file configuration;

 

server c1-dc10.kc.med version 3 prefer
server g1-dc12.kc.med version 3
peer mhs01.kc.med version 3
server 127.127.1.1 stratum 10
authenticate no
statsdir /var/tmp/ntp/
statistics loopstats
filegen loopstats file loopstats type day enable
filegen loopstats file loopstats type day link enable
driftfile /etc/ntp.drift

 

 

# ntpdc -p(on Node1 "mhs01.kc.med")

remote local st poll reach delay offset disp
=======================================================================
*mhs02.kc.m 10.244.2.11 6 64 377 0.00014 0.000061 0.04012
=LOCAL(1) 127.0.0.1 5 64 0 0.00000 0.000000 3.99217
=g1-dc12.kc 10.244.2.11 2 256 377 0.00099 939.26737 0.15222
=c1-dc10.kc 10.244.2.11 1 256 377 0.00099 939.18571 0.16798

 

#ntpdc -p (on Node2 "mhs02.kc.med")


remote local st poll reach delay offset disp
=======================================================================
+mhs01.kc.m 10.244.2.12 7 64 376 0.00014 0.000080 0.04518
*LOCAL(1) 127.0.0.1 5 64 377 0.00000 0.000000 0.03044
=g1-dc12.kc 10.244.2.12 2 64 377 0.00073 938.72240 0.13562
=c1-dc10.kc 10.244.2.12 1 64 377 0.00073 938.71480 0.13440

 

both the servers are in synchronization with each other but they are around 15 minutes ahead of the NTP servers.

Need to synchronize both the servers with NTP servers.

If we manually update the time the oracle clusterware will reboot the server, we don't want the servers to get rebooted.

How can the servers be synchronized?

 

P.S. This thread has been moved from HP-UX>System Administration to HP-UX > networking. -HP Forum Moderator

 

2 REPLIES 2
Patrick Wallek
Honored Contributor

Re: ntp big offset

The only safe way to adjust the time with such a big discrepancy is to shutdown your DB cluster, adjust the time, make sure that NTP is running, and then restart the DB.

 

 

Matti_Kurkela
Honored Contributor

Re: ntp big offset

Your "server 127.127.1.1 stratum 10" makes the xntpd daemon see the local system clock as a pseudo NTP server. As a result, you have four time sources in your configuration: the two real NTP servers, the other database server (with a "peer" type association) and the local system clock.

 

Your mhs02.kc.med server has synchronized with LOCAL(1), i.e. its local system clock, so it is actually free to drift (comparing a clock to itself does not tell much about how accurate time it has). And mhs01.kc.med has synchronized with mhs02.kc.med, so it ends up trusting the false time of mhs02 rather than the correct one by the real NTP servers.

 

Since there are four NTP time sources configured, the 2 real NTP servers cannot overrule the 2 sources of false time, and NTP basically goes with whatever seems to be the least disruptive. As a result, your current configuration allows the database servers to go into "we're right, everyone else is wrong" mode as they are confirming each other.

 

Since the difference between the NTP server and the database servers is still less than 1000 seconds, there might be one possible way to solve this without stopping the database.

 

First, stop xntpd on both database servers. ("sh /sbin/init.d/xntpd stop")

Then, comment out the "server 127.127.1.1 stratum 10" lines in /etc/ntp.conf on both database servers.

Finally, restart xntpd using the -x option ("xntpd -x"). With the -x option, it will make even big adjustments slowly without causing the time to jump. Then wait for hours as the system clocks will slowly converge to the time of the NTP servers.

 

During this time, the servers won't be actively synchronized with each other (as the time value of the peer database server will be voted out by 2 vs. 1), but since the database servers are initially in perfect sync with each other and the speed of convergence should be the same, the difference in system clocks between the database servers should not become too great.

 

I would recommend that you leave the "server 127.127.1.1" lines permanently commented out. You will need them only if you need to start the xntpd daemons on the database servers while the real NTP servers are not reachable - and in that situation, only one database server should have the "server 127.127.1.1" line, not both of them. In such a major failure situation, it might be better to actively decide which database server is more stable (regarding the potential for further failures) and uncomment the "server 127.127.1.1" line on that server only when required.

 

The "stratum 10" part of the "server 127.127.1.1 ..." line is obviously not working: your ntpdc output indicates the LOCAL(1) has stratum 5 on both database servers, not stratum 10.

 

Actually, the "... stratum 10" part looks like the syntax of the newer ntpd daemon, which is still not the default version on HP-UX. When using the HP-UX default xntpd, the stratum adjustment should be done with the fudge keyword on a separate configuration line:

# NOTE: enable only when real NTP servers are not reachable, and only on one host at a time.
#server 127.127.1.1
#fudge 127.127.1.1 stratum 10

 

MK