Comware Based
1752794 Members
6182 Online
108789 Solutions
New Discussion юеВ

A5800 & NTP Configuration

 
L1nklight
Valued Contributor

Re: A5800 & NTP Configuration

I get what you are saying with this Paul, but if these are really supposed to be datacenter/core grade switches competitive with the rest of the market my expectation is that they do what the rest of their competitors do at minimum. I've never had to go back with my Juniper or Cisco gear and had to sus out why NTP isn't working as expected...

paulgear
Esteemed Contributor

Re: A5800 & NTP Configuration

All platforms have their quirks, but i agree this isn't something we should have to struggle with.

 

All i can think is that HP/H3C/3Com never intended for these switches to be used as NTP servers.

Regards,
Paul
MDella
Advisor

Re: A5800 & NTP Configuration


@paulgear wrote:

 

I know this is probably not helpful to your situation, but if you have a data centre full of machines, you probably should have two other important things:

  1. A standard installation process which includes the necessary notes about using NTPv3 only.
  2. A configuration management system like Puppet, Chef, or cfengine, which allows you to make bulk changes to configurations like this one.

So just to set the record straight on this... We use a multi-phased approach to build, launch, deploy, etc machines.  When you are dealing with numbers from 1000-20,000 machines, you have a process.  Since deployments are automated, there are LOTS of things that are thought of, dealt with, etc that service the basic functions of these machines.  Where do logs go, how are disks of varying sizes partitioned and deployed, what auditing and automated reporting systems are in place, etc.

 

When building a machine from "raw metal", you have many steps to overcome.  One of the most basic are getting time correct in your overall infrastructure.

 

For us, we start from raw metal booting the iLO of our machines off of DHCP servers.  This of course uses the SNTP function of iLO as distributed by DHCP.  Some interesting facts however... The iLO system uses GMT as its base system and does not necessarly have the "brains" to deal with DHCP 112 to allow for the changing of timezones.  Weither you knew this or not, GMT is *not* the same as UTC or (more currently) Etc/UTC timezones.  Modern OSs have to deal with ever changing politics of time and whatnot, but somehow time has to be universal.

 

Fortunately SNTP is v3 based. So we get the time.  iLO then puts the time in the RTC for you (nothing you can do about it) during a reboot of the machine (a bug that was reported and later fixed. They now put it in the RTC upon physical power cycle, not on reboots).  Why is this distinction important?!?  Because it turns out the default European/London time zone is 3600 seconds off for half the year.

 

When iLO populates the RTC, half the year it puts the wrong time in for Etc/UTC.  When the machine boots, the first thing that happens (time wise) is that ntpd starts to run, however half the year it recognizes a time slew of 3600 seconds.

 

Unfortunately, before ntpd runs AND syncs with its stratum servers, puppet starts... and of course the time checks are 3600 seconds off, so puppet refuses to work with the puppet master.... and the entire process falls apart.

 

So time is a HUGE issue.  Network devices in a data center are considered (on scale) your backbone of operations and relatively immutable.  When you have 200+ access switches, multiple cores, etc, you want a framework that is synced across the board, etc.

 

So the real issue is that on scale, all the intricate parts effect each other in ways not always understood.  In case you hadn't guessed, we have spent a HUGE amount of time (and consequently man hours * $150/hr) trying to understand the problem and how to deal with it.  Easily enough money spent to buy a few core switches.  Or understanding of the problem is much better (and better documented throughout the organization) but its the frustration of "going backwards" on an area we thought was done that prompted the above posting.

 

Don't get me wrong, we're using HP switches now throughout our backbone system (for reasons beyond the fact that they cost a quarter of cisco gear) but that doesn't change the $$$ spent in understanding what we lost in transition.

 

My point above was not to necessarly complain, but to help document what we found and our frustrations.  Also to maybe (though a forum) get someone at HP/H3C to notice and think maybe someone out there does actually care about NTP4 as an industry standard, not as something to "add on later".

 

Marcos

 

paulgear
Esteemed Contributor

Re: A5800 & NTP Configuration

Hi Marcos,

This is probably getting a bit off-topic now, but am i curious about your daylight time issues. Am i understanding you correctly? You seem to be implying that the hardware clock and NTP are both affected by daylight time. Most of the systems i've worked with expect the hardware clock to remain in UTC all the time, and handle daylight time in userspace.
Regards,
Paul
MDella
Advisor

Re: A5800 & NTP Configuration

So the server has only one time, whatever is in the RTC and it counts regardless of interpertation.  The issue isn't the clock itself (nor NTP which also does it this way).  The problem is in how other programs "interpret" time and what they do with it.

 

For instance, iLO has an SNTP that reads configuration items out of DHCP.  In our case, we point to our primary two NTP servers via the DHCP system.  Unfortunately, iLO seems to set the timezone (based on using DHCP) to European/London (which I suppose used to be GMT, but GMT doesn't shift like london).  Since iLO "thinks" its really in London time, it applies its "changes" (ie, 1 hour difference) and writes that to the RTC. 

 

Now the RTC onboard has been set to be one hour off (RTC has no comphrension of timezone, it just has numbers that rotate).  When the linux OS boots, it reads time off of the RTC, and applies that to ITS timezone (in this case UTC) which means it takes the 1 hour shifted RTC and believes that that is actually UTC time so it applies it to the OS.

 

A little bit later, ntpd loads and starts doing its thing to sync time....  First time run there is a parameter to allow ntpd to skew > 3000 seconds (in this case its 3600 seconds) so ntpd moves the clock BACK to the correct time shift for UTC.  If you're smart enough to apply the SYNC_HWCLOCK=yes in /etc/sysconfig/clock to at least push this back into the RTC.

 

HOWEVER, if you are running (dl series, gen7) iLO3 version 1.55 or earlier, the iLO will reset your RTC every time the machine is reset.  If you are running 1.57 or later, iLO will ONLY overwrite the RTC upon power cycle.

 

We are still working on getting iLO to recognize the DHCP 122 field to set the timezone to Etc/UTC rather than the default of Eurpoean/London (that or getting HP to fix their iLO to put in the correct default setting as London is NOT the same as UTC).

 

Marcos

 

P.S., another plug for getting comware5 fixed... DHCP on all linux boxes does read the ntp fields, HOWEVER there is no provision automatically for changing the default of version 4 to version 3 of NTP.

paulgear
Esteemed Contributor

Re: A5800 & NTP Configuration

Hi Marcos,

That's definitely a bit of a pickle. Are your iLOs in the same DHCP scope as your Linux host frontend LANs? If not, what i would probably do in that case is simply remove NTP from the DHCP parameters for the iLO VLAN. That way you at least avoid the iLO TZ bug, and you can put in the correct manual NTP configuration with puppet.
Regards,
Paul
MDella
Advisor

Re: A5800 & NTP Configuration

Chicken and egg problem. We already solved this, this is more documentation for those that follow our footsteps in the issue of trying to use COMWARE5 devices as their NTP sources.

 

Puppet doesn't fix your problem if it detects your clocks are off. So you cant use puppet to update things if things are wrong to begin with :-)

 

 

L1nklight
Valued Contributor

Re: A5800 & NTP Configuration

Hey since we are all talking about it, I am having an issue with my ntp stuff upon review. It seems as if my offset has climbed insanely high. How would I go about correcting it? Currently it appears that my offset is at 226 seconds. 

 

I think the issue may be I am mis-interpreting what the use of the ntp-server unicast-peer functionality is for. I was under the impression that was used for configuring devices you want to use the A-Series as an NTP server. The unicast-server I have setup has an offset of -226237.7 all the unicast-peers i have setup have a drift of less then -2.

paulgear
Esteemed Contributor

Re: A5800 & NTP Configuration

Hi L1nklight,

Given how far this thread has gone, i think you would be best to start a new one. :-)

MDella, i'm keen to hear a follow-up if you come across an elegant solution.
Regards,
Paul
jfnz
New Member

Re: A5800 & NTP Configuration


@Justin_Goldberg wrote:

I'd use 3.country.pool.ntp.org, and only updates once a week or more (or else you risk your subnet being banned)


That's probably about the most inaccurate statement possible.

 

NTP syncing once a week? A ban on a whole subnet? Unlikely.

 

The NTP pool uses DNS to load-balance between public NTP servers which are run by various orginisations.

 

The only way you'd be looked at would be if you were to start throwing several hundred/thousand requests a second at a single NTP server.