- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Measuring system uptime
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-25-2004 05:58 AM
тАО05-25-2004 05:58 AM
Measuring system uptime
After spending some time trying to understand the question, the waters got deeper.
Some say:
1.if the system is running, its up
2.if the system and network are ok, its up
3.if the system, network and app are ok, its up
What did I miss?
I like door #3 btw.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-25-2004 06:30 AM
тАО05-25-2004 06:30 AM
Re: Measuring system uptime
What if you go through door number 3, but find out that everything is awful slow?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-25-2004 06:45 AM
тАО05-25-2004 06:45 AM
Re: Measuring system uptime
That's yet another whole ballgame and the waters get deeper still.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-25-2004 07:26 AM
тАО05-25-2004 07:26 AM
Re: Measuring system uptime
- Outage.
- Performance degredation.
- Incomplete functionality.
Sometimes they classify a really slow system as "Outage". But it's really an application/network problem. Not an OS/Hardware problem.
So, if a network switch dies, and your system is still up. Is this an Outage of the VMS system? I say no. The end user says yes. If I have to re-boot the VMS system before the stupid Cisco switch will talk to it again. It's then a VMS problem.
Or....
If there is a power outage on the East cost, and the hosted system in Missouri is still up and running. Is this an Outage? (Not from where I am sitting. But definitely an Outage to the end user! :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-25-2004 09:39 AM
тАО05-25-2004 09:39 AM
Re: Measuring system uptime
We tend to take a more system-centric view. If the system and the application on the system is up, we're up. Doesn't matter if users can get to the application or not.
Our users don't always like this idea, but it is more tied to a contract with our vendor that contains performance and availability garantees.
I guess this is one of those "Your milage may vary" questions. :-)
Dave Harrold
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-25-2004 06:41 PM
тАО05-25-2004 06:41 PM
Re: Measuring system uptime
You have bought up a really cool topic, that can be debated on and on. It really depends from Manager to Manager and from Company to Company on how they define the uptime for a system
I have worked for various companies in my careerspan and believe the definition of system up time differed from one to another
1. System down time (Definition 1)
Some compaies define this as the time
for which your system has been up.
2. System down time (Definition 2)
Some companies define this as the time
for which the systems were up and so
were the applications running on that
system
3. System down time (Definition 3)
Some companies define this as the time
for which the systems were up and so
were the aplications as per the SLA
We can debate on this for long. As a systems manager, i always used to find issues with Item#2 and part of Item#3 (not comfortable with the application portion).
As majority of the systems folks understand that some times, the applications will have their own issues and it would be unfair to say the system was unavailable during that time, although you had your servers up and running.
Regards
Mobeen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-25-2004 09:32 PM
тАО05-25-2004 09:32 PM
Re: Measuring system uptime
In our uptime talk at ENSA@WORK we went somewhat deeper into this topic.
Our story started with the hardware configuration:
-- two-site, 4-node cluster with CLUSTER uptime >7 years.
-- 11 major applics.
-- users spread over 58 location, about 10 of them redundantly connected (number of redundancies is growing)
-- MOST users on WBT's, connected to a Cirix farm, from which a.o. sessions to VMS-apps.
-- several (typically 'heavier' users on PC, some connections to VMS apps
-- a VERY VITAL app, using VT's on Terminal Servers
-- a VERY VITAL app, connected via TerminalServer to a radio transmitter, for MobileTerminalData (MDT) connections.
-- several of the apps have (sometimes MAJOR) connectivity with external systems via Firewall and (private, countrywide) network.
Then, we were charged with; "If the user cannot run his/her app, then the system is down, so how can you claim 7 years up?"
So, we spend some time in breaking this down:
- If a WBT is down, ONE user is down, but he/she can take the WBT at the next desk, and work on.
Does this mean SYSTEM IS DOWN? not many takers here.
- If the network segment to a location is down NO services are offered in that location.
Is the system down? YES- to those in that location; NO- to everybody else.
- Cirix is down or malfunctioning.
Is the system down? YES- to WBT workers
No to all others. NO- to WBT workers who took the trouble to learn about short-circuiting from WBT straight to VMS or UNIX (but at user-level: non-trivial)
- If the network connectivity to the cluster fails?
Well, we DID loose connectivity to ONE site, but until now, VMS has remained reachable along at least one route. But it would mean that system availability for MANY users would cease.
- If a VT breaks down? A nusance to have to move to another desk.
- A terminal server breaks down? Half of the VT based workplaces will fail, but application functionality can continue. The desks in one room are spread over (atleast) two servers.
The radio system, or the Terminal Server it is connected to, fails?
NO MDT communication until manually failed over to the cold standby, Somebody will have to move onsite,
System down? YES- for that app NO- for all other apps.
- Then - an application can have its own issue. Some applics ( the DBMS & the RMS ones ) support rolling upgrades, and in principle they show uptimes comparable with VMS itself. Some use unix-ported Dbms-ses.
The db-engine inherently runs single-instance. Any upgrade or node-failover implies applic downtime.
If one applic is down, is the sytem down? Not to those using other applics.
- If the (connectivity with one of the) remote systems is down, the app runs in a reduced-functionalitu mode. Is this to be considered UP or DOWN? Depends on what functionality our target user needs at that moment...
- And we had a real special case in january 2002, with the Euro introduction.
One app has a rather large financial aspect.
And in the Euro version a prohibitive bug was detected mid-december.
So, the Euro version wasn't available till end january.
First but... the Guilders version was. Only, it could not be used for any new transaction.
Second but... It HAD to be available for statistics and accounting. So, at request of application management, the app WAS made available, but ONLY to application management, Statistics department, and Auditing. (totalling about a dozen people) The app was blocked for the normal users (about 3000). Now, do you consider the app UP or DOWN?
To us, the app is running according to specs by app mgt, but not many users agreed...
We have come to the conclusion, that WE will consider "the system" to be "up", when at least "some" application is available to "some" applcation users, because that seems to be the only thing measurable.
The only other thing would be to measure each individual application (or even application functionality) for each individual user. And even in this approach one purist remarked: "How do you define an app that WOULD be NOT available, IF the user tried, at a time he does not try?"
Then again, maybe if your system is running only ONE app, which eighter IS or IS NOT running, and every user accesses it via one and the same path, maybe THEN you can define applic availability as system uptime.
Not every system has that monocultural approach.
So in the end, back at square one: "it depends".
It really starts to look like some philosofic essay, eh? Hoping this holds some usefull points to some.
Jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-26-2004 04:56 AM
тАО05-26-2004 04:56 AM
Re: Measuring system uptime
Thanks for posting. Your post reminded me of the application/systems component dependencies we were trying to define at one of my work places to address BCP and DR
rgds
Mobeen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-26-2004 02:53 PM
тАО05-26-2004 02:53 PM
Re: Measuring system uptime
well, the simple answer is, Uptime is whatever you have sold your "customer" as such.
Let me give an example. I am working for an exchange. For us the system is up, if banks etc can trade. In our case this also includes network connectivity in most cases. If this is interrupted, there is a (pro-rated) effect on system availability. This is "fair" since we are selling full service including network management up to the banks sites. OTOH, if the bank decides do use the Internet to connect to the exchange, any outage beyond our routers is not counted, since we do have no influence on the Internet at large ;-)
All this is obviously part of SLAs.
I do recommend having SLAs by the way. While they might appear as a pain from a technical point at first, they are really important tools in expectation management and justification of effort.
Greetings, Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-31-2004 09:40 PM
тАО05-31-2004 09:40 PM
Re: Measuring system uptime
SLA is only fun for managers. This is not realy uptime. Uptime is for systemmanager that the system is up and apps. are running. If the networks isn't functioning this is for the user no uptime, but for systemmanagers (internaly) uptime. If the SLA let you have a downtime for 12 hours a day (eg stockexchange frontend is only up from 9 to 9) this is in my opiniun a uptime for 50% and not 100% because this is mentioned as such in the SLA. For the manager, YES this is 100% SLA !
AvR