Re: Uptime

Daniel Fernandez Illan · ‎04-13-2005

Jan
Congrats from Spain, too :-)
i Think that next year, you will posted the same threat but increasing number of years!!
Saludos
Daniel.

Anton van Ruitenbeek · ‎04-13-2005

Martin,

We tried to go by HTTP as wel, but this is also sheilded off. HTTP trafic is only allowed from the Citrix servers and these are listed in the Cisco Catalist and the firewalls.
The way I can't get around it is 'We may not put any time to it and change the configuration for this because it isn't our major business'. Soo we have to do it in sneaky time ...
Yes, the firewalls and network is not under our 'command'. So this must be done with some tactfully politeness ;-0

AvR

NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !

Willem Grooters · ‎04-14-2005

Jan/Anton/Frank,

Staat ook op OpenVMS.org, leek me gepast. Zet het ook op mijn (standalone, 1-node (nog wel) cluster ;))

Eraan gedacht het ook te adverteren op Business Continuity Forum?

Willem (0 points, got mine already)

Willem Grooters
OpenVMS Developer & System Manager

Jan van den Ende · ‎04-14-2005

Willem,

you are not the first to point me to OpenVMS.ORG !!!
You _ARE_ on Sue's mailing list, me thinks?

Which Business Continuity Forum? I only know the Business Recovery Forum. It seems you did not check that.

PS. you must do real strange things (or find me in a real bad mood) to get zero points from me. At least one for the effort.

Proost.

Have one on me.

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Garry Fruth · ‎04-14-2005

Congradulations. It will take a long time for my current client to catch up. We are at 14 months on two of the VMS clusters.

Evert Jan van Ramselaar · ‎04-14-2005

Congrats Frank, Anton & Jan!

I much enjoyed the coffee with "gebak"... :D

Great achievement on the cluster uptime, but how about application availability? ;)
(running and hiding)

Cheers!

Your Unix peer,
EJ

Contrary to popular belief, Unix is userfriendly. It just happens to be selective about who it makes friends with.

labadie_1 · ‎04-14-2005

Congratulations !

I have gone several time to more than 1000 days on standalone nodes.

Uwe Zessin · ‎04-14-2005

That isn't bad, either. Best I could do when I was system manager was about 2x 497.5 days in a row. There was a bug in VMS for the MicroVAX 3400 systems that caused a hard system hang after a 32-bit 10ms counter rolled over (not even reset did work) :-(

.

Jan van den Ende · ‎04-15-2005

Gerard,

That is also quite impressive! And it also definitively means you did not upgrade >> NOR PATCH << your system!
Tell that to an M$ Administrator, and you are forever branded a liar, and a bad one, because "everybody" can "know" your story can absolutely not be true! :-)

Evert Jan,
the discussion came up here before.
I have entered some aspects of it in

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=800734

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=793109http:

and especially in:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=602529

I rather enjoyed reading them back one again. :-)

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Antoniov. · ‎04-15-2005

I heard some America hospital use vms. Just for curiosity I'd know their uptime.
Dutch Police is using an hard 24x7 vms system.

Antonio Vigliotti

Antonio Maria Vigliotti

Kris Clippeleyr · ‎04-17-2005

Jan,

Better late than never.
Congratulations on this great achievement.

Kris (aka Qkcl)

I'm gonna hit the highway like a battering ram on a silver-black phantom bike...

Craig McGill_2 · ‎04-24-2005

This is an achievement, but not what you might think. I don't want to rain on your parade, but the time displayed in the original post is the cluster FOUNDING time (cluster_ftime). This is the time that the cluster was founded. As I said, this IS an achievement, but it is not the achievement you might think. It is the time that the first member of the cluster booted. It does NOT represent how long any member of the cluster has been up. Every single member of the cluster could have booted once, a dozen times, a million times, since then, as long as at least ONE member of the cluster is up at any one time. The cluster_ftime is only reset when ALL members are down and the first one boots again.

I don't mean to be cruel. I say a third time that this is an achivement. But any member of the cluster may have booted any number of times since then. It unfortunately doesn't indicate how long a VMS system has been up. They all may have been up for just a few minutes when the command was entered!

I'm surprised no-one else noticed this.

Craig McGill.
Canberra Australia.

Wim Van den Wyngaert · ‎04-24-2005

Craig,

There are 2 usages for clusters.

The first is that x nodes form a cluster and each run their own application. The advantage is sharing the disks. On such a cluster, your comments would be right : the cluster itself doesn't mean much.

On most clusters however, there is offered a clustered service. This service can be offered as long as 1 of the nodes is running. The service can be an Oracle Parallel Server, a Sybase server with its companion, a stock exchange application with almost-real-time failover, etc. On such systems your comment is wrong. One can even upgrade certain nodes while others continue running the application.

I guess Jan is in the 2nd category.

Wim

Wim

David B Sneddon · ‎04-24-2005

Craig,

IIRC none of the "original" members exist. The cluster
members have been upgraded (rebooted) and the entire
cluster is now in a different location... all of this
with the "cluster" still being available.
Is that not what clustering is all about?

Dave

Uwe Zessin · ‎04-24-2005

Craig,

according to your member profile, this is your first day @ ITRC forums. Welcome!

If you had joined us earlier you would have a better background about the history of this cluster as it has been discussed a few times. Of course we can tell the difference between a node and a cluster uptime. Jan clearly wrote:
"=> cluster uptime:"

And yes, it is raining here :-(

.

Jan van den Ende · ‎04-25-2005

Craig,

But any member of the cluster may have booted any number of times since then. It unfortunately doesn't indicate how long a VMS system has been up.

Yes, very true. Even more than you mention.
For one thing, the cluster has been kept more-or-less up-to-date on hardware and VMS. (new version only after that was available > 1/2 year at least; patches only if there is one we think warrants the reboot cycle, and available > 2 weeks). It started at 6.2, and now is 7.3-2.
It started on AS2000 & Vax4000, now is ES40, ES45, & DS15. Half of it was relocated 7 KM.
HSZ50 connected drives were replaced by SAN.

But, uptime in itself is not the real issue:
COUNTINUOUS SERVICE OF APPLICATIONS IS.

From your profile I have no info on your background (and apologies beforehand if I am wrong in this!), but your statements seem to reflect a mindset like *UX or M$ type of cooperating systems undeservedly termed "cluster". (you know, the failover type).
In a homogenous VMS cluster every application accesses every disk _SHARED_. No need for dedicated Raw Devices. No need for a single Database Server that has _SINGLE PROCESS_ access to the database, to funnel _ALL_ db IO for _ALL_ processes.

Yes, this implies you use VMS-aware database systems. RDB, DBMS, RMS. (Oracle used to do it right, until they decided to become *UX software, sometime during ORA V7. Ingres did it ok, I do not know if they still do.)

But, for applications built on RMS, RDB and DBMS, it is entirely possible to have applications having the same uptime as the VMS _CLUSTER_, ie, uninterrupted service over rolling bootstraps.

For various reasons, several nodes have been part of this cluster for shorter or longer periods, and the oldest current member was added 9 nov 2001.
The current longest _NODE_ uptime is 270 days, but the current longest _APPLIC_ uptime is 2934 days. It came available about 10 minutes after the cluster_ftime.
It is a DBMS application. It has undergone 2 major applic upgrades, and 3 DBMS upgrades.
_THAT_ has also been done using Rolling Upgrades! Yes, it takes some planning.

So, I claim that we can fully agree with Wim: in a _HOMOGENEOUS_ VMS cluster, _NODE_ uptime is unimportant, as long as the _APPLICATIONS_ are well-behaved VMS applications. Application uptime is what counts. (and with over a dozen applics, not all of the same behavior, that is a complete discussion on its own).

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Wim Van den Wyngaert · ‎04-25-2005

Jan,

Just a few questions.
1) How did you do rolling converts of the RMS files (t.i. without stopping the access to them)
2) did none of the RDB upgrades needed system tables changes for which you need to stop the db clusterwide ?

Wim
(thanks for reminding me to have one on you : I need to buy a new stock of Duvel)

Wim

Jan van den Ende · ‎04-25-2005

Wim,

1) by knowing the applic, AKA cheating :-).
The nature of the RMS applications is such, that updates are done:
a) interactively during 2-shift office hours
b) nightly batch-mode info exchange with external parties.
The data must remain available for queries 24*365
After (verified) end-of-office, we use CONVERT/SHARE (see various postings by Hein) to a temporary file name. When done, rename to the original name. Any new user after that uses the refreshed file. And since that is done during a no-write period, old and refreshed file contain the same info.
Cost: some worktime during the 'read-only' window.

2) The DBMS application is somewhat similar.
The mutations are done by a single department (on 24*365 service), and by regular batchjobs.
Querying is heavy, both by the entire corps, as by all other Dutch police corpora.
It is this department that owns the applic, and can request updates. THEY then schedule a non-update period.
The database is set read-only, and from one node access to it is removed.
Working on that node, we create a new Concealed Device for the new version of the applic, and/or the DBMS, and/or the database.
If the update is successfull (according to the owning department) any new sessions are set to the node that has the new version. Since no interactive session is allowed to last longer than 10 hours, thereafter the other nodes are also set to the new version, and re-enabled.

So, strictly speaking, the apps are NOT continuously full-function. But, since they ARE continuously providing the service requested by there functional application managers, we (and more important, Management) consider them Continuous Available.

Proost.

Have one on me. (yes, Wim, my Duvel is already waiting in the 'fridge!)

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Wim Van den Wyngaert · ‎04-25-2005

Jan,

Lucky you to have a read-only application.

We have applications that auto-reconnect. So, you can stop the database and restart it without doing anything on the application site (60 stations and 2 servers). But of course during that time, the application isn't able to function.

This can only be used for small changes/resets.

And of course, sometimes it goes wrong, mainly reporting things in batch.

Wim

Wim

Craig McGill_2 · ‎04-30-2005

Apologies if I got anyone's nose out of joint. And apologies about my lack of profile info. I'll attempt to correct that now: I'm a VMS specialist and have been since about 1987 (first used VMS at University in 1985). I'm a sysprog (I especially like kernel mode programming and Macro-32, although my 3GL of choice is VMS Pascal and I'll fight anyone to the death who argues that C is better). I'm also a sysadmin, and security specialist. I've written device drivers and privileged shareable images. I've been a past chair of the OpenVMS Special Interest Group (SIG) for DECUS Australia.

Where I came from, lengthy NODE uptimes were an achievement - being sysprogs and sysadmins we don't care about the apps running on our systems ;-) We would always get quite pleased when we found a VAX sitting in the corner somewhere that people had (mostly) forgotten about, had entered SHOW SYSTEM and found a huge number of up days.

That's the background I was coming from. But I do understand your emphasis on APPLICATION uptime. I understand how that is important to customers (but not me!). So sorry about the misunderstanding.

But since this is a technical forum for VMS gurus and VMS interested persons, I must admit that I'm surpised there isn't much more interest in VMS uptime than application uptime. What's an application anyway? Aren't they those pesky things that run on our systems? Our boxes would run much better without those application things on them!

Can someone please post some stats about long VMS uptimes? MUCH more interesting than stupid application uptimes.

Jan van den Ende · ‎04-30-2005

Craig,
sounds you would agree with my definition of "userfriendly":

The users MUST be friendly to the system manager!

But, in the world I have to live in, my main sponsor is NOT paying for system availability "an sich", but for the way we are able to allow the users to use the service we are supposed to supply. Translates into applic availabilty at our site.

And in the end, it is that sonsor's monthly tribute that pays the mortgage...

Proost.

Have one on me. (my brother-in-law just served a load of Grolsch on occasion of his daughter's birthday, join me virtually!).

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Uwe Zessin · ‎04-30-2005

It's hard to achive a great uptime if your boss asks you to reboot, because some system processes (like ERRFMT and JOB_CONTROL) have accumulated a noticeable CPU time :-(

I was able to resist, but then a VMS bug hit me/ my systems and my boss got what he wanted. Scroll back to my response from Apr 15, 2005 10:40:56 GMT for details.

How do you achive uptime when doing kernel mode programming? You must be a genius ;-)

.

Jan van den Ende · ‎04-30-2005

Uwe,

that is one advantage if only _YOU_ are the savvy people: Give 'm what they demand, but perform a _ROLLING_ reboot.

Then again, our current middle management _ARE_ ex-VMS people.
They (not too) secretly take some pride in our achievements, but formally, they HAVE to endorse the formal ( *UX) policy.
A little bit painfull (and undeserved pain at that) that the UX flavor is Tru64...

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Anton van Ruitenbeek · ‎05-02-2005

For rolling-reboot _YOU NEED_ a minimum of two machines !
Not everybody has this.

AvR

NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !

Willem Grooters · ‎05-03-2005

if you have just one machine, an uptime like this cluster is even harder to achive. I don't think any stand-aolne Alpha will ever reach this.
I've heard tell that some stand-alone VAXen have even longer uptimes than those 7 years. I don't know what's true on these claims, but given the technical structure and robustness of the VAX, I'm not too surprised.
There are reports of PDP's that run longer than any VAX (and in far more hazardous environments), but these are definitely exceptions.
Are they?

Willem Grooters
OpenVMS Developer & System Manager

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime

Re: Uptime