cancel
Showing results for 
Search instead for 
Did you mean: 

Achieving high availability

Brian Bientz
Advisor

Achieving high availability

We are looking to bid on a contract that requires 99.999% availability. How does one achieve this level of availability? I understand the use of clustering and Oracle RAC to handle unplanned downtime. But how do you handle the software upgrades? Can software upgrades really be performed without incurring downtime? Can this level of availability only be obtained if you never touch the application?

Any experience, advice or warnings would be greatly appreciated.
8 REPLIES
Ken Hubnik_2
Honored Contributor

Re: Achieving high availability

Upgrades are done by failing systems over to backups then failing back to upgrade the backup. Serviceguard is the HP package that handles the switching.
Christopher McCray_1
Honored Contributor

Re: Achieving high availability

Hello,

The method is referred to as "Rolling Upgrades". For example, if you have a 2-node cluster, a database package and you are upgrading the version of Oracle, you would upgrade the Oracle application on the one server, then move the package over to the other node. Then you would upgrade Oracle on the other node and then move the package back, if desired. However, the 99.999% availability mentioned is not due to merely installing MCSG and creating a cluster. There are many hardware redundancy steps that are taken and also the type of hardware is taken into account. Mirroring the OS and data is also a factor, LAN redundancy, etc. 99.999% == $$$

Hope this helps

Chris
It wasn't me!!!!
Byron Myers
Trusted Contributor

Re: Achieving high availability

Brian, A key factor is understanding a 99.999% unplanned downtime requirement, or is it 99.999% planned AND unplanned downtime requirement. For the unplanned down time, you can incur 5.2 minutes of unplanned down time per year (99.999%) availability. BTW, 99.995% downtime equates to 26.8 minutes of unplanned down time per year - a whopper of a difference.
Service Guard is great for 99.995% but is pushing its limits for ORACLE with 99.999%. It takes my Service Guarded Oracle systems from 5 to 10 minutes to reform after a primary node failure. Obviously you're big up front cost will be a robust HW infrastructure - Cluster, redundant everything, LAN, Storage subsystem, etc. If you plan on doing online maintanance of OS and HW, check out Stratus products (stratus.com). These are fully redundant server with redundant everything running CPU's in lockstep - online maintenance of HW and OS and run native HP-UX. Whatever your solution, maintaining Oracle WILL require down time - much more that 5 minutes for activities like version upgrades. I have one Service Guard cluster with two N4000's that have been running for three years without a failover. But we schedule monthly maintenance to perform HW, OS, database, or any other type of maintenance that can be scheduled. Another selling point of Service Guard clustering - during scheduled maintenance, you can move packages (ORACLE) to the standby node (5-10 minutes down time) and perform maintenance on the primary node. When done, you can move Oracle back to its original node (another 5-10 minutes down time). Hope I have helped.
If you can focus your eyes far and straight enough ahead of yourself, you can see the back of your head.
A. Clay Stephenson
Acclaimed Contributor

Re: Achieving high availability

You understand that what you are trying to achieve is 5.26 minutes of downtime/yr; I doubt that is possible but 5.26 minutes of unplanned downtime probably is. I am currently well over three years with zero unplanned downtime in my production environment. The problem with achieving your levels of uptime is that even if everything works perfectly the time required to for package switches to occur will probably exceed your limits. You almost certainly will have to do some package switching for maintenance.

Very few application require no 'touching'; bugs are typically found quite often under unusual circumstances. You will almost certainly have to apply critical OS and database patches thus some planned downtime will be required.

If you can get the problem changed to no more than 5 minutes of unplanned downtime per year then that becomes a much more achievable target.
If it ain't broke, I can fix that.
Ashwani Kashyap
Honored Contributor

Re: Achieving high availability

HP is the only company , I think , that offers this kind of availibility . They call it the concept of 5 nines or a fault tolerant system with a total downtime of 5 min planned or unplanned . But it costs millions of dollars and it is only implemented by HP consultants . Basically its a total revamp of your infrastructure and softwares with HP proprietary stuff . The solution is so costly though that only a few financial companies that need that kind of availbility could afford it .

The next level of availability is 99.99 which is a down time of 50 min.

The next level of availbility is 99.95 which is a down time of 4.3 hours which you can achieve thhrough normal service guard implementations .
David Storrie
Occasional Contributor

Re: Achieving high availability

Brian, as the others have stated the switchover times associated with MCSG might be prohibitive to your 99.999%. If your application allows it, I would suggest using a load balancing server (i.e. Alteon) and whatever native replication Oracle offers. This should allow switchover with only loss of inprogress transactions in the event of failure.
Tim Middleton
Occasional Contributor

Re: Achieving high availability

Brian, Oracle replication can be a complete pig to set up and configure. Try looking at other products such as Quest Shareplex. This software has similar functionality to Dataguard in Orcale 9i release 2
John Dixon_1
Advisor

Re: Achieving high availability

Hi Brian,
You may want to check out www.stratus.com
They do some Continuum Servers which are designed for specificly for high availability. They have systems that run HPUX, so no need to get new OS skills.

cheers,
John