Business Recovery Planning
1752725 Members
5538 Online
108789 Solutions
New Discussion юеВ

Regarding PM

 
SOLVED
Go to solution
nanan
Trusted Contributor

Regarding PM

Hi everyone
I would like to ask something about PM(proactive maintenance)
I need a detail explanation as much as you can.

these are my questions
I think PM is necessary,
If you don't feel like me, Please let me know your opinion

1. The reason why we have to do

2. What kind of things we can
EX)Online diag, Offline diag, patch....

3. Let's have an example, you are operating some system(DB) supporting real time data to customer and must not be stop all day.
So your system is built by two node RAC cluster
and you need to apply some patch
In the case, you don't need to stop the business because you can apply the patch one by one with halt node.
But If you have to upgrade DB SW, you need to stop both system's all instances, don't you?
and the SW upgrading takes about 30M~1H of couse it depand on what you apply to DB.

Anyway! I want to know how to maintain non stop PM or How to reduce the down time

If you are doing PM, please show me the lists
and the way to avoid down time in real time system

Regards
nanan
5 REPLIES 5
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Regarding PM

There are two kinds of downtime: Planned and Unplanned and if you don't have planned downtime you are certain to have unplanned downtime. First there is no need to periodically reboot, clean the machines, or anything like periodic hardware maintenance, iff, you have a very clean, cool, controlled humidity environment with good, clean power. There is great need to apply patches although generally just doing the Gold Qpk releases is good enough. You should always peruse the recent patch release notes for phrases like "critical" and "possible data corruption". The very last thing you want is a problem that corrupts small quantities of data here and there. It might be months before that kind of error is actully identified. The worst possible scheme to adopt is "it ain't broke so I won't patch it" because how do you know it ain't broke?

Your specific example does require some planned downtime. Even if you install the newest Oracle patches in a safe location and can simply redefine the ORACLE_HOME directory to point to the new location there is almost certainly a script which must be run to update the data itself.

My experience has been that getting planned downtime is not that difficult if: 1) They are announced well in advance and with the concurrence of the businesses so that minimal disruption will result 2) Scheduled for minimal impact. 3) You ALWAYS stay within the timeframe for your maintenance window. If you say 3 hours then that does not mean 3 hours and 5 minutes.

Now, the only way to meet condition 3 is to have tested the patches in a Sandbox so that you know exactly how long it will take and you know about any unexpected problems before they occur.

My deployment scheme is 3-tiered: 1) Sandbox 2) Test environment 3) Production. By the time I get to the Production environemtn I know all the "gotcha's" and the patches have been deployed long enough for any expected behavior to manifest themselves in Test. Generally, at least 3 weeks transpires between Sandbox and Production. The exception to this are those release notes which mention "data corruption"; those are deployed on an accellerated schedule if I determine tyhat the conditions might occur on my boxes.
If it ain't broke, I can fix that.
nanan
Trusted Contributor

Re: Regarding PM

It might be a way to reduce down time that runing the DB application on DR system
Even we need to down time for data sync between each Storage.

Are you knowing an another way,
Let me know please
Wim Van den Wyngaert
Honored Contributor

Re: Regarding PM

I must disagree with Clay. I'm on OpenVMS.

We have very old stuff running on our nodes and do not patch the system after the system is in production. If we modify something, complete qualification must be done and all the applications on the system must give their green light before the patch/upgrade is applied. There are simply to many problems with patches (on all OS). "If it ain't broke, why break it ?".

Also : never install patches in production that have not been released for e.g.. more than 3 months.

To give you an idea :
firmare problems : http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=958746

Patch problems :
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=655880


Fwiw

Wim
Wim
nanan
Trusted Contributor

Re: Regarding PM

Hi wim

Thanks your opinion!
My case is HP-UX ,
VMS have been known for very stable system and old OS, so you may not need to install patch as much as we think.
But as you know HP-UX is open system and there are so many applications on it
and always changing production system because new developed source programs are ported
and every day new patchs are released from HP to prevent HP OS to be robust system.

Just after facing a problem what we take mesures is not accept to my internal customers who using IT infra.

In conclusion, we have to do proactive somthing for non interrupt business services

Regrads
Allan Bowman
Respected Contributor

Re: Regarding PM

Hi Nanan,

I think Wim's point was that you don't need to apply every patch that gets released. If you are having a problem that is addressed by a patch, you should certainly install it (after testing on a non-production system). Or if a patch is released that appears may have have an impact on your system, further investigation is certainly warranted.

Yes, VMS is very stable, but not just because it is "old" (or as we prefer to say: mature), but because it has one of the best OS engineering teams around. And just like HP-UX, VMS is also an open system (hence the full name of OpenVMS) and has many applications running on it and has patches coming out between releases.

I recently came from a mixed environment (OpenVMS, Linux, and SCO Unix) where we performed monthly PM's on all systems mainly to ensure software stability and prevent expansion of memory leaks (these occurred on only a handful of the Unix systems but caused us to not trust the other systems). We had numerous identical systems connected via WAN and could repoint traffic from one system to another so that the PM procedures had zero impact on production and were done on a rotating schedule (1/4 of the systems each week).

Hope this gives you a little insight.

Allan in Atlanta