- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Disaster Recovery - Need Help and Advice Despe...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 07:59 AM
тАО01-08-2002 07:59 AM
First of all - Happy New Year to everyone.
I really need the opinions of my fellow admins and peers.
My question is regarding disaster recovery. In previous jobs we would test the validity of our tapes by restoring them to test servers or restore individual files etc. Do ignite restores etc. This was deemed sufficient and I was never in a situation where I was not able to recover.
Well the company that I work for now seems to have a different interpretation of disaster recovery. This is what they are proposing:
They want to simulate hardware failures on the servers so they want me to disable processors then see if the box still works. Disable power supplies, memory, cards etc. Then see if the box still works. I think this is beyond insane but is this something that is done in other shops? I am not at all comofortable, "breaking things," on a running production system. We are not a 24 X 7 operation and if we were we would need a cluster, which they don't want to hear. What do other companies do for disaster recovery? Any and all help would be appreciated and points will be assigned as always.
Regards,
Rob Smith
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 08:14 AM
тАО01-08-2002 08:14 AM
Re: Disaster Recovery - Need Help and Advice Desperately
Well, it sounds like they want you to test for stuff on the production box?
If you had a test box like in the past, maybe, but some things are obvious like pulling the power, unless you have a UPS.
Here, I try to keep good documentions, along with the print_manifest, and just have a "hot" folder in case of a failure which needs me to interact like restoreing a drive, and volume groups, things that we, as administrators have to interact with. We have service on all the hardware, so if a power supply craps, out, I am at the mercy of support, in which I have all the info i would need to call them also in a folder.
if you have a test or an unused system, then its nice to "play" but we hadnt been told to just yank the power, or cards, or kill stuff to see what happens, cause you could get differnt results sometimes, and I dont think you can "predict" things, knowing what to do, and how to
isolate it, and resolve it, either by you or servcie contract will make you ready.
We also have a contract where i send DRP stuff to a hot site, so if something bellies up like the whole building, they can restore us from that site and have us work there.
I think your correct in your post, I wouldnt want to just kill stuff, to see what happens.
scott
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 08:19 AM
тАО01-08-2002 08:19 AM
Re: Disaster Recovery - Need Help and Advice Desperately
In our company we have a cluster (2 nodes) for a 24x7 production environment for call center application and we have one server for production DataWarehouse, and in develop environment we have one box with all the developing people for all the different environments, but we don't have a test environment.
This is a problem for us, because we can not probe our BRS system, then we do backups with omniback and make_recovery, but we can not check it.
You are right the best way to check your BRS environment is with a test box, but usually this is very expensive for the company.
In my company we don't do any check of our BRS system, then when the servers fail we will try to solve.
Regards,
Justo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 08:25 AM
тАО01-08-2002 08:25 AM
Re: Disaster Recovery - Need Help and Advice Desperately
It sounds like they want to create real 'disaster'scenario by breaking a production system??
Well, usually in many sites we create similar setup like production system and test the things in "test environment"
There should be multiple UPS, Power supply to the running systems as a backup resources is a standard way of connecting the system. Some New boxes infact have in-built multiple power supply units.
HP's Service Guard is built on same technology
There will be "perfact pair" in a production environment, If one server goes down by any reason the other should "must" take entire responsibilty.
There are some good docs and forum contributions on Disaster Recovery Process.
Goodluck,
-USA..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 08:26 AM
тАО01-08-2002 08:26 AM
Re: Disaster Recovery - Need Help and Advice Desperately
I wouldn't take a production system and deliberately try to cripple it. That's why one has a test server.
If your management is serious about disaster recovery, they should contract for a disaster recovery service with an appropriate provider.
In that way, depending on your contract, you can actually travel off-site with Ignite and backup tapes and restore (or attempt to do so) your servers from scratch.
Developement of a true disaster recovery plan is a feat in and of itself.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 08:36 AM
тАО01-08-2002 08:36 AM
Re: Disaster Recovery - Need Help and Advice Desperately
If you pull a system board to disable a CPU, what happens if you bend a pin, or fry the board when you are reinstalling? Can you say HP time & materials for repairs! That will get expensive in a hurry.
Can you tell that I recommend against doing this? :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 08:42 AM
тАО01-08-2002 08:42 AM
Re: Disaster Recovery - Need Help and Advice Desperately
I am attaching disaster recovery plan followed in our organisation. This may be of help
Regds
Ramesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 08:46 AM
тАО01-08-2002 08:46 AM
Re: Disaster Recovery - Need Help and Advice Desperately
I think the request to disable hardware on a running, production system is way overboard. On a test box, anything goes. But production? No way. How often is the power turned off to half the building just to see if the company still runs? How about disabling a couple of managers? Hmm, things might run better.
We have a contract with a DR site. We run DR tests there twice a year to ensure we can recover. I use Ignite and Omniback, make regularly scheduled backup, and send tapes offsite.
Document, document, document. I run scripts regularly to gather system documentation, transfer it to my PC, and send copies of it offsite also.
Sounds like they may be confusing DR with High Availability. That's a different animal but still, I'd be very hesitant to disable a production system.
Darrell
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 08:49 AM
тАО01-08-2002 08:49 AM
SolutionThese guys are confusing fault tolerance with disaster recovery; the two are not the same and in fact are only barely related.
Your past solution was Disaster Recovery - what do I do if the building and I are no longer there. If you have planned well enough, someone else can follow your procedures at a recovery site and get a replacement system back up and running in a timely manner.
Fault tolerance is what your guys are trying to explore. This is really, really dumb on a production box. I would only do this on a test box/cluster or better still a sandbox. Unless you are willing to setup MC/ServiceGuard, you can only go so far. Certainly mirror disks or use arrays with multiple data paths. You can also add Auto Port Aggregation (APA) for network resiliency. I do very heavily test sandbox configurations by removing SCSI cables; network cables; killing power supplies; etc.
You don't mention what your equipment is but perhaps the best way to apprach this situation is to persuad your powers that be to purchase a test box on the used-market. This would give you spare parts and would allow a rigorous test program.
Regards, Clay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 09:00 AM
тАО01-08-2002 09:00 AM
Re: Disaster Recovery - Need Help and Advice Desperately
Just my thoughts,
Craig
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 09:26 AM
тАО01-08-2002 09:26 AM
Re: Disaster Recovery - Need Help and Advice Desperately
Other then that you can pull cpu's in between reboots or maybe even disabling via the GSP? I have only seen enabling one after a bad one was replaced. Your not going to be able to pull memory without shutting down the box?? I think.
What they are really testing for is your argument. If you are worried about hardware failure build as much reduncy as possible. Some can be done with clusters like fans,power supply, UPS, backup generators, mirroring etc.
It just depends on how many 9's do they want and keep in mind the more 9's the more $.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 09:59 AM
тАО01-08-2002 09:59 AM
Re: Disaster Recovery - Need Help and Advice Desperately
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2002 10:34 AM
тАО01-08-2002 10:34 AM
Re: Disaster Recovery - Need Help and Advice Desperately
This would be a great test if you were testing the resiliancy of these servers (IE Stress Testing & Fault Tolerance) but for what you apparantly are running in your environment, this is OVERKILL.
Testing like that takes place when testing VCS clusters or MCSG clusters, not when you have standalone production boxes.
Most importantly, anything can happen when you start playing around purposely with the hardware components, your company may find itself in a true down situation when all it want's to do is test.
I would strongly not recommend this level of testing and I am quite sure that if they spoke to your ASE from HP they would confirm exactly the thoughts that everyone here has expressed.
Good luck.
FG.