- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- "Standby" Node Management Advice
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 08:11 AM
тАО10-20-2010 08:11 AM
I will be configuring a stand-alone node, but with a dedicated, cold-standy back up system. Some quick info:
1) 2 ES45s
2) All disks (including system disks) will be on an HP SAN
3) All disks will be presented to both nodes
4) The BOOTDEF_DEV of each node will point to the same system disks
5) If the active node fails, the recovery procedure will be to confirm that it is "down", then boot the "cold" node.
5) We expect to do a monthly test of shutting down the active node & booting the "cold" node.
My request: what processes/procedures can I put in place to reduce (if not eliminate) the possibility that the "cold" node will be booted while the active node is running (which, I expect, would immediately corrupt the system disk, etc)
TIA
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 08:32 AM
тАО10-20-2010 08:32 AM
SolutionI will first make the somewhat obligatory observation that this is one of the most basic uses of OpenVMS clusters.
That said, I would not use the same system disk for both systems. I would also make sure that the procedures ensure that the "dead" CPU is permanently disabled, an accidental boot could be a serious problem.
There are few safeguards that one can construct without using OpenVMS clusters. Off the cuff, one could do something with multi-homed IP and PING with each of the systems having a unique IP address and trying to reach the other node. If the other node responds, STOP. However, this is a bit of a hack and it only guards against startup while the other node is running If network connectivity is lost, but the SAN remains operational, one is back where one started from.
On the OpenVMS cluster front, there was a presentation at the 2007/2008/2009 HP Techforum on a simple little package, for use with OpenVMS clusters, that would allow one to easily migrate single node applications to the other node in the event of a failure. The intent was to provide a way to provision non-cluster aware applications in a cluster.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 09:21 AM
тАО10-20-2010 09:21 AM
Re: "Standby" Node Management Advice
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 09:34 AM
тАО10-20-2010 09:34 AM
Re: "Standby" Node Management Advice
That said, I would configure a unique system disk for each ES-45. The fail over server starts and mounts no secondary disks. Document your fail over procedure to change the boot device on BOTH systems to place the fail over server into production. Having the server running will let you know if it suffers a hardware problem. You add the risk of having secondary disks mounted.
Make sure the risks are documented and management understands the trade off. Short term savings now can lead to down time and lost data costs in the future.
You can do this, is it a good idea. "It depends."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 10:24 AM
тАО10-20-2010 10:24 AM
Re: "Standby" Node Management Advice
I agree with this.
I presume that the two nodes will also use the same value for BOOT_OSFLAGS as well as the same BOOTDEF_DEV?
If so,
and you configure this system as a cluster member,
and these systems share a common network that will pass SCS traffic,
then the VMS clustering software would force a second node to CLUEXIT bugcheck very early on in the boot process as it would be attempting to boot with the same SCSNODE and SCSSYSTEMID (regardless of votes, etc).
A cluster can't have two nodes with the same identity and won't let it happen...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 10:30 AM
тАО10-20-2010 10:30 AM
Re: "Standby" Node Management Advice
If that's the case, that implies you might also want to look at what a different operating system might offer you here. Or at re-working the application to work around the lack of a cluster PAK, though that, well, duplication of what clustering provides is going to cost some money. (On the other hand, a good cluster design might also mean your code might or can be more platform-portable, too.)
Easiest? Install an operating system platform that's suited for this particular design. That's not VMS. Sure, you can hack VMS to work here, particularly with the mess that is host names and - if the host and peripheral devices aren't identical - the hardware configuration. And the risk of collisions. But you'll be fighting the OS. VMS really wants cluster-aware designs, and wants to run both boxes in parallel, and wants to use both of the boxes rather than letting one sit cold.
What do you need to look at to force-fit VMS into this design, from the OS-level? Careful management of host-specific device names. At maintenance of the NIC addresses, and the rest of the networking set-up. And a decision of whether you're eventually going to move to clustering, or stay with this scheme. And interlocks to ensure both boxes aren't booted in parallel (and that the applications themselves aren't operating in parallel), as others have mentioned.
Another available approach here can be a way to do a fast install or a fast recovery scheme from your own archives (and using InfoServer or such), and involving your own system disk images and maintenance of your code and your data. This requires some maintenance of the existing system and application environment; eliminating the ad-hoc operations of the platform from the environment may or may not be feasible here.
And FWIW, if you have solved the sequential-boot cold-standby scheme that you're aiming for, you've largely also implemented the fast load scheme, too.
Most of what's here is an application programming question, really. Designing a cold-start, single-host, non-server-parallel application. And either ignoring, or explicitly bypassing most of what VMS provides here. Probably the best way to do that is to ignore as much or all of VMS; as much of it as you can. (Which also then leads you to look at whether you even need or want VMS underneath there, too.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 11:06 AM
тАО10-20-2010 11:06 AM
Re: "Standby" Node Management Advice
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 12:07 PM
тАО10-20-2010 12:07 PM
Re: "Standby" Node Management Advice
really, if this is really what is needed, your safest, (and in the end probably cheapest as well) solution is a cluster license. Set up each node to boot from the same root of the same device.
You will essentially be running one-node cluster, but most importantly, in one single pass you will be GUARANTEED to never run both nodes simultanuous. (as already mentioned, that would lead to CLUE$EXIT before damage can be done). That includes safeguarding against ANY potential errenuous OR MALICIOUS
or plain stupid wrong actions.
Consider the cost of trying to think of, write, and implement someting that TRIES to reach something similar, and you will find the licence is actually the cheapest option.
just my EUR 0.02...
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 12:27 PM
тАО10-20-2010 12:27 PM
Re: "Standby" Node Management Advice
Your basic assumption here seems to be that any system failure will be of the ES45 box, rather than (say) the disks, SAN, power, aircon, network, etc... You may want to look at other failure scenarios to make sure you're not putting all your recovery eggs in just one (unlikely?) basket.
Without clustering you have no way to detect another node booting from the same disk, and no protection against corruption.
Perhaps you should present the disks to only ONE node at a time? Part of your recovery procedure includes switching the presentation at the SAN level. If you can't see the disk, you can't boot from it, or corrupt it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-20-2010 05:54 PM
тАО10-20-2010 05:54 PM
Re: "Standby" Node Management Advice
"2) All disks (including system disks) will be on an HP SAN"
Was that intentional or a typo? Do you plan to have some sort of CA (Continuous Access)going on to another site? If you bite the bullet on Cluster licenses, while you're begging you should ask for Shadowing licenses and make that bit of the DR recovery "easier" too.
Cheers,
Art