- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: hard stopping( failover test) hpux cluster pac...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2008 01:45 AM
05-13-2008 01:45 AM
hard stopping( failover test) hpux cluster package
In about a few weeks I want to test failover on a HPUX MCSG HA cluster.
What I want to know if the following test is good test for testing failover
We have two Itanium server RX4640 which part of production cluster. Both has the OS "HP-UX B.11.23".
hpux01 and hpux02.
hpux01 is running a hpdb01 cluster packages with contains 20 oracle databases.
hpux02 is not running any cluster packages and it configured as failover for hpux01.
hpux01 is configured to only switch to hpux02.
The hard failover test that we thought of is:
Take both the power cable of the out of the hpux01.
As a result the hpdb01 cluster packages ,which is running on the hpux01, should failover to hpux02.
- Is this a good test?
- What will be the result if take out powercable without shutting down the system, correctly?
- Could this result in a filesystem corruption?
- Could also corrupt the hpux11.23 OS on the hpux01?
If this is not good test for testing failover, Please let me know
what other alternatives there are for doing a hard failover test.
- Is killing the cmcld process maybe a good failover test?
root 13137 13118 0 Apr 23 ? 64:09 /usr/lbin/cmcld -j
Thanks for your help.
Kind Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2008 03:32 AM
05-13-2008 03:32 AM
Re: hard stopping( failover test) hpux cluster package
Killing cmcld will force a system to take a memory dump and reboot, which is also a legitimate test, although it will take longer for the server to go through the TOC/reboot cycle (and you should clean up the resulting dump in /var/adm/crash).
A question comes to my mind though, why run all of the databases on one server, leaving the other one totally idle? Wouldn't it be better to distribute packages such that each server were equally loaded?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2008 04:36 AM
05-13-2008 04:36 AM
Re: hard stopping( failover test) hpux cluster package
Then put that plug back and drop the other side of your power. Is the box still holding up? - Then if you want to do a full power loss, drop the remaining power connection.
Another, less forceful test you may wish to add to your list of things to check, might be to drop the network connections and check the results.
You might first ensure that lan0 will fail over to lan1 (or whatever is your second lan failover nic). Then ensure it did.
Then, add to your network disruption, the heartbeat for hpvm01.
You might also add testing your pvlinks or whatever utility you use for I/O redundancy.
Drop one link........is everything still running ok? Put the link back, and then drop the other link.....is everything still running ok?
If your going to test failover - test every feature you can think of.
Rgrds,
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2008 04:43 AM
05-13-2008 04:43 AM
Re: hard stopping( failover test) hpux cluster package
I totally agree. I am amazed at how many places out there set up 2 node clusters and totally waste the second node. You could distribute your packages across the two giving failover to each side, thus possibly improving application performance.
Or...another suggestion. Make 1 node your production node and the other your test/dev node. If the production node goes down, it fails over to the test node. Do not give failover option to your test/dev packages When your production packages come up on the test box, at the very start of each production package, just put a simple one-liner to halt it's development package FIRST......and voila - dev goes down, production comes up... and both boxes are being fully utilized and mgmt see's a "bang for the buck!!"
The one-liner:
/usr/sbin/cmhaltpkg -n $(/usr/bin/uname -n)
This says that if the dev package is running on this box...turn it off before you start running this production pkg.
Rgrds,
Rita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2008 12:30 AM
05-14-2008 12:30 AM
Re: hard stopping( failover test) hpux cluster package
But this is not the only testcase! Most of the time the power test works fine and everything gets stuck if network errors occur. E.g. the package stop script might hang forever because the application depends on a present network connection while shutting down. Find out potential software problems too e.g. client issues!
And be critical with your environment and pull any(!!!) cable if in doubt.
The other thing is the empty failover node.
As the others state it's always a good idea to split the application into 2 packages or run more than 1 package on every node.
But don't build asymmetric clusters (e.g. configure the test package to run on 1 node only).
- You'll get different configurations for every cluster node, which makes troubleshooting and recovery difficult.
- Administration (e.g. creation of users) is harder and directory/user id collisions occur more often.
- You'll increase the overall downtime of your production package because everytime after a failure of your production package you need to switch it back to get the rest package up again.
So, from my point of view, always create a cluster with identical configurations, patch levels etc and configure packages to be able run on all nodes. It'll make life easier in case of an incident... and you'll find out in the very same second you're responsible for bringing things up again after a crash/failure.
My 2 cents,
Armin
PS: Please assign points if you find answers useful!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2008 01:57 AM
05-14-2008 01:57 AM
Re: hard stopping( failover test) hpux cluster package
Please note you are not just testing Serviceguard to see if it works. You are also testing it so that you can learn about the expected behaviour and be prepared for every eventuality. Make a list of every test and document what happens. I can assure you that someone in the future will ask you what will happen under certain circumstances.
The other replies were excellent, I completely agree that you must test gradual fail of network interfaces and of power.
Serviceguard is only designed for SPOF, so you should also know what happens under various MPOF conditions and how it would react - will it TOC, will it stay up...etc...know the limits of your config.
You must also test failure of a network switch, then several.
Also, know the limits of your serviceguard protection, for example a single link to an external system.
It is also worth testing to see what happens to the client software - can it cope with a package switchover?
The HA documentation under docs.hp.com gives more detailed information about this vital part of Serviceguard implementations.