- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Availability Manager Analyzer on WIndows 7 cou...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2012 08:28 PM
08-23-2012 08:28 PM
Availability Manager Analyzer on WIndows 7 could not fix quorum
We have a cluster of Itanium blades across 2 sites (4 on primary and 3 on secondary) running OpenVMS 8.3-1H1. Availability Manager V3.0-2A runs on all nodes and the server is also running on one of the nodes in the secondary site. We are running Availability Manager Analyzer 3.1-2 on our laptop running Windows 7. We used the IP of the node running the Availability Manager Server to monitor the cluster.
Using the availability Manager Analyzer, we can monitor the VMS nodes and crash any of the nodes. We simulated a primary site crash by crashing the primary site nodes leaving the secondary site nodes in hang state. We can see from the Analyzer each icon being greyed out after it was crashed. When all 4 primary nodes crashed, all icons turned grey also. We are not able to fix the quorum from any of the remaining secondary site nodes because the FIX option is greyed out.
We are using the triplet *\1DECAMDS\c on our VMS nodes. We were forced to use the ILO to fix the quorum.
ANy suggestion is welcome to solve this issue.
Thanks.
Noel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2012 12:12 AM
08-24-2012 12:12 AM
Re: Availability Manager Analyzer on WIndows 7 could not fix quorum
You should not run the Availability Manager Server on a managed node, particularly a clustered one. It is a normal user mode application, and will hang if the cluster loses quorum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2012 09:22 AM
08-24-2012 09:22 AM
Re: Availability Manager Analyzer on WIndows 7 could not fix quorum
>>> It is a normal user mode application, and will hang if the cluster loses quorum.
As Richard points out, by running Availability Manager Server on a node in the cluster, if the cluster isn't available, the Availability Manager Server will also not be available. Your options would be to either bring up an emulated VMS system with Availability Manager Server installed outside the cluster, or a Windows based system. I'd consider doing that at each site. I used to address this situation by running AM on a local Windows server and configuring VPN access, one AM system at each site. That was before the AM rely server option was released.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2012 02:30 PM
08-26-2012 02:30 PM
Re: Availability Manager Analyzer on WIndows 7 could not fix quorum
Slightly off topic...
>We simulated a primary site crash by crashing the primary site nodes leaving the secondary site nodes in hang state
Please be sure you understand exactly what you're simulating. Using AM to crash nodes serially does NOT simulate losing a site because you can't synchyronise the crashes. What you actually get is two crashes in quick succession. Look at the logs to see the true sequencing of events.
This may be adequate for your purposes, but if you need to test a more realistic site failure, you have to be a bit more inventive. Here are some DANGEROUS programs which can be used to simulate a site failure by synchronising the target nodes so they crash as simultaneously as possible.
.TITLE gate ; ; implements a "starter gate" using locks ; $LCKDEF .PSECT data,rd,wrt,noexe,quad lksb: .BLKL 2 res: .ASCID /StarterGate/ .PSECT code,rd,nowrt,exe .ENTRY start,^M<> $ENQW_S efn=#0 lkmode=#LCK$K_EXMODE - lksb=lksb flags=#LCK$M_NODLCKWT - resnam=res $HIBER_S RET .END start
.TITLE SiteFailover ; ; Deliberate crash of a system, synchronized across multiple nodes ; using a "starter gate" lock ; $LCKDEF .PSECT data,rd,wrt,noexe,quad lksb: .BLKL 2 res: .ASCID /StarterGate/ .PSECT code,rd,nowrt,exe .ENTRY start,^M<> $ENQW_S efn=#0 lkmode=#LCK$K_CRMODE - lksb=lksb flags=#LCK$M_NODLCKWT - resnam=res ; $CMKRNL_S die ; commented out for safety MOVL #40,r0 RET .ENTRY die,^M<> CLRL R0 ; MOVL (R0),R0 ; commented out for safety RET .END start
Start by running the "starter gate" program as a subprocess. It takes out an exclusive lock on the "StarterGate" resource, and then hibernates. Now run the SiteFailover program on each node that you want to crash. (Realise that you'll need to remove the safety comments for it to work, and the processes will need CMKRNL privilege). They will all request the lock being held by the starter gate. When you're ready, kill the starter gate process. It will drop the lock, which will release all the killer processes, crashing all the systems before they have time to detect each other. This will give a more realistic site failure crash than serially crashing nodes.
I call our killer program "DELIBERATE_SYSTEM_CRASH" and build it only when needed (indeed, the image is deleted before releasing the StarterGate). This helps prevent accidents and means the crash dumps have a clear indication that they crash was intentional.