- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- A lot of processes in state RWCAP
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2004 04:00 AM
07-08-2004 04:00 AM
A lot of processes in state RWCAP
Today we analysed a dump of a node of aa dual node cluster, that seemed to be frozen.
A lot of processes were in state RWCAP, including the CLUSTER_SERVER and OPCOM processes.
We saw that every process has capabilities QUORUM and RUN, but boths CPUs only had RUN (and the primary CPU of course had PRIMARY), which explains why so many processes are in state RWCAP.
Now my question is: if the CLUSTER_SERVER is in state RWCAP, will QUORUM capabilities of the CPUs ever be set again in the CPU database???
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2004 04:12 PM
07-08-2004 04:12 PM
Re: A lot of processes in state RWCAP
If this is a CLUEXIT bugcheck, then the RWCAP processes are "normal". When quorum is lost, the QUORUM capability is removed from CPUs, that's how OpenVMS prevents processes from running until quorum has been regained.
Changes to CPU capabilities are made in high IPL interrupt service routines, rather than from the process context of CLUSTER_SERVER, so the fact that it's in RWCAP is not an issue.
Perhaps there is a comms problem which is causing quorum to be lost?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2004 06:11 PM
07-08-2004 06:11 PM
Re: A lot of processes in state RWCAP
The system frooze, then it was decided to press the HALT button and crash the system from the prompt.
I doubt that it is a clue exit.
The strange thing is that we not not see any errors in the ERRLOG.SYS, nor the OPERATOR.LOG
Probably because the processes that should do so are in state RWCAP ???
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2004 07:20 PM
07-08-2004 07:20 PM
Re: A lot of processes in state RWCAP
If you were seeing excessive remastering activity, you'd likely spot
processes in RWCAP state.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2004 07:23 PM
07-08-2004 07:23 PM
Re: A lot of processes in state RWCAP
Can you explain a bit more about what you mean.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2004 08:01 PM
07-08-2004 08:01 PM
Re: A lot of processes in state RWCAP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-09-2004 04:14 AM
07-09-2004 04:14 AM
Re: A lot of processes in state RWCAP
RWCAP here indicates a state of quorum loss. (We know because the CPUs lack the QUORUM capability bit. All the processes require both QUORUM and RUN to be scheduled to run).
The node could lose quorum if it lost communications with the other node.
What type of cluster interconnects are involved?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-09-2004 04:35 AM
07-09-2004 04:35 AM
Re: A lot of processes in state RWCAP
Things that might conceivably cause a hang and quorum loss include:
o Bad hardware generating a steady stream of interrupts such that you're stuck up at hardware interrupt IPL
o Software problem or overload that keeps the Primary CPU saturated at or above IPL 8, so things like PEDRIVER Hello messages don't get sent out and communications links look like they're broken as a result. (And if this occurred on the other node, making it uncommunicative, that might cause this node to lose quorum, if it didn't have enough votes by itself.)
It might help to look at the console output (if you have a console printer or a console management system that catches that) or in console output or the OPERATOR.LOG file on the other node in the cluster. I'd be looking for things like messages from the Connection Manager about connection loss or quorum loss or state transition events, or from PEDRIVER (if you use the LAN as your cluster interconnect) about excessive packet loss.
Did you have a performance management data collector (like DECps or ECP or T4) running at the time? Sometimes those can give you clues as to what happened (especially just before the time of the hang -- because during the hang itself, you will probably be missing data).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2004 05:47 PM
07-11-2004 05:47 PM
Re: A lot of processes in state RWCAP
It turns out, I was misinformed when I got access to the dump file. It is from the second node of a dual node cluster that was crashed manually just after the first one.
So, I have been following the wrong leads.
I was already wondering why I could not see anything in the operator.log or the errlog.sys.
Thanks anyway.