- Community Home
- >
- Servers and Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Quorum disk lost connection every two hours
-
- Forums
-
- Advancing Life & Work
- Advantage EX
- Alliances
- Around the Storage Block
- HPE Ezmeral: Uncut
- OEM Solutions
- Servers & Systems: The Right Compute
- Tech Insights
- The Cloud Experience Everywhere
- HPE Blog, Austria, Germany & Switzerland
- Blog HPE, France
- HPE Blog, Italy
- HPE Blog, Japan
- HPE Blog, Middle East
- HPE Blog, Russia
- HPE Blog, Saudi Arabia
- HPE Blog, South Africa
- HPE Blog, UK & Ireland
-
Blogs
- Advancing Life & Work
- Advantage EX
- Alliances
- Around the Storage Block
- HPE Blog, Latin America
- HPE Blog, Middle East
- HPE Blog, Saudi Arabia
- HPE Blog, South Africa
- HPE Blog, UK & Ireland
- HPE Ezmeral: Uncut
- OEM Solutions
- Servers & Systems: The Right Compute
- Tech Insights
- The Cloud Experience Everywhere
-
Information
- Community
- Welcome
- Getting Started
- FAQ
- Ranking Overview
- Rules of Participation
- Tips and Tricks
- Resources
- Announcements
- Email us
- Feedback
- Information Libraries
- Integrated Systems
- Networking
- Servers
- Storage
- Other HPE Sites
- Support Center
- Aruba Airheads Community
- Enterprise.nxt
- HPE Dev Community
- Cloud28+ Community
- Marketplace
-
Forums
-
Blogs
-
Information
-
English
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-18-2008 06:09 AM
06-18-2008 06:09 AM
Quorum disk lost connection every two hours
Anyway, on to the problem.
From time to time, the system reports "Lost connection to quorum disk", followed a few seconds later by "Quorum regained...". The interesting this is that this occurs on two hour intervals, but not on all two hour intervals:
06/17/08 00:07:45: %CNXMAN, Lost "connection" to quorum disk
06/17/08 00:07:48: %CNXMAN, Quorum regained, resuming activity
06/17/08 02:07:45: %CNXMAN, Lost "connection" to quorum disk
06/17/08 02:08:15: %CNXMAN, Quorum regained, resuming activity
06/17/08 04:07:45: %CNXMAN, Lost "connection" to quorum disk
06/17/08 04:07:52: %CNXMAN, Quorum regained, resuming activity
06/17/08 08:07:45: %CNXMAN, Lost "connection" to quorum disk
06/17/08 08:08:15: %CNXMAN, Quorum regained, resuming activity
06/17/08 10:07:45: %CNXMAN, Lost "connection" to quorum disk
06/17/08 10:07:53: %CNXMAN, Quorum regained, resuming activity
06/17/08 14:07:45: %CNXMAN, Lost "connection" to quorum disk
06/17/08 14:08:15: %CNXMAN, Quorum regained, resuming activity
06/17/08 16:07:41: %CNXMAN, Lost "connection" to quorum disk
06/17/08 16:08:15: %CNXMAN, Quorum regained, resuming activity
06/17/08 22:07:45: %CNXMAN, Lost "connection" to quorum disk
06/17/08 22:08:15: %CNXMAN, Quorum regained, resuming activity
No disk errors reported, the system is not busy at the times indicated -- actually not very busy at all.
System is ES40, 4 cpus, 4GB memory, CIPCA connected to HSZ50, all disks are RAID5. Has VMS83A_UPDATE V5.0 installed (yes, I see that there is a V6.0).
Ideas, suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-18-2008 08:00 AM
06-18-2008 08:00 AM
Re: Quorum disk lost connection every two hours
Please post the cluster system parameters.
SYSMAN> param show /cluster
Please also post the SHOW DEVICE /FULL from the quorum disk. This disk is typically MOUNT /SYSTEM.
Please do check for errors or restarts or such out at the HSZ, too -- for any disk- or CI-related errors or faults or such that might be logged out on the controller, or elsewhere in the configuration.
Also check the network and other cluster communications controllers that might be present.
FWIW, RAID5 has an enormous I/O load during rebuilds, too. IMHO with modern disk prices, RAID10 is often a better choice. And when you get rid of the quorum disk, I'd take a look at the whole of the CI storage connection, too, as that's old kit. Direct-attached SCSI might be a better choice for a one-node configuration, with a PCI RAID controller.
And yes, do get rid of the quorum disk.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-18-2008 09:09 AM
06-18-2008 09:09 AM
Re: Quorum disk lost connection every two hours
If you have T4 running, zoom in to the 7'th minute.
Notably I would check the minute for 6/17 06:07 and 12:07 because it might show something happening without the lost quorum noise.
I would also run a SHOW SYSTEM just at 6 minutes past the hour, and again at 8 minutes and 'subtract' them for a process activity insight for those minutes.
Of course this is not unlikely to influence the problem ... it might even make it go away :-).
Finally, has it been behaving like this 'for ever'? When did it start? What had changed around that time?
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-18-2008 12:11 PM
06-18-2008 12:11 PM
Re: Quorum disk lost connection every two hours
Parameter Name Current Default Minimum Maximum Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VAXCLUSTER 2 1 0 2 Coded-value
EXPECTED_VOTES 2 1 1 127 Votes
VOTES 1 1 0 127 Votes
DISK_QUORUM "$1$DUA182 " " " " " "ZZZZ" Ascii
QDSKVOTES 1 1 0 127 Votes
QDSKINTERVAL 3 3 1 32767 Seconds
ALLOCLASS 1 0 0 255 Pure-number
LOCKDIRWT 1 0 0 255 Pure-number
CLUSTER_CREDITS 32 32 10 128 Credits
NISCS_CONV_BOOT 0 0 0 1 Boolean
NISCS_LOAD_PEA0 1 0 0 1 Boolean
MSCP_LOAD 1 0 0 16384 Coded-value
TMSCP_LOAD 0 0 0 3 Coded-value
MSCP_SERVE_ALL 1 4 0 -1 Bit-Encoded
TMSCP_SERVE_ALL 0 0 0 -1 Bit-Encoded
MSCP_BUFFER 1024 1024 256 -1 Coded-value
MSCP_CREDITS 32 32 2 1024 Coded-value
TAPE_ALLOCLASS 1 0 0 255 Pure-number
NISCS_MAX_PKTSZ 8192 8192 576 9180 Bytes
CWCREPRC_ENABLE 1 1 0 1 Bitmask D
RECNXINTERVAL 20 20 1 32767 Seconds D
NISCS_PORT_SERV 0 0 0 256 Bitmask D
MSCP_CMD_TMO 0 0 0 2147483647 Seconds D
LOCKRMWT 5 5 0 10 Pure-number D
Disk $1$DUA182: (HSJ004), device type MSCP served SCSI disk array, is online,
mounted, file-oriented device, shareable, served to cluster via MSCP Server,
error logging is enabled.
Error count 0 Operations completed 12140682
Owner process "" Owner UIC [SYSTEM]
Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W
Reference count 1722 Default buffer size 512
Current preferred CPU Id 0 Fastpath 1
Total blocks 17763835 Sectors per track 64
Total cylinders 6939 Tracks per cylinder 40
Logical Volume Size 17763835 Expansion Size Limit 18505728
Host name "HSJ004" Host type, avail HSJ5, yes
Alternate host name "HSJ005" Alt. type, avail HSJ5, yes
Allocation class 1
Volume label "CL1_RD09_182" Relative volume number 0
Cluster size 18 Transaction count 896
Free blocks 5740218 Maximum files allowed 467469
Extend quantity 5 Mount count 1
Mount status System Cache name "_$1$DUA182:XQPCACHE"
Extent cache size 64 Maximum blocks in extent cache 574021
File ID cache size 64 Blocks in extent cache 573444
Quota cache size 0 Maximum buffers in FCP cache 4240
Volume owner UIC [1,1] Vol Prot S:RWCD,O:RWCD,G:RWCD,W:RWCD
Volume Status: ODS-2, subject to mount verification, protected subsystems
enabled, write-through caching enabled.
No activity on the HSJ50 consoles. No unusual network activity.
This appears to have started around the time that we upgraded from V7.3-2 to V8.3.
The machine is scheduled for a reboot tomorrow evening to remove the quorum disk, and for other changes, so the matter will be, as Spock would say, rendered academic.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-18-2008 12:49 PM
06-18-2008 12:49 PM
Re: Quorum disk lost connection every two hours
I don't see anything obvious in the settings.
Usual shot-gun for weirdnesses: Check the HSJ firmware, the SRM firmware, and the OpenVMS ECOs.
But then if you're removing the quorum disk, set your votes and expected votes and disk quorum values appropriately, and be done with it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-25-2008 12:52 AM
06-25-2008 12:52 AM
Re: Quorum disk lost connection every two hours
How many nodes is the cluster ?
All nodes are/have the same vms version ?
AvR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-25-2008 05:43 AM
06-25-2008 05:43 AM
Re: Quorum disk lost connection every two hours
Hewlett Packard Enterprise International
- Communities
- HPE Blogs and Forum
© Copyright 2021 Hewlett Packard Enterprise Development LP