- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Troubleshooting Mount Verification
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 09:05 AM
тАО02-19-2008 09:05 AM
The mount verification messages are related to one volume only and only when under load. I understand the mount verification will happen when there is a delay in I/O somewhere along the path.
The mount verifications are nearly instant, but there are many. Below is an example of a typical OPCOM message.
%%%%%%%%%%% OPCOM 18-FEB-2008 08:56:10.17 %%%%%%%%%%%
Device $5$DKA3: (SCORPO PKD) is offline.
Mount verification is in progress.
%%%%%%%%%%% OPCOM 18-FEB-2008 08:56:10.18 %%%%%%%%%%%
Mount verification has completed for device $5$DKA3: (SCORPO PKD)
The fact that the mount verifications only happen on one volume (there are similar load characteristics on other volumes hosted on the same HSZ80) suggests that there is a problem with a disk or possibly a shelf.
Does anyone have any suggestions on how I might troubleshoot this? The HSZ80 doesn't show any obvious errors when I log in.
Apart from the mount verification messages there appears to be no additional impacts, though I haven't done an analysis on performance of this volume vs others.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 10:42 AM
тАО02-19-2008 10:42 AM
Re: Troubleshooting Mount Verification
a disk enters mount-verification if an IO is finished with a certain class of re-tryable IO error status values. Are any errors being logged against the SCSI adapter (PKD) or the disk itself ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 12:11 PM
тАО02-19-2008 12:11 PM
Re: Troubleshooting Mount Verification
After I made the original post I actually started to go down the path of trying to dig up more information on the disk errors. Unfortunately I found that my SEA web interface wouldn't connect correctly to the server to bring up any details. I thought I had verified the WEBES toolset after our 8.3 upgrade, but perhaps I need to move to the newest version of WEBES. I'm too easily sidetracked it appears.
Using analyze/error/elv to translate the errorlog I see the errors. Nothing from the actual report jumps out at me, but in a full translate there's a lot of untranslatable binary data. Within that data I see references to $5$DKA, HSZ80, and what might be serial numbers. I'll have to see if those match up with anything in the storage system.
I've attached a text file with an example error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 01:25 PM
тАО02-19-2008 01:25 PM
SolutionAs for the posted error log entry, the "Dump untranslatable event body" means that one of the other error-reporting tools will be needed here. This is usually one of ANALYZE /ERROR /ELV, SEA, or DIAGNOSE DECevent tools.
The device type code in the posted dump is that of a 36 GB Ultra(3) SCSI disk. Old. Probably 180726-003 or related Universal brick, in a 4314R or 4354R series or similar shelf, IIRC. At somewhere between about US$25 and US$75 for a spare on the used-disk market (plus shipping), I'd preemptively swap it.
I might well look to replace the shelf, and retire the whole lot of bricks with something slightly newer.
FWIW, here's a page of links to disk MTBF patterns found in various large-scale device surveys:
http://64.223.189.234/node/93
Stephen Hoffman
HoffmanLabs LLC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 01:54 PM
тАО02-19-2008 01:54 PM
Re: Troubleshooting Mount Verification
There appears to be a serial or drive #, but it's not unique to drives in the raidset.
Perhaps when I get SEA back up and running it will provide further information.
As for replacements, there are a great many shelves and bricks to be replaced. The replacement for this particular storage is already speced to come in the form of FATA disk in an EVA.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 02:23 PM
тАО02-19-2008 02:23 PM
Re: Troubleshooting Mount Verification
Do these mount verifications come at a regular interval, maybe every 5 minutes? Do you have the latest fibre-scsi ECO installed?
I seem to recall seeing a periodic mount verification of either HSx devices or maybe it was devices on models of scsi controllers (maybe the shared differential controllers). I was thinking this was a V8.2 with/without ECO kind of a "feature".
Does this sound familiar to anyone else?
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 02:36 PM
тАО02-19-2008 02:36 PM
Re: Troubleshooting Mount Verification
Some substring of that ZG91400368V83Z string looks good as a serial number.
Also see if the HSZ80 error log and device configuration data has any relevant data and any related error messages. (SWCC stuff is linked here: http://64.223.189.234/node/564)
I've also seen bad slots in shelves, flaky firmware, bad controllers, and I know of a failing (failed?) disk that was so far out of balance that it shook so badly that it caused seek problems across other disks in the same mounting. But an old 36 GB disk brick looks like the best of the usual suspects here.
Or get somebody in to sort this out for you.
Stephen Hoffman
HoffmanLabs LLC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 05:53 PM
тАО02-19-2008 05:53 PM
Re: Troubleshooting Mount Verification
If you are getting hardware errors, they should be generating event logs on the console port of the controller that is the "master" for the raid storage set. And those event logs will have the P T L for the device that is getting errors (if that is the cause).
Is SCORPO part of a cluster? Is $5$DKA3: the quorum disk? When we had our quorum disk on the HSZ70, it was not uncommon to get mount verification messages when the quorum disk was backed up using the "old" recommendation to give BACKUP as many resources as possible, so the disk queues could get quite long while backup was in use.
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 08:06 PM
тАО02-19-2008 08:06 PM
Re: Troubleshooting Mount Verification
An OpenVMS engineer somewhere back in the mists of time had decided that the quorum I/O would be queued with the lowest priority, which meant it was politely queued up behind the other typical I/O flying around.
For a system-level cluster coordination function that involved one I/O every three seconds or so and that led to badness when sequential quorum I/Os were missed or otherwise delayed during a BACKUP or other I/O storm; that degree of I/O queue deference didn't seem particularly sensible given the high cost and repercussions of missed quorum I/O and the infrequency of the quorum I/O, and that then led to a conversation with the then-maintainer of the quorum watcher.
Off-hand, I don't know if the priority in the IRP has changed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-19-2008 11:57 PM
тАО02-19-2008 11:57 PM
Re: Troubleshooting Mount Verification
DECevent V3.4 is still the tool of choice for translating SCSI-related errors. Or install WEBES (SEA) on your laptop or PC and copy over ERRLOG.SYS for analysis.
ANAL/ERR/ELV is useless in most cases, as it does no translate the most interesting part of most errlog entries.
The 'ZGxxx' serial number is most likely from the HSZ80 itself.
The reason for the mount-verifications seem to have been explained.
Volker.