Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Mount verification problem

 
SOLVED
Go to solution
Fredrik.eriksson
Valued Contributor

Mount verification problem

Hi :)

I have a little issue that doesn't really disturb day-to-day usage but is quite annoying for those who monitor the systems.

Every day around 16.00 gmt+1 I get between 4-10 errors in my dsa0:[sys0.sysmgr]operator.log saying that a bunch of disks are offline. This just happens once for each disk and then directly after it says mount completed and then mount verification is in progress (see the attached snipped log). Everytime this happens the disks report +1 on their error counts so in our PSW it shows up as an high priority error.

Is this something anyone has seen before?

The machines in question is clustered with a joint connected SAN and they both run OpenVMS 7.3-2

Best regards
Fredrik Eriksson
8 REPLIES 8
Phil.Howell
Honored Contributor

Re: Mount verification problem

With storageworks controllers it was an indication that its battery needed replacing
Phil
Kumar_Sanjay
Regular Advisor

Re: Mount verification problem

A mount-verification is occurred when certain classes of disk IO errors or fatal errors do not result in mount verification. OpenVMS tries to re-establish or verify the IO path to and the contents (volume label etc.) of the disks, which incurred the IO error. Generally, transient disruption in the SAN is a frequent cause of mount verifications that are immediately resolved. If you are only seeing a few of these a month per device, I'd not deem it serious.

If you want further analyze these mount-verifications and trying to find the underlying reason, concentrate on all components in the IO path to the disks.

Thanks.
Sanjay
Fredrik.eriksson
Valued Contributor

Re: Mount verification problem

Thank you both for your answers.

First, it is a Compaq Storageworks Raid Array atleast. How do you easily check what the battery state is? and where is this battery located?

Second, This does not occur a couple of times a month. It occours daily and around 16.00. I've talked to our datacenter guys and they say they are'nt doing anything around these hours or with these machines.

I've tried to find consistencys to the pattern when this is reoccuring but the diffrence is varying too much. Some days it's pushed forward by about 25s/per disk/per error and some times more or less.
The only thing really reoccuring event is that it's coming back every day around 16.00.
There is not batchjobs running around that time and i can't seem to find any process doing anything special either.

I've concluded a while back that these errors are harmless, but as i said it's an annoyance for our monitoring group since they show up all the time.

I'll attach my statistics file which I totaled the amounts of errors/per day and how many seconds this particular device reported an error compared to the day before.

Best regards
Fredrik Eriksson
Phil.Howell
Honored Contributor
Solution

Re: Mount verification problem

for HSZ controllers you connect a vt to its serial port and type "show this" and "show other".
There is also a utility called HSZTERM that can be used in the same way.
More recent controllers may have a web interface
Phil

Fredrik.eriksson
Valued Contributor

Re: Mount verification problem

Okey, I do have the hszterm$scsipad.exe... but I'm unsure how to use it and doesn't seem to be very much documentation around for it.
When I run set host/scsi it asks me for a device and when I've supplied that it just returns "error activating image HSZTERM$SCSIPAD" and "Image file not found $1$DKA0:[SYS0.SYSCOMMON.][SYSEXE]HSZTERM$SCSIPAD.EXE;"

Am i giving it the wrong device name?
I found a "console template" named hsz10 so I was assuming that it's the name for the device?

Sorry if I'm not doing everything right, I'm quite the newbie at OpenVMS system management.

Best regards
Fredrik Eriksson
Ian Miller.
Honored Contributor

Re: Mount verification problem

HSZTERM was used for the older scsi connected controllers such as the HSZ70. If you have one of those then put the .exe in SYS$SYSTEM then SET HOST/SCSI DKxxx should work. It's not particularly reliable nor supported but usually works. At the array command prompt then
RUN FMU
FMU> SHOW LAST_FAILURE MOST_RECENT


If you have a FC connected array you need different software.
____________________
Purely Personal Opinion
Fredrik.eriksson
Valued Contributor

Re: Mount verification problem

Thank you guys :)

Got the terminal to work against my StorageWork RA and it said what I wanted.

Cache battery is near its end of life, it should be replaced SOON. Run frutil-
to replace.
Mirror cache battery is near its end of life, it should be replaced SOON. Run-
frutil to replace.

This should be easily solved by just replacing the batteries :)

Best regards
Fredrik Eriksson
Fredrik.eriksson
Valued Contributor

Re: Mount verification problem

This solved my problem, atleast I'm quite sure it did.

I had a HP guy here that helped me change the batteries in my esa10000/RA8000 (found out that hsz80 is just the controller for the unit).

Anyway, thank you all for the help and the suggestions :)

Best regards
Fredrik Eriksson