Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Disk Mount Verification during a backup

 
SOLVED
Go to solution
AnnieLabFish
Occasional Advisor

Disk Mount Verification during a backup

Hi,

 

I'm going to start by saying that my experience with VMS is limited to doing backups and a few other things that I've picked up along the way while troubleshooting. 

 

I perform a 6 month backup on our VMS servers.  The first 2 servers were fine, the 3rd server keeps reporting that a disk is not mounted, mount is in progress, mount complete after about 25-30 minutes. 

 

We have been doing the backups the same way for 8 years. 

 

I perform the backup as follows:

 

Boot from the VMS CD

Init the tape drive (mka600)

 

mount/for mka600:hrmbd1

 

mount/over=id dka100:

 

backup/rewind/init/image/verify  dka100: mka600:hrmbd1

 

It runs perfectly on DKA0: but when I do it on DKA100 on this server it's a problem.

 

The only way out if it is to halt the system.  When I boot from the CD again it tells me that DKA100 was improperly dismounted and is repairing.  I have seen that before on other servers and have not has this issue.

 

I did do an analyze/disk_structure and I got a cryptic message about dir;2 is not dir;1 (or something like that) and when I tried to repair it, I couldn't. 

 

I also checked the ERRORLOG and the only stuff in there was from 25 AUG 2011 and then something from 2009, 2007 and 2004 when the system was installed.

 

When I do SHOW DEVICE I see that it's mounted. 

 

I'm not having any issues booting or running the system;  it's just an issue when I back up DKA100:.

 

I have replacement hard drives on the way just in case it's hardware related. 

 

 

Has anyone seen this?  Any help is appreciated.

 

 

 

 

 

 

13 REPLIES 13
GuentherF
Trusted Contributor

Re: Disk Mount Verification during a backup

Next time do a "$ DISMOUNT/ABORT/NOCHECK DKA100:" so you don't have to reboot the system.

 

This could be caused by a connectivity problem. Is this FC or direct SCSI? What kind of storage subsystem or controller?

 

Does the same backup work when you booted off the system disk?

 

You can try a "stress test" on this disk using "ANALYZE/DISK/READ DKA100:" which reads ALL files on this disk.

 

/Guenther

AnnieLabFish
Occasional Advisor

Re: Disk Mount Verification during a backup

Guenther,

 

It's SCSI.

 

Thank you.  I will try your suggestions.

 

I can try running the backup differently.  I will let you know.  I won't be able to work on the system until 6/11so I will update after that.

Thanks!

Annie

Steven Schweda
Honored Contributor

Re: Disk Mount Verification during a backup

 
Steven Schweda
Honored Contributor

Re: Disk Mount Verification during a backup

 
Bob Blunt
Respected Contributor

Re: Disk Mount Verification during a backup

The error message you got about .DIR;2 means that you have two versions of a file named the same thing and FILES-11 will NOT allow two versions of .DIR files.  I would check the two files to see if they're both really directory files.  You might first try just doing a $DIR/FULL for both files and checking the file attributes.  Sometimes users will create an output file of a directory listing and name it .DIR  ANA/DISK doesn't understand this because it believes that all files with a .DIR extension MUST be directory files and it knows there can't be two versions.  IF the .DIR;2 file is NOT a directory then you can rename it with extreme prejudice to anything you desire (except .DIR).  IF, however, it IS a directory file then it will need to have the .DIR extension but you'll need to create a new filename.  Another utility that you might use to check the contents (if the .DIR;2 isn't obviously a text file or something non-directory) is $ DUMP/DIRECTORY filename.dir;2  This might generate some errors if the file isn't a directory but that way you can know better what to do.

 

As far as issues with your disk in MV?  You won't have the benefit of any patches when booting from the installation CD so IF there were any issues with mount verification on a configuration like yours you won't have the luxury of working around it if there's an inherent problem that a patch might fix.  But while we're at it...could you please give some more details about your configuration and what O/S version you're using?  There will also be an issue in your hardware configuration with some contention on the SCSI bus.  It looks like all or most all of your disks are connected to the A SCSI bus as is your tapedrive.  When you issue your BACKUP command you're, AT LEAST, doubling your I/O onto that single SCSI bus and sometimes that can overwhelm a sensitive device on the bus.  DKA100 might be on the borderline about to expire and when you load it down doing BACKUP that might be enough stress that it starts flaking out.

 

bob

AnnieLabFish
Occasional Advisor

Re: Disk Mount Verification during a backup

 Steven,

Thanks for the info.  Sorry about the lack of detail-- I hadn't planned on posting anything about this until after I had left the server and it's not an easy place to get to.

 

I'll look into the information you gave me about the directory and try to get you more details to munch on.

 

Annie

 

 

AnnieLabFish
Occasional Advisor

Re: Disk Mount Verification during a backup

Bob,

Your explanation about the .DIR;2 helps me make some sense of things.  The only time someone worked on the server since the last backup (November) was a few weeks ago--and it was the vendor.  The error message wasn't providing me with a "filename.dir;2", just ".dir;2" so I am going to have to do some digging around to figure out where it's coming from. 

 

Some background and additional info...

 

It is my understanding that I boot to the CD just to get to the option that allows me to enter DCL commands, which is where I do my backup.  This allows me to do a stand alone backup.  (I hope I don't sound stupid--I was given this system a few years ago and I rarely work on the servers in this way..)  The instructions for the backups were given to me from the system owner before me and those came from the system's vendor. 

 

The system communicates with monitoring devices in the field and then sends it to workstations in 5 locations where the data is viewed.  All of the data/software/programming that handles that communication is located on DKA100.  The server was running flawlessly prior to failing over and running the backup.

 

Here's what I know:

The server is an Alpha DS15

It's running HP Open VMS Alpha Operating System Version 7.3-2

 

Once the backup has been running for about 25 minutes I see this:

 

%system-I-mountver, DKA100:is offline, mount verification in progress

%system-I-mountver, DKA100: has completed mount verification

 

And it repeats...until I press the Halt button on the server, which then returns it to the SRM(?) console.  I then start over. 

 

I let it run for almost 2 hours (the backup usually takes an 1:20-1:30) to see if it was still trying to backup but just giving me that message while it was backing up.  The tape drive light also stops flashing and the hard disk activity light stops flashing constantly, which is a pretty good indication that it's not backing up.

 

My thoughts are that I have a disk issue or a SCSI controller issue but I don't know if I can find that out in VMS and if I can, how?

 

I work in an environment, that due to the sensitve nature of the data and infrastructure, I'm unable to be super descriptive.  Since I'm not sure what I'm allowed to copy/paste, I'm being cautious.  I hope this is enough information for now. 

 

Annie

 

 

H.Becker
Honored Contributor

Re: Disk Mount Verification during a backup

The message from ANALYZE/DISK looks like
%ANALDISK-W-DIRNAME, directory file [000000]X.DIR;2 is not named '.DIR;1'
or
%ANALDISK-W-DIRNAME, directory file [000000]EGON.LIS;4711 is not named '.DIR;1'
 
As others said, it usually helps to show the actual message, at least to avoid some confusion.
 
It is not easy to create a directory without a name, that is with a full name like [000000].DIR;1. Neither $ CREATE/DIR nor $ RENAME of an existing directory file will do this.
 
Anyway, it is a warning and it is not related to any MOUNTVER message/problem. The MOUNTVER indicates a problem with accessing the disk. That may be a hardware problem with the disk itself or the path to the disk.
 
On the other hand, FILES-11 doesn't really care whether there are more versions of a directory file with type .DIR. It even doesn't care about the file type being .DIR. It just maintains different headers with different file IDs for different versions. By default you can only access files in the directory with the directory file named .DIR;1 - that is with the usual [directory_name] specification. But after a $ SET PROC/PARSE=EXTENDED you can use DIDs and get to both .DIR versions or any other .TYPE without any significant problem.
 
And to repeat what was already said, if you experience disk problems, you don't want to manipulate any data, not even with ANALYZE/DISK/REPAIR before you are sure that you have saved as much as you can of the precious data from that disk.
AnnieLabFish
Occasional Advisor

Re: Disk Mount Verification during a backup

I have an update to my original post.

 

I have to change some file names, etc so if it seems a little generic...

 

The message received about dir;2, when using ANALYZE/DISK DKA100: is:

 

Directory File [ABC] ORANGES.DIR;2 is not named DIR;1

 

Today when I ran it, I also got:

 

File (165, 1, 0)  [ABC.ORANGES.COMMAND] Get_Tran_files.com;1

      revision date is in the future

 

There are quite a few of these message, all with different numbers in the () and different file names.

 

I ran the backup from the CD again and after 22 minutes I started receiving the:

 

%system-I-mountver, DKA100:is offline, mount verification in progress
%system-I-mountver, DKA100: has completed mount verification

 

I ran the backup again, this time from the system, and after 23 minutes, the tape drive stopped blinking and the hard drive light, which had been blinking steadily, stopped blinking (or the light was blinking but it was dim).

 

I have the new hard drive in hand and will be heading up to work on it again on 6/13.

 

I am not as concerned about the DIR issue if it's not related to my backup issue.  I need to get this drive backed up and then I can troubleshoot the DIR issue.  If it is related, then obviously I want to fix that first.

 

Thanks,

Annie

abrsvc
Respected Contributor

Re: Disk Mount Verification during a backup

There appears to be either a problem with reading the disk itself or there is corruption.  Since the problem shows up with backup, perhaps a physical backup will help here.  At least that will read the blocks WITHOUT interpreting them at all.  This should make an exact copy of the drive.  While this will not address a reading problem, it should allow you to make a copy.  This will give you a back out option by at least having a duplicate disk.  Please note that the copy target drive needs to be an exact match to the source in model and size.

 

If the error messages are changing each time you attempt this, I would suspect a failing drive rather than file corruption.

 

Dan

Steven Schweda
Honored Contributor

Re: Disk Mount Verification during a backup

 
Jeremy Begg
Trusted Contributor

Re: Disk Mount Verification during a backup

Hi Annie,

 

I'd say with some confidence that your DKA100 is faulty.  If you don't see any errors on this disk when VMS is running normally, but only when running you backup procedure, I'd huess that the failed disk blocks are not being touched by your application in normal operation.

 

An easy way to test this is to try running an "online" backup of that disk, i.e. while VMS is running (not booted from the CD-ROM) ...

 

$ MOUNT/FOR MKA600:

$ BACKUP/IMAGE/VERIFY DKA100: MKA600:TEST.BCK/REWIND/LABEL=TEST/IGNORE=(LABEL,INTERLOCK)

 

You will probably see messages about "file is open for write by another user" and maybe "verification error in block ..." but these are expected when you backup a disk which is in active use.  It will be interesting to see if DKA100 goes into mount verification while this backup is running.

 

As has been said earlier, a .DIR;2 file is a problem.  The filesystem is quite happy to create a .DIR;2 file but it will never be treated as a directory: a directory must be a filename ending in .DIR;1.  (There are other attributes also which I won't go into here.)

 

There are all sorts of ways this might come about; I most often see it when an application or software installation creates a temporary or work directory and then tries to rename the new directory to its final name - without first checking to see if another directory of that name already exists.

 

Rather than trying to repair this error using ANALYZE I suggest you simply rename the .DIR;2 file, e.g.

 

$ SET PROC/PRIV=BYPASS

$ RENAME DKA100:[ABC]ORANGES.DIR;2 FREDNURK.DIR;1

 

then see what it contains ...

 

$ DIR/DATE/SIZ=ALL/WID=FIL:30 DKA100:[ABC.FREDNURK]

 

(assuming of course there isn't already a [ABC.FREDNURK] directory!)  You can then decide what needs to be done with the directory and its files.

 

Regards,

Jeremy Begg

 

AnnieLabFish
Occasional Advisor
Solution

Re: Disk Mount Verification during a backup

Thank you all for your help.  The problem is resolved.

 

I tried running an online backup and about 23 minutes into the backup, the tape drive stopped responding and the hard drive light stopped flashing. 

 

I spent yesterday installing the new hard drive.  I was able to restore the drive from a successful backup from 6 months ago.  The only changes on the system were performed a month ago and the files that were changed resided on our development server for the system. 

 

I backed up the server after verifying that it was up to date and the backup performed flawlessly.  The server has been running, in control of the system, for 16 hours with no issue.

 

I looked at the DIR;2 directory and it is an exact copy of the DIR;1 version of itself.  I will be in contact with our vendor as it is a directory that their system uses and the other server in the system has only one copy of that directory.  

 

Thanks again.  All of your troubleshooting suggestions helped immensely and helped me get to the solution.

 

Annie