1828479 Members
2834 Online
109978 Solutions
New Discussion

Re: Backup Issues

 
SOLVED
Go to solution
odwillia
Frequent Advisor

Backup Issues

We are getting the attached errors on our backups can anyone tell me what they mean?

Thanks in advance.

40 REPLIES 40
Volker Halle
Honored Contributor
Solution

Re: Backup Issues

Hi,

I assume you're talking about the following error:

%BACKUP-F-PROCINDEX, error processing index file on DISK$DEVSCRATCH1:, RVN 1
-SYSTEM-F-VOLINV, volume is not software enabled

Backup seemed to have a problem reading INDEXF.SYS of the disk DISK$DEVSCRATCH1.

VOLINV indicates that the 'volume valid bit' is not set for that disk. Please check the status of that disk as seen from the node running your backup procedure:

$ SHOW DEVICE DISK$DEVSCRATCH1

If this error is intermittend, include the SHOW DEV command in your backup procedure.

Any errors on that disk ? Any mount-verification messages regarding this disk ?

Volker.
Robert Gezelter
Honored Contributor

Re: Backup Issues

odwillia,

I agree with Volker. I would highly recommend checking the error log for errors on this device.

- Bob Gezelter, http://www.rlgsc.com
odwillia
Frequent Advisor

Re: Backup Issues

For the device status I received the following message MntVerifyTimeout. What are the steps needed to fix this. I do not know much about VMS.
Volker Halle
Honored Contributor

Re: Backup Issues

MntVerifyTimeout

The path to the device has been lost for more than MVTIMEOUT seconds (typically 1 hour). The disk may have failed and may be broken.

Check whether there are any files open on that disk. Does SHOW DEV DISK$DEVSCRATCH1 show a transaction count greater than 1 in the Trans Count column ?

If TransCount=1, try $ DISM/ABORT DISK$DEVSCRATCH1

Then try to mount the disk again with:

$ MOUNT/SYSTEM/NOASSIST disk-name DEVSCRATCH1

Be prepared for the mount to fail, if the disk has gone bad.

Volker.
odwillia
Frequent Advisor

Re: Backup Issues

I guess the disk is bad since I was not able to remount it after I dismount it. I kept on getting MOUNT-F-IVDEVNAM, invalid device name. Any other suggestions?
Volker Halle
Honored Contributor

Re: Backup Issues

Please show the full command used to mount the disk. 'Invalid device name' seems to imply, that you somehow mistyped the name of the disk device !

Volker.
Robert Gezelter
Honored Contributor

Re: Backup Issues

odwillia,

It can be dangerous to diagnose problems like this over the phone. There are many possible causes, from a failed disk at one extreme to something as simple as loose cable at the other end of the range.

If there are no obvious problems (e.g., flashing red lights), I would recommend getting experienced assistance. I have seen many easily recoverable situations become far worse when incorrect action is taken.

- Bob Gezelter, http://www.rlgsc.com
odwillia
Frequent Advisor

Re: Backup Issues

OK, so it was a typo. Once I run the mount command it ask for the _label: The label should be the name uder volume label right? and what should the log name be?
Volker Halle
Honored Contributor

Re: Backup Issues

Search through your startup procedures for the actual MOUNT command used during startup.

Try $ SEARCH SYS$STARTUP:*.COM MOUNT,DEVSCRATCH1/MATCH=AND

Then use the mount command found.

The label should be DEVSCRATCH1, whether there had been an additional logical name (this is optional), I don't know.

If there had been a logical, you may be able to find it via:

$ PIPE SHOW LOG * | SEARCH SYS$PIPE

Use the phyiscal disk name (e.g. DKA200:).

OpenVMS has an extensive HELP system built in. If you see an OpenVMS error message (like SYSTEM-F-VOLINV), you always obtain help for this error message with:

$ HELP/MESSAGE VOLINV

Hope this helps ;-)

Volker.
Hoff
Honored Contributor

Re: Backup Issues

An errant MOUNT or an errant BACKUP command risks either not making a proper BACKUP, or risks overwriting existing disk data.

This is one of the most arcane and cryptic areas of OpenVMS, and there are very few "blade guards" here; data can get clobbered. BACKUPs can get trashed.

For purposes of a hand-entered BACKUP command, you need not use the logical name on the MOUNT command.

The logical name parameter on MOUNT is useful within a command procedure however, as it can be used as the target for all subsequent device references within the procedure. MYCORP_TARGET and MYCORP_SOURCE could be used as the logical names for the output and input devices, for instance.

The volume label must be the assigned volume label found on the volume, or you must use the MOUNT /OVERRIDE, or you're using (as is often the case) MOUNT /FOREIGN.

Are there any bound-volume sets here?

Again, this area is data-hazardous. Please take the time to read and understand the DCL command syntax, and please consider practicing in a testing configuration where an errant command can trash data without harm to production. Do also look at the BACKUP command examples in the back of the BACKUP manual, as these provide many examples of the various sequences with BACKUP, and you can find and choose the particular command associated with what you want to do.

There are example BACKUP command procedures around that can be used as starting points, as well.

As for another potential hazard here, I see Rdb referenced. (Rdb itself isn't hazardous, but there are specific RMU commands needed to perform a successful and restore-able backup of an Rdb database. You can't use OpenVMS BACKUP directly on an Rdb database and expect to restore the database.)

And FWIW, the existing backup archives here potentially (probably?) contain silent data corruptions, too. (Those file interlocks that are being overridden were implemented for a reason, after all. Not because the engineers wanted to force folks to use another qualifier keyword on BACKUP.)

This whole area is comparatively ancient technology -- and not all that much past what RSX11M+ implemented. The UI and the tools are such that it is accordingly very easy to unintentionally corrupt critical data.

Stephen Hoffman
HoffmanLabs LLC
odwillia
Frequent Advisor

Re: Backup Issues

This is what I get when I do a show dev d:

DPA300: (NRCAVB) MntVerifyTimeout 3598 DEVDISK1 3758040 39 1
DPA301: (NRCAVB) MntVerifyTimeout 3598 DEVDISK2 646806 15 1

Volker Halle
Honored Contributor

Re: Backup Issues

So you're running 'HP RAID Software for OpenVMS '. DPA devices are software RAID devices.

You need to use appropriate $ RAID SHOW commands to find out about the structure of your RAID sets and the physical disks involved.

You can use $ RAID ANALYZE/ERROR_LOG to find RAID-related errlog entries.

There should also be a SYS$MANAGER:RAID$DIAGNOSTICS_nodename.LOG file with diagnostic messages.

Volker.
odwillia
Frequent Advisor

Re: Backup Issues

I have attached the show raid results.
Volker Halle
Honored Contributor

Re: Backup Issues

This looks like a RAID 0 array configured over 6 disks of a Mylex controller, which has been partitioned into 3 virtual units.

All the units have had a lot of errors reported against them.

Please look at the diagnostics file mentioned earlier and try to find out what happened when.

The HP RAID Software for OpenVMS - Guide to Operations can be found here:

http://h30266.www3.hp.com/odl/vax/sysman/raidv30/raid_ops_guide.pdf

You will at least need to dismount the DPA: devices, which are in MntVerifyTimeout, then re-mount them with the MOUNT commands to be found somewhere in your system startup procedures. But first try to find out what happened.

Consider to obtain qualified help to prevent damage to your data, if you're uncertain what to do...

Volker.
odwillia
Frequent Advisor

Re: Backup Issues

$ SEARCH SYS$STARTUP:*.COM MOUNT,DEVSCRATCH1/MATCH=and
%SEARCH-I-NULLFILE, file SYS$SYSROOT:[SYSMGR]ADDOPER.COM;2 contains no records
%SEARCH-I-NULLFILE, file SYS$SYSROOT:[SYSMGR]ADDSYS.COM;2 contains no records

I got this error when I attempted to remount.

******************************
SYS$SYSROOT:[SYSMGR]NRC_MOUNT_DISKS.COM;21

$ Mountxx/noass/sys/rebuild DPA302: DEVSCRATCH1 DEVSCRATCH1

******************************
SYS$COMMON:[SYSMGR]SYSHUTDWN.COM;18

$ dismountxx/abort/over=check DISK$DEVSCRATCH1 !dpa302:
$ Mountxx/noass/sys/rebuild DPA302: DEVSCRATCH1 DEVSCRATCH1
%MOUNT-F-MEDOFL, medium is offline
Volker Halle
Honored Contributor

Re: Backup Issues

This may be a software or hardware problem. Use RAID ANAL/ERR, OPERATOR.LOG and SYS$MANAGER:RAID$DIAGNOSTICS_nodename.LOG to find out, when and how this problem started. This may tell you what has failed and when.

Can you currently access DPA300: and DPA301: without problems ? Is only DPA302: giving you problems ?

Did you look at the drives behind the Mylex controller. Any yellow or red lights ?

Volker.
odwillia
Frequent Advisor

Re: Backup Issues

No yellow or red lights. I'm looking for the error logs now.
Volker Halle
Honored Contributor

Re: Backup Issues

When re-reading your previous replies, I see that all 3 DPA devices are/were in MntVerifyTimeout. This most likely indicates a problem with the Mylex controller irself or maybe the shelf the disks are in (power-fail ?).

As you are using a partitioned RAID 0 stripeset, the failure of ANY physical DRA disk will cause the whole array to become inoperative !

First make sure to check for your last GOOD backup of these 3 DPA: devices !

Volker.
Guenther Froehlin
Valued Contributor

Re: Backup Issues

I recommend to do a DISMOUNT/CLUSTER/ABORT for all three DPA devices.

If that succeeded do a RAID UNBIND of the array.

Did all DRA devices dismount? If not issue DISMOUNTs for the DRA devices not dismounted yet.

Mount all DRA devices with MOUNT/OVER=ID/NOASSIST. If that fails...fix a DRA underlying problem.

If that worked for all DRA devices dismount them and do the RAID BIND command (parameters are somewhere in SYSTARTUP:*.COM - hopefully).

/Guenther
odwillia
Frequent Advisor

Re: Backup Issues

Trying to find a current error_log.

$ raid analyze/units dpa300

Processing _DPA300:[000000]RAID$CONFIGURATION_MANAGEMENT.SYS
%RAID-I-OPENERR, error opening _DPA300:[000000]RAID$CONFIGURATION_MANAGEMENT.SYS
-RMS-E-DNF, directory not found
-SYSTEM-F-VOLINV, volume is not software enabled


Processing _DPA300:[000000]RAID$BC1.SYS
%RAID-I-OPENERR, error opening _DPA300:[000000]RAID$BC1.SYS
-RMS-E-DNF, directory not found
-SYSTEM-F-VOLINV, volume is not software enabled
%RAID-F-ANERR, check analyze report
$
$ raid analyze/units dpa301

Processing _DPA301:[000000]RAID$CONFIGURATION_MANAGEMENT.SYS
%RAID-I-OPENERR, error opening _DPA301:[000000]RAID$CONFIGURATION_MANAGEMENT.SYS
-RMS-E-DNF, directory not found
-SYSTEM-F-VOLINV, volume is not software enabled


Processing _DPA301:[000000]RAID$BC1.SYS
%RAID-I-OPENERR, error opening _DPA301:[000000]RAID$BC1.SYS
-RMS-E-DNF, directory not found
-SYSTEM-F-VOLINV, volume is not software enabled
%RAID-F-ANERR, check analyze report
$
$ raid analyze/units dpa302
$
Volker Halle
Honored Contributor

Re: Backup Issues

You can't access any files on those disks as they have timed-out mount verification and are considered 'software invalid'.

Try to look for RAID related messages in SYS$MANAGER:OPERATOR.LOG

Look for SYS$MANAGER:RAID$DIAGNOSTICS_*.LOG. Any error messages in there ?

The command to decode the RAID errorlog entries is:

$ RAID ANAL/ERROR SYS$ERRORLOG:ERRLOG.SYS

Volker.
odwillia
Frequent Advisor

Re: Backup Issues

I found the RAID$DIAGNOSTICS_nodename.log but the last entries are from over a year ago. (January 2007). Any suggestions?

Volker Halle
Honored Contributor

Re: Backup Issues

Since when are your backups failing then ?

The DPA disks are showing a very high error count. Did you check $ RAID ANAL/ERR ERRLOG.SYS - maybe you were overwhelmed by the amount of output ?

If nothing else help, follow the advice given by Guenter - he knows the RAID software stuff !

First find the RAID BIND commands in your startup files then proceed as suggested...

Volker.
odwillia
Frequent Advisor

Re: Backup Issues

Ok, I will give that a try.

How can I tell if that is our local disk array or the fabric SAN?