Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Backup Errors

SOLVED
Go to solution
byron moore
Advisor

Backup Errors

Hi,
We have been having some backup issues which we though were hardware related. Can someone tell me what the errors in the attached mean and give some advise on how to handle the errors. Thanks in advance.
10 REPLIES
Steven Schweda
Honored Contributor

Re: Backup Errors

There's a lot of junk in there. Which of the
errors were bothering you?

> %DCL-W-IVQUAL, unrecognized qualifier - check validity, spelling, and placement

Doesn't look to me like hardware.

> %BACKUP-I-SOFTWERRS, 9 recoverable media errors occurred [...]

> %BACKUP-F-LABELERR, error in tape label processing [...]
> -SYSTEM-F-PARITY,

Those look more like hardware.


What, exactly, is _NRCAVA$MKB500:?

How old and tired are the tapes?
byron moore
Advisor

Re: Backup Errors

These are the errors I'm refering to. Sorry for the clutter.

%BACKUP-I-SOFTWERRS, 9 recoverable media errors occurred [...]

> %BACKUP-F-LABELERR, error in tape label processing [...]
> -SYSTEM-F-PARITY,

Hoff
Honored Contributor
Solution

Re: Backup Errors

Whomever wrote this procedure coded some stuff I would not have coded. The following disables the BACKUP mechanisms used to recover from tape errors, and also allows silent data corruptions:

BACKUP/IMAGE/NOALIAS/NOCRC -
/GROUP=0/BLOCK_SIZE=65024 -
/MEDIA_FORMAT=COMPACTION -
/NOREW/NOREC/IGN=(LABEL,INTERLOCK,NOBACKUP)-
/LIST=DISK$WORKCOBOLFTP:[BACKUP_LISTINGS.NRCAVA]DGA241_200909111707.LIS -
DISK$RMADSK01: -
mt:DGA241.BCK

Whomever coded that "trusted the drive" completely, and explicitly disregarded the collisions with active (open) files.

I'd be surprised if the Rdb database stuff on these disks was successfully archived.

Your tape media here is spotty, but (in the case of the SOFTWERRS errors) these particular errors were detected, processed and recovered using the data redundancy added into the saveset on the media by BACKUP.

The parity error indicates an error that was not recovered.

Typically you'll want to read the media and get rid of this cartridge, and clean the drive heads on whatever device this is, and look to replace the device if problems persist.

If this is nine-trace media and an old tape drive, then the head(s) may be out of alignment from the alignment used to write the media.

If this is DDS/DAT, get a DLT or SDLT or better. DDS/DAT media and drives wear out, and this behavior is consistent with a worn cartridge or a worn drive.

If this is DLT or SDLT or Ultrium, then look to clean the drive and look to replace the cartridge or potentially the drive.

Stephen Hoffman
HoffmanLabs LLC
Shriniketan Bhagwat
Trusted Contributor

Re: Backup Errors

Hi,

SOFTWERRS indicates, while writing the saveset on to tape, BACKUP has encountered write errors and the Backup utility
recovered successfully from write errors. The number specifies number of time BACKUP has rewritten the block. If the number
is too high then suggestion is to changes the media.

PARITY indicates some issues with the hardware. Please check the hardware.

With /GROUP=0 qualifier no redundancy groups are created in the save set resulting in no recovery of a block that are corrupted since the save-set was originally written. Hence its good to add some value between 0 to 100 for /GROUP qualifier instead of 0.

Regards,
Ketan
byron moore
Advisor

Re: Backup Errors

Hoff and Shriniketan thanks for your answers. A couple of questions:

Whomever wrote this procedure coded some stuff I would not have coded. The following disables the BACKUP mechanisms used to recover from tape errors, and also allows silent data corruptions:

BACKUP/IMAGE/NOALIAS/NOCRC -
/GROUP=0/BLOCK_SIZE=65024 -
/MEDIA_FORMAT=COMPACTION -
/NOREW/NOREC/IGN=(LABEL,INTERLOCK,NOBACKUP)-
/LIST=DISK$WORKCOBOLFTP:[BACKUP_LISTINGS.NRCAVA]DGA241_200909111707.LIS -
DISK$RMADSK01: -
mt:DGA241.BCK

Whomever coded that "trusted the drive" completely, and explicitly disregarded the collisions with active (open) files.

Which lines of code do that? How should I change it?

Shriniketan

What would you suggest for the /GROUP qualifier?


Jon Pinkley
Honored Contributor

Re: Backup Errors

"Trusted the drive completely":

/NOCRC This tells backup it doesn't need to compute a checksum to be written with the data, so it can't be comprared when read.

/GROUP=0 This tell backup not to save extra RAID type data to the tape. By default an XOR block is written for every 10 data blocks written, so it is similar to RAID 5 with 10+1 redundancy. Note that this is an obsolete feature with modern tape drive that have Reed Solomon ECC built into the drives. If your tape device is a 9-trk reel, then /GROUP=10 is a good thing, if the drive is a DLT, then in my opinion, it doesn't help because if the drive can't read the block, it will return a parity error and backup will probably not be able to recover anyway. At least that is my experience.

"disregarded the collisions with active (open) files":

/IGNORE=INTERLOCK

Note that if you do not specify /IGNORE=INTERLOCK, then any file opened for write access anywhere in the cluster will not be saved to tape. If you do use /IGNORE=INTERLOCK you will get whatever happens to be on the disk at the time the blocks are read, which is not necessarily consistent. You should not be backing up your RdB database files with something other than backup, I believe RMU is the RdB provided backup solution.
it depends
Jon Pinkley
Honored Contributor

Re: Backup Errors

I wrote "You should not be backing up your RdB database files with something other than backup, I believe RMU is the RdB provided backup solution."

I ment you should use RMU or some other Oracle provided utility to backup your RdB database files. Backup is not the correct tool unless the database is down, and the database file is static.
it depends
Hoff
Honored Contributor

Re: Backup Errors

Jon's got the basics.

Whomever coded this effectively implemented the archives to look nice and to occupy storage media and to keep a tape library busy, but I'd be willing to bet that the recovery was not particularly tested, and I'd be further willing to bet that there will be a project to recover the Rdb data from those archives.

RMU /BACKUP is the central tool for an Rdb database. Use it. Once you get a reliable RMU backup, then you can toss that backup out to your primary archives using BACKUP. (I've tended to have BACKUP ignore the disks with the Rdb databases, and had an RMU sequence that archived the databases over to disks that the "typical" nightly BACKUP did look at. That way, I had a couple of recent copies on disk, and only needed to pull a tape from the archives on rare occasions. In the "typical" processing, the RMU stuff runs before the BACKUP stuff and thus allows the BACKUP to capture the freshly-generated RMU archives, obviously.)

I've put together a non-RMU backup from somebody that had a similar on-line backup regimen, and that wasn't an inexpensive undertaking.

Here are some articles on on-line data archiving and BACKUP and (featuring the use of backup /ignore=interlock) on OpenVMS "worst practices":

http://labs.hoffmanlabs.com/node/1314
http://labs.hoffmanlabs.com/node/1284
http://labs.hoffmanlabs.com/node/772
http://labs.hoffmanlabs.com/node/1077
http://labs.hoffmanlabs.com/node/801
Shriniketan Bhagwat
Trusted Contributor

Re: Backup Errors

A little explanation about the /CRC, /GROUP_SIZE and /IGNORE=INTERLOCK qualifiers.

/CRC qualifier allows BACKUP utility to calculate and includes a CRC figure in each saveset block. The CRC figure allows
detection of corrupt saveset blocks during a restore operation. Once the saveset block is read into memory, BACKUP re-calculates the CRC figure and compares it with the CRC value stored in the
saveset block. A mismatch between the two CRC check figures indicates the saveset block is bad. Once BACKUP knows the block
is bad, it may be able to recover using the XOR mechanism. The XOR mechanism allows BACKUP to recover a corrupt saveset block
if one block in an XOR redundancy group is found bad. The size of the XOR redundancy group is controlled by the /GROUP_SIZE qualifier.

I am not sure about the value for /GROUP qualifier that is being used in your environment. You can use the default value i.e. 10 should be moderate.

/IGNORE=INTERLOCK allows BACKUP to save a file that is currently opened for write or exclusive access. BACKUP only saves the data that actually exists in a file on disk at the time of processing. Data that
is in user or system buffer in memory will not be saved. Also, data that is being modified while BACKUP is reading the file may not be saved in a consistent state.

Its recommended to use /CRC, /GROUP_SIZE and /IGNORE=INTERLOCK qualifiers
with BACKUP.

Regards,
Ketan
Hoff
Honored Contributor

Re: Backup Errors

> Its recommended to use /CRC, /GROUP_SIZE and /IGNORE=INTERLOCK qualifiers
with BACKUP.

So you're recommending using a BACKUP qualifier that produces inconsistent and potentially corrupt and potentially silently corrupt results in the output saveset?