Operating System - HP-UX
1833323 Members
3353 Online
110051 Solutions
New Discussion

Re: problem restoring data

 
SOLVED
Go to solution
João Luís Marques Pinto
Occasional Advisor

problem restoring data

Hello,
we have recently switch from DDS2 to DDS3 tapes on a DD3 capable drive.
I don't believe it is related to the problem but lately we have verified that after the starting the oracle db with the restore datafiles it will report some files are corrupted.
There are no errors with tar during write and read of the tapes, if there was a I/O error shouldn't it cause a checksum error ? Can I safely assume this is a logical data error meaning the backup was done with the files in an inconsistent status from an Oracle point of view ?

Thanks
16 REPLIES 16
Antonio Cardoso_1
Trusted Contributor

Re: problem restoring data

Olà João,

If tar command reports no error, you should actually look at source data consistency (i.e. Oracle data for your case).

antonio.
Borislav Perkov
Respected Contributor

Re: problem restoring data

Hi,
It is Oracle related issue. Maybe one of the reasons could be that the database was not properly shutdown.
Regards,
Borislav
João Luís Marques Pinto
Occasional Advisor

Re: problem restoring data

According to the DBA the databata was properly shutdown before the backup.
The new information is that repeating the procedure from the same tape results on diffent files being reported as invalid on different times. I don't see much sense on the problem. Even on the case the tapes are damaged or the device is not working properly I should get checksum errors, not just corrupted data after extraction.
JASH_2
Trusted Contributor

Re: problem restoring data

Joao,

I have seen this problem arise for several reasons:-

1) The backup was run in several parts and in hot backup mode.
2) The backup was run in hot backup mode and not all of the redo logs were backed up.
3) The restore was run in several parts.

Oracle databases like to be backed up and recovered "all in one", so if the restore was done in several parts, there may be different dates on the restored files, which oracle may complain about.
If it is backed up in several parts, in hot backup mode, again it may have differing dates on the files once restored.
If the backup takes a long time to do, in hot backup mode, it can sometimes miss a new redo log, therefore complain about this when trying to bring the database up.
If none of the above, then it could be data corruption on either the tape or the database prior to backup. Has the DBA tried a database recovery command, within oracle.

Hope any of this helps.

Regards,

JASH
If I can, I will!
João Luís Marques Pinto
Occasional Advisor

Re: problem restoring data

JASH,
the problems are with "cold" backups.
The database recovery was attempted, the database is open but it is still corrupted (at leat it retuns some unexpected results for some queries).
In case there is a problem with the media/device shouldn't the tape archive checksum fail ?

JASH_2
Trusted Contributor

Re: problem restoring data

Joao,

What are you using to back it up?
If I can, I will!
JASH_2
Trusted Contributor

Re: problem restoring data

Ignore my last question, I have just read your fist question again.

If tar fails, then yes it does usually complain about a checksum, but, as you have said it has not failed, or reported any errors.

To be honest, all I can think of is some kind of oracle type error, but that is not very helpful.

I will have a look around and see if I can find anything else.

Regards,

JASH
If I can, I will!
JASH_2
Trusted Contributor

Re: problem restoring data

Joao,

Do you have an earlier tape with the same kind of backup on, to the DDS3, that you can do a "control" restore and see if you get the same errors when starting the database? This would help prove whether it is a backup or database problem. Also, have you successfully restored from a DDS2 backup?

Regards,

JASH
If I can, I will!
JASH_2
Trusted Contributor

Re: problem restoring data

Joao,

When you have done your restores, have you made it so it over-writes all of the files. If not you will have a mix of different dates and states of files. This will show as corruption when trying to start the database.
All files will need to be over-written.
That could explain why you are getting different dates and times, as per your answer above.

BTW-You don't have to keep giving me points, until we get this sorted, as it might look as if I keep making posts just to get points.

Regards, JASH
If I can, I will!
Bill Hassell
Honored Contributor
Solution

Re: problem restoring data

tar (and cpio and pax and dump, etc) are archaic tools and have virtually no data intergrity features. Use fbackup instead. When fbackup writes to the tape, and additional checksum is appended to each record. This is in the data record and not related to the tape drive's checksum values. Now you are correct that if there are no I/O errors reported from tar, then the tape drive thinks all is well. Of course, the drive may have an internal problem that is corrupting the data, and that's one of the many reasons to use fbackup.

When fbackup runs, it computes a checksum for the data block in memory. Then the data block with the checksum is written to the tape. If data corruption occurs between memory and the tape, the chceksum will not match and fbackup will report the error.

You will find that fbackup can be significantly faster than tar or cpio and is designed to properly handle multi-tape backups. Be sure you include a config file to setup maximum performance. The file should look like this:

blocksperrecord 4096
records 64
checkpointfreq 4096
readerprocesses 6
maxretries 5
retrylimit 5000000
maxvoluses 200
filesperfsm 2000

To check a tape after writing it, use frecover with the -N option. The tape will be read, the incoming blocks will have their checksum computed, and then the two checksums are compared. A failure means that the data has become corrupted and you'll need to replace the tape drive. The majority of data problems on tapes are related to dirty or worn heads, but it is firmware-driven electronic device so a failure can occur anywhere. fbackup will verify the hardware (and improve backup speed and reliability).



Bill Hassell, sysadmin
João Luís Marques Pinto
Occasional Advisor

Re: problem restoring data

JASH,
I am extracting the data with root and I have checked that I have no error on the extraction (so I can safely assume all the files were overwritten). Yes we did some tests with the DDS2 backups and the restore worked just fine, that is why I am still following the hw/tape failure option, altough the logical problema (db not properly shutdown) seemed the most probable.

Bill,
I had the wrong idea that the tar checksum was already software calculated from the data in memory before being written to the tape. I will now perform some tests with fbackup since it will provide me a more reliable way to verify the data integrity.

Thanks
Bill Hassell
Honored Contributor

Re: problem restoring data

Here's a reference on the actual format of a tar backup:

http://en.wikipedia.org/wiki/Tar_(file_format)#File_format_details

When tar indicates a checksum error has occurred, it really means that the header layout for a file did not match. While it does provide protection for the header, there is nothing in the data blocks.


Bill Hassell, sysadmin
João Luís Marques Pinto
Occasional Advisor

Re: problem restoring data

I did check the tar format specification. Being the header itself data on the tape I would assume the chances of having "real" data corrupted without having headers corrupted would be minimal, specially because the current backup script is backing up one file per tar. I hope to get some conclusion about the tape reliability with the fbackup tests.
I will post the results.
Bill Hassell
Honored Contributor

Re: problem restoring data

Yes, if the internal header checksum is OK, the drive probably would not make errors in other blocks. But that is based (like tar) on vertical format multi-track tape drives using 1/2" reel-to-reel tape. The data is written in parallel with separate amplifiers and drivers for each data bit in a single byte. Modern tape drives write data in a single track with heads spinning at an angle, just like video tape. So the data written serially but in large drive-defined blocks (about 128k) at a time. Naturally, there are a lot of electronics involved with the buffering, serializing and drive-created checksums.

One additional feature of fbackup is the -N option in frecover which reads all the data and computes the internal checksumm then compares it to the recorded checksum from fbackup. It is called the no-restore option but checks everything on the tape.

Another area to look at is the nature of the corruption. Are the Oracle files mostly empty space (no records yet)? There may be a problem with compression on your tape drive. Since the DDS2 tapes work and the DDS3 tapes do not, there is a significant difference in the recording method used in DDS3 format and perhaps a bug has been there all along, just not present with the DDS2 format. You may want to download the latest firmware for your tape just to make sure the drive is up to date.


Bill Hassell, sysadmin
João Luís Marques Pinto
Occasional Advisor

Re: problem restoring data

After trying the same backup tape on another reading device which worked fine I have concluded that the problem was with the DDS reading device (particularly with the DDS3 tapes), eventually it is a firmware related problem as suggested by Bill.

Thanks all for your support.
João Luís Marques Pinto
Occasional Advisor

Re: problem restoring data

Explained on my last reply.