1827262 Members
2415 Online
109717 Solutions
New Discussion

BACKUP Operations

 
SOLVED
Go to solution
Mario Abruzzi
Contributor

BACKUP Operations

When defragmenting disks using a full image backup and restore operation, are there certain process and system parameters that can be set to optimize the overall process? Digital once published a DSNlink article that showed formulas and parameter relationships that would optimize VAX backups. Do these same parameter values apply to Alpha?
22 REPLIES 22
Wim Van den Wyngaert
Honored Contributor

Re: BACKUP Operations

faris_3
Valued Contributor
Solution

Re: BACKUP Operations

and this one
http://h71000.www7.hp.com/doc/732FINAL/aa-pv5mh-tk/aa-pv5mh-tk.HTMl

(Setting Process Quotas for Efficient Backups)

hth,
HF
Wim Van den Wyngaert
Honored Contributor

Re: BACKUP Operations

And this one
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=658938

/group=0 increased our backup speed with about 20%.

Wim
Wim
Ian Miller.
Honored Contributor

Re: BACKUP Operations

consider the implications of turning off redundency groups with /group=0 in your environment before doing it.
____________________
Purely Personal Opinion
Wim Van den Wyngaert
Honored Contributor

Re: BACKUP Operations

As far as I know, all DLT drives have some kind of raid (or /group=5) check built-in.

Wim
Wim
John Gillings
Honored Contributor

Re: BACKUP Operations

Mario,

If you mean you're taking a disk, doing an image BACKUP to tape and then back to the same disk then PLEASE DON'T.

The risk of losing your data is FAR higher than any possible performance benefit from defragging the disk. Consider, if the tape breaks while restoring, you have already destroyed the source disk, and the tape is lost. You have no data!

If you really need to defrag, take a spare disk, do a disk to disk image BACKUP and swap the drives around. That way you have a fallback, and you don't burn any bridges.

Please don't tell me you don't have spare disks! The cost of disk drives today is so far below the value of your data, any accountant that suggests you can't afford them should be treated in the same manner as an accountant who suggests your company should never purchase insurance policies.

There are really only 2 "knobs" to tune BACKUP - WSQUOTA (controlling memory consumption) and FILLM (controlling the number of files open). All other quotas should be relatively infinite.

If you're suffering fragmentation then make this one the LAST time you need to go through this process - tune your files to avoid fragmentation. It's quite simple, just make sure the allocation and extension sizes are large.

A crucible of informative mistakes
Jan van den Ende
Honored Contributor

Re: BACKUP Operations

Mario,

I could no else but heavily agree with John about your risks!

And, EVEN if you have moderately to heavily (but not REALLY VERY HEAVILY) fragmented files, you can have VMS cope with --most-- of the bad effects.

If you set SYSGEN param ACP_WINDOW to 255, then file headers will be loaded in to memory entirely, (called "cathedral windows") so whenever a portion of a file is required, the info about where the relevant info is located on disk is directly available.
Of course, two successive reads for two relatively close parts which are on different segments in a fragmented file will require two IO's, while in a less fragmented file the second part may well have been read in with the chuck that got the first, and that would save a physical IO, but the main issue:
with cathedral windows it is NEVER necessary to do a window-turn, usually being an extra IO read another part of the fileheader (maybe to find you need STILL another..)

Of course it comes at some cost: it uses memory. Then again, not really very much, and at today's mem prices...

Oh yeah, ACP_WINDOW _IS_ a dynamic param, meaning that you do not have to reboot for it.
But, it takes only effect for disks mounted AFTER you change it,
and, it DOES influence your memory usage, so it might be wise to let AUTOGEN re-size your memory allocation params.


Hope this helps.

Cheers.

Have one on me.

Jan

Don't rust yours pelled jacker to fine doll missed aches.
Wim Van den Wyngaert
Honored Contributor

Re: BACKUP Operations

Even backup to a single disk is a risk. You should need a protection against bad blocks. So, beter use a shadowed disk.

(I have several single disks that gave problems)

Wim
Wim
Robert_Boyd
Respected Contributor

Re: BACKUP Operations

Someone suggested using /GROUP=0 -- the problem with this is that if a single block of your backup saveset is destroyed due to random radiation/scratch/etc you lose everything in that block on the tape -- regardless of any safety features provided by the drive mechanism. If you want to maintain a high probability of being able to recover from a single bad block on the tape, use a large number for your redundancy group size -- I have used numbers like 25 or 35 or 40. This way if you lose one block out of say 35, you still have a good chance of reconstructing the lost block. Also, another good way to save time is to increase the blocking factor when writing to tape. Because of the better reliability of the DLT tapes/drives, you might as well write the maximum allowable block size of 65024. If you aren't doing this already, you'll find that will reduce your backup time lost to frequent positioning with smaller blocks.

Also, some people will make noises about using /NOCRC to save time on backups. The problem is that if you don't use CRC when writing to tape, you lose the ability to be fairly certain that the data you're reading back is the same data that was written to the tape. The longer that you plan to keep your tapes in storage, the more critical it is to use EVERY data safety feature in the chain.

How much is it worth to save a few minutes versus being able to cleanly restore your data?
Master you were right about 1 thing -- the negotiations were SHORT!
Jan van den Ende
Honored Contributor

Re: BACKUP Operations

Robert,

yes, BACKUP is really fantastic in that respect, I have seen real miracles!
(Tape 12 years of age, reading back with 35000+ Recoverable Errors), but... those were the days.

Today, if you are using SCSI drives you are Fu**ed (anyone still have DSA compliant drives? Do they even exist in capacities somewhat in agreement with today's storage amounts?)

On SCSI, encountering just ONE SINGLE parity error, you are in "Fatal Drive Error - Position Lost", and as yet I have not found any way to continue reading and let BACKUP perform its magic.

So, formally you are right, but until someone ever corrects/augments the SCSI protocol, in practise it is USELESS. Alas... :-(

Still:

Proost.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.

Re: BACKUP Operations

Robert,

As I said, all modern DLT drives have also redundancy built in. If I remember correct about 20% overhead is used, so this corresponds with /group=5. If the drive has the correction, why do it on VMS level too ?
Why do host based raid on top of tape based raid ?

Wim
Wim Van den Wyngaert
Honored Contributor

Re: BACKUP Operations

Jan,

I could be wrong but it is the tape drive itself that is doing the error recovery via a built-in /group=5. So, if you get parity error, it would mean that there are too many blocks with errors to do the recovery.

Wim
(a Duvel a day keeps the doctor away)
(and 2 is even better)
Wim
Jan van den Ende
Honored Contributor

Re: BACKUP Operations

Wim,

Can that number of "too many" be found out, can it be monitored to be approaching, can it be manipulated by any settings?

Anyway, in view of the frequency WE see them, it is DEFINITELY way much worse than in the DSA days!

Concerning Duvel: they were on special offer here last week (just over half normal price), so a bought some crates. _Your_ doctor will have no need to visit me shortly!

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Robert_Boyd
Respected Contributor

Re: BACKUP Operations

Wim,

I wonder what processors you're running on that the overhead for writing redundancy blocks adds 20% cpu overhead? In my experience that was probably the case in the VAX environment, but I haven't had that experience in the Alpha systems. With most systems I've worked with the worst overhead has generally been I/O bottlenecks reading the data from the disks and/or writing it to tape.

Unfortunately the way the data is written on the tape is really not quite equivalent to RAID storage. At least not with the DLT drives. Perhaps there is something different with the newer technology that I don't know about. Over the many years of making backups and restoring from them, I have seen numerous times when restores were hampered by the failure to use the /GROUP and /CRC capabilities. And, there have been plenty of times that I have seen restores complete from old tapes that would have failed miserably if the group encoding had been left off. This happens more frequently in environments like the pharmaceutical industry where retention of data in some cases is expected to be more than 10 years.

The error correction/recovery that is done at the drive level only applies to the data as it is received from the system. If there is any glitch in the data path to/from the system the tape drive will happily run its error correction procedures on that data and write it to the tape. This only covers recovering data from the tape as it was received by the tape drive. If something should have happened to the data on the way to the drive it will be written with the error encoded. If something happens to the data on the way from the drive into the CPU, the error will go undetected in most cases.

The other thing is that if there is sufficient damage to a block on the tape due to age, wear or cosmic rays, the recovery mechanisms of VMS BACKUP are much more extensive than what the drive can do. The recovery that can be done with the redundancy groups covers a much larger chunk of real estate on the media than the redundancy information written by the drive. This means that the probability of the entire redundancy group being trashed is much lower. The whole thing is a matter of probabilities and confidence levels and how those are affected by the reliability of the media and the drives and the retention span.

When you ask VMS BACKUP to employ /GROUP processing and /CRC calculations you are ensuring that when the tape is read, you will have a very high probability of restoring the data correctly. If you only rely on the mechanisms of the tape drive you drop your confidence level immediately. And the confidence level drops substantially as the age of the tape increases beyond the 1st year. It also may be affected by the frequency that you do backups, maintenance on the drive(s), whether or not you do a verification pass on each backup, etc....

When I want full end-to-end data checking and recovery capability I use the data checking that runs on the cpu before the blocks are sent out to the drive. I realize that this may seem like a waste of processor resources to some. I have been in enough restore/recovery situations that I am sure the extra overhead is worth it. If you don't care about getting the data back reliably after a few weeks, perhaps your situation doesn't warrant the extra computation.

To me optimizing backups is about tuning performance to get the best backups possible in the shortest window possible. Making the window as short as possible at all costs without maintaining a high degree of certainty of restorability isn't "optimum" in the environments I've worked in.
Master you were right about 1 thing -- the negotiations were SHORT!
Wim Van den Wyngaert
Honored Contributor

Re: BACKUP Operations

Robert

My performance result of 20% comes from a GS160. So, not that old.

My info concerning the error recovery of modern DLT is coming from Quantum (the maker of most drives). But they don't say which drives exactly have the feature. But our DLT drive on the GS160 is a modern one. On older ones I have not enabled /group=0.

If there is data corruption on the way from the drive to the cpu then this corruption is also possible for disk data. How do you protect against that ?

My guess is that Quantum made the technique to avoid implementing /group in all Unix like tools.

May be your experiences are outdated ? I know you are absolutely right for the old tapes (remember /dens=1600 ?).

I would be very glad if someone that knows exactly how all the parts work together could answer ...

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: BACKUP Operations

John Gillings
Honored Contributor

Re: BACKUP Operations

re: overhead of /CRC

Way way back (maybe VMS V5.1?) there was an issue in BACKUP where using /CRC had a severe performance impact. It was due to a data structure used in calculating the CRC being misaligned (even on VAXes alignment sometimes matters!). The calculation used the rare POLYD instruction, so performance on hardware that emulated this instruction (eg VAX 11-750) was particularly bad. This was fixed very quickly, but the mythology that "/CRC is bad" seems to have stuck.

Modern drives have built in redundancy, so some people argue that it's unnecessary to generate redundant redundancy, but hey, we're OpenVMS! We use belts, braces AND double elastic, and maybe throw in an extra pair of braces just to be sure :-)
A crucible of informative mistakes
Jan van den Ende
Honored Contributor

Re: BACKUP Operations

John,

let me re-phrase my question:

In these SCSI times _HOW_ do I:
- get data from the drive after a "Parity error - Position lost" event?
I already know that simple re-try has about 100% chance of generating the same.
- get info about a tape detoriating, and nearing th epoint where the drive will decide it is over the treshold?
- influence that threshold?

From what I understand until now, there is VERY little opportunity to use the redundance functions of BACKUP is SCSI-tape is involved.

Proost.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Chris Davis_9
Advisor

Re: BACKUP Operations

Hi guys!

I'm interested in the "parity error" issue too. I've got 2 ES40 with disks conmnected through HSG60 controllers. The backup devices are SDLT through KZPCA-AA cards. I've been seeing parity errors on new SDLT tapes which, if I reinitialize and perform backups to again, show no errors, so tape quality should not be an issue.

Wim Van den Wyngaert
Honored Contributor

Re: BACKUP Operations

Chris,

And did the operation continue due to the recovery technique (/group=10) of backup.exe ?

Wim
Wim
Jan van den Ende
Honored Contributor

Re: BACKUP Operations

Wim,

if his experiences equal mine, then NO!!

.. and that is what my whole point _IS_ about...


Proost.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Jan van den Ende
Honored Contributor

Re: BACKUP Operations

Mario,

Robert,

From your Forum Profile:


I have assigned points to 22 of 57 responses to my questions.


Maybe you can find some time to do some assigning?

Mind, I do NOT say you necessarily need to give lots of points. It is fully up to _YOU_ to decide how many. If you consider an answer is not deserving any points, you can also assign 0 ( = zero ) points, and then that answer will no longer be counted as unassigned.
Consider, that every poster took at least the trouble of posting for you!

To easily find your streams with unassigned points, click your own name somewhere.
This will bring up your profile.
Near the bottom of that page, under the caption â My Question(s)â you will find â questions or topics with unassigned points â Clicking that will give all, and only, your questions that still have unassigned postings.

Thanks on behalf of your Forum colleagues.

PS. â nothing personal in this. I try to post it to everyone with this kind of assignment ratio in this forum. If you have received a posting like this before â please do not take offence â none is intended!

Proost.

Don't rust yours pelled jacker to fine doll missed aches.