VMS Poor SDLT performance

Uwe Zessin · ‎09-17-2004

Antonio,
please don't pick up single numbers for your arguments. Please read the whole text to get the right context.

>It looks like another case of were the documentation has not kept up.
>http://h71000.www7.hp.com/doc/732FINAL/aa-pv5mh-tk/00/01/119-con.html

I feel that the sentence above the URL clearly shows that I do _not_ agree with the values that are written down on that page. The link should rather illustrate that even the latest documentation has not kept up with reality.

'splitting hair' was a metaphor or an analogy - it has nothing to do with your real hair.

.

Antoniov. · ‎09-18-2004

Uwe,
splitting hair sound like stupid so I wasn't pleased to read it :-(

The comic side of this one (sorry if I can't write best english) is I didn't want to say you posted any wrong information :-)
Reading full thread I can suppose you understood this one. I did want only evidence the documentation has not kept up.

I hope now it's clear.
Cheers
Antonio Vigliotti

Antonio Maria Vigliotti

Uwe Zessin · ‎09-19-2004

'splitting hair' is an analogy for looking after or insisting on details - it says nothing about a person's intelligence.

It sounds like you now have understood my point so we can stop it.

.

Antoniov. · ‎09-19-2004

agree

Antonio Vigliotti

Antonio Maria Vigliotti

Dave Gudewicz · ‎02-23-2005

http://h71000.www7.hp.com/doc/82FINAL/aa-pv5mj-tk/aa-pv5mj-tk.PDF

Section 11.7 starting on page 432 "Setting Software Parameters for Efficient Backups" looks like a new section for v8.2

While in need of some basic editing (spell check, etc) some of the concepts discussed in this thread are there.

Wim Van den Wyngaert · ‎02-23-2005

More precize link http://h71000.www7.hp.com/doc/82FINAL/aa-pv5mj-tk/00/01/117-con.html#proc-sec

but why don't they change the default for /group ?

Wim

Tim Nelson · ‎02-24-2005

Thanks for the additional information guys.
I have not had time to tweak this lately. The last observation which may have been obvious is I cannot seem to keep the tape drive streaming. Tape stops while reading from disk, disk queues increase while writing to tape. A full seesaw effect is obvious. Some portions of this may just be the way the backup command works. Fill buffers from disk, empty buffers to tape. Fill buffers from disk, empty buffers to tape.etc.. If the backup command cannot keep filling the buffers from behind as they are emptying to tape then performance will never be optimal as the tape must keep stopping and repositioning. This could be due to my specific config. One fibre HBA to disk another for the tape. Buffering between the two may simply be impossible for this version of OS or HW. Some tests with smaller buffers seem to give better performance than larger ones. i.e. seesaw faster before the buffers in the tape drive can fully empty.

Uwe Zessin · ‎02-24-2005

No, I don't beleive it's the fibre channel adapters. They are highly efficient. In a special situation - building multiple host-based mirrorsets between two MSA1000 storage arrays - I've seen up to 160 MegaBytes per second running through an adapter in a DS15.

Don't forget that all data has to go through the process running BACKUP.

Apparently you have now confirmed what me and Guenther have already said last year:
> In BACKUP's case larger is not always better!

.

Tim Nelson · ‎02-24-2005

That seems to be the case Uwe..

Thanks again to all for the sharing of knowledge.

Tim

Ian Miller. · ‎02-24-2005

I think the /group default is planned to change. It curious that the document only has 5000 for a extend size when writing savsets to disk. I would have thought that using 32k would be better.
I'm told that with xp disk arrays then a block size that is a multiple of 4 is a good thing and that the diolm quota should be not too big or it causes trouble (too much in a queue on the xp controller I think it was).

____________________
Purely Personal Opinion

Tom O'Toole · ‎02-24-2005

One thing in this document is confusing. It recommends /group=0, since that's done in the drive, but also recommends /crc. As I have written before, what is the point of having /crc while disabling the ability to correct. Either the tape drive does this, and one NEVER should get a crc error, or
one does sometimes get an error (outside of the tape drive). If it IS possible to get errors at the backup level, shouldn't an xor block be written to be able to correct them?
If not, what's the point of /crc?

It's true, with a modern machine it doesn't add that much CPU time, but if they are saying one can saturate a box with multiple backups, it could certainly be a factor in that case.

Can you imagine if we used PCs to manage our enterprise systems? ... oops.

Ian Miller. · ‎02-24-2005

the point of the VMS BACKUP CRC is that it provides a check on the integrity of the data independant of whatever you write the backup to. There is no cost in space as the place that the CRC is stored is reserved always and there is a small cost in CPU time to do the calculations.

____________________
Purely Personal Opinion

Wim Van den Wyngaert · ‎02-27-2005

I never encountered CRC errors. Did someone already had them AND reported by the backup utility, not by the drive ? And of course in recent years.

Wim

Wim

Tom O'Toole · ‎03-09-2005

I used to get errors occasionally with 9 track tape, like:

%backup-xorerrs, N errors recovered by redundancy group

when reading/restoring a tape. I believe that the recovery would have to be governed by the crc field, since how else would backup know which block in the group was in error?

More recently (DLT/SDLT era), we get occasional %SYSTEM-F-PARITY errors while writing tapes. These are (as has been pointed out here) completely unrecoverable from, other than restarting the backup on a different tape. I believe we get these errors more often than we should, given the error rates quoted by the manufacturers. I also suspect that if these drives are not kept 100% streaming, the frequency of these errors goes up a lot.

Can anyone else confirm?

Can you imagine if we used PCs to manage our enterprise systems? ... oops.

Uwe Zessin · ‎03-09-2005

In the old times, an unreadable record could also be reported by the tape drive, but it was often possible to tell the tape drive to skip this error and find the next good tape record unless too much data was destroyed. So BACKUP was able to directly become aware of a bad block.

.

Cass Witkowski · ‎03-09-2005

Tom,

There is a issue with TZ89 drives that if the block size was too small or if the buffers (WSEXTENT) was too small that you could end up with parity errors. We have seen this recently and have corrected it by upping the WSEXTENT.

What I would like to get from HP is the final and official word for DLT/SDLT and LTO drives if /CRC and or /GROUP qualifiers are still needed. If the DLT/SDLT and LTO drives have a lot more error detection that a 9-track tape with word and longitudinal parity then does backup computing the CRC add anything?

If these drive will not allow you to read past a parity error does then having the XOR groups block buy us anything?

These two questions have been asked for a long time but I haven't seen anyone from HP Storage or the OpenVMS group belly up to provide real answers.

I would also like to see the BACKUP utility improved to use double buffering so that reads from the disks occur at the same time as writes to the tapes. This would have a better chance of keeping these hungry tape drives streaming. Right now they are guaranteed to pause each time backup stops to read more data from the disks.

Or maybe backups should be changed to be more efficient backing up from disk such that the data being backed up does not have to be in directory and alphabetical order. Perhaps it would be better to backup the data as fast a possible and then when you restore you actually put the data when it needs to go. So instead of doing the gather scatter on the backup to tape and a basically sequential restore perhaps we try and backup sequentially and do the gather scatter on the restore. Since we hopefully backup to tape a lot more than we need to restore this could lead to better backup performance.

My 5 cents at least.

Tim Nelson · ‎03-09-2005

I whole heartedly agree with Cass that enhancments need to continue forward.
It is now worth 10cents !!

Dave Gudewicz · ‎03-09-2005

I agree with Cass also.

And on a *somewhat* related matter; when will we get a definitive statement on an often raised issue.....

For years it was asked why quantum was set by default to 20, which was OK back in VAX (analagous to 9-track tape in this discussion) days but a looooong time when Alpha (SDLT) came along, especially the newer models. Yes you can tweak it, but how many people #1 remember and/or #2 even know this parameter exists.

Read the VMS 8.2 Release Notes, pg 4-19. Quantum default is now 5.

Perhaps we'll see something in some future Release Notes on the backup questions raised in this thread.

Now up to 15 cents.... how many Euros? :-)

Dave...

Jan van den Ende · ‎03-09-2005

Ok, let's put some more coal on the fire!

Firstly, ever since the intro of SCSI I have been _GALLED_ by the unability to correct a single, simple parity error!

And to demonstrate it is (at least: was) a SCSI-specific thing:

Back in the TK70 era we had a (project) backup tape that suddenly was needed on line again.
It was my first-ever experience wit Backup FATAL parity error, used as I was to "nnn Recoverable Errors Encountered"
DEC had a simple solution: read it on a DSSI TK70 drive!
DEC happened to have a site only some 10KM away, with a suitable system.
We went there with a spare blank tape, the DSSI unit encountered _ONE_ recoverable error, the Backup was wrtten to the new tape.

Pfwooee! (then)

Nowadays, NO escape, except ridiculously expensive and time-consuming dedicated recovery shops.....

I keep catalogueing this as a very heavy degradation of VMS resilience, together with the loss of Shadow Mini Merge when moving to SCSI.
At least that second is HAS been addressed, but it took over 10 years of nagging and nagging and nagging...

------------------------

I WANT THE BACKUP ERROR RECOVERY BACK!!!!

------------------------

Cass,
the idea of doing most of the Backup hard work only upon Restore interesting.

Try to think about it, and I guess it will probably very desirable to have two modes of opreation. I do not really see any way to get the defragmentation done, nor a selective restore, without first reading the total tape into memory. Restore to a smaller disk also seems impractical.
Then again, your main point: backup restore for recovery. Whenever you really need it, fragmentation nor file placement will rank high on the priority list!!!

--- a very interesting idea.
Come to think of it: BACKUP/PHYSICAL already comes close...

Dave: USD 0,15 about equals EUR 0,10.
I think I will add another EUR 0,10 myself now.

Proost.

Have one on me.

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Cass Witkowski · ‎03-11-2005

Jan,

For doing the defrag on restore my thought would be that the directory information would be put on the tape first as well as meta information on what say 10 MB chunk or chunks of tape data the file would be in. I know that tape store things in smaller block sizes but this is a logical chunk.

For restoring a particular file the directory would be read as well as the needed chunks from tape. each chunk has index saying what files and what VBNs are in this chunk. It is just a matter of skipping to the proper chunk and getting the needed blocks.

For an entire disk restore the directory information would have the files and their new location on the disk. So as the chunks were read off the tape drive the data would be written to the proper disk locations. Since most disk controllers have write back cache this may help reduce the disk head movement even more.

I guess we are up to 2 bits (25 cents) now :)

Jan van den Ende · ‎03-11-2005

Cass,

I think you should sit with Hoff and/or Andy for some time to do some real reconnessance on such topic!
With only the likes of us this will never grow over phylosofical, but with their ilk, the hurdles will become apparent, and if surmountable, they can indicate so.
_IF_ that can happen, then it is up the the people in this (and other) VMS fora to see that we get it higher up on the priority list... no problem there I estimate.
Bootcamp or DECus Symposium (under whatever name known nowadays) are excellent occasions for such.

-- SHOULD -- you come to the bootcamp (I will), then we definitely should try to arrange such, and I would REALLY want to sit at that table too!

Proost.

Have one on me.

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Uwe Zessin · ‎03-11-2005

Sounds like you are talking about /PHYSICAL backup/restore at the moment...

Please realize that a /PHYSICAL backup must be taken from a quiet volume (e.g. a snapshot, split-mirror or from a dismounted disk) - else, a change in meta-data (e.g. overwriting the locations of a former directory file) during the backup can create a corrupted volume on tape.

This is much more critical than a BACKUP /IMAGE /IGNORE= INTERLOCK, which is at least somewhat synchronized with concurrent file system access.

.

Tom O'Toole · ‎03-16-2005

Jan,

I (and I would imagine everyone else here) agree with your sentiments 100%. The DSSI story is amazing. This PROVES that there is recoverability we are losing unecessarily, and I think it's unacceptable. As you say, the silence from the vendor(s) about this issue is deafening.

Clearly there is a different, more fatal status being returned by the tape driver/controller which backup will not
retry. Does anyone have access to the source listings to see exactly what's happening here? It's very nice if modern controllers are more reliable than old 9-track, but I still get tape errors, and I want recovery (we are backing up a lot more data now).

One of the most ridiculous things is getting these -F-PARITY errors on WRITE! I would like to know WHAT are the circumstances where this error is generated. Are we killing a whole backup job because of one bad block on the tape? Backup traditionally inhibited retry on write error, and would just rewrite the block, brilliantly handling minor media defects. Are we now just supposed to trash the tape?

Can you imagine if we used PCs to manage our enterprise systems? ... oops.

Tom O'Toole · ‎03-22-2005

I've just come accross sys$etc:mkset.txt which is describing setting a /def_rec_allowed parameter on scsi drives (including fibre attached MG devices). The text implies this would only be necessary on third party drives, but makes some comments about allowing deferred recovery of an error instead of a fatal return. I wonder if this is applicable to this discussion. We get our -f-parity errors on hp/compaq drives.

comments?

Can you imagine if we used PCs to manage our enterprise systems? ... oops.

Uwe Zessin · ‎03-23-2005

I suspect the -F-PARITY is a general error condition returned by the SCSI device drivers to signal an error - I have seen such on SCSI disks as well.

.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance