VMS Poor SDLT performance

Jan van den Ende · ‎08-08-2004

Uwe,

yes, I don't know of any subsystem that could handle 4096 concurrent IO's. The setting will just give you the maximum, and that is not so bad. (Unless part of the subsystem can be trashed by it, like HSG80 in some firmware versions. That constitutes a BUG, but it can hurt).
Your pointer DID show me that working purely from memory is definitely not perfect (at least in my case :-(
I totally forgot about ASTLM and ENQLM.
Insert into the above:
Each I/O that is issued will have to be kept track of, and that is done be declaring an AST that will trigger when the IO completes. So, you need at least one AST for each IO. If the number of generated IO's would exceed ASTLM, then also phase 2 above transfers control to phase 3.
I don/t know where exactly ENQLM comes in. It is the number of entries you can have in timer queues, and it is not clear to me how they would be used in Backup.

Don't rust yours pelled jacker to fine doll missed aches.

Wim Van den Wyngaert · ‎08-08-2004

Maybe the best solution is to do "set working_set" just before the backup command.
Then you can use exactly what you want and you do'nt depend on pql etc.

Wim

Wim

Robert Atkinson · ‎08-08-2004

Tim - we have excatly the same setup as you, and backup up around 27GB/hour (99887284 blocks takes 100 minutes).

This is on a loaded system (running Dayend) copying from HSG Snap disks, but I wouldn't expects this to rise to much more than 32GB p/h on a standalone system.

We've removed CRC checking, but not /GROUP and have had no problems (so far).

Rob.

Galen Tackett · ‎08-09-2004

Jan wrote:

> I don/t know where exactly ENQLM comes in. It is the number of entries you can have in timer queues...

Jan,

Did you mean TQELM or ENQLM? I believe TQELM is the one that governs entries in timer queues.

Galen

Jan van den Ende · ‎08-09-2004

Galen,

must have still been sleepy.
Of course I should not have used the description for TQELM to refer to ENQLM.
ENQLM is the number of locking operations request a process can have. Realising that, I can also explain why it is bad to not have that high enough: accessing and de-accessing files require lock operations, and you will be doing a lot of that.

Sorry for any confusion I generated ;-[

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Tim Nelson · ‎08-09-2004

Thanks for all the input.

I am still doing some testing by upping the quotas.

Ran into another problem which I will put in a different thread. ( Run/uic= does not start process as other user ).

Uwe Zessin · ‎08-09-2004

Jan,
a HSV100 or HSV110 controller has two ports and can have up to 2048 outstanding I/Os per port. But as far as I can tell OpenVMS will only use one path and the SCSI device driver will not create such a deep I/O queue anyway.

.

Tom O'Toole · ‎08-10-2004

Rob,

Unless I'm mistaken, removing /crc will remove BACKUP's ability to determine that a block is in error, thus greatly limiting the usefulness of redundancy groups specified with /group.

As for the original posters performance problem, if your cpu is close to 100%, and backup is the main user, you're not going to be able to significantly improve performance no matter how you fiddle with quotas.

Specifying /nocrc WILL greatly reduce BACKUP's cpu consumption, and you should test with this just to verify that CPU is your bottleneck (your throughput should go way up with /nocrc if this is the case). However, you should not trust your production backups to /nocrc, and instead consider why so much CPU is being used by backup.

Most VMS systems which support a SAN should be able to drive a single SDLT at full speed without running out of CPU. Please tell us the other details of your configuration. Thanks.

Can you imagine if we used PCs to manage our enterprise systems? ... oops.

Uwe Zessin · ‎08-10-2004

Of course you are right, Tom.

Let me repeat what I have already written above:
""You can try to limit the CPU load by using /NOCRC/GROUP=0, but that is only good for testing and not a serious backup, because it will turn off end-to-end checking and remove redundancies from the save-set.""

.

Jan van den Ende · ‎08-10-2004

Tom, Uwe,

In the old days I would have blindly agreed 100%.

But now, in the days of SCSI....

Way back when, at a previous customer, there was the need to restore a backup tape 10 years old.
After locating a tape-unit still able to read 800 bpi reels (!), I read the backup tape.
Surprisingly, it was CPU-bound, and took quite long.
The final message explained that: 33000-some recoverable errors, and some unhappy operator who had to clean the tapeunit from most of the tapes' magnetite.
THAT is what Backup with full recovery functionality is intended to do for you, ... if you use DSA tape systems.
Now on SCSI: ONE SINGLE parity error, and SCSI forbids reading on.

Antonio will know exactly what I mean when I state that THAT is the reason SCSI is pronounced "scuzi": that is the reply in Italian if you want any recovery.

In the days of TK50 and TK70, there DID also exist DSSI devices (slow, but DSA compliant), so, IF tou had a tape with a parity error, your local DEC would read the tape for you (usually just ONE recoverable error!), and write it to another one.
I am not aware of the same being possible with any DLT II, III, or IV system.

So Tom, Uwe, anybody else, if I am too pessimistic because of ignorance, please educate me: WHAT is the use of /crc if backing up to tape? It is GREAT, but you cannot use it..

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Uwe Zessin · ‎08-10-2004

It has nothing to do with 'SCSI'. It is a function of the tape drive firmware if it does not let you go past an unrecoverable spot. You could also have seen this on a DSSI-based DLT tape drive (TF85, TF86, ...).

Recoverability has nothing to do with a tape drive being DSA compliant (you talk about DSSI and CI-attached drives, right?). I have also seen BACKUP recovering errors on a Massbus-attached tape drive (TU77 or TU78, I don't recall). How does BACKUP do it? It creates redundant data similar to RAID-5!

I know that many people say that todays tape drives do lots of self-correction to cover tape errors and do not let you get past an unrecoverable spot. I still suggest to use /CRC, because it is an end-to-end check on the whole data path from the CPU to the bits on the tape. What happens if BACKUP creates a corrupted save-set? I have seen BACKUP doing that - the CRC detected it.

I am open about using /GROUP=0, but I will continue to use /CRC.

.

Jan van den Ende · ‎08-10-2004

Uwe,

I concur.
I never really encountered situations in which the crr overhead was a reason NOT to use it, and, maybe just because of old habit, I keep using it.
But my previous statement stands: i NEVER was able to read past (even a single) parity error on any SCSI tape drive. All the others (Q-BUS. CI etc; no Massbuss experience though) simply tarnsfered it to the OS, kinda "DUNNO, can you make anything of that?", and then, of cource BACKUP could.
But if you don't get any bits out of the drive past the parity error??
About DSSI TF8x: include them into my list of "non-SCSI drives for the tape".
DO such drives also exist for the current tape generations? If yes, Hallelujah!! Tell me about it, and I will somehow get them past the budget-guards.
(would make me happy about NOT doing without crc)

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Tom O'Toole · ‎08-10-2004

Uwe,

If you use /crc/group=0, what does that get you? If /crc detects a bad block, you have no XOR block with which to recover the bad block.

Can you imagine if we used PCs to manage our enterprise systems? ... oops.

Uwe Zessin · ‎08-10-2004

Tom,
when I wrote "I am open about using /GROUP=0", I meant somebody has to present some arguments _for_ using it. Thank you for the counter-argument ;-)

.

Antoniov. · ‎08-10-2004

I'm with Jan on SCSI; with old TK50 or TK70 we could restore data, slowly but we could.
Now, DAT tapes doesn't feel me safe.
For my own I prefer default /CRC/GROUP.

Antonio Vigliotti
P.S.
SCSI is read in italian as scuzi and sound like sorry!

Antonio Maria Vigliotti

John Koska · ‎08-11-2004

You mentioned EMC BCV meta volumes. Are these logical volumes, in which other systems have a physical portion of the disk?

Shared disk, shared cache, and other shared stuff may lead to shared performance.

Try monitoring 3930 for a time to find out when it is idle, and then retry the backup or at least a large enough portion to have a benchmark.

:) jck

Wim Van den Wyngaert · ‎08-19-2004

Did some testing and found that backup easily uses 80.000 blocks of wsquota. If wsquota is lower, the working set slowly increases above wsquota.

Wim

Wim

Wim Van den Wyngaert · ‎09-12-2004

http://www.quantum.com/NR/rdonlyres/68D6FAF0-8A46-49D5-B051-9CAE7D88BBD1/0/CH2e6.pdf

http://www.quantum.com/NR/rdonlyres/A3096946-7A8A-4A97-AC19-C21624335802/0/CH2e5.pdf

It seems that modern DLT drives do ECC themselves. So, /group is no longer necessary ? And /crc neither ?
It corresponds with /group=4 which is better than the VMS default of 10.

Wim

Wim

Uwe Zessin · ‎09-12-2004

Again:
/CRC is an end-to-end check: from the system's memory/CPU down to the tape media. ECC in the tape drive will not help against corruptions on the SCSI bus or BACKUP corrupting a save-set.

.

Wim Van den Wyngaert · ‎09-12-2004

Uwe,

Is such a corruption possible and where is it documented ? Isn't scsi having protection stuff either ?

What if data is corrupted while being transferred from disk to the VMS system ? Are we protected ?

Wim

Wim

Wim Van den Wyngaert · ‎09-12-2004

http://www.pcguide.com/ref/hdd/if/scsi/protCRC-c.html

SCSI has parity check and scsi-3 has full crc. So, on my GS160 I could use /group=0 and /nocrc. On my old nodes I should be careful.

Wim

Wim

Guenther Froehlin · ‎09-14-2004

There are 2 basic tests you can/should do when you think BACKUP runs (too) slow:

$ SHOW TIME
$ BACKUP/all_your_qualifiers disk: NLA0:dummy/SAVE
$ SHOW TIME
$ BACKUP/all_your_qualifiers disk: NLA0:dummy/SAVE/LIST=test.lis

and

$ SHOW TIME
$ BACKUP/PHYSICAL/BLOCK=65024 disk: NLA0:dummy/SAVE
$ SHOW TIME
$ SHWO DEVICE/FULL disk:

From the first test get the total blocks count from the bottom of test.lis and divide it by 2 * elapsed time to get the KB/sec.

For the /PHYSICAL test divide the device's total blocks count by 2 * elapsed time to get the KB/sec.

The two numbers give you an idea how much the current volume layout (file size, location, fragmentation) is impacting your backup speed.

/Guenther

Tim Nelson · ‎09-14-2004

Thanks again for the continued responses.

Here are a couple additional items we have noticed:

Giving the process addition system resources, i.e. increasing quotas, has actually made backup times increase. What was noticed was the disk queue increased dramitically initially making me believe that the Symm just could not provide enough IO bandwidth. With further investigation it was even weirder that 100% of the data was coming from the Symm cache and hence no physical disk contention problem.
Later investigation has put us in the current state.
There seems to be some thrashing amongst the FC HBAs. One HBA to the disk. The other HBA to the MDR with the tape drives.
The tape drives will cycle between idle and writing. When writing the disk queues will increase. When idle the disk IO will increase. Hence the thrash.
We are not sure yet if there is some limitation on this AS1200 and its ability to transfer data over the bus between two FC HBAs ( they are separated over hose0 and hose1) or if some other limitation exists.
If the SDLT can never stream because the system is fighting over the IO then we will never reach any good backup rates.
If anyone has thoughts or ideas please add to this lenthy string.

Thanks again !! Points to all !!

Martin P.J. Zinser · ‎09-14-2004

Hello Tim,

everybody tweaked the backup command sofar, how about the mount?

We use

MOUNT/NOASSIST/MEDIA_FORMAT=COMPACTION/FOREIGN-
/CACHE=TAPE_DATA

for our SDLTs.

Greetings, Martin

Antoniov. · ‎09-14-2004

Hello Martin,
welcome back in vms :-)

Antonio Vigliotti

Antonio Maria Vigliotti

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance

Re: VMS Poor SDLT performance