Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Shadow Copy Breaks

SOLVED
Go to solution
Wim Van den Wyngaert
Honored Contributor

Shadow Copy Breaks

I have a problem : the performance of the shadow copy (interbuilding) is too good. As a result, the application takes 18 instead of 3 minutes to do certain things (Sybase dbcc, no other activities on the cluster).

Is there a way to decrease the resource usage of the shadow copy (not merge !!!) ?

Max shadow copy is set to 1 (and 0 on the other member of the cluster).

Wim
Wim
41 REPLIES
labadie_1
Honored Contributor

Re: Shadow Copy Breaks

You have several logicals to play with

SHAD$MERGE_DELAY_FACTOR applies to all shadow sets mounted on a node unless you also use SHAD$MERGE_DELAY_FACTOR_DSAnnnn.
SHAD$MERGE_DELAY_FACTOR_DSAnnnn applies to each shadow set specified by its virtual unit name, DSAnnnn.

If you increase the setting for either logical name, you increase the merge rate and decrease the I/O rate. Conversely, if you decrease the setting for either, you decrease the merge rate and increase the I/O rate.




see at this copy of the doc
http://www.pi-net.dyndns.org/docs/openvms0731/731final/5423/5423pro_013.html

the chapter
9.2.1 Improving Performance of Unassisted Merge Operations
Wim Van den Wyngaert
Honored Contributor

Re: Shadow Copy Breaks

Gerard : 2nd paragraph : not merge but copy.

Wim
Wim
Willem Grooters
Honored Contributor

Re: Shadow Copy Breaks

Wim,

I'm not familiar with shadow-copy - let alone interbuilding - but I could think of the requirement to have it done with all possible resources of network (fibe, I presume) and disk.
If your Sybase application has the same requirements - and knowing Sybase is a relational database and THUS requires a lot of resources as well - it's obvious you have a conflict. A factor 6 however is very high.
I would suggest to look for the real bottleneck: disk, controller or interconnect.

Another possibility I can think of is the disk being locked, due to the shadow-copy's requirements. That may prevent the application access the disk during shadow-copy operations.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Volker Halle
Honored Contributor

Re: Shadow Copy Breaks

Wim,

during a shadow-copy, SHADOW_SERVER will be doing 127 block reads and writes as fast as it can to provide the required disk redundancy as fast as possible.

Try to find out, where the bottleneck is between the shadow-copy and your application. CPU-usage during a shadow-copy would normally be minimal, interconnect load and disk/channel load may be more significant, but it highly depends on your configuration. Start with a couple of MONITOR commands during the next planned shadow-copy.

I've recently seen a massive CPU-load (high INT-stack) on an ES47 cluster during shadow-copy due to nonpaged pool fragmentation.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Shadow Copy Breaks

Volker,

It is a FDDI interconnect and the disk thruput was about 12 MBytes per second. Sybase hardly got 2 MByte and all the rest went to the shadowing. The interconnect was charged about 8 Mbytes per second.
Seems normal to me.

It seems that the shadowing is simply "gourmand".
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Shadow Copy Breaks

Volker,

No pool fragmentation. 40% used and largest block is 50%.

Wim
Wim
labadie_1
Honored Contributor

Re: Shadow Copy Breaks

It memory serves me, the process shadow_server is not allowed to take more than 10 % of the Cpu, even if nobody else uses the Cpu !
Ian Miller.
Honored Contributor

Re: Shadow Copy Breaks

what version of VMS?
____________________
Purely Personal Opinion
Wim Van den Wyngaert
Honored Contributor

Re: Shadow Copy Breaks

CPU is not the problem. The total copy took 5 minutes of cpu in 14 hours.

It is VMS 7.3 patched until 03-2003.

Wim
Wim
Volker Halle
Honored Contributor

Re: Shadow Copy Breaks

Wim,

I don't think shadow-copy is 'something special'. SHADOW_SERVER just issues huge IOs (127. blocks) as fast as possible to both disks involved in the shadow-copy, which will cost bandwidth in the FDDI link.

The CPU usage accounted to SHADOW_SERVER is not everything. Do you have MONI MODE data from the shadow-copy ? MONI MSCP on the remote site ? MONI DLOCK ?

You could change the IO-size with DEFINE/SYS SHAD$COPY_BUFFER_SIZE nn (with 31 <= nn <= 127).

Volker.

PS: I'm currently on vacation, so response time may be slower than usual ;-)
Jan van den Ende
Honored Contributor

Re: Shadow Copy Breaks

Wim,

without pointing at anything special (yet), I think something _IS_ wrong at your site.
WE also have inter-site FDDI (backed initially by 10Mb, now by 100MB ethernet),
and we have found out that shadowcopies have 'little ( <50% ) infuence on response times, as long as we keep the number of concurrent copies at no more than three....
unless, a process has heavy IO to a disk currently under copy, THEN performance may drop to 40 - 50 %.

We have had enough opportunity to get to those stats:
our backups were done by taking one member out the set, and adding in "the forth member". Then the 3rd member was backuped to tape, and served as a hot-standby backup.
Since MiniCopy (and SAN) were introduced, we reverted to a more conventional scheme, since the 4-disk scheme breakes MiniCopy.

The reason to introduce 4-disk way back when was a glitch in the way one applic used external communication with a remote system. All smooth in DECnet times, but (mandated) switching to IP initially required a restore about once a week. We soon learned that swapping the disk + shadow copy took minutes, restoring the tape and then shadowcopy took over an hour, and was otherwise a hindrance too. Now the app is better adapted to IP, so the balance shaifted again.

What is causing your performance loss?
No idea yet, but my bets are that something at your site is sub-optimal.
Exactly WHAT? That is the $1000 question.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Wim Van den Wyngaert
Honored Contributor

Re: Shadow Copy Breaks

Extra info for the 1000$ question :

The shadow server was doing 1 copy at the time at 1 disk was MSCP served (thus going over the FDDI).

A confirmation was given that performance is normally 2-3 minutes and now 18. Sybase is doing IO's with size 60 pages.

Sybase runs with prio=3, shadow_server with 4.

No logicals *shad* active except shad$merge_delay_factor (not used for copy).

The MSCP params :
MSCP_LOAD 1
MSCP_SERVE_ALL 1
MSCP_BUFFER 16384
MSCP_CREDITS 128
MSCP_CMD_TMO 0
Wim
labadie_1
Honored Contributor

Re: Shadow Copy Breaks

Check Mscp is fine with

[OpenVMS] MSCP Disk Server, Usage And Performance Tuning


http://h18000.www1.hp.com/support/asktima/operating_systems/009324D2-0E9A6D00-1C01E7.html

or simply
$ mc agen$feedback
$ sea sys$system:agen$feedback.dat mscp

do you have a high percentage of mscp_frag_io and mscp_wait_io
compared to msco_total_io ?
Wim Van den Wyngaert
Honored Contributor

Re: Shadow Copy Breaks

Gerard,

Wait and frag are both zero.

I decreased priority of shadow server to 1. Hope this will improve Sybase thruput. In any case, can not do much harm.

Wim
Wim
labadie_1
Honored Contributor

Re: Shadow Copy Breaks

Can you take a network analyzer, and monitor the protocols on your FDDI, so be able to say from 9 to 10 this morning, on the FDDI link, there was
- 68 % of 60-07 (Cluster protocol)
- 23 % of Tcpip
- 3 % of ...

there was a high (or low) percentage of retransmitted frames

the top ten talkers were alpha1 et alpha2, followed by alpha12 and alpha13...

Or ask your network colleague (or yourself) to do it.

Jan van den Ende
Honored Contributor

Re: Shadow Copy Breaks



In any case, can not do much harm.


.. and probably will not help much, eighther.

Since you have no CPU load problems, every process that reaches COM state will not have to wait for the CPU, and can do its thing.
One such thing is launching IO.
And those, once launched, are not influenced by process priority, only by IPL.

OTOH, if it DOES make a difference, THEN we enter interesting territory.

.. en zeker, Duvel staat vrij hoog op mijn Prefered List. Dentergems Wit scoort ook sterk, en Koninck, en diverse Tripels, en ... ik krijg er dorst van!


Cheers.

Have one on me!

Jan

Don't rust yours pelled jacker to fine doll missed aches.
Volker Halle
Honored Contributor

Re: Shadow Copy Breaks

Wim,

if the Sybase performance 'suffered' through a parallel shadow-copy, you have to ask yourself, which resource might have been the bottleneck:

- SHADOW_SERVER will read from the source disk and write to the target disk. So it needs some CPU cycles on the node where it's running, some on the MSCP server node, bandwidth and IOs on the local and remote disk channel and disks and lots of bandwidth on the FDDI

- Sybase also wants to be doing IOs, so it also needs CPU cycles and bandwidth and IOs to the disks involved. If the IOs are mostly reads, they should come from the local disk, if they are mostly writes, then FDDI bandwidth is also required.

I would start with making a drawing of all the components involved, try to collect some monitor data (MONI MODE, MONI MSCP, MONI SCS, MONI DLOCK) on both nodes during the shadow-copy and then try to interpret the results - looking for unusual load or behaviour.

This is not some kind of 'problem' which can easily be solved by one or two questions and a solution in this kind of forum.

Volker.
labadie_1
Honored Contributor

Re: Shadow Copy Breaks

I would add to Volker monitor items
$ monitor rlock
labadie_1
Honored Contributor

Re: Shadow Copy Breaks

on both nodes
$ ana/sys
lck sh active
lck stat/toptree=10
may highlight some things.
Wim Van den Wyngaert
Honored Contributor

Re: Shadow Copy Breaks

The Sybase server is doing only reads. The shadow server is doing writes on the disk that Sybase is reading.

The local and remote disk are controller based mirrored.

I checked in TNG and found nothing special in SCS, DLOCK, etc.

I wasn't hoping to find the reason but hoping there was an undocumented feature allowing to decrease shadow server activity.

I checked in amds if I could decrease quotas but it is hardly using any.

JAN : may be you have something comparable at your site but nobody saw it. I just saw it because of an aborted job. I know of other VMS clients that have simular problems (normal operation impossible due to shadow server activity).

Why is there a break available for merge and not for copy ?

Wim
Wim
Jan van den Ende
Honored Contributor

Re: Shadow Copy Breaks

Wim!!


The shadow server is doing writes on the disk that Sybase is reading.


Are you sure about that???
AFAIK a copy-target disk is NOT enabled for RMS access. The only shadow set members that do user IO are th efull members.

But if you do write-IO to the set in an area that has already been copied, then THAT will have to be done to the target member as well.

btw, in a multisite SAN config, did you apply the SITE logicals cf. the Shadowing Manual?


hth

Cheers

Have one on me

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Wim Van den Wyngaert
Honored Contributor

Re: Shadow Copy Breaks

Good point Jan. I didn't think of that.
So : Sybase is reading remote and the shadow server too. This is slower and explains the delay. Btw the shadow server IO is not always visible in TNG 2.4.

And yes, I did define the site logicals.

Jan : couldn't you have the same ? A program reading remotely gets delayed ? It's a pitty I have no play-cluster.

Wim

Wim
Jan van den Ende
Honored Contributor
Solution

Re: Shadow Copy Breaks

Wim,

our shadow sets and manipulation thereoff for Backups are such, that it is rare for us to NOT have at least one member at each site.
Maybe we latently do have your problem, but then it has never caught our attention.

Cheers.

Have one on me

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Wim Van den Wyngaert
Honored Contributor

Re: Shadow Copy Breaks

Jan,

Should have given the rabbit on your previous answer (but now you have more points).

Have at least one on me (I will this evening)

Wim
Wim