Operating System - HP-UX
1828779 Members
2621 Online
109985 Solutions
New Discussion

script performance with gzip, wait and background commands

 
SOLVED
Go to solution
Michael Resnick
Advisor

script performance with gzip, wait and background commands

Hi all,

My DBA has this huge database compress command (see below) that combines multiple sets of gzips running in the background and using the wait command to break the sets up.

My concern is that doing it this way may put a huge strain on the system and there should perhaps be some better way of accomplishing what he needs. (He needs to have all of the database files copied and compressed for subsequent shipment to another server.)

Is there a better, more efficient way to do what he's doing in the folling command? (yes, he does this as a single command).

I could probably do this single-threaded using find and piping the found files into gzip, but he says he needs to take advantage of the 12 processors so he wants to run multiple gzips at once.

Thanks in advance,

------------------------------


gzip -c /ora6/system1.dbf > /ora6/system1.dbf_coldbackup.gz & gzip -c /ora7/rbs1.dbf > /ora7/rbs1.dbf_coldbackup.gz & gzip -c /ora7/rbs1-2.dbf > /ora7/rbs1-2.dbf_coldbackup.gz & gzip -c /ora8/rbs201.dbf > /ora8/rbs201.dbf_coldbackup.gz & wait gzip -c /ora8/rbs202.dbf > /ora8/rbs202.dbf_coldbackup.gz & gzip -c /ora9/rbs301.dbf > /ora9/rbs301.dbf_coldbackup.gz & gzip -c /ora9/rbs302.dbf > /ora9/rbs302.dbf_coldbackup.gz & gzip -c /ora1/detail_5.dbf > /ora1/detail_5.dbf_coldbackup.gz & gzip -c /ora2/data_601.dbf > /ora2/data_601.dbf_coldbackup.gz & wait gzip -c /ora2/data_602.dbf > /ora2/data_602.dbf_coldbackup.gz & gzip -c /ora6/tools01.dbf > /ora6/tools01.dbf_coldbackup.gz & gzip -c /ora1/users01.dbf > /ora1/users01.dbf_coldbackup.gz & gzip -c /ora1/batchdata101.dbf > /ora1/batchdata101.dbf_coldbackup.gz & gzip -c /ora2/batchindex_1.dbf > /ora2/batchindex_1.dbf_coldbackup.gz & wait gzip -c /ora3/data101.dbf > /ora3/data101.dbf_coldbackup.gz & gzip -c /ora4/data201.dbf > /ora4/data201.dbf_coldbackup.gz & gzip -c /ora5/data301.dbf > /ora5/data301.dbf_coldbackup.gz & gzip -c /ora6/data501.dbf > /ora6/data501.dbf_coldbackup.gz & gzip -c /ora2/data_603.dbf > /ora2/data_603.dbf_coldbackup.gz & wait gzip -c /ora8/datab.dbf > /ora8/datab.dbf_coldbackup.gz & gzip -c /ora9/datac.dbf > /ora9/datac.dbf_coldbackup.gz & gzip -c /ora10/datae.dbf > /ora10/datae.dbf_coldbackup.gz & gzip -c /ora11/index101.dbf > /ora11/index101.dbf_coldbackup.gz & gzip -c /ora12/index201.dbf > /ora12/index201.dbf_coldbackup.gz & wait gzip -c /ora13/index2a.dbf > /ora13index2a.dbf_coldbackup.gz & gzip -c /ora14/index501.dbf > /ora14/index501.dbf_coldbackup.gz & gzip -c /ora1/indexa.dbf > /ora1/indexa.dbf_coldbackup.gz & gzip -c /ora2/indexb.dbf > /ora2/indexb.dbf_coldbackup.gz & gzip -c /ora3/indexc.dbf > /ora3/indexc.dbf_coldbackup.gz & wait gzip -c /ora4/indexd.dbf > /ora4/indexd.dbf_coldbackup.gz & gzip -c /ora5/indexe.dbf > /ora5/indexe.dbf_coldbackup.gz & gzip -c /ora6/onlinedata101.dbf > /ora6/onlinedata101.dbf_coldbackup.gz & gzip -c /ora7/onlineindex_1.dbf > /ora7/onlineindex_1.dbf_coldbackup.gz & gzip -c /ora4/data202.dbf > /ora4/data202.dbf_coldbackup.gz & wait gzip -c /ora5/data302.dbf > /ora5/data302.dbf_coldbackup.gz & gzip -c /ora10/rbs401.dbf > /ora10/rbs401.dbf_coldbackup.gz & gzip -c /ora10/rbs402.dbf > /ora10/rbs402.dbf_coldbackup.gz & gzip -c /ora4/data203.dbf > /ora4/data203.dbf_coldbackup.gz & gzip -c /ora2/data_604.dbf > /ora2/data_604.dbf_coldbackup.gz & wait gzip -c /ora2/data_605.dbf > /ora2/data_605.dbf_coldbackup.gz & gzip -c /ora1/users03.dbf > /ora1/users03.dbf_coldbackup.gz & gzip -c /ora1/users02.dbf > /ora1/users02.dbf_coldbackup.gz & gzip -c /ora1/detail_4.dbf > /ora1/detail_4.dbf_coldbackup.gz & gzip -c /ora7/rbs1-3.dbf > /ora7/rbs1-3.dbf_coldbackup.gz & wait gzip -c /ora8/rbs203.dbf > /ora8/rbs203.dbf_coldbackup.gz & gzip -c /ora9/rbs303.dbf > /ora9/rbs303.dbf_coldbackup.gz & gzip -c /ora10/rbs403.dbf > /ora10/rbs403.dbf_coldbackup.gz & gzip -c /ora5/data303.dbf > /ora5/data303.dbf_coldbackup.gz & gzip -c /ora1/detail_1.dbf > /ora1/detail_1.dbf_coldbackup.gz & wait gzip -c /ora1/detail_2.dbf > /ora1/detail_2.dbf_coldbackup.gz & gzip -c /ora1/detail_3.dbf > /ora1/detail_3.dbf_coldbackup.gz & gzip -c /ora1/users04.dbf > /ora1/users04.dbf_coldbackup.gz & gzip -c /ora2/data_606.dbf > /ora2/data_606.dbf_coldbackup.gz & wait gzip -c /ora8/temp11.dbf > /ora8/temp11.dbf_coldbackup.gz & gzip -c /ora8/temp13.dbf > /ora8/temp13.dbf_coldbackup.gz & /gzip -c /ora8/temp14.dbf > /ora8/temp14.dbf_coldbackup.gz & gzip -c /ora8/temp15.dbf > /ora8/temp15.dbf_coldbackup.gz & gzip -c /ora8/temp16.dbf > /ora8/temp16.dbf_coldbackup.gz & gzip -c /ora8/temp17.dbf > /ora8/temp17.dbf_coldbackup.gz & wait gzip -c /ora2/redo1a.log > /ora2/redo1a.log_coldbackup.gz & gzip -c /ora3/redo1b.log > /ora3/redo1b.log_coldbackup.gz & gzip -c /ora4/redo2a.log > /ora4/redo2a.log_coldbackup.gz & gzip -c /ora5/redo2b.log > /ora5/redo2b.log_coldbackup.gz & gzip -c /ora2/redo3a.log > /ora2/redo3a.log_coldbackup.gz & gzip -c /ora3/redo3b.log > /ora3/redo3b.log_coldbackup.gz & wait gzip -c /ora4/lans/redo4a.log > /ora4/redo4a.log_coldbackup.gz & gzip -c /ora5/redo4b.log > /ora5/redo4b.log_coldbackup.gz &gzip -c /ora2/redo5a.log > /ora2/redo5a.log_coldbackup.gz &
26 REPLIES 26
Steven Schweda
Honored Contributor

Re: script performance with gzip, wait and background commands

> My concern is that doing it this way may
> put a huge strain on the system [...]

Define "huge strain". (Some CPU chip will
pull up lame with a stretched cache address
line?) What happens to the system's
performance when this "job" runs?

Where's the current bottleneck, I/O or CPU?

Knowing nothing, I'd guess that running more
than one gzip process per CPU would be
counter-productive, but it seems as if it'd
be pretty easy to run some tests with the
crazy command broken up in different ways, to
see what helps and what hurts. If the CPUs
are fast and the disks are slow, then
multiple gzip processes per CPU might be
good.

Running some tests with even a little
instrumentation (elapsed time, CPU time)
might be more useful than accumulating
guesses from "experts" (who have insufficient
info on which to base an intelligent
conclusion).
Bill Hassell
Honored Contributor

Re: script performance with gzip, wait and background commands

It sounds like this hasn't been run yet. gzip is somewhat CPU intensive but it will certainly be disk intensive. Running all of these will probably run the CPU usage up to 100% (not a bad thing if nothing else is running) and the disks will be rattling around the floor. I wouldn't run any production database activities during this time if bad performance would be a problem.

Seriously, you can't determine the impact until you have some data. Pick a database metric, some query that takes a minute or two, and run it without any gzip load. Then run 4 or 5 copies at the same time and run the same metric. That should give you an idea what kind of load to expect with several dozen gzips running.


Bill Hassell, sysadmin
Michael Resnick
Advisor

Re: script performance with gzip, wait and background commands

OK, yes, perhaps some additional info would be needed. 8=]

We've been chasing down a problem with the DBA's backup for a while. This process normally takes less than 2 hours to run in the three weekends following a reboot. However, on the fourth weekend, it'll take about three hours and then the subsequent weekend it's killed after four weekends.

Data file sizes are about the same from one week to the next, but with a litle growth (perhaps a few mb.)

SysAdmin says that we do see a disk bottleneck when this runs, but performance is as expected from the array. Memory is not a bottleneck.

Basically, I should be asking if there's a more efficient way to write the script he has above. Again, I could create just pipe the output from find into gzip, which would process the files one-by-one (and perhaps remove some of the disk bottleneck), but is there a more efficient way to accomplish this?

Steven Schweda
Honored Contributor

Re: script performance with gzip, wait and background commands

> [...] following a reboot. [...]

So, as I understand it, this job runs more
slowly the longer the system has been up?
And you think that the problem is with
_this_ job, not the other job(s) on the
system with the memory leak(s)?

> We've been chasing down a problem with the
> DBA's backup for a while. [...]

And if you concentrate exclusively on the job
with the symptom, you may be ignoring the
other job(s) which may be causing the growing
(system-wide) resource exhaustion which is
causing the trouble.
Michael Resnick
Advisor

Re: script performance with gzip, wait and background commands

Yes, this script run on Sunday and will run in about an hour if run within a week of a reboot, in about 1.5 hours on the following weekend, and increase with each week following that.

We've been monitoring many other areas and the application itself seems to be running fine all the while. (I'm one of the app guys.) Our processing times, even for our "long running batch" routines (1.5 hours) are fairly consistent even many weeks after a reboot.

Yes, this is the only job where we see the problem exists, but we're still looking into other areas of the system.

We've had our support vendor come in and perform a system-wide review and they report everything is configured correctly and all hardware is working fine.

I'm not ruling out other areas and we're still looking for them, but the problem only started (or was noticed) within a few weeks of the DBA implementing the above script.

We very well could have a memory leak somewhere as there is 12GB on the machine and we can watch the free memory slowly go from about 8gb to <1gb after about two weeks. It will take a couple of months to get that low if we do not run the above script. Our sysadmin says this is normal though.

What tools (if any) are available to help diagnose and find memory leak issues?

Again, thanks in advance for any info.
James R. Ferguson
Acclaimed Contributor

Re: script performance with gzip, wait and background commands

Hi Michael:

> This process normally takes less than 2 hours to run in the three weekends following a reboot. However, on the fourth weekend, it'll take about three hours and then the subsequent weekend it's killed after four weekends.

This certainly does sound like a memory leak is at work.

You don't say much about the actual backup tool used following the compresssion. If, for example, you were using 'fbackup' and did a 'kill -9' at any time, you would *not* cleanup the shared memory segments used by this utility. Then, over time, you would "lose" memory.

Does this, or something like it, fit?

Regards!

...JRF...
Michael Resnick
Advisor

Re: script performance with gzip, wait and background commands

I believe this is what occurs...

1) DBA shuts down the DB and brings it up in restricted mode
2) Performs a full export
3) Shuts down DB and performs command above
4) Starts up the DB

We don't typically kill things but I'm not sure about others. I guess the best thing to do is find some tool that can identify if memory areas are left hanging around.
George Spencer_4
Frequent Advisor

Re: script performance with gzip, wait and background commands

Hi Michael,

I am amazed that this works, so it proves that you learn something new each day.

I am sure that it is not as efficient as your dba thinks. I suspect that the "wait" is being used to limit the number of concurrent processes, but this would also be restricting the efficiency of the gzipping. For example, at the start of a group of gzip processes, you may have six processes, but by the end it will only be a single process that is running. Once this process finishes, the next group would proceed.

Surely a better way would be:

Create an array of dbf files to gzip
While there are still unzipped files
if "jobs" shows less than six processes are running
pop a dbf file from the array and gzip in background
end if
sleep 30 seconds
end while

This would mean there would be 6 processes running right up until the end.

That's my thoughts anyway. Hope it helps.

George
Dennis Handly
Acclaimed Contributor

Re: script performance with gzip, wait and background commands

>George: I am amazed that this works

Yes but silly to put them all on one line. Just use:
gzip -c /ora6/system1.dbf > /ora6/system1.dbf_coldbackup.gz &
...

I have no idea what "wait stuff" will do, since stuff isn't a job.

>I suspect that the "wait" is being used to limit the number of concurrent processes, but this would also be restricting the efficiency of the gzipping.

It wouldn't matter if they finished at the same time. Also, it allows others processes to run.

>Surely a better way would be:

It might be simpler to divide the files in 6 parts, invoke 6 script(s) in the background to work on each part. And inside the script, just gzip each file in the foreground.

Until it gets near the end, there will always be 6 gzips.

I suppose you could use a named pipe to divide the filenames, not sure how to handle the EOF on the pipe case?
George Spencer_4
Frequent Advisor

Re: script performance with gzip, wait and background commands

The reason that I am amazed that this works, is that if you attempt to run:

who > /tmp/who.1 & wait who > /tmp/who.2 &

Only the first who command will generate output; the second who command generates an empty file. However, the use of:

who > /tmp/who.3 & wait; who > /tmp/who.4 &

will result in both commands working.

I assume that the use of the wait command, after a group of background gzip's, is to wait until all of these have finished; before proceding to the next group. If the sizes of the logs is similar, then the gzip processes might all finish at around the same time. However, if there is a considerable difference between the log sizes, then the concatenated gzip's would not be the most efficient techique.

Excuse my previous attempt at structured English; I am a bit out of practice.
Dennis Handly
Acclaimed Contributor

Re: script performance with gzip, wait and background commands

>George: Only the first who command will generate output; the second who command generates an empty file.

Right, that's what I said. But there is NO second who command, there is only a wait.

>who > /tmp/who.3 & wait; who > /tmp/who.4 &

Why propagate silliness? Put them all on separate lines.
Michael Resnick
Advisor

Re: script performance with gzip, wait and background commands

Sorry guys... A little communication issue with the DBA... The gzip commands are all on individual lines such as:

gzip... &
gzip... &
gzip... &
gzip... &
wait
gzip...
and so on.

So, this will spawn several gzip processes and then when it hits the wait, the script will not continue until all prior background processes have completed.

Still monitoring the system, but I can see that free memory is decreasing by about 400-500mb per day.
Dennis Handly
Acclaimed Contributor

Re: script performance with gzip, wait and background commands

>but I can see that free memory is decreasing by about 400-500mb per day.

Have you found a process that is increasing in memory use? Does "swapinfo -tam" show an increase in total swap use?
Michael Resnick
Advisor

Re: script performance with gzip, wait and background commands

Since I'm a lowly SE I cannot run swapinfo. :(

I do have access to Glance and can get info from there, but am not an expert at reading the data nor how to locate a memory leak issue.
Dennis Handly
Acclaimed Contributor

Re: script performance with gzip, wait and background commands

>Since I'm a lowly SE I cannot run swapinfo. :(

I'm confused. Only old broken versions of swapinfo fail to let anyone run them. What OS version are you using? What error do you get?

>but am not an expert at reading the data nor how to locate a memory leak issue.

Well, you can look at top and see if the size value is increasing. gpm's process display should have something similar.
Michael Resnick
Advisor

Re: script performance with gzip, wait and background commands

I guess our SysAdmin doesn't like to play fair:

me--> $ /usr/sbin/swapinfo -tam
ksh: /usr/sbin/swapinfo: cannot execute
me--> $ ll /usr/sbin/swapinfo
-r-xr--r-- 1 bin bin 20480 Nov 9 2000 /usr/sbin/swapinfo

me--> $ model
9000/800/S16K-A

It's an hp-ux 11.11 machine.

From top:
Memory: 2305944K (1189060K) real, 3826496K (2605988K) virtual, 2410436K free

That free memory was at about 2850000K yesterday morning, and at about 3250000k the morning before. After a reboot, it is around 6550000K
James R. Ferguson
Acclaimed Contributor

Re: script performance with gzip, wait and background commands

Hi Michael:

> I guess our SysAdmin doesn't like to play fair

So it would seem, since on an 11.11 system of mine:

# ls -l /usr/sbin/swapinfo
-r-xr-xr-x 1 bin bin 20480 Sep 7 2004 /usr/sbin/swapinfo

At the least, have your syaadmin 'chmod' the binary as above. Then tell him/her and the DBA that an open dialog and some cooperation goes a long way to finding a solution.

Regards!

...JRF...

James R. Ferguson
Acclaimed Contributor

Re: script performance with gzip, wait and background commands

Hi (again) Michael:

...and if you have 'glance' available you can see swap utilization with 'w'. Use '?' to see other available views.

Regards!

...JRF...

Dennis Handly
Acclaimed Contributor

Re: script performance with gzip, wait and background commands

>I guess our SysAdmin doesn't like to play fair:
-r-xr--r-- 1 bin bin 20480 Nov 9 2000 swapinfo

This may be HP's fault here. Your sysadmin would have to use swverify to determine it. But getting a latter patch may just fix it.
You might also make a copy of swapinfo and then add execute permission.

>From top: ... 2410436K free

I'm not sure how much you should be trusting top for this critical info.
Michael Resnick
Advisor

Re: script performance with gzip, wait and background commands

Glance shows

/dev/vg00/lvol2 device,4.0gb avail, 0mb used
pseudo-swap: memory, 9.2gb avail, 6.5gb used

I wrote a little program to display a few things, including dyn_buf.psd_free from pstat_getdynamic and have been capturing that info regularly.

on 12/30, it showed 2624m free and slowly decreased with each day. Today, it shows 1569mb. (We've had little activity over the last two weeks as the plant is shut down for year-end.)

How can I see who's taking up the memory or if there's a leak?
Steven Schweda
Honored Contributor
Solution

Re: script performance with gzip, wait and background commands

> How can I see who's taking up the memory or
> if there's a leak?

Unbounded growth in the virtual memory (with
no good excuse) would suggest a leak.

A command like:
UNIX95=1 ps -e -o 'pid sz vsz args'
might reveal who's eating the memory. With
a bit of effort, its output could be piped
through an appropriate "sort" command, to
make it easier to find the culprit(s).
Dennis Handly
Acclaimed Contributor

Re: script performance with gzip, wait and background commands

>ME: This may be HP's fault here.

Yes. PHCO_27007 fixes it.

>How can I see who's taking up the memory or if there's a leak?

I'm not sure psd_free will give what you want. You could be using that memory for the buffer cache or whatever the kernel needs. It seems that psd_avm would be better.

What you really need to do is look at the size for individual processes and see which is leaking, using ps(1) as mentioned by Steven.

Or you can write another program to look pstat_getproc and these fields:
pst_vtsize; # virtual pages used for text
pst_vdsize; # virtual pages used for data
pst_vssize; # virtual pages used for stack
pst_vshmsize; # virtual pages used for shared memory
pst_vmmsize; # virtual pages used for mem-mapped files
Fredrik.eriksson
Valued Contributor

Re: script performance with gzip, wait and background commands

now I haven't read the complete thread, but seems like you could do this alot easier and alot more efficient.
A script like this might do the trick?

#!/bin/bash
num_cpu=12
incl_list="file_with_paths_whitespace_seperated.txt"
output_prefix="file"
binary="/usr/bin/gzip"
options="-c"

x=0
for i in $(cat $incl_list); do
status=1
output_file=$(echo "${output_prefix}_${x}.gz")
while [ $status -ne 0 ]; do
if [ $(ps aux | grep gzip | wc -l) -le $num_cpu ]; then
$binary $options $i $output_file
status=0
else
sleep 10
fi
done
let x=x+1
done

This just makes sure that there never is more then $num_cpus concurrent gzips running.

Hope it gives you some clues :)
Best regards
Fredrik Eriksson
Fredrik.eriksson
Valued Contributor

Re: script performance with gzip, wait and background commands

oh, sry... i forgot the add to background ampersand :)