1834397 Members
2775 Online
110067 Solutions
New Discussion

Concurrent gzips

 
SOLVED
Go to solution
Russell Gould
Advisor

Concurrent gzips

Hi

I'm having difficulty developing a script which ought to be fairly simple I'd guess (if I knew what I was doing !!)

Basically, currently we perform a disk backup of a database whch generates some huge files and then we do gzip * which only runs one instance of gzip and our box has 6 processors. We therefore have 5 idle'ish processors.

What we want to do is have a script that runs after the disk backup has completed which runs 3 concurrent gzips until such time that all files have been compressed in a particular directory
I am hitting two issues though with my script !

Firstly, because the files are so big, if 3 gzips are running, my script tries to pick up a file which is currently being compressed (because gzip creates a new file and leaves the old one there while it's compressing) and errors saying 'do you want to overwrite the .gz file'

Because of this, I tried checking the status of the file using fuser filename | wc -w to ascertain whether the file was being used or not, but this doesn't seem to work either.

I'm a bit stuck.

Please can someone help

Thanks

Russell
It's not a problem, it's an opportunity !
8 REPLIES 8
Tom Jackson
Valued Contributor

Re: Concurrent gzips

Hi Russ:

What if you first determined the files that you want to zip, then do the gzip:

for i in `ls /source_dir/*.ext`
do
/usr/contrib/bin/gzip -c /source_dir/${i} > /dest_dir/${i}.gz
done

Tom
Stanimir
Trusted Contributor

Re: Concurrent gzips

Hi!
As far as understand you could use
3 concurent gzip's and create different files
in your backup directory :
#gzip -c backupfile1 > //bkp1.gz
#gzip -c backupfile2 > //bkp2.gz
#gzip -c backupfile3 > //bkp3.gz

-c is used for redirect output of gzip-command
to different files.

If you wish, you can rename after that bkp*.gz
files with "mv".

You can also increase the backup speed
using: "gzip -1 ...".

Regards.

Robin Wakefield
Honored Contributor
Solution

Re: Concurrent gzips

Hi Russell,

This will do 3 at a time, picking up the next 3 when the 1st 3 have finished, etc...


==========================
#!/usr/bin/ksh
NUM=3
cd yourdirectory
files=`echo *`
echo $files
set -- $files
while test $# -gt 0 ; do
for i in $1 $2 $3 ; do
echo Zipping $i...
gzip $i &
done
wait
sleep 1
if test $# -gt $NUM ; then
shift $NUM
else
shift $#
fi
done
=============================

rgds, Robin
Shannon Petry
Honored Contributor

Re: Concurrent gzips

The trick to your dilemma is to run gzip in the background, so you can get multiple instances.

If you do like this...

for FILE in * ; do
gzip $FILE
done

The commands in the loop must finish before the loop continues.

In unix, there is a nice way to tell your shell to run something background, which is indicated by ending the command with a "&" or ampersand character.

for FILE in * ; do
gzip $FILE &
done

Now, the first gzip will go to the background of the shell, and then the loop continues. This of course will gzip everything in a directory, so use caution where/what you do this with.

gzip is not multi-threaded, so you will not see a benefit of a multi-cpu machine on a single gzip. I thought that a gnu-zip multithreaded version was in the works at some point, not sure the status though. Check the gnu home page for updated information.

Similarly, pkzip has a HP-UX (all unices for that matter) version which may be multi-threaded. Again, check the pkware home page for better info.

Regards,
Shannon
Microsoft. When do you want a virus today?
Wilfred Chau_1
Respected Contributor

Re: Concurrent gzips

try use fuser and make sure the file is not open when you pick up the file and try to gzip.


fuser -f

filename: o

the "o" above means the file is opened.
Rick Garland
Honored Contributor

Re: Concurrent gzips

You did mentioned that the files were big. If over 2 GB is size, the gzip utility provided with HPUX in not sufficient. Get the GNU gzip which will handle files over 2 GB in size.

If not using largefiles for your filesystems, no worries.
Rory R Hammond
Trusted Contributor

Re: Concurrent gzips

Russell,

Setting the jobs off in the background will be helpfull.

All of the solutions shown pose a problem. If all the backup are on the same disk. Concurrency might not buy you as much as you like because of disk I/O contention. You didn't mention your DATABASE type and backup method.

if your are using cpio you could try something like
(cd /database/data
find . -print |cpio -o |gzip > /archive/data.out.) &
for each file system. contention rules still apply.


If you backup directily to a tape drive, On most systems compression is implied. Compressing a backup file and then putting it to tape would not buy you anything but overhead.

Don't forget that you can exceed bus bandwidth when backuping up to multiple tape drives. Hardware layout and design is important.

If your are doing dbexport from oracle. You can do the following:
mknod /dev/oracle_pipe p
exp scott/tiger buffer=1000000 file=/dev/oracle_pipe full=y grants=y log=/tmp/oracle_dmp.log
| gzip < /dev/oracle_pipe > /archive/oracle_dmp.gz


Rory

There are a 100 ways to do things and 97 of them are right
Jack Werner
Frequent Advisor

Re: Concurrent gzips

Russel,

We use "compress" to shrink our oracle backup .dbf files. Since all instances' .dbf files are backed up to the same filesystem, we do the compress immediately after the .dbf files are created for each instance. Our goal is not to fill up the backup filesystem (or make it larger than it needs to be). I changed our backup script to also do a full export of every instance on the host and then compress the export file. Our backup strategy consists of 1) a cold backup (weekly) and 2) hot backups (daily) that copy the native .dbf files to the backup filesystem. Databases are shut down for(and restarted after) the weekly cold backup.
Jack
i'm retired