topic Concurrent gzips in Operating System - HP-UX

Concurrent gzips

Russell Gould — Tue, 11 Feb 2003 11:27:08 GMT

Hi

I'm having difficulty developing a script which ought to be fairly simple I'd guess (if I knew what I was doing !!)

Basically, currently we perform a disk backup of a database whch generates some huge files and then we do gzip * which only runs one instance of gzip and our box has 6 processors. We therefore have 5 idle'ish processors.

What we want to do is have a script that runs after the disk backup has completed which runs 3 concurrent gzips until such time that all files have been compressed in a particular directory
I am hitting two issues though with my script !

Firstly, because the files are so big, if 3 gzips are running, my script tries to pick up a file which is currently being compressed (because gzip creates a new file and leaves the old one there while it's compressing) and errors saying 'do you want to overwrite the .gz file'

Because of this, I tried checking the status of the file using fuser filename | wc -w to ascertain whether the file was being used or not, but this doesn't seem to work either.

I'm a bit stuck.

Please can someone help

Thanks

Russell

Re: Concurrent gzips

Tom Jackson — Tue, 11 Feb 2003 11:54:59 GMT

Hi Russ:

What if you first determined the files that you want to zip, then do the gzip:

for i in `ls /source_dir/*.ext`
do
/usr/contrib/bin/gzip -c /source_dir/${i} > /dest_dir/${i}.gz
done

Tom

Re: Concurrent gzips

Stanimir — Tue, 11 Feb 2003 12:03:20 GMT

Hi!
As far as understand you could use
3 concurent gzip's and create different files
in your backup directory :
#gzip -c backupfile1 > //bkp1.gz
#gzip -c backupfile2 > //bkp2.gz
#gzip -c backupfile3 > //bkp3.gz

-c is used for redirect output of gzip-command
to different files.

If you wish, you can rename after that bkp*.gz
files with "mv".

You can also increase the backup speed
using: "gzip -1 ...".

Regards.

Re: Concurrent gzips

Robin Wakefield — Tue, 11 Feb 2003 12:18:09 GMT

Hi Russell,

This will do 3 at a time, picking up the next 3 when the 1st 3 have finished, etc...

==========================
#!/usr/bin/ksh
NUM=3
cd yourdirectory
files=`echo *`
echo $files
set -- $files
while test $# -gt 0 ; do
for i in $1 $2 $3 ; do
echo Zipping $i...
gzip $i &
done
wait
sleep 1
if test $# -gt $NUM ; then
shift $NUM
else
shift $#
fi
done
=============================

rgds, Robin

Re: Concurrent gzips

Shannon Petry — Tue, 11 Feb 2003 13:47:11 GMT

The trick to your dilemma is to run gzip in the background, so you can get multiple instances.

If you do like this...

for FILE in * ; do
gzip $FILE
done

The commands in the loop must finish before the loop continues.

In unix, there is a nice way to tell your shell to run something background, which is indicated by ending the command with a "&" or ampersand character.

for FILE in * ; do
gzip $FILE &
done

Now, the first gzip will go to the background of the shell, and then the loop continues. This of course will gzip everything in a directory, so use caution where/what you do this with.

gzip is not multi-threaded, so you will not see a benefit of a multi-cpu machine on a single gzip. I thought that a gnu-zip multithreaded version was in the works at some point, not sure the status though. Check the gnu home page for updated information.

Similarly, pkzip has a HP-UX (all unices for that matter) version which may be multi-threaded. Again, check the pkware home page for better info.

Regards,
Shannon

Re: Concurrent gzips

Wilfred Chau_1 — Tue, 11 Feb 2003 22:14:11 GMT

try use fuser and make sure the file is not open when you pick up the file and try to gzip.

fuser -f

filename: o

the "o" above means the file is opened.

Re: Concurrent gzips

Rick Garland — Tue, 11 Feb 2003 22:26:06 GMT

You did mentioned that the files were big. If over 2 GB is size, the gzip utility provided with HPUX in not sufficient. Get the GNU gzip which will handle files over 2 GB in size.

If not using largefiles for your filesystems, no worries.

Re: Concurrent gzips

Rory R Hammond — Wed, 12 Feb 2003 18:47:30 GMT

Russell,

Setting the jobs off in the background will be helpfull.

All of the solutions shown pose a problem. If all the backup are on the same disk. Concurrency might not buy you as much as you like because of disk I/O contention. You didn't mention your DATABASE type and backup method.

if your are using cpio you could try something like
(cd /database/data
find . -print |cpio -o |gzip > /archive/data.out.) &
for each file system. contention rules still apply.

If you backup directily to a tape drive, On most systems compression is implied. Compressing a backup file and then putting it to tape would not buy you anything but overhead.

Don't forget that you can exceed bus bandwidth when backuping up to multiple tape drives. Hardware layout and design is important.

If your are doing dbexport from oracle. You can do the following:
mknod /dev/oracle_pipe p
exp scott/tiger buffer=1000000 file=/dev/oracle_pipe full=y grants=y log=/tmp/oracle_dmp.log
| gzip < /dev/oracle_pipe > /archive/oracle_dmp.gz

Rory

Re: Concurrent gzips

Jack Werner — Wed, 12 Feb 2003 21:37:20 GMT

Russel,

We use "compress" to shrink our oracle backup .dbf files. Since all instances' .dbf files are backed up to the same filesystem, we do the compress immediately after the .dbf files are created for each instance. Our goal is not to fill up the backup filesystem (or make it larger than it needs to be). I changed our backup script to also do a full export of every instance on the host and then compress the export file. Our backup strategy consists of 1) a cold backup (weekly) and 2) hot backups (daily) that copy the native .dbf files to the backup filesystem. Databases are shut down for(and restarted after) the weekly cold backup.
Jack