Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Compression of very big files

Wim Van den Wyngaert
Honored Contributor

Compression of very big files

We currently keep Sybase database dumps on disk in zip archives. The zip archive is about 20% of the size of the dumps.

The zip however consumes lots of cpu (even with /level=1 about 60sec/250 MB).

Anyone a solution to compress with (a lot) less cpu consumption ?

Wim
Wim
25 REPLIES
Ian Miller.
Honored Contributor

Re: Compression of very big files

Which version of zip?
____________________
Purely Personal Opinion
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

2.1
Wim
Karl Rohwedder
Honored Contributor

Re: Compression of very big files

ZIP V2.31 is current.

We ZIP RDB backupfiles and I found, that BZIP2 compresses better and uses less resources (no hard data available at the moment).

regards Kalle
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

Correction 2.3.
Wim
Ian Miller.
Honored Contributor

Re: Compression of very big files

there is a beta version of a later zip out there somewhere or bzip2 is on the freeware
http://h71000.www7.hp.com/freeware/freeware70/000tools/alpha_images/bzip2.exe
____________________
Purely Personal Opinion
Karl Rohwedder
Honored Contributor

Re: Compression of very big files

Note that BZIP2 on the freeware is V1.0.1, whereas the version I use is 1.0.3a.
On the site http://antinode.org/dec/sw/bzip2.html
is a version 1.0.3b already, but I havn't used this one.

regards Kalle
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

Just tried 1.0.2.on a variable rec dumpof Sybase.

rms-f-irc illegal record encountered; vbn or record number = !ul
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

If I change the file to fix,512 it works (bug or feature ?). rfm=stm caused record too large for users' buffer.

My reference file of 320 MB is compressed in 518 cpu secs while it takes 81 secs with zip/level=1 (178 without /level which equals =5).

So, not the thing I was hoping for or a problem ?

Wim
Wim
Karl Rohwedder
Honored Contributor

Re: Compression of very big files

Just gave it a try (using a small RDB backup file). BZIP2 uses considerably more CPU but less IO and produces a far better result in regards of filesize (see attached textfile).
I used the 1.0.3b version.

regards Kalle
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

I can confirm that the compress result is the best with gzip (almost 10% better than zip/lev=5, maybe level 9 could beat gzip).

But cpu is the problem. So a nogo.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

BTW : the cpu friendly /level=1 results in a 20 - 25 % bigger file than gzip. But since the result is still about 1/5th of the original file, it has not much importance.

Wim
Wim
Vladimir Fabecic
Honored Contributor

Re: Compression of very big files

Stupid, but usefull idea:
Change files to fix,512 ; export directory via NFS, mount it on linux machine and do BZIP2. BZIP2 does not use much I/O, and you spend extra cpu time, not cpu of VMS machine.
In vino veritas, in VMS cluster
Steven Schweda
Honored Contributor

Re: Compression of very big files

What's "very big"? 250MB? (Be careful not
to confuse this with "large", which tends to
imply "bigger than 2GB".) (("That's not a
knife [...]"))

Zip 2.31 has some VMS-specific I/O
improvements, but I would not expect it to
differ much in CPU time from Zip 2.3. It
might save some _real_ time, however. For a
variety of reasons, I would not use anything
older than Zip 2.31 (or UnZip 5.52).

I didn't do anything to the bzip2 code to
accomodate any non-UNIX-like file/record
formats, so, as it says on the Web page:
--------
BZIP2 is a UNIX-oriented utility, and as
such, it has little hope of dealing well
with RMS files whose record format is
anything other than Stream_LF.

For a more versatile compressor-archiver
with greater RMS capability, see the
Info-ZIP Home Page.
--------

I suppose that it should say "or fixed-512",
too, but the program is not expecting to
deal with RMS records of any type, and I have
no plans to change this. (Jump right in, if
you wish.) The release notes describe the
difference between my bzip2 1.0.3a and
1.0.3b, and it's I/O-related, not
CPU-related.

When Zip 3.0 arrives (hold your breath), it
is expected to offer bzip2 compression
(optional, instead of the default "deflate"
method) in a Zip archive, but this is not in
the latest beta kit (3.0e). (It'll be using
an external bzip2 object library, so you'll
need something like my bzip2 kit to enable
the feature. Similar for UnZip 6.0, of
course.)

I haven't ever tried it, but it should be
possible to build [Un]Zip with some fancy C
compiler options, like /ARCHITECTURE and
/OPTIMIZE=TUNE, which might help the
CPU-bound parts. In Zip 2.x, you'd probably
need to edit the builder to do this. (In Zip
3.x, it can be done from the command line.)
Test results from some adventurous user would
be received with interest.
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

It would be nice if zip had a "very light" mode.

THe file I mentioned is just a sample. But we don't go over 2 GB (stay just under it).

Not going to upgrade to win a few %. And need to be 6.2 compatible.

Wim
Wim
Steven Schweda
Honored Contributor

Re: Compression of very big files

> It would be nice if zip had a "very light"
> mode.

You mean less compression than "-1"? There
_is_ "-0", but that (no compression) may be
less than you'd like. I don't recall a lot
of demand for this, but I could ask around.

> Not going to upgrade to win a few %.

The I/O improvements since 2.3/5.51 are
pretty big, in my opinion. I can't say how
much they would help you. (Personally, I
like the more non-VMS-compatible "-V"
archives, and command-line case preservation
(non-VAX), too.)

> And need to be 6.2 compatible.

Was that VAX or Alpha?

We're (I'm) still testing as far back as VMS
V5.4 (VAX). I have V6.2 (VAX) on a system
disk here, but I can't remember if I've tried
it lately. (Someone else may test it on
something even older.) Be sure to complain
if you have any problems.
Thomas Ritter
Respected Contributor

Re: Compression of very big files

Wim, dare I ask why CPU consumption is a problem ? We perform lots of zipping using gzip but those activities are scheduled outside of business hours. Sure CPU is high, but so what. We also use RAM disk techniques which see the CPUs running sustained 100% for hours.

Years ago I worked at a site, which had an all integrated chargeout system tied into all processing. Users would be billed based on CPU I/O and other items. System utilization cost users real money and really encourage good IT.



Hein van den Heuvel
Honored Contributor

Re: Compression of very big files

>> If I change the file to fix,512 it works (bug or feature ?). rfm=stm caused record too large for users' buffer.

Hi Wim,

I think we have been here before, but I'll repeat it none the less in defense of the xyz-ZIPs and/or any other non-sybase tool which may croak on those files.

A file with attributes rfm=stm is expected to have CR/LF as record terminators and can silently ignore leading binary zeroes in records. Hardly a 'flexible' format for supposedly binary files.

If a file does not have those attributes, yet is labelled as such, then applications can and will fall over. Rightly so!

Labelling the file RFM=FIX, MRS=512 is likely to be much more appropriate and 'benign' for most applications.

IMHO those binary files should really be labelled RFM=UDF, but unfortunately that upsets some standard tools.

Met vriendelijke groetjes,

fwiw,
Hein.


http://h71000.www7.hp.com/doc/731FINAL/4523/4523pro_007.html#rms_record_format_field
"FAB$C_STM
Indicates stream record format. Records are delimited by FF, VT, LF, or CR LF, and all leading zeros are ignored. This format applies to sequential files only and cannot be used with the block spanning option. "
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

Thomas,

CPU is a problem because we have a lot to compress. And have no real non-business hours. We have 1 continent live all the time while the other continents are doing dumps, compress, etc.
To be more correct : cpu is not the problem, wall time is.

Wim
Wim
Steven Schweda
Honored Contributor

Re: Compression of very big files

> [...] cpu is not the problem, wall time is.

If so, I'd definitely look at Zip 2.31, as
its I/O improvements may actually help.

With some SET RMS_DEFAULT action, you can
help Zip 2.3, but you need 2.31 to get the
SQO bit set to make highwater marking less
painful, and to avoid _copying_ the temporary
output file, if your (output) archive is on
a different disk from your current default
device+directory. (An explicit "-b" option
can work around that one in 2.3.)

Have I mentioned that I think that Zip 2.31
is generally better than 2.3?
Karl Rohwedder
Honored Contributor

Re: Compression of very big files

I tried ZIP V3.0E with different compiler options on a [small] RDB backup file (25MB) on a DS10/600MHz:

- standard compilation
- opt. for EV56
- opt. for EV67

See attached textfile for results.

ergards Kalle
John Abbott_2
Esteemed Contributor

Re: Compression of very big files

Just some background info.
We spent some time about 3 yrs ago and found that bzip2 used slightly more cpu than zip(2.1) but bzip2 was better in compressing files. All our files selected were of fixed length. For our large compressions we ended up running them with a $set proc/prio=1 in an attempt not to upset the active system too much. Sorry, I don't have the test results anymore, the jobs run on a EV68 system.

John.
Don't do what Donny Dont does
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

Test results (on a 4100).

Input 447 K(blocks)

Old 2.3 (level / cpu / Kblocks)
1 62 120
2 66 113
3 74 110
4 91 107
5 105 100
6 170 97

New 2.31
1 62 119
2 64 114
3 74 110
4 89 107
5 110 100
6 165 98

Compared new with old for the 6 zips :
The wall time for 6 zips was 40 seconds lower
Total cpu time was 6 seconds lower.
*** Direct IO was almost cut in half ***

BTW : we found level 2 to be best buy.


Wim
Wim
Steven Schweda
Honored Contributor

Re: Compression of very big files

> Test results (on a 4100).

With the programs built how? (Any fancy
compiler/linker options?)

> *** Direct IO was almost cut in half ***

As I said, it has some I/O improvements.
(But SET RMS_DEFAULT can help the older
version considerably.)

> The wall time for 6 zips was 40 seconds lower

Out of what? 50s v. 40s, or 10240s v. 10200s?

It's good to be careful about file caching
when running this kind of test. Otherwise,
the second program to run has an advantage.

Everything's complicated.
Steven Schweda
Honored Contributor

Re: Compression of very big files

Oops. That should have been "50s v. 10s, or
10240s v. 10200s?".

Sigh. ("Preview >>" doesn't help much
unless one's brain is actually functional.)