Operating System - OpenVMS
1752570 Members
5341 Online
108788 Solutions
New Discussion юеВ

Re: Compression of very big files

 
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

I can confirm that the compress result is the best with gzip (almost 10% better than zip/lev=5, maybe level 9 could beat gzip).

But cpu is the problem. So a nogo.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

BTW : the cpu friendly /level=1 results in a 20 - 25 % bigger file than gzip. But since the result is still about 1/5th of the original file, it has not much importance.

Wim
Wim
Vladimir Fabecic
Honored Contributor

Re: Compression of very big files

Stupid, but usefull idea:
Change files to fix,512 ; export directory via NFS, mount it on linux machine and do BZIP2. BZIP2 does not use much I/O, and you spend extra cpu time, not cpu of VMS machine.
In vino veritas, in VMS cluster
Steven Schweda
Honored Contributor

Re: Compression of very big files

What's "very big"? 250MB? (Be careful not
to confuse this with "large", which tends to
imply "bigger than 2GB".) (("That's not a
knife [...]"))

Zip 2.31 has some VMS-specific I/O
improvements, but I would not expect it to
differ much in CPU time from Zip 2.3. It
might save some _real_ time, however. For a
variety of reasons, I would not use anything
older than Zip 2.31 (or UnZip 5.52).

I didn't do anything to the bzip2 code to
accomodate any non-UNIX-like file/record
formats, so, as it says on the Web page:
--------
BZIP2 is a UNIX-oriented utility, and as
such, it has little hope of dealing well
with RMS files whose record format is
anything other than Stream_LF.

For a more versatile compressor-archiver
with greater RMS capability, see the
Info-ZIP Home Page.
--------

I suppose that it should say "or fixed-512",
too, but the program is not expecting to
deal with RMS records of any type, and I have
no plans to change this. (Jump right in, if
you wish.) The release notes describe the
difference between my bzip2 1.0.3a and
1.0.3b, and it's I/O-related, not
CPU-related.

When Zip 3.0 arrives (hold your breath), it
is expected to offer bzip2 compression
(optional, instead of the default "deflate"
method) in a Zip archive, but this is not in
the latest beta kit (3.0e). (It'll be using
an external bzip2 object library, so you'll
need something like my bzip2 kit to enable
the feature. Similar for UnZip 6.0, of
course.)

I haven't ever tried it, but it should be
possible to build [Un]Zip with some fancy C
compiler options, like /ARCHITECTURE and
/OPTIMIZE=TUNE, which might help the
CPU-bound parts. In Zip 2.x, you'd probably
need to edit the builder to do this. (In Zip
3.x, it can be done from the command line.)
Test results from some adventurous user would
be received with interest.
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

It would be nice if zip had a "very light" mode.

THe file I mentioned is just a sample. But we don't go over 2 GB (stay just under it).

Not going to upgrade to win a few %. And need to be 6.2 compatible.

Wim
Wim
Steven Schweda
Honored Contributor

Re: Compression of very big files

> It would be nice if zip had a "very light"
> mode.

You mean less compression than "-1"? There
_is_ "-0", but that (no compression) may be
less than you'd like. I don't recall a lot
of demand for this, but I could ask around.

> Not going to upgrade to win a few %.

The I/O improvements since 2.3/5.51 are
pretty big, in my opinion. I can't say how
much they would help you. (Personally, I
like the more non-VMS-compatible "-V"
archives, and command-line case preservation
(non-VAX), too.)

> And need to be 6.2 compatible.

Was that VAX or Alpha?

We're (I'm) still testing as far back as VMS
V5.4 (VAX). I have V6.2 (VAX) on a system
disk here, but I can't remember if I've tried
it lately. (Someone else may test it on
something even older.) Be sure to complain
if you have any problems.
Thomas Ritter
Respected Contributor

Re: Compression of very big files

Wim, dare I ask why CPU consumption is a problem ? We perform lots of zipping using gzip but those activities are scheduled outside of business hours. Sure CPU is high, but so what. We also use RAM disk techniques which see the CPUs running sustained 100% for hours.

Years ago I worked at a site, which had an all integrated chargeout system tied into all processing. Users would be billed based on CPU I/O and other items. System utilization cost users real money and really encourage good IT.



Hein van den Heuvel
Honored Contributor

Re: Compression of very big files

>> If I change the file to fix,512 it works (bug or feature ?). rfm=stm caused record too large for users' buffer.

Hi Wim,

I think we have been here before, but I'll repeat it none the less in defense of the xyz-ZIPs and/or any other non-sybase tool which may croak on those files.

A file with attributes rfm=stm is expected to have CR/LF as record terminators and can silently ignore leading binary zeroes in records. Hardly a 'flexible' format for supposedly binary files.

If a file does not have those attributes, yet is labelled as such, then applications can and will fall over. Rightly so!

Labelling the file RFM=FIX, MRS=512 is likely to be much more appropriate and 'benign' for most applications.

IMHO those binary files should really be labelled RFM=UDF, but unfortunately that upsets some standard tools.

Met vriendelijke groetjes,

fwiw,
Hein.


http://h71000.www7.hp.com/doc/731FINAL/4523/4523pro_007.html#rms_record_format_field
"FAB$C_STM
Indicates stream record format. Records are delimited by FF, VT, LF, or CR LF, and all leading zeros are ignored. This format applies to sequential files only and cannot be used with the block spanning option. "
Wim Van den Wyngaert
Honored Contributor

Re: Compression of very big files

Thomas,

CPU is a problem because we have a lot to compress. And have no real non-business hours. We have 1 continent live all the time while the other continents are doing dumps, compress, etc.
To be more correct : cpu is not the problem, wall time is.

Wim
Wim
Steven Schweda
Honored Contributor

Re: Compression of very big files

> [...] cpu is not the problem, wall time is.

If so, I'd definitely look at Zip 2.31, as
its I/O improvements may actually help.

With some SET RMS_DEFAULT action, you can
help Zip 2.3, but you need 2.31 to get the
SQO bit set to make highwater marking less
painful, and to avoid _copying_ the temporary
output file, if your (output) archive is on
a different disk from your current default
device+directory. (An explicit "-b" option
can work around that one in 2.3.)

Have I mentioned that I think that Zip 2.31
is generally better than 2.3?