1748051 Members
5099 Online
108757 Solutions
New Discussion юеВ

ZIP performance

 
SOLVED
Go to solution
Art Wiens
Respected Contributor

ZIP performance

We have an "out of control"/badly thought out process which has been producing thousands (many 10's of thousands!) of invoice files (relatively small files 10 - 20 blocks on a disk with a large cluster size) for quite some time. I have been using ZIP_CLI (Zip 2.32) with a /BATCH=listoffiles.txt method to clean them up. It takes a phenomenal amount of time and resources (Alpha 800 5/500 512MB VMS v7.2-2). For example this was for ~80,000 files:

Accounting information:
Buffered I/O count: 1315155 Peak working set size: 82640
Direct I/O count: 49565437 Peak virtual size: 247648
Page faults: 6110 Mounted volumes: 0
Charged CPU time: 0 11:34:52.42 Elapsed time: 1 05:31:52.67

The actual command used was:

$ zip_cli 2007_1.zip /batch=2007_TOZIP.LIS /move/keep/vms/nofull_path

Is there any performance advantage instead of using a /BATCH list file to using wildcards for the input and using /SINCE and /BEFORE to select the input? Or is there any way in general to do this more efficiently / expeditiously?

Steven, I hope I have provided enough details to help your psychic powers along. ;-)

Cheers,
Art
40 REPLIES 40
Robert Gezelter
Honored Contributor

Re: ZIP performance

Art,

I presume that the intent is to ZIP and remove the files in one operation.

Are you sure that the problem is ZIP and not the reorganization of the directories as the files are removed? Along a similar vein, have you tried different orderings of the files in the "listoffiles" (Hint: Inverse ordering per the discussions about delete optimization).

What is the speed of the ZIP without the /MOVE?

- Bob Gezelter, http://www.rlgsc.com

Art Wiens
Respected Contributor

Re: ZIP performance

Of the 29 hours involved in this example, I'ld say ~26 hours were related to ZIP and about 3 hours to removing them. No, I haven't tried anything other than a natural list provided by DIR/COL=1 and removing the HEADER and TRAILER info. In the past I have used a list with DIR/NOHEAD/NOTRAIL ... doesn't seem to make any difference ie. whether I'm in that directory or not.

The ZIP process seems to take a lot of time up front, about midway through the ordeal it actually creates a temporary ZIP file Zxxxxxxx and starts adding files to it.

Art
Art Wiens
Respected Contributor

Re: ZIP performance

How would I produce an inverse order list of files?
Robert Gezelter
Honored Contributor

Re: ZIP performance

Art,

I was referring to the order of processing files, not which directory you (or the ZIP file are in).

My thinking was that the problem may not be ZIP, and may be related to the traversals and re-structuring of the directory during the deletes.

For control purposes (although I admittedly am reluctant to suggest it), it would be useful to know the performance of a simple wildcard COPY *.* NL: and a DELETE *.*. If these take similar times, the problem is in the size of the directory, not ZIP.

- Bob Gezelter, http://www.rlgsc.com
Robert Gezelter
Honored Contributor

Re: ZIP performance

Art,

To produce an inverse ordered list of file (within a directory), produce the list using DIRECTORY/BRIEF/NOHEAD/NOTAIL and use SORT to sort in DESCENDING order (HELP SORT).

- Bob Gezelter, http://www.rlgsc.com
Art Wiens
Respected Contributor

Re: ZIP performance

"use SORT to sort in DESCENDING order "

Duh! Sorry, I'm still worn out after 29 hours of ZIP'ing ;-)

I have more to do, I'll give that a whirl.

Cheers,
Art
Steven Schweda
Honored Contributor

Re: ZIP performance

> [...] ZIP_CLI (Zip 2.32) [...]

So someone really does use that CLI stuff.
Normally, I'd suggest trying the current
released version, 3.0, but I doubt that it
would help here. (Might be fun, though.)

> [...] /nofull_path

Are all the files in one directory? With
/move telling it to delete the files, if
they're all in one place, then the delete
operation itself might be very slow (and out
of my hands). A test without /move might be
informative. I'd need to think/look, but if
it deletes the files in the same order as it
adds them to the archive, then a
reverse-sorted list might help (less
reshuffling).

> Is there any performance advantage instead
> of using a /BATCH list file to using
> wildcards for the input and using /SINCE
> and /BEFORE to select the input?

Knowing nothing, I wouldn't expect it to
matter. (Only one way to find out...)
Using the list does give you control over
the order, so if that matters, then the list
might be better.

> [...] enough details [...]

It's a good start, but without some
profiling, it's hard to guess where it's
spending its time.

Adding /COMPRESSION = STORE ("-0") would
eliminate any CPU time spent doing
compression, but I doubt that that's the big
problem.

Ah. I'm submitting too slowly.

> The ZIP process seems to take a lot of
> time up front, about midway through the
> ordeal it actually creates a temporary ZIP
> file Zxxxxxxx and starts adding files to
> it.

It does do some research on the files to be
archived before it starts to work. I thought
that it was mostly checking for existence,
but there might be more to it.
Robert Gezelter
Honored Contributor

Re: ZIP performance

Steven,

The processing of a directory holding 80,000 files implies a directory well in excess of the XQP caches.

I did not ask Art, but another thing that comes to mind is ensuring that the XFC is enabled and sized appropriately to the task at hand.

Just the directory searches on a directory of that size would likely be expensive.

- Bob Gezelter, http://www.rlgsc.com
Art Wiens
Respected Contributor

Re: ZIP performance

$ show mem/cache/full
System Memory Resources on 3-SEP-2009 10:57:55.77

Virtual I/O Cache
Total Size (Kbytes) 3200 Read IO Count 1003983095
Free Kbytes 0 Read Hit Count 176445148
Kbytes in Use 3200 Read Hit Rate 17%
Write IO Bypassing Cache 23261386 Write IO Count 38318456
Files Retained 99 Read IO Bypassing Cache 275267547

$ show sys/noproc
OpenVMS V7.2-2 on node xxxxxx 3-SEP-2009 10:58:17.64 Uptime 85 21:26:23

As I said, there's only 512MB memory in this box.

Art