1819794 Members
3374 Online
109607 Solutions
New Discussion юеВ

ZIP limitation?

 
SOLVED
Go to solution
Art Wiens
Respected Contributor

ZIP limitation?

I am using Zip 2.3 (November 29th 1999) on Alpha v7.2-2 . I have been busy zipping unmaintained directories chock full of 3 - 10 block text files. Several directories have had 10,000+ files and went without issue. Next in line is a particular directory with over 45,000 files in it!!

I start ZIP and get no error message or indication that anything bad is happening ... process is doing "lots" of io, consuming cpu etc. but is now taking over 3 hours to even start adding files to a new zip file. It has no files open, no ZIP file created yet.

Is 45,000 too many files for ZIP?

Art
17 REPLIES 17
Arch_Muthiah
Honored Contributor

Re: ZIP limitation?

Please make sure your have the latest version of zip from the following HP link, where you can find all the latest ZIP, UNZIP packages.


http://h71000.www7.hp.com/openvms/freeware/freeware.html
Regards
Archie
Ian Miller.
Honored Contributor

Re: ZIP limitation?

Zip has a 2Gb file size limit - parhaps you are running into that?

Accessing a directory of that size is going to be slow whatever you are doing.
____________________
Purely Personal Opinion
Art Wiens
Respected Contributor

Re: ZIP limitation?

Archunan: The version I'm using is the same as what's on the Freeware v2.3 .

Ian: No it shouldn't be close to 2G, tons of files but they're all small.

It finally did "start" after about 4 hours. What's it doing during this initial period? Obviously it must be "reading" the source files, but I don't see any open files for this process other than the executable.

I guess if I had to deal with 45,000 things, I might take a bit of time upfront to plan what I was going to do first ;-)

Cheers,
Art
Ian McKerracher_1
Trusted Contributor

Re: ZIP limitation?

Hello Art,

Have a look at this link. It probably isn't the cause of your problem but it may be of interest.

http://www.techiegroups.com/archive/index.php/t-54389.html


Regards,

Ian

Art Wiens
Respected Contributor

Re: ZIP limitation?

Thanks Ian. I did notice that it's creating the temporary ZI*.* file in my local directory - once it gets going. In my case though, several hours went by before it even opened the ZI file. Another 1.5 hours to add all 45,763 files into it.

Art
Steven Schweda
Honored Contributor
Solution

Re: ZIP limitation?

> Please make sure your have the latest
> version of zip

This is good advice.

> from the following HP link, where you
> can find all the latest ZIP, UNZIP
> packages.

This is no better advice today than it was a
while ago. As I enjoy reminding people,
repeatedly in some cases, the latest
released versions are Zip 2.31 and UnZip
5.52, and the source kits are normally
available at or near:

http://www.info-zip.org/

Zip 2.31, for example, puts the "ZI*."
temporary file in the right place. If your
actual archive will be on a diferent device,
Zip 2.3 will need to copy the whole thing
after it's done, while Zip 2.31 will simply
(and quickly) rename it.

I wouldn't bet that a newer Zip would work
any faster on a large number of files, but
I'd be interested to learn whether it does.
I doubt that anyone has tried a test like
this, so there could easily be some
previously unnoticed slow code in there.

What's the Zip command used? There could,
for example, be a problem in VMS wildcard
processing. I assume that you're not
explicitly specifying all 45000 file names.

I've forgotten. Is VMS V7.2-2 too old for
the latest directory caching improvement?

It shouldn't be '"reading" the source files',
but it should be looking for them, as it
does need a list. If the directory look-ups
are slow, then Zip may be doomed to a
certain amount of undesirable sloth.

P.S. If you hit the 2GB limit, let me know.
Art Wiens
Respected Contributor

Re: ZIP limitation?

The command was:

$ zip "-Vwj" zipfile.zip $1$dga77:[dir.subdir]*.*;*

I was not in the source directory when I issued the command, the source directory contained 45,763 files. The resulting zip file ended up being 458,398 blocks.

Art
Steven Schweda
Honored Contributor

Re: ZIP limitation?

Ok. It seems that VMS V7.2 was where the
directory cache improvement was made, so I
probably can't blame that. Of course, and I
quote:

[T]he OpenVMS Wizard cannot and does not
recommend storing large numbers of files
in a single (very large) directory.

http://h71000.www7.hp.com/wizard/wiz_5241.html

If you don't specify a non-default device
for the archive file itself
("dev:[dir]zipfile.zip"), then that
particular fix in Zip 2.31 won't affect you.
Other I/O speed improvements might, however,
so I'd still suggest using the newer version.
200MB is not 2GB, but it does take some
little while to write it. But in your case,
the prep time seems to swamp the I/O time.

When I get really bored, I may have a look at
the wildcard code to see if it does something
especially lame.

Wake me if things get any worse.
Steven Schweda
Honored Contributor

Re: ZIP limitation?

For a good time, I made a short command
procedure to create N similarly named files
("A_nnnnnn.dat", nnnnnn = "000001", ...) in
a directory. On an otherwise idle XP1000
(500MHz):
N = 10000, t = 0:22:41
N = 50000, t = 2:56:04

Looks non-linear to me.

Given that it takes three hours just to
create 50000 such small files, I'd bet that
doing any significant directory work on that
mess would be comparably slow, and thus that
there's not much hope of speeding up Zip in
this situation.

See also:

http://groups.google.com/group/comp.os.vms/browse_thread/thread/a57ade02d8aae46e
Willem Grooters
Honored Contributor

Re: ZIP limitation?

Art,

I'm not certain this applies to your problem:
If you do:

1. $ zip "-Vwj" zipfile.zip $1$dga77:[dir.subdir1]*.*;*
2. $ zip "-Vwj" zipfile.zip $1$dga77:[dir.subdir2]*.*;*

and so on, so you're adding files beyond an existing one. This takes time since the EXISTING data needs to be copied first, I don't know how efficient zip will do this. Uisng this, each addition will take more and more time.
Since you omit the directory information (-j option) it might be that each addition might mean that checks need to be done - and this requires scanning thealready stored information. Again, the more files you have in the archive, the longer it would take.

If next is a directory of 45000 files, you can expect low performance for several reasons: the size of the directory itself will add to the already heavy processing.

Consider to make a separate zip of each directory, and zip these all into one.

1. $ zip "-Vwj" zipfile1.zip $1$dga77:[dir.subdir1]*.*;*
2. $ zip "-Vwj" zipfile2.zip $1$dga77:[dir.subdir2]*.*;*

...
lastL
$ zip "-V" allzips.zip zipfile*.zip

Willem
Willem Grooters
OpenVMS Developer & System Manager
Peter Barkas
Regular Advisor

Re: ZIP limitation?

I have experience of the performance difference between the two approaches described by Willem.

I was zipping about 40-60 files totalling about 1.5gb or so. Doing them all at once was fine but incrementally adding each one separately was much much slower.
Art Wiens
Respected Contributor

Re: ZIP limitation?

Steven:

"[T]he OpenVMS Wizard cannot and does not
recommend storing large numbers of files
in a single (very large) directory."

I'll try and mention that to the long gone application programmers! ;-)

That's why we call them "legacy" systems.

Willem:

No that doesn't apply, I'm zipping individual directories to new zip archives.

Cheers,
Art
Steven Schweda
Honored Contributor

Re: ZIP limitation?

> I'll try and mention that [...]

Slap 'im around a little for me, too.

> That's why we call them "legacy" systems.

Some legacy.

I recall some suggestions in comp.os.vms a
while ago, involving dispersing the files
from the one over-full directory into
multiple (less over-full) directories, and
creating a search-list logical name which
points to the list of new directories.

New files are still a problem, as they keep
getting put into the first directory in the
list, but finding the old files could be
faster.

This assumes that the directory name is not
hard-coded into the "legacy", of course.

Thoroughly re-engineering an application
without actually changing it can be tricky.
Of course, if "unmaintained" means "never to
be looked at again", you may be happier just
suffering for a while with slow Zip.

I'm still pushing Zip 2.31, however, as it's
better, even though it won't solve your big
problem.
Art Wiens
Respected Contributor

Re: ZIP limitation?

>> I'll try and mention that [...]

> Slap 'im around a little for me, too.

If they were still around it might be a closed fist! :-O

>> That's why we call them "legacy" systems.

> Some legacy.

Perhaps "inheritance" is a better analogy, but grannie didn't like me too much!

Ah well, like a fellow I used to work with always said:

"It's all pensionable time."

I'll give v2.31 a whirl.

Cheers,
Art
Willem Grooters
Honored Contributor

Re: ZIP limitation?

Legacy/Inheritance, meaning "oldfashioned"?

On a VMS system, it will take time (obviously) to DIR a directory of 10.000 files.
On a Unix system, listing a directroy of that size will tell you the commandbuffer is full and show you nothing else.

What do you prefer?

Willem
Willem Grooters
OpenVMS Developer & System Manager
Steven Schweda
Honored Contributor

Re: ZIP limitation?

If you're foolish enouh to say "ls *", it may
fail. Do you think that a plain "ls" will
have any trouble? Have you tried it, or are
you just guessing?

I prefer less inane comments.
Art Wiens
Respected Contributor

Re: ZIP limitation?

Ok, I guess it's time to close this thread ;-)

VMS is a fine system. The way that some things were implimented by my predecessors could have been done a bit "cleaner" ... other things are quite clever and flexible.

Management keeps referring to it as a "sunsetting platform" except the sun is having a hard time rising on *nix, so I continue to try and patch holes and keep the boat afloat.

There is no "best" system/application out there.

Cheers,
Art