Re: ZIP performance

Steven Schweda · ‎09-04-2009

> It would be somewhat interesting to know
> what 'zip' does in this 'scanning' phase,
> [...]

Perhaps, perhaps not. All I saw on a quick
look was a $PARSE (_not_ SYNTAX_ONLY), but
there could be a non-VMS-specific stat() (or
something) somewhere else.

Hein van den Heuvel · ‎09-04-2009

I realize this is a bit excessive, for a problem which is fixed in the next release, but I do have a workaround/solution for Art, or anyone else ending in this predicament.

Once you have this mess you can very quickly create a parallel directory structure with chunks of the large directory just using RMS and NOT using SET FILE commands.
Define a search list to point to the fragment and run.

When done, you can blow away the old large, or smaller new directories with
$ SET FILE/NODIR large.DIR.
$ SET PROT large.DIR.
$ DEL large.DIR.
That will take NO time.

Proof...

GEIN $ defi art SYS$DISK:[.ART]
GEIN $ mcr sys$login:tmp 280
ELAPSED: 0 00:00:16.21 CPU: 0:00:00.96 BUFIO: 301 DIRIO: 2078
ELAPSED: 0 00:00:15.57 CPU: 0:00:00.83 BUFIO: 300 DIRIO: 1990

GEIN $ @split
subdirectory 0
subdirectory 1
subdirectory 2
28001 files

( ! runs in seconds !! )

GEIN $ defi art SYS$DISK:[.ART0],[.ART1],[.ART2]
GEIN $ mcr sys$login:tmp 280
ELAPSED: 0 00:00:01.50 CPU: 0:00:00.19 BUFIO: 392 DIRIO: 223
ELAPSED: 0 00:00:01.44 CPU: 0:00:00.14 BUFIO: 392 DIRIO: 98

5 x better... and would be much better still with larger 'old' directories.

You'll find the procedure I used for the split below. Test program, for just 100 files opens as posted before.

Note: I had to pre-allocate the sub-directories not just for the obvious speed improvement, but also to prevent errors:
"%RMS-E-EXT, ACP file extend failed
-SYSTEM-F-BADPARAM, bad parameter value"

Art, you owe me an other beer.

Others, If you were interested enough to read this far, you also owe me one :-)

Hein.

$open /read old art.dir
$i = 0
$ list = "sys$disk:"
$loop:
$sub = i / 10000
$if i.eq. (sub*10000)
$then
$ write sys$output "subdirectory ", sub
$ subdir = "[.art''sub']"
$ cre/dir/allo=1000 'subdir
$ list = list + "," + subdir
$ close/nolog new
$ open/append new art'sub'.dir
$endif
$read/end=done old rec
$write new rec
$i = i + 1
$goto loop
$done:
$close old
$close new
$write sys$output i, " files"
$list = list - ","
$define art 'list
$show log art

Jim_McKinney · ‎09-04-2009

<< Once you have this mess you can very quickly create a parallel directory structure with chunks of the large directory just using RMS and NOT using SET FILE commands.
Define a search list to point to the fragment and run.

When done, you can blow away the old large, or smaller new directories with
$ SET FILE/NODIR large.DIR.
$ SET PROT large.DIR.
$ DEL large.DIR.
That will take NO time. >>

I presume that the intent would be to choose one or the other, large or new smaller directories (but not both), and use this strategy. With the new smaller directories I see no problem. However, if one was to do this to the original large directory you'd still not recover the space allocated to the files nor the file headers in the INDEXF or allocations from the bitmap. To do so you'd need to follow up with an ANALYZE/DISK/REPAIR, in order to complete the work that DELETE would have done, correct?

And I suppose if one chose to use this strategy on the old large directory, rather than the new smaller ones, the INDEXF header records associated with the resident files would then all have misdirected backlink pointers as that directory is now gone.

Jan van den Ende · ‎09-04-2009

Well,

nice reading so far.

Given that hardware changeover will still take some time, the one thing that can be done to make the current config start behaving better is.. implementing SEARCH LISTS.

Art,

it would be trivial for you to determine the time for creating a 'reasonable' amount of new files in this dir. ( say, < 5000 (Hein, agree?))

For the sake of example, let us pick 3500 files a period ( week, or day, or month, whatever)

Create a search list logical name as your problem directory.

Every 'period' create a fresh directory, and add that as the first entry to the search list.
New files will appear there.
Of course, for your ZIPping, pick only one (the last, probably) from the search list.
After it has been emptied, remove it from the search list.

Any action on the 'directory' (search list) will function as always, but new file creation will be significantly faster, and 'whole directory processing' for just one out of the search list will be practically uncomparable ( n-square ! )

hth

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Andy Bustamante · ‎09-04-2009

Prior post correction.

>>>Create your Zip archives against the new directory, 5,000 or so files at time with the delete option.

Should have read
...Create your Zip archives against the new directory, 5,000 or so files at time withOUT the delete option.

Andy

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net

Jan van den Ende · ‎09-05-2009

Art,

just a minor addendum:

the new directory of the search list does not need to be on the same physical device. So, if anyway possible, choose a different one! That spreads head movement contention over different spindles. And IF you can INIT a drive with clustersize 16 (or at least any power of 2) and start using that, then you are in for additional performance benefits and disk space savings.

And DO try to put pressure on speeding up the move to the ES system!
(even there, the above advises are still valid!)

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Art Wiens · ‎09-09-2009

As usual, Hein crafted a superb solution! ZIPing 10,000 files at a time is WAY more efficient!

per 10,000:

Accounting information:
Buffered I/O count: 120206 Peak working set size: 17216
Direct I/O count: 42474 Peak virtual size: 182672
Page faults: 1760 Mounted volumes: 0
Images activated: 5
Elapsed CPU time: 0 00:03:40.27
Connect time: 0 00:04:43.92

Once this is done, I will, as also suggested, set the subdirectories /NODIR and delete them, and then use DFU to delete the remaining large directory. Is there anything else required after DFU to fully clean up? SET VOL/REBUILD or ANALYZE/DISK/REPAIR needed?

Thanks again Hein!

With regard to the future of this situation, I think I will try and convince the programmer to go with ZIP archives all the way. ie. produce an invoice, print it and ZIP_CLI/MOVE it immediately. There's no need to have thousands of files lying around.

Cheers,
Art

Hein van den Heuvel · ‎09-09-2009

>> ZIPing 10,000 files at a time is WAY more efficient!

Once the core problem is understood, it becomes easy. ZIP itself was almost irrelevant. :-).

>> Once this is done, I will, as also suggested, set the subdirectories /NODIR and delete them, and then use DFU to delete the remaining large directory.

Turn that around... in case something goes wrong. The multiple small directories combined are equivalent to the large one.
Blow away the large one, nicely delete the smaller ones, one at a time.

>> Is there anything else required after DFU to fully clean up?

No. Shouldn't be.

>> convince the programmer to go with ZIP archives all the way. ie. produce an invoice, print it and ZIP_CLI/MOVE it immediately.

Excellent!

Enjoy,
Hein.

Willem Grooters · ‎09-09-2009

>> convince the programmer to go with ZIP archives all the way. ie. produce an invoice, print it and ZIP_CLI/MOVE it immediately.

one at a time? That means (AFAIK) that ZIP has to
1. Determine end of archive
2. Add this file
3. Update the index
4. Write the new file
5. Remove the imported file

Stap 1 will take some more time with every update. and a lot of small amounts make it a large one. What would happen if a second ZIP-CLI/MOVE comes along?
I think it's safer (and more effecient) to move the created file to a separate directory, and ZIP the whole directory and delete it, after a certain period or number of files.

Willem Grooters
OpenVMS Developer & System Manager

Art Wiens · ‎09-10-2009

You're probably right ... a one for one action is most likely not the most efficient way, but I think certainly at the end of an invoice run ... print 'em, ZIP 'em, delete 'em.

Cheers,
Art

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: ZIP performance