- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- ZIP performance
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2009 04:05 PM
09-03-2009 04:05 PM
Re: ZIP performance
with a reverse-sorted archive, and if UnZip
were used to restore those files, "
Doesn't really matter ... we might have to pull 1 to 5 invoices out of an archive once in a while. The whole thing would never need to be restored.
About spending any money on this old stuff ... it's a strange "political" environment here to say the least. An extra nickel will not be spent on this hardware. We spent a million bucks three years ago on ES47's but there's no money for support resources to "port the applications". I have recompiled and linked the apps ... it all works, but there is no time to "bless it". No power users available to do user acceptance testing. Very frustrating!
"We have not been given us any filename examples."
eg:
0020000.TXT;3 11/137 23-MAY-2007 14:09:24.69
0020000.TXT;2 11/137 22-MAY-2007 15:17:50.30
0020000.TXT;1 11/137 18-MAY-2007 13:19:23.16
0020001.TXT;3 11/137 23-MAY-2007 14:09:24.73
0020001.TXT;2 11/137 22-MAY-2007 15:17:50.34
0020001.TXT;1 11/137 18-MAY-2007 13:19:23.20
0020002.TXT;3 11/137 23-MAY-2007 14:09:24.77
0020002.TXT;2 11/137 22-MAY-2007 15:17:50.48
0020002.TXT;1 11/137 18-MAY-2007 13:19:23.23
0020003.TXT;3 11/137 23-MAY-2007 14:09:24.81
0020003.TXT;2 11/137 22-MAY-2007 15:17:50.53
0020003.TXT;1 11/137 18-MAY-2007 13:19:23.27
0020004.TXT;3 11/137 23-MAY-2007 14:09:24.85
0020004.TXT;2 11/137 22-MAY-2007 15:17:50.58
0020004.TXT;1 11/137 18-MAY-2007 13:19:23.31
The file names are a relatively sequential invoice number. There are some gaps ... not sure why.
And yes, some challenged individual ran billing three times in 5 days!! But just in May 07.
The reverse list may have made some improvement! 5.5 hours in and it has the temp ZIP file open already and has added ~13,000 files to it! "Flying" now!
Cheers,
Art
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2009 11:08 PM
09-03-2009 11:08 PM
Re: ZIP performance
Yeah,the algoritme is a simple shuffle up or down as needed when a block is filled, or emptied.
The shuffle is done in ACP_MAXREAD blocks at a time, but only shuffling 1 block up or down total.
Art,
depending on the exact distribution of file names in the first 2 dcharacters, for your order files the system has a threshold at 28,000 files.
The actual threshold is a directory size greater than 1440 blocks.
Got the proof.
I created a tine command file to populate a directory with SET FILE/ENTER commands.
And I created a tiny program (attached) to open 100 files in that directory incrementing an initial number by a speficied number.
Run that against a 5000 file directory as baseline:
GEIN $ mcr sys$login:tmp 50
ELAPSED: 0 00:00:02.10 CPU: 0:00:00.12 BUFIO: 302 DIRIO: 31
ELAPSED: 0 00:00:02.03 CPU: 0:00:00.12 BUFIO: 302 DIRIO: 14
Next 24000
GEIN $ mcr sys$login:tmp 240
ELAPSED: 0 00:00:02.17 CPU: 0:00:00.07 BUFIO: 302 DIRIO: 67
GEIN $ mcr sys$login:tmp 230
ELAPSED: 0 00:00:02.29 CPU: 0:00:00.25 BUFIO: 302 DIRIO: 64
no big changes
Now for 28000. ( 6 * 240 * 19 (files per dir block) ))
GEIN $ mcr sys$login:tmp 280
ELAPSED: 0 00:00:31.24 CPU: 0:00:00.99 BUFIO: 302 DIRIO: 1854
ELAPSED: 0 00:00:39.81 CPU: 0:00:00.98 BUFIO: 302 DIRIO: 2017
Ah ... more than 10 times slower!
Mind you, this problem is fixed in 8.3
System: GEIN, AlphaServer DS10L 466 MHz
For the curious few, programs and details such as a directory index block from 8.3 in attachement.
Hope this helps,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2009 11:11 PM
09-03-2009 11:11 PM
Re: ZIP performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2009 11:25 PM
09-03-2009 11:25 PM
Re: ZIP performance
You don't indicate how long you need to keep files for or on what frequency/volume new files are created. What's the growth per day/week/month in storage requirements? Do you have loads of space available? I hope the files are on a disk other than the system disk too but you don't mention.
There was also a bug that I'm aware of that when the directory file gets within a few hundred blocks of 32767 blocks the directory file structure apparently becomes corrupt or otherwise in accessible. That's painful to get around too as you need to do ANA/DISK/REPAIR passes to move files into [SYSLOST] then quit part way through so that you don't corrupt [SYSLOST] as well...!
If you have the space and the application is well enough behaved, I'd be tempted to backup the directory/ies that your invoice files are in and anything else on that disk, reinitialize the volume with more reasonable characteristics, recreate everything else on the volume (directory structures, ownership, other files) and, if necessary, the last invoice file so that the application knows where to go, then just leave the files to grow - possibly using a search list and several directories so that you can swap new directories in occasionally and giving a method for binning off old invoices if you can.
You can "back fill" the invoice files from tape if you really want them back on disk again if you have the last invoice file already restored or if the application knows the name of the file it needs to write next.
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2009 05:40 AM
09-04-2009 05:40 AM
Re: ZIP performance
You might not have read my reply before you wrote this.
Of course this can be done better.
This was a poor implementation by OpenVMS, and it was fixed. Or rather: mitigated. A real fix required b-tree directories or some such.
60,000 files at 11 used blocks is just 322 MB, and they with in a 1.5MB directory.
Give each file 1 full disk rotation and 1 max seek on an old disk, and round that up to a generous 50 millisecond and you still get 20 files/second = 3000 seconds. In reality it will be better than 15 ms, for 60 files/second or 20 minutes to read the whole lot. Double that if you insist to account for the required file header IO. Note: the files are contiguous, as they fit in a single cluster.
Also, Art mentions 'hours' before data started to be used, and 11 hours CPU time, later refined to 60% kernel, 10% interrupt. Even if ZIP goes to the file headers for some attributes, that is an unreasonable amount suggestions a performance aberration in the lookups.
>> You're creating lots of itsy bitsy files in one directory. With hardware like the AlphaServer 800 5/500 it's going to be highly painful.
Beg to differ. It's the specific use of software to blame. If those files were spread even over just 10 directories you would neve have heard about ot.
>> There was also a bug that I'm aware of that when the directory file gets within a few hundred blocks of 32767 blocks the directory file structure apparently becomes corrupt or otherwise in accessible.
Correct. Cause a BADIRECTORY error on a good day, a crash on bad days. I actually found and fixed that 2-sep-2004.. 5 year to the day. Ken Blaylock released the fix for 7.3-1 and onwards. The SHFDIR code did a simple 16 bit operation on what in reality was a complex 32 bit fields (EBK and HBK are word swapped for historical (hysterical?) reasons.
At that time I wrote in a note "In testing I was also somewhat surprised to see the (7.1) system degrade signifincatnly adding entries in order to the end of a directory. I had kinda expected that adding to the end of a pre-allocated directory wouldn't hurt too much. It did."
Now we know why. My simple test to load a directory up to 65K blocks into a (DFU) pre-allocated directory never finished: Here is the START of that output.
$ @DIR_TEST
1000000 size: 1 time: 0
1001000 size: 200 time: 1399
1002000 size: 400 time: 1412
:
1007000 size: 1400 time: 1570
:
1027000 size: 5400 time: 20753
1028000 size: 5600 time: 21334
1029000 size: 5800 time: 22016
[ To test this expediently I used DCL loop to read an (oversized with many versions) directory record and append it over and over tweaking the name to be new and in order to file the bulk of the file. Proper set file/enter for the tail.]
>> - possibly using a search list and several directories so that you can swap new directories in occasionally and giving a method for binning off old invoices if you can.
Yes, that would work. But I get the impression that would have to happen during the SINGLE? run that created the files?
Btw.. the "Directory Index Cache" is documented in Kirby McCoy's little black bible "VMS File System Internal". It even explains "A size of 15 bytes was picked because MAIL$800... files that are about a day apart in creation time very in the fourteenth of fifteenth character."
Art>> Steven, I hope I have provided enough details to help your psychic powers along.
I missed that on early reading. Nice! The man knows his audience. :-)
It would be somewhat interesting to know what 'zip' does in this 'scanning' phase, but not interesting enough to pick up and open up the sources (because I know I would spend way too much time in there).
Also, a wildcard operation instead of a list would possibly help here, as that triggers RMS to use its own directory cache/scan
Cheers all!
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2009 06:35 AM
09-04-2009 06:35 AM
Re: ZIP performance
> what 'zip' does in this 'scanning' phase,
> [...]
Perhaps, perhaps not. All I saw on a quick
look was a $PARSE (_not_ SYNTAX_ONLY), but
there could be a non-VMS-specific stat() (or
something) somewhere else.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2009 08:01 AM
09-04-2009 08:01 AM
SolutionOnce you have this mess you can very quickly create a parallel directory structure with chunks of the large directory just using RMS and NOT using SET FILE commands.
Define a search list to point to the fragment and run.
When done, you can blow away the old large, or smaller new directories with
$ SET FILE/NODIR large.DIR.
$ SET PROT large.DIR.
$ DEL large.DIR.
That will take NO time.
Proof...
GEIN $ defi art SYS$DISK:[.ART]
GEIN $ mcr sys$login:tmp 280
ELAPSED: 0 00:00:16.21 CPU: 0:00:00.96 BUFIO: 301 DIRIO: 2078
ELAPSED: 0 00:00:15.57 CPU: 0:00:00.83 BUFIO: 300 DIRIO: 1990
GEIN $ @split
subdirectory 0
subdirectory 1
subdirectory 2
28001 files
( ! runs in seconds !! )
GEIN $ defi art SYS$DISK:[.ART0],[.ART1],[.ART2]
GEIN $ mcr sys$login:tmp 280
ELAPSED: 0 00:00:01.50 CPU: 0:00:00.19 BUFIO: 392 DIRIO: 223
ELAPSED: 0 00:00:01.44 CPU: 0:00:00.14 BUFIO: 392 DIRIO: 98
5 x better... and would be much better still with larger 'old' directories.
You'll find the procedure I used for the split below. Test program, for just 100 files opens as posted before.
Note: I had to pre-allocate the sub-directories not just for the obvious speed improvement, but also to prevent errors:
"%RMS-E-EXT, ACP file extend failed
-SYSTEM-F-BADPARAM, bad parameter value"
Art, you owe me an other beer.
Others, If you were interested enough to read this far, you also owe me one :-)
Hein.
$open /read old art.dir
$i = 0
$ list = "sys$disk:"
$loop:
$sub = i / 10000
$if i.eq. (sub*10000)
$then
$ write sys$output "subdirectory ", sub
$ subdir = "[.art''sub']"
$ cre/dir/allo=1000 'subdir
$ list = list + "," + subdir
$ close/nolog new
$ open/append new art'sub'.dir
$endif
$read/end=done old rec
$write new rec
$i = i + 1
$goto loop
$done:
$close old
$close new
$write sys$output i, " files"
$list = list - ","
$define art 'list
$show log art
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2009 09:38 AM
09-04-2009 09:38 AM
Re: ZIP performance
Define a search list to point to the fragment and run.
When done, you can blow away the old large, or smaller new directories with
$ SET FILE/NODIR large.DIR.
$ SET PROT large.DIR.
$ DEL large.DIR.
That will take NO time. >>
I presume that the intent would be to choose one or the other, large or new smaller directories (but not both), and use this strategy. With the new smaller directories I see no problem. However, if one was to do this to the original large directory you'd still not recover the space allocated to the files nor the file headers in the INDEXF or allocations from the bitmap. To do so you'd need to follow up with an ANALYZE/DISK/REPAIR, in order to complete the work that DELETE would have done, correct?
And I suppose if one chose to use this strategy on the old large directory, rather than the new smaller ones, the INDEXF header records associated with the resident files would then all have misdirected backlink pointers as that directory is now gone.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2009 10:25 AM
09-04-2009 10:25 AM
Re: ZIP performance
nice reading so far.
Given that hardware changeover will still take some time, the one thing that can be done to make the current config start behaving better is.. implementing SEARCH LISTS.
Art,
it would be trivial for you to determine the time for creating a 'reasonable' amount of new files in this dir. ( say, < 5000 (Hein, agree?))
For the sake of example, let us pick 3500 files a period ( week, or day, or month, whatever)
Create a search list logical name as your problem directory.
Every 'period' create a fresh directory, and add that as the first entry to the search list.
New files will appear there.
Of course, for your ZIPping, pick only one (the last, probably) from the search list.
After it has been emptied, remove it from the search list.
Any action on the 'directory' (search list) will function as always, but new file creation will be significantly faster, and 'whole directory processing' for just one out of the search list will be practically uncomparable ( n-square ! )
hth
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2009 01:15 PM
09-04-2009 01:15 PM
Re: ZIP performance
>>>Create your Zip archives against the new directory, 5,000 or so files at time with the delete option.
Should have read
...Create your Zip archives against the new directory, 5,000 or so files at time withOUT the delete option.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-05-2009 07:35 AM
09-05-2009 07:35 AM
Re: ZIP performance
just a minor addendum:
the new directory of the search list does not need to be on the same physical device. So, if anyway possible, choose a different one! That spreads head movement contention over different spindles. And IF you can INIT a drive with clustersize 16 (or at least any power of 2) and start using that, then you are in for additional performance benefits and disk space savings.
And DO try to put pressure on speeding up the move to the ES system!
(even there, the above advises are still valid!)
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2009 06:50 AM
09-09-2009 06:50 AM
Re: ZIP performance
per 10,000:
Accounting information:
Buffered I/O count: 120206 Peak working set size: 17216
Direct I/O count: 42474 Peak virtual size: 182672
Page faults: 1760 Mounted volumes: 0
Images activated: 5
Elapsed CPU time: 0 00:03:40.27
Connect time: 0 00:04:43.92
Once this is done, I will, as also suggested, set the subdirectories /NODIR and delete them, and then use DFU to delete the remaining large directory. Is there anything else required after DFU to fully clean up? SET VOL/REBUILD or ANALYZE/DISK/REPAIR needed?
Thanks again Hein!
With regard to the future of this situation, I think I will try and convince the programmer to go with ZIP archives all the way. ie. produce an invoice, print it and ZIP_CLI/MOVE it immediately. There's no need to have thousands of files lying around.
Cheers,
Art
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2009 07:34 AM
09-09-2009 07:34 AM
Re: ZIP performance
Once the core problem is understood, it becomes easy. ZIP itself was almost irrelevant. :-).
>> Once this is done, I will, as also suggested, set the subdirectories /NODIR and delete them, and then use DFU to delete the remaining large directory.
Turn that around... in case something goes wrong. The multiple small directories combined are equivalent to the large one.
Blow away the large one, nicely delete the smaller ones, one at a time.
>> Is there anything else required after DFU to fully clean up?
No. Shouldn't be.
>> convince the programmer to go with ZIP archives all the way. ie. produce an invoice, print it and ZIP_CLI/MOVE it immediately.
Excellent!
Enjoy,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2009 09:44 PM
09-09-2009 09:44 PM
Re: ZIP performance
one at a time? That means (AFAIK) that ZIP has to
1. Determine end of archive
2. Add this file
3. Update the index
4. Write the new file
5. Remove the imported file
Stap 1 will take some more time with every update. and a lot of small amounts make it a large one. What would happen if a second ZIP-CLI/MOVE comes along?
I think it's safer (and more effecient) to move the created file to a separate directory, and ZIP the whole directory and delete it, after a certain period or number of files.
OpenVMS Developer & System Manager
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 04:35 AM
09-10-2009 04:35 AM
Re: ZIP performance
Cheers,
Art
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2009 04:39 AM
09-10-2009 04:39 AM
Re: ZIP performance
Don't let hundreds of thousands of files pile up ... especially if you have old, slow hardware!!
Cheers,
Art
- « Previous
-
- 1
- 2
- Next »