Operating System - OpenVMS
1752801 Members
5472 Online
108789 Solutions
New Discussion юеВ

Re: Maximum Directory Entries

 
SOLVED
Go to solution
Ian Miller.
Honored Contributor

Re: Maximum Directory Entries

"But I noticed that create fails when someone does a EVE of the directory. Old sickness of VMS ... other threat ..."
probable read lock on the directory file preventing update of the directory to add new entry for created file.
____________________
Purely Personal Opinion
Hein van den Heuvel
Honored Contributor

Re: Maximum Directory Entries

The is no hard limit.
There TENDS to be a performance degradation.
In older VMS versions there was a performance knee at a directory size of 128 blocks when doing wildcard lookups.
If filenames are generated 'in oder' with ever increasing names, there is very litle overhead indeed.
If the names are random, then new files will frequently cause the needed to 'shuffle' up a good chunk of the directory to make room for the new name.
Keep names short if you can!
bad: [report_directory]adobe_report_for_july_05_2004_00546.pdf
good: [report_directory]2004070500546.pdf

Divide and conquer:
better:[200407_reports]0500546.pdf

hth,
Hein.
Robert Atkinson
Respected Contributor

Re: Maximum Directory Entries

That's odd...got this from Google earlier, which implies Random names are better :-

With normal random file naming behavior, directory shuffles are infrequent. However, non-random behavior can cause problems. A classic case is a DELETE *.*;* on a big directory. The file wildcarding of course returns the files front to back, and so files are deleted from the front of the directory - precisely the worst order. So the time to delete all the files in a big directory goes with the square of the number of files. If the directory is really huge, it's well worth building a command procedure to delete the files back to front.
Willem Grooters
Honored Contributor

Re: Maximum Directory Entries

Rob,

Ian is right:


max number of files in a directory is usually limited by finding enough contiguous space for the directory file.


You can run into "funny" behaviour if there is too litlle contiguous space to hold an expanded directory file. Especially if free space is scattered. Browse down the forum for some example....

To keep your file collections usable, I gues you wouldn't want 10.000th of files into one directory - let alone it _may_ introduce a performance penalty.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Kris Clippeleyr
Honored Contributor

Re: Maximum Directory Entries

Robert,


So the time to delete all the files in a big directory goes with the square of the number of files. If the directory is really huge, it's well worth building a command procedure to delete the files back to front.


For deleting "large" directories, I find that DFU comes in handy.
Oth, to prevent any application that writes its files in one directory from "over-filling" a certain directory, you can set up a search list of logical names, that point to different physical directories, and at certain time intervals rotate the definition. Writing will happen to the first directory in the list, reading will be tried on all.
E.g.:

$ DEFINE LOG DISK2:[LOG1],DISK2:[LOG2],DISK2:[LOG3]
$ CREATE LOG:T.T
ctrl/Z
$ DIREC LOG

will show T.T in DISK2:[LOG1]

hereafter

$ DEFINE LOG DISK2:[LOG2],DISK2:[LOG3],DISK2:[LOG1]
$ CREATE LOG:X.X
ctrl/Z
$ DIREC LOG

will show T.T in DISK2:[LOG1], and X.X in DISK2:[LOG2]

We used this trick once on an application that created a huge amount of uniquely named logfiles.

Greetz,

Kris

I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Wim Van den Wyngaert
Honored Contributor

Re: Maximum Directory Entries

First test results on a 2 node 4100 cluster using 7.3.

DCL loop to create files with name starting with 0-9 + fixed name. Started 2 jobs on each node to fill the directory.

After almost 5 hours :
First hour : 4320 files created
Second hour : 3490 files created
Third hour : 2732 files created
So : big directories are slow for file creation.

De directory file is now 7000 blocks and contains 22.000 files.

Test continues wednesday.
Wim
Hein van den Heuvel
Honored Contributor

Re: Maximum Directory Entries


In reply to my reply Robert wrote:

" That's odd...got this from Google earlier, which implies Random names are better :-

With normal random file naming behavior, directory shuffles are infrequent."

Define 'normal'!?

I was assuming (may well be wrong) that the applcation in question just kept on adding files. Deleting was not mentioned (yet). If you just keep on adding, then adding at the end is best. This will cause the directory to grow at least by a cluster and maybe more.

The 'normal' behaviour referred to is probably to have files coming and going over time maybe a few more coming than going. Then linear naming is horrible, when doing the deletes. The adds will be fine, but the deletes will always be from the beginning (first directory block) and every directory block emptied will cause a shuffle down.

A random delete will unlikely empty a block, so will not cause a shuffle. Hopefully it will create enough room for a future random add for a different file, but targetted to the same block


btw... for totally optimal directory packing I forgot to mention dropping 'obvious' file types. Like '.pdf' in a report directory. Just give the exception files an extention. And that extention can be ".F" for fortran source and ".O" for objects... if performance is more important than clarity and easy of use (unlikely!)

Ramblings...

For computer named/used files, where humanoids are not reading any meaning into the file name (more or less like VMSmail extention files), the optimal packing is ofcourse by using a base-36 (or worse!) characterset: 0-9A-Z. Using that, 2 characters can identify 1000+ files, 3 is good for almost 50,000 and 4 can address more than a million, but youhave to add 14 bytes of directory data per entry.
10,000 files woudl then just take 350 blocks of .DIR file. Crazy but possible.

Hein.
Andy Bustamante
Honored Contributor

Re: Maximum Directory Entries

As someone already mentioned the 128 performance block hit for directory size was lifted in VMS 7.2.

You haven't mentioned delete or long term storage. Are these these files out of date in after being displayed or do you use long term storage? For short term use, I use a six directory logical for our web server's reports and a batch job that redefines it in 10 minute intervals. Reports are available for at least 50 minutes, with the sixth directory having the contents deleted once an hour. For very busy sites, distribute the directories among multiple disks.

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Robert Atkinson
Respected Contributor

Re: Maximum Directory Entries

The reports will be kept for at least 6 months, longer if I can find the disk space.

Rob.
Wim Van den Wyngaert
Honored Contributor

Re: Maximum Directory Entries

Test results continued but only 2 jobs instead of 4.

There are now about 35.000 files in my directory that itself is now 11.000 blocks.

No problems yet but :

1) doing dir/siz test.dir takes between 2 and 10 seconds (why ?)

2) doing dir/tot/sin=15:30 takes about 10 minutes (acceptable, old machine)

Wim
Wim