Re: Overheads of large .DIR files?

John McL · ‎04-11-2010

We have a situation where we create maybe 18,000 log files per day (the actual number varies depending on what's run) and these hese log files are held for 7 days. The file names contain 28 characters, made up of an unchanging 11-character standard prefix, a 4-digit number (1 to 1999) that might cycle several times in 24 hours, an underscore character, the PID and the file type '.LOG'. (Each filename is unique because these are log files for detached "slave" processes and yes, we have lots of those.) Any node in the cluster (2 to 4 machines) might create these log files.

To date all these files have gone into one directory, making for some .DIR files that exceed 5000 blocks. File creation is spread across the day and deletion, after 7 days, takes place around 4:00am when the system is quiet.

I think we should move to a new log file directory each day (via a logical that rolls over at midnight driven within the image by a timer AST) and that we should use shorter file names (just "S_.LOG"). I believe that the overheads associated with file creation and deletion will be reduced if we go this way.

My knowledge is based on the old performance "knee" at 127 blocks, after which point performance went downhill. I understand performance was improved in VMS v7.3 but I can't find a good description of exactly what altered and what the implications are for inserting and removing file names. In particular I can't find information about when disk I/O's are required (c.f. cache lookups), or about the splitting of blocks in .DIR files when inserting new files names. My understanding is that a lock is taken out on the entire volume when the .DIR file is being modified, so I would like to minimise this lock time as well as the disk I/O time.

In short ...
Q1 - what performance-related practices do you recommend for large numbers of files being created in a single directory, and why?
Q2 - exactly what changes were made in v7.3 and what, if any, performance "gotchas" still exist?

(And if you say that no changes are necessary to our current shema then please explain why.)

Hoff · ‎04-11-2010

Q1: don't do what is being done here?
Q2: big directories are n^2 data structures. And stuff can fall out of cache.

As for a more detailed answer, read this first:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=625667

Hein and Mark Hopkins in particular describe this stuff in some detail over there.

You could choose to fix this design flaw, or you could choose to reorganize the allocations (incremental additions are best added to the end of the directory; log files that sort alphabetically and increasing), or you can investigate what can be done to avoid creating the log files or such; addressing the environment holistically.

As for removing the files, reverse delete (DCL or DFU or a program) will help with that aspect of performance.

If this is the lowest of the low-hanging fruit for local performance, go for it.

Updating the application design might help, too; I've seen cases where revisiting or rethinking that can be beneficial; where the design is the "dead elephant in the room."

P Muralidhar Kini · ‎04-11-2010

Hi John,

Directory file is a logically contiguous file. When lot of files are added
in a directory, the directory file size increases. But for this to happen
the system should have enough contiguous space.

>> My understanding is that a lock is taken out on the entire volume when
>> the .DIR file is being modified, so I would like to minimize this lock
>> time as well as the disk I/O time.

Directory is a file, whose contents are filenames along with some
attributes such as Version Limit, Version, FID, SYMLINK information and
so on. When creating/deleting new files in a directory, the directory files
needs to be modified to reflect this operation. For this, serialization
lock is taken out on the directory. This blocks only those XQP threads
that want to access the same directory. But other activities on the volume
can proceed. i.e. you can create/delete files in some other directory on
the same disk at the same time.

>> I think we should move to a new log file directory each day
Yes this sounds like a good idea.
With this setup, the directory for every day would be smaller.

With a single large directory,
For the first day the directory gets filled up (lets say Filenames a.txt
to z.txt). For the subsequent days, when files are created (say c.txt),
then based on whether the block where c.txt has to be inserted is full
or not, XQP would have to do Expand Shuffle (move d.txt to z.txt one block
below) to insert new entries in a directory file.

When files needs to be deleted (say d.txt), then based on whether the
block having d.txt is full or not, XQP would have to do compress shuffle
(move e.txt to z.txt one block above) to delete entries in a directory
file.

If we have a day wise directory,
Then the number of Expand/Compress Shuffle that XQP does can be minimized.
i.e. As every day has its own directory, the day wise create/delete would
act on only its corresponding directory with few number of entries
(as compared to only a single directory with large number of entries).

>> Any node in the cluster (2 to 4 machines) might create these log files.
Is distributed lock manager involved here ?

Will get back to you on the performance related practices.

Regards,
Murali

Let There Be Rock - AC/DC

Jon Pinkley · ‎04-11-2010

Having a directory per day will be more efficient; I doubt you will find anyone that will say otherwise.

Shortening the file names will allow more file names to be stored in each directory block, so the 18,000 files will require fewer blocks in the directory file.

Just curious, will having the PID in the log file name be very useful? How will a user know what PID was theirs? If the files have a constant name, the directory will be much more compact, since each version must only store the version number and file ID of the specific file (when it is in the same directory block). For example, the following command file will create 1000 versions of "THIS_IS_A_LONG_FILE_NAME_THAT_WILL.HAVE_MANY_VERSIONS" in a 10 block directory.

$ cre/dir/all=10 [.itrctest]
$ cnt=1
$top:
$ cre [.itrctest]this_is_a_long_file_name_that_will.have_many_versions;
$ cnt = cnt + 1
$ if cnt .le. 1000 then goto top
$end:
$ exit

Another advantage of one per day, is that you can easily delete all the files in the directory at the end of the 7 day waiting period. One of the most efficient ways to do that would be with DFU.

For example to delete device:[20100404...]*.*;*

$ dfu delete/directory/tree/nolog device:[000000]20100404.dir

If you use dfu to delete the directories, it will also delete the directory too. I am not aware of a way to delete the files without the directory using DFU. So I would recommend using

$ create/directory/allocation=5000

when creating the empty directories, to avoid constant directory expansions (which will probably involve recopying the current contents to a new location on disk each time, since VMS directory files must be contiguous).

You may want to create a search list logical name that will include all the 7 day's worth of directories, so it will be possible to find a log file from a previous day using a simple directory command:

For example

$ define[/system] applog device:[20100411],device:[20100410],device:[20100409],device:[20100408],device:[20100407],device:[20100406],device:[20100405]

$ directory applog:mylog.log

Jon

it depends

John McL · ‎04-11-2010

Hoff,
Mark's & Hein's comments were interesting but were largely focused on file deletes. This isn't a big issue because we're using ZIP with the "remove" (or is it "move"?) option. (For those not familiar, it's like BACKUP/DEL). Moreover this runs in a batch job at about 4:00am when performance isn't as important as from 8am to 9pm or theerabouts.

We are working on reducing the number of log files because some are for "slave" processes that ultimately did nothing because of what other slaves did, but this situation seems unpredictable and depends on job mix.

John McL · ‎04-11-2010

Muralidhar,
I'd forgotten about the contiguous requirement, so thanks for that reminder.

Are you sure about the lock ONLY being on the .DIR file during creates and deletes? What about the changes to INDEXF.SYS and BITMAP.SYS? Aren't there two locks, one for the directory and one for the volume?

The other point about daily directories is that each is independent and that a very large number of files created on one day won't continue being a performance problem for the other 6 days.

John McL · ‎04-11-2010

Jon,

I agree with you that multiple versions would be better (smaller .DIR and fast access) and I've confirmed that using the cycling 4-digit number we currently use in the filename, however ...

The slave processes are named according to their PID and the PID is displayed by certain management utilities, so short of some convoluted translation system and telling everyone how to use it I'm pretty much stuck with using the PID.

The saving grace is that the log files will always(?) be added to the end of the list for the machine on which the slave process is running (i.e. on a 2-node evenly balanced system new filenames will be entered at a point 50% down the file and at the end.

Sure filenames in the form _.LOG would guarantee end of file additions but now we've taken a 14 character filename out to 21 characters (orig size was 28 chars) and .DIR file size has grown. Is that a better tradeoff? I'm not sure.

P Muralidhar Kini · ‎04-11-2010

Hi John,

From Mark's comment in the above link, note the following thing

>> Note that the create/dir command allows you
>> to allocate the space up front so you don't
>> have to endure the frequent extends for
>> directories expected to be very large.

Apart from the Expand and Compress shuffle operation of the directory
file, this talks about moving directory file to some other location on
the disk.
When adding entries to directory file, if there is no contiguous space
(i.e. contiguous space after the current directory file location on the
disk) then the directory would be moved to some other location of the disk
where the contiguous space is available. By pre-allocating directory file
on the disk using the create command, this can be avoided.
you may want to try this out also.

>> Are you sure about the lock ONLY being on the .DIR file during creates
>> and deletes? What about the changes to INDEXF.SYS and BITMAP.SYS?
>> Aren't there two locks, one for the directory and one for the volume?

You are correct. When creating a file, a lot of other operations are
involved other than creating a directory entry for that file. Space needs
to be grabbed from the disk and so on. For these different set of locks
are used and some of them would block the activity on the entire volume.

I was referring in particular to operation of adding/removing entries from
the directory files for which only the serialization lock on the directory
file is taken. This is where there is lot of scope for optimization and
reduce the amount of time taken.

>> The other point about daily directories is that each is independent
>> and that a very large number of files created on one day won't continue
>> being a performance problem for the other 6 days.
Yes. Also as i said before, with smaller directories, XQP would not have to
do a lot of activity for Expand/Compress shuffle of the directory.
This would be a performance benefit.

Regards,
Murali

Let There Be Rock - AC/DC

Steven Schweda · ‎04-11-2010

> [...] largely focused on file deletes. This
> isn't a big issue because we're using ZIP
> with the "remove" (or is it "move"?)
> option. [...]

Zip has no special code to make deleting
files any faster than anything else.
Depending on how it's used, I'd expect it to
use some sub-optimal (perhaps anti-optimal)
order when deleting the files in a directory.

John McL · ‎04-11-2010

Steven,

I'm in no position to be able to modify the ZIP technique that we use on a whole range of files.

As I said, I'm not overly concerned with what happens at 4:00am unless it ultimately impacts the major processing that occurs between 8:00am and about 9pm.

Using separate daily directories will also mean that deletions of the log files there won't have .DIR management overheads potentially impacting other files (although the volume lock would do so briefly).

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Overheads of large .DIR files?

Overheads of large .DIR files?