- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Overheads of large .DIR files?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 06:05 PM
04-11-2010 06:05 PM
To date all these files have gone into one directory, making for some .DIR files that exceed 5000 blocks. File creation is spread across the day and deletion, after 7 days, takes place around 4:00am when the system is quiet.
I think we should move to a new log file directory each day (via a logical that rolls over at midnight driven within the image by a timer AST) and that we should use shorter file names (just "S_
My knowledge is based on the old performance "knee" at 127 blocks, after which point performance went downhill. I understand performance was improved in VMS v7.3 but I can't find a good description of exactly what altered and what the implications are for inserting and removing file names. In particular I can't find information about when disk I/O's are required (c.f. cache lookups), or about the splitting of blocks in .DIR files when inserting new files names. My understanding is that a lock is taken out on the entire volume when the .DIR file is being modified, so I would like to minimise this lock time as well as the disk I/O time.
In short ...
Q1 - what performance-related practices do you recommend for large numbers of files being created in a single directory, and why?
Q2 - exactly what changes were made in v7.3 and what, if any, performance "gotchas" still exist?
(And if you say that no changes are necessary to our current shema then please explain why.)
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 06:42 PM
04-11-2010 06:42 PM
Re: Overheads of large .DIR files?
Q2: big directories are n^2 data structures. And stuff can fall out of cache.
As for a more detailed answer, read this first:
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=625667
Hein and Mark Hopkins in particular describe this stuff in some detail over there.
You could choose to fix this design flaw, or you could choose to reorganize the allocations (incremental additions are best added to the end of the directory; log files that sort alphabetically and increasing), or you can investigate what can be done to avoid creating the log files or such; addressing the environment holistically.
As for removing the files, reverse delete (DCL or DFU or a program) will help with that aspect of performance.
If this is the lowest of the low-hanging fruit for local performance, go for it.
Updating the application design might help, too; I've seen cases where revisiting or rethinking that can be beneficial; where the design is the "dead elephant in the room."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 06:45 PM
04-11-2010 06:45 PM
Re: Overheads of large .DIR files?
Directory file is a logically contiguous file. When lot of files are added
in a directory, the directory file size increases. But for this to happen
the system should have enough contiguous space.
>> My understanding is that a lock is taken out on the entire volume when
>> the .DIR file is being modified, so I would like to minimize this lock
>> time as well as the disk I/O time.
Directory is a file, whose contents are filenames along with some
attributes such as Version Limit, Version, FID, SYMLINK information and
so on. When creating/deleting new files in a directory, the directory files
needs to be modified to reflect this operation. For this, serialization
lock is taken out on the directory. This blocks only those XQP threads
that want to access the same directory. But other activities on the volume
can proceed. i.e. you can create/delete files in some other directory on
the same disk at the same time.
>> I think we should move to a new log file directory each day
Yes this sounds like a good idea.
With this setup, the directory for every day would be smaller.
With a single large directory,
For the first day the directory gets filled up (lets say Filenames a.txt
to z.txt). For the subsequent days, when files are created (say c.txt),
then based on whether the block where c.txt has to be inserted is full
or not, XQP would have to do Expand Shuffle (move d.txt to z.txt one block
below) to insert new entries in a directory file.
When files needs to be deleted (say d.txt), then based on whether the
block having d.txt is full or not, XQP would have to do compress shuffle
(move e.txt to z.txt one block above) to delete entries in a directory
file.
If we have a day wise directory,
Then the number of Expand/Compress Shuffle that XQP does can be minimized.
i.e. As every day has its own directory, the day wise create/delete would
act on only its corresponding directory with few number of entries
(as compared to only a single directory with large number of entries).
>> Any node in the cluster (2 to 4 machines) might create these log files.
Is distributed lock manager involved here ?
Will get back to you on the performance related practices.
Regards,
Murali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 06:54 PM
04-11-2010 06:54 PM
Re: Overheads of large .DIR files?
Shortening the file names will allow more file names to be stored in each directory block, so the 18,000 files will require fewer blocks in the directory file.
Just curious, will having the PID in the log file name be very useful? How will a user know what PID was theirs? If the files have a constant name, the directory will be much more compact, since each version must only store the version number and file ID of the specific file (when it is in the same directory block). For example, the following command file will create 1000 versions of "THIS_IS_A_LONG_FILE_NAME_THAT_WILL.HAVE_MANY_VERSIONS" in a 10 block directory.
$ cre/dir/all=10 [.itrctest]
$ cnt=1
$top:
$ cre [.itrctest]this_is_a_long_file_name_that_will.have_many_versions;
$ cnt = cnt + 1
$ if cnt .le. 1000 then goto top
$end:
$ exit
Another advantage of one per day, is that you can easily delete all the files in the directory at the end of the 7 day waiting period. One of the most efficient ways to do that would be with DFU.
For example to delete device:[20100404...]*.*;*
$ dfu delete/directory/tree/nolog device:[000000]20100404.dir
If you use dfu to delete the directories, it will also delete the directory too. I am not aware of a way to delete the files without the directory using DFU. So I would recommend using
$ create/directory/allocation=5000
when creating the empty directories, to avoid constant directory expansions (which will probably involve recopying the current contents to a new location on disk each time, since VMS directory files must be contiguous).
You may want to create a search list logical name that will include all the 7 day's worth of directories, so it will be possible to find a log file from a previous day using a simple directory command:
For example
$ define[/system] applog device:[20100411],device:[20100410],device:[20100409],device:[20100408],device:[20100407],device:[20100406],device:[20100405]
$ directory applog:mylog.log
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 07:22 PM
04-11-2010 07:22 PM
Re: Overheads of large .DIR files?
Mark's & Hein's comments were interesting but were largely focused on file deletes. This isn't a big issue because we're using ZIP with the "remove" (or is it "move"?) option. (For those not familiar, it's like BACKUP/DEL). Moreover this runs in a batch job at about 4:00am when performance isn't as important as from 8am to 9pm or theerabouts.
We are working on reducing the number of log files because some are for "slave" processes that ultimately did nothing because of what other slaves did, but this situation seems unpredictable and depends on job mix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 07:27 PM
04-11-2010 07:27 PM
Re: Overheads of large .DIR files?
I'd forgotten about the contiguous requirement, so thanks for that reminder.
Are you sure about the lock ONLY being on the .DIR file during creates and deletes? What about the changes to INDEXF.SYS and BITMAP.SYS? Aren't there two locks, one for the directory and one for the volume?
The other point about daily directories is that each is independent and that a very large number of files created on one day won't continue being a performance problem for the other 6 days.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 07:39 PM
04-11-2010 07:39 PM
Re: Overheads of large .DIR files?
I agree with you that multiple versions would be better (smaller .DIR and fast access) and I've confirmed that using the cycling 4-digit number we currently use in the filename, however ...
The slave processes are named according to their PID and the PID is displayed by certain management utilities, so short of some convoluted translation system and telling everyone how to use it I'm pretty much stuck with using the PID.
The saving grace is that the log files will always(?) be added to the end of the list for the machine on which the slave process is running (i.e. on a 2-node evenly balanced system new filenames will be entered at a point 50% down the file and at the end.
Sure filenames in the form
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 07:39 PM
04-11-2010 07:39 PM
Re: Overheads of large .DIR files?
From Mark's comment in the above link, note the following thing
>> Note that the create/dir command allows you
>> to allocate the space up front so you don't
>> have to endure the frequent extends for
>> directories expected to be very large.
Apart from the Expand and Compress shuffle operation of the directory
file, this talks about moving directory file to some other location on
the disk.
When adding entries to directory file, if there is no contiguous space
(i.e. contiguous space after the current directory file location on the
disk) then the directory would be moved to some other location of the disk
where the contiguous space is available. By pre-allocating directory file
on the disk using the create command, this can be avoided.
you may want to try this out also.
>> Are you sure about the lock ONLY being on the .DIR file during creates
>> and deletes? What about the changes to INDEXF.SYS and BITMAP.SYS?
>> Aren't there two locks, one for the directory and one for the volume?
You are correct. When creating a file, a lot of other operations are
involved other than creating a directory entry for that file. Space needs
to be grabbed from the disk and so on. For these different set of locks
are used and some of them would block the activity on the entire volume.
I was referring in particular to operation of adding/removing entries from
the directory files for which only the serialization lock on the directory
file is taken. This is where there is lot of scope for optimization and
reduce the amount of time taken.
>> The other point about daily directories is that each is independent
>> and that a very large number of files created on one day won't continue
>> being a performance problem for the other 6 days.
Yes. Also as i said before, with smaller directories, XQP would not have to
do a lot of activity for Expand/Compress shuffle of the directory.
This would be a performance benefit.
Regards,
Murali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 07:50 PM
04-11-2010 07:50 PM
Re: Overheads of large .DIR files?
> isn't a big issue because we're using ZIP
> with the "remove" (or is it "move"?)
> option. [...]
Zip has no special code to make deleting
files any faster than anything else.
Depending on how it's used, I'd expect it to
use some sub-optimal (perhaps anti-optimal)
order when deleting the files in a directory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 08:06 PM
04-11-2010 08:06 PM
Re: Overheads of large .DIR files?
I'm in no position to be able to modify the ZIP technique that we use on a whole range of files.
As I said, I'm not overly concerned with what happens at 4:00am unless it ultimately impacts the major processing that occurs between 8:00am and about 9pm.
Using separate daily directories will also mean that deletions of the log files there won't have .DIR management overheads potentially impacting other files (although the volume lock would do so briefly).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 11:09 PM
04-11-2010 11:09 PM
Re: Overheads of large .DIR files?
>>>
When files needs to be deleted (say d.txt), then based on whether the
block having d.txt is full or not, XQP would have to do compress shuffle
(move e.txt to z.txt one block above) to delete entries in a directory
file.
<<<
seems to have a typo: "whether the
block having d.txt is _empty_ or not".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2010 11:24 PM
04-11-2010 11:24 PM
Re: Overheads of large .DIR files?
Yes. That was a typo.
I meant "whether the block having d.txt is _empty_ or not".
Thanks for pointing that out.
Regards
Murali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 12:06 AM
04-12-2010 12:06 AM
Re: Overheads of large .DIR files?
This is a retry, please excuse a possible duplicate.
jpe
John,
>>>
Sure filenames in the form
<<<
That explains easily:
>>>
contain 28 characters, made up of an unchanging 11-character standard prefix,
<<<
Consecutive directory entries get compacted where possible:
Each entry begins with the COUNT of characters from the previous entry that is repeated, followed by only the differing chars.
In the original setup, you only had the 11 char prefix ONCE, thereafter only the (one-byte) count (btw, hence the 255 char length limit).
Your new approach only allows for condensing off 6 bytes in the same second (6 digits + 1 "_" - 1 byte), 4 bytes in the same 10-second period, and regularly less when that rolls.
Summary: long same-character LEADING substrings are relatively cheap, but varying chars early in the string force the remainder to be verbatim.
hth
Proost.
Have one on me.
jpe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 12:24 AM
04-12-2010 12:24 AM
Re: Overheads of large .DIR files?
I will have to keep this brief, as I need to get ready for an early meeting.
Another consideration is whether these files need be stored in a single unitary directory.
Two relatively straightforward options present themselves:
- use a different directory per day, for searching purposes, use a logical name which concatenates six days of directories into a search list (this approximates the present behavior).
- rather than a flat directory with a 28 character filename, consider creating a second-level set of subdirectories based upon part of that name (e.g., PID) with individual files being entered in the subdirectories. The subdirectories would be entered in the search list as described above.
Combining the above two alternations would provide the scope of a seven-day retention, together with far smaller individual directories. As has been noted previously by other posters, the work involved in adding/removing files is loosely related to the size of the directory, thus segmenting the add/delete problem has potentially larger than linear payoff.
Searching is then handled using a logical name with a search list.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 04:01 AM
04-12-2010 04:01 AM
Re: Overheads of large .DIR files?
Consecutive directory entries get compacted where possible:
<<<
The directory entries are always compacted within a disk block: all the free space is at the end. But entries are not compressed. What is described here, sounds like front end key compression for RMS index files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 04:56 AM
04-12-2010 04:56 AM
Re: Overheads of large .DIR files?
Yeah, excellent. More dense is better, but try to get some movement in those leading characters. The XQP will use them as an index/accelerator. From what you described so far
Any concerns on re-boots and pids recycling?
>> Q1 - what performance-related practices do you recommend for large numbers of files being created in a single directory, and why?
a) Try to avoid that. Try exploit subdirectories and search-lists of directories. The latter may now need any application change.
b) If you can not use the searchlist/subdirectory, then try NOT to add in ever increasing order. Randomness where periodic deletes do NOT empty out the entire directory block are ideal. That way new files will typically re-use directory space from past files, and the directory will stay more or less constant in size, avoiding shuffles to squeeze out empty directory blocks, and open up for a fresh block.
Hmmm... I never implemented anything like this, but with a somewhat predictable re-use pattern, you could seed the directory with constant entries to stop blocks from emptying out.
Write a tiny program to read the directory as sequential file. Every time the RFA_VBN changes, take the first N characters from the name, and call sys$enter to do something similar to: SET FILE/ENT=nnnnn.X $_place_holder.X. Just leave those around.
>> Q2 - exactly what changes were made in v7.3 and what, if any, performance "gotchas" still exist?
a) RMS was thought to user a directory buffer greater than 127 block if needed.
b) The XPQ was thought to not just use a single block buffer during directory shuffles, but use SYSGEN PARAM ACP_MAXREAD as buffers size. That's typicaly set to 32, reducing large directory operations by that factor.
Jan... you confused RMS INdexed file key compression with directories. Nice thought, but no, directory entries are NOT compressed.
Cheers,
Hein
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 05:26 AM
04-12-2010 05:26 AM
Re: Overheads of large .DIR files?
Q1a: aesthetics?
Q2: If deletes aren't an issue, then what is an issue?
Q3: what are your critical tasks and where are those bottlenecked for resources?
Q3a: Is that with big directories?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 06:17 AM
04-12-2010 06:17 AM
Re: Overheads of large .DIR files?
I think seeding a certain directory for a customer of mine might be helpful.
So I wrote the helper program I suggested earlier to report the first record in each directory block, the numbers of records for that block, and a final line to indicate how many leading characters would be needed to create unique names. Source below.
It already provided useful insights for me ( using my twisted definition of useful :-).
I could envision someone using something like this to get an insight in how directory entries might move around, using some snapshots. A quick DCL hack could use the output as driver file to actually create seeds directory name entries.
Enjoy,
Hein.
$ type CHECK_DIRECTORY.C
#include
#include
#include
#include
int sys$open(), sys$connect(), sys$get();
main(int argc, char *argv[]) {
struct FAB fab;
struct RAB rab;
struct { short verlimit; unsigned char flags, namecount; char name[508]; } directory_record;
int s, i, records=0, old_records=0, blocks=0, minimum=0, old_vbn=0;
char old_name[256];
if (argc != 2) return 16;
fab = cc$rms_fab;
fab.fab$l_fna = argv[1];
fab.fab$b_fns = strlen(argv[1]);
fab.fab$b_fac = FAB$M_GET;
fab.fab$b_shr = FAB$M_SHRUPD | FAB$M_SHRGET;
rab = cc$rms_rab;
rab.rab$l_fab = &fab;
rab.rab$l_ubf = (char *) &directory_record;
rab.rab$w_usz = sizeof (directory_record);
s = sys$open (&fab);
if (s&1) s = sys$connect (&rab);
while (s&1) {
directory_record.namecount = 0; // EOF
s = sys$get(&rab);
if (!(s&1) && s != RMS$_EOF) break;
records++;
if (old_vbn == rab.rab$l_rfa0 ) continue; // Same block test.
if ( blocks++ ) printf ("%06d %02d %s\n", // First block is special
old_vbn, records - old_records, old_name);
for (i=0; i < directory_record.namecount; i++) // Uniqueness test.
if (old_name[i] != directory_record.name[i]) break;
if (i > minimum) minimum = i;
directory_record.name[directory_record.namecount] = 0;
strcpy ( old_name, directory_record.name);
old_vbn = rab.rab$l_rfa0;
old_records = records;
}
printf ("\n%d records in %d blocks, minimum length %d.\n", records, blocks, minimum );
if (s == RMS$_EOF) s = 1;
return s;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 06:22 AM
04-12-2010 06:22 AM
Re: Overheads of large .DIR files?
Pre-seeding the directory sounds interesting. Instead of pre-seeding you can also re-use a directory file after deleting almost all of the old log files.
Using Hein's idea, looking at the first entry of each disk block in the directory tells you, which filenames you want to keep. "Pre-seeding" now means to delete all other files (and probably set EOF to zero for the remaining ones). This shouldn't be slower than deleting ALL files: no disk block shuffling any more.
Now, if the pattern of the created filenames is similar enough, the available directory blocks will have enough room for most of the new files. So there will not be much copying within the disk file.
There will be a lot of files in an "empty" directory. This may be VERY confusing.
All this may be possible to code in DCL. Dump/dir can be used to find the filenames to keep.
However, given the confusion and with writing/changing all the command procedure it may not be worth the effort.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 08:35 AM
04-12-2010 08:35 AM
Re: Overheads of large .DIR files?
Besides the comments on directory structure and using a logical, how large are your log files compared to the cluster size? You'll need to review disk capacity against space since a larger disk cluster (or allocation size) will use more space, but can reduce i/o operatations overall.
Another option, is to spread your logical over 2 disks, alternating by day. Schedule the delete pass on disk 0 while disk 1 is active for new logs. Have a batch job modify the logical each night.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 02:34 PM
04-12-2010 02:34 PM
Re: Overheads of large .DIR files?
- separate directories each day?
Yes, that's the plan. Accessed by a logical name that a timer AST flips (scheduled 23:59.99 and waits 1 second, then redefines logical according to current day).
- search list across the directories?
Yes, definitely
- further subdirectories?
It would be a tough job to convince people of the necessity at this stage. We might see how things go with just the separate directories and keep this in reserve. Splitting them by node (i.e. different logicals on each node) should mean that if the PID was the first varying component of the file names then new files would be added at the end.
FYI, about a month ago the situation was that we put 4 different types of log files in one directory and had over 250,000 entries. A new slave was started because all other slaves were busy but it was often the case that before the new one was ready to do some work one of those others would become free and it would take the task. Some of the overheads at the startup of the slave are our doing but I figured that .DIR management was probably also an issue.
Various changes have drastically reduced the number of slave processes and the reduction in log files has been significant but to my thinking the size of the directory files is still an issue (creates, deletes, lookups on INDEXF.SYS information). And that's why I've asked these questions.
Redesigning the whole architecture is not an option so my solutions have to be fairly tight and preferably involve minimal changes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 02:51 PM
04-12-2010 02:51 PM
Re: Overheads of large .DIR files?
This XFC caching of the first few characters is interesting.
I seem to have two options:
(a) the idea from others that adding to the end of a directory is most efficient but when the current space (presumably allocated rather than used) is full, a larger number of contiguous must be found and the original .DIR copied into it.
(b) Your idea of preallocating a large directory then populating it with dummy entries that will help XFC performance if I also vary the first few characters of the file names.
Your approach looks interesting but I'm not sure that I could get it implemented here. I also have some questions re your approach:
- Are the dummy entries files that really exist and wouldn't you need special conditions to avoid deleting them?
- What happens to your system if the load suddenly surges and the nunber of log files jumps? Won't this potentially mean directory expansion and possible splits of the structure that you carefully put together?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 03:02 PM
04-12-2010 03:02 PM
Re: Overheads of large .DIR files?
I suspect that disk size and cluster size is probably not an issue. We have big disks (220 million blocks) with plenty of space on most and cluster sizes either 32 or 64 blocks.
Logicals spread over disks? My current plans have a logical pointing to the current directory and it does so by pointing to a logical name for each day and it's this second level that points to a specific disk and directory (i.e. CUR_HTTP_LOGDIR -> HTTP_LOGDIR_TUESDAY -> disk & directory). The aim was to make it flexible and allow system managers to use whatever disks and directories they wanted to.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2010 08:14 PM
04-12-2010 08:14 PM
Solutiona1) it is easy enough to pre-allocate.
a2) an occasional re-allocate (once per extend = minimum once per cluster) is much cheaper than a full shuffle every time some middle block is filled. But there is a price on cleanup.
(b) Your idea of preallocating a large directory then populating it with dummy entries
- Are the dummy entries files that really exist and wouldn't you need special conditions to avoid deleting them?
b1) I would certainly give them a special extension (.X ? .KEEP ?) and/or version number
b2) If there are deleted, well then there is no functional harm done. You could just rerun the seed tool just before further deletes.
b3) You could make them just dummy File-ID entries. SYS$ENTER will accept any number you like : 1,1,0 123,123,0 whatever. See sample tool below. But it is probably better to use $ SET FILE/ENTER=
That file PLACE_HOLDER.X could be protected against delete and have a contents to explain its purpose.
b4) I would choose as short as possible seed names, just enough to force the right distribution as not to eat too much space in the 512 byte directory block. Rounding up to even size it free. (The name space is always rounded up to a word).
>> What happens to your system if the load suddenly surges and the nunber of log files jumps? Won't this potentially mean directory expansion and possible splits of the structure that you carefully put together?
Yes, but nothing will be worse than today. Just not as optimal as it perhaps could be.
Delete ($set file/remove) all the *.X; file name entries and re-seed ( re-seat ? :-).
fwiw,
Hein
$ type enter.c
/*
** enter.C create directory entry for a file ID.
**
** Have fun, Hein van den Heuvel, HP 6/4/2002
*/
#include ssdef
#include rms
#include stdio
#include string
#include stdlib
main(int argc, char *argv[])
{
int i, status, sys$parse(), sys$enter();
char *p, expanded_name[256], resultand_name[256];
struct FAB fab;
struct NAM nam;
if (argc < 5) {
printf ("Usage $%s [x]
return 268435456;
} else {
fab = cc$rms_fab;
fab.fab$l_fop = FAB$M_NAM;
fab.fab$l_dna = ".DAT";
fab.fab$b_dns = strlen(fab.fab$l_dna);
fab.fab$l_fna = argv[4];
fab.fab$b_fns = strlen (argv[4]);
fab.fab$l_nam = &nam;
nam = cc$rms_nam;
nam.nam$b_nop = NAM$M_NOCONCEAL;
nam.nam$l_rsa = resultand_name;
nam.nam$b_rss = 255;
nam.nam$l_esa = expanded_name;
nam.nam$b_ess = 255;
status = sys$parse( &fab );
if (status & 1) {
i = atoi (argv[1]);
nam.nam$w_fid_num = (short) i;
nam.nam$b_fid_nmx = (unsigned char) (i >> 16);
nam.nam$w_fid_seq = (short) atoi ( argv[2] );
nam.nam$w_fid_rvn = (short) atoi ( argv[3] );
status = sys$enter ( &fab );
}
return status;
}
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-13-2010 01:19 AM
04-13-2010 01:19 AM
Re: Overheads of large .DIR files?
(a) the idea from others that adding to the end of a directory is most efficient but when the current space (presumably allocated rather than used) is full, a larger number of contiguous must be found and the original .DIR copied into it.
<<<
I wouldn't say "most", you should be avare of ...
Sorry ITRC's "Retain format(spacing)." is not what I think it should be, you have to look at the attached text file.