- Integrated Systems
- About Us
- Integrated Systems
- About Us
10-28-2005 08:04 AM
Some major changes in usage have occured since and now some of the applications are creating many small files (~15,000/day - email messages, temporary command procedures to create emails, logfiles resulting from temporary command procedures creating emails - all not more than 1 - 10 blocks each). These are chewing up way too much space!
I need to address this space consumption and I'm looking for recommendations on performance vs clustersize vs chunksize. I like raidsets and want to stay with them ... I can't afford enough disks for mirrored stripesets ... ~100GB is a reasonable size (I think) for our volumes and typically more reading than writing so Raid 3/5 is ok.
Not to say that the whole disk is small files, there are significant number of larger files (1M - 3M blocks) as well.
I guess I'm looking for a good balance.
Solved! Go to Solution.
10-28-2005 03:41 PMSolution
You want redundancy, and you want IO balancing.
With many small files you indeed want a smaller cluster size to avoid excessive waste.
Which VMS version? older VMS versions only allowed for 1 million clusters/disk, forcing a large (205 in your case) cluster size. Recent VMS versions allow for more bits and this smaller clusters. If you are stuck with the 1M clusters, then you can solve the problem through PARTITIONS declared on the HSG. Just carve a smaller slice of the large raidset to hold many small files with a small cluster size.
I would pick a smaller clustersize like 16 or 32 blocks if you can make that work for the disk(s)/partitions with the small files.
For the disks with the large files, I would merrily choose a large clustersize like 256, or 512 or so. Remember, clusters are only used during allocation. They have no influence on the IO size.
As for the chunk size? Don't worry too much.
For the small files you want large chunks to avoid breaking a single IO into two physical ones. But with tose small files in the less than 10 blocks, and a chunk size above 200 blocks the odds for uncalled for splits are minimal already.
Two approaches here
1) make the chunk size a whole multiple of the cluster size. This minimize the odds of a sngle IO needing two chunks.
2) make the chunksize indivisible, a prime, in order to spread the odds and penalize evenly all throughout, and avoid some files being penalized all the time.
Personally my favourite is #1. No file gets penalized :-).
For the large files some files hope that a single large IO from the application can get served by multiple disks at the same time. Well, typical OpenVMS applications rarely do large IOs ( > 128 blocks ) 16 block, and 32 block IOs are the norm. There is no point splitting those, as the transfer time is minimal compared to seek & rotational delay. If your application happend to do large IOs, and is largely single stream (sorts!) then it is worth your while to experiment with small chunk sizes, notably for (near) direct attach like SWXCR or evem MSA, but less so for your HSG.
Just leave that chunk size at 256 blocks.
Hope this helps some,
10-28-2005 05:18 PM
Re: HSG80 Raidset - 4 x 36GB - recommendations
Use it to break up a big 'disc' into smaller, manageable pieces where you place the small / temporary files onto some LDs with appropriate cluster factor.
I've found it to be a useful technique. Helps to control disc fragmentation too.
You can even go as far as having a LD device for entirely temporary files - and just blow it away (INIT) when you need to delete all the temporaray files - assuming that you can have a brief pause to dismount the LD device and re-mount it.
Works quite nicely with a HSG80 based array, especially a big one full of striped and mirrored discs. You get the performance of a heavily striped and mirrored unit, plus the ability to control the file placement based on the category of file usage. Also handy for backups and restores to break the whole thing into managable pieces.