1827785 Members
3259 Online
109969 Solutions
New Discussion

Re: reg:cluster size

 
p.balamurugan
Advisor

reg:cluster size

In 18 GB my data size was 13.62GB/13.67GB and same data was taken backup to 300 GB disk. When I verified the disk size in 300GB disk it was 13.62GB/15.03GB. As per my observation allocated space has took so much space because of cluster size. Cluster size is differing for both 18 GB and 300 GB disk based on some formula in ods-2. So if we reduce the cluster size for 300 GB from default to some other value, any performance and disadvantage will occur?
3 REPLIES 3
Karl Rohwedder
Honored Contributor

Re: reg:cluster size

Does those 2 GB really hurt you an 300GB disk?
Depending on VMS version there are limitations regarding BITMAP.SYS and clustersizes on big disks. Newer versions allow every clustersize.

It makes sense to adopt the clustersize to the usage pattern of the disk. Do you store many small files or do you have some very big database files on the other hand. Newer storage systems show better performance with certain cluster sizes (16 or 32).

regards Kalle
Hein van den Heuvel
Honored Contributor

Re: reg:cluster size

The default is 1 million (1024*1024) clusters per drive. The increased usage is due to the roundup to entire clusters.
The maximum cost is relatively predictable: number of files times clustersize.
The actual cost is hard to predict as it depends on average used file size.
There is no direct performance advantage over accepting the larger (odd) clustersize, other then reduction in potential fragmentation, but that is indirect.
There is a potential, small, advantage in picking your own clustersize (smaller or larger), as you can take usage patterns into account. Lots of little files? Most file 50 blocks (just picking a random number)? and so on. And you can pick a 'nice' multiple of 16 which might just help the storage controller a little
Hth,
Hein.


Jon Pinkley
Honored Contributor

Re: reg:cluster size

p.balamurugan,

As Karl Rohwedder pointed out, different versions of VMS have different ODS restrictions.

Can you please provide the following info so we can help explain the difference in the ratio of used/allocated space?

1. Version of VMS (if recent version, "$ show system/noprocess" should provide version and architecture.

2. Number of files making up the 13.62 GB used. Did you get these values from output of "$ Directory/size=all/grand dev:[*...]" or some other method?

3. Cluster size on the 18GB drive and the 300 GB drive.

4. Command used to backup 18GB drive to 300 GB drive.

RE: Karl's question about whether 2 GB is significant. In reality, the difference will be closer to 10% of your 300 GB drive, as your used/allocated ratio on the 18GB drive was .996 and on the 300GB drive it was .906. If all new file creations follow the same pattern, we would expect the same ratios to continue. However, if you used backup/image and did not use /truncate, the output files are created based on the allocated size of the original files. If the new cluster size is not an integral multiple of the original cluster size, the files on the target volume may have more space allocated than is necessary. It /truncate is used, the output allocation is based on the used blocks instead of the allocated blocks.

Example: original cluster size 8, new cluster size 18, original file 17/24

Results with backup/notruncate (default value) new file 17/36 (24 blocks requires 2 18 block clusters)
Results with backup/truncate new file 17/18 (17 blocks requires 1 18 block cluster)

Note if you do use /truncate, any files with pre-allocated storage will be truncated. I do not know how to tell backup to truncate only if the used space is currently in the highest allocated cluster. If the new cluster size is an integral multiple of the old cluster size, the /truncate is not needed. I now always use a power of 2 for cluster size, so this condition is always met.

RE: your question about performance. I am not convinced yet that cluster size affects performance as much as is currently in vogue to suggest. For special cases, there are performance benefits for "aligned" transfers. What I am not convinced of is that the cluster factor affects the data access request patterns in a way that will cause a large number of the requests to be "aligned". VMS does not do I/O a disk cluster at a time. I plan to start a new thread on this specific issue.

Cluster size does affect how fragmented a file can become, so that is one reason to consider it. However, there are other means to "avoid fragmentation", for example by setting a larger allocation extension quantity.

If you have a lot of small files, there are advantages to using a small cluster size, and if you are expecting a disk that is 10 times as large to hold 10 times as much data, you will need to use the same cluster size.

Note: In some cases, a larger cluster size may actually improve your used/allocated ration, but that would be a special case. Example, if 90% of your files had 9 blocks used, a cluster size of 9 would be better from a space efficiency standpoint than an 8 block cluster.

One last thing to consider when using larger disks. If your 18 GB disk was busy, expect your 300 GB drive to be even busier. You have increased the capacity of the drive by over 15 times the 18 GB drive. If you fill it with data that is accessed in the same manner, expect performance to decrease. This is assuming your 300 GB drive is a single spindle, etc.

Good Luck,

Jon
it depends