Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Dealing with a badly fragmented disk

 
SOLVED
Go to solution
John A. Beard
Regular Advisor

Dealing with a badly fragmented disk

Hi,

We are experiencing poor performance issues associated with a particular SAN disk (DSA11) that appears to be extremely badly fragmented. This disk is part of a OpenVMS environment. The biggest concern relates to the amount of time it is taking to perform an incremental backup of this disk.

The device was created with 314572800 blocks and a /CLUSTER value of 100. There are a total of 486 directories, 15881 files and 237167981 blocks currently in use. 220 files (225126315 blocks) are recreated on a daily basis via BACKUP commands from other disks to DSA11

We do not have a defragmentation tool running on this server, so everything is left to RMS to sort out. We were getting reports of significant window turns and split i/o's, so we set acp_window to 255. This has not changed the status of the window turns or split i/o's on our daily reports.

I have a couple basic questions -

[1] Based on the information below (a cross section using DFU fragments > 1000), just how badly in your oppinion do these figures reflect the unhealthy state of the disk.

DSA11:[ORACLE_DATABASE.BACKUPS.PROD1C]CRD_INDEX_PROD.DBS_BACKUP_12;1
7168000/7168000 26/1681

DSA11:[ORACLE_DATABASE.EXPORTS]PROD1_SYSTEM.EXP;119 19227768/19227800 1125/76557

DSA11:[ORACLE_DATABASE.BACKUPS.PROD1]TDSA_EXPORT.DMP;225
4134136/4134200 509/34540

DSA11:[ORACLE_DATABASE.EXPORTS]AM0589P_SYSTEM.EXP;225 4132544/4132600 504/34198

DSA11:[ORACLE_DATABASE.BACKUPS.PROD1]TDSA_ITEM1_DATA1.DBS_BACKUP_35;1
8282112/8282200 63/4233

[2]

As regards what happens when a new file is being created on this disk, ie a 4GIG database copy, how exactly does the physical placement of this file get determined. Not using something like COPY /CONTIG, does RMS simply place this new file all over the shop, even if it means creating 1000s of fragments.

[3] We are contemplating purchasing DFO, but in the interim, does DFU offer the same base functionality for defragging as it's big brother. I have tried a single passes using DFU, but it doesn't really seem to have helped. We noticed that a number of files could not be move beacuse of lack of space, but do you think DFO might a least get us over the biggest part of the problem.

Glacann fear críonna comhairle.
26 REPLIES 26
Robert Gezelter
Honored Contributor

Re: Dealing with a badly fragmented disk

John,

While these files may be the most fragmented files on the disk, they may not be the ones causing all of the window turns and split I/Os (note that all of these files are exports and dumps).

Dumps and exports frequently grow as they are created, so it is not uncommon for them to expand in many steps. A first question that I have is: What are your RMS parameters for the jobs that create these files (SHOW RMS) and what programs are being used to create these files.

Additionally, I would suggest tracking down exactly what is happening BEFORE setting up DFO for what may turn out to be a different problem altogether (e.g., if files which are almost never read are fragmented, it is not that much of an issue).

- Bob Gezelter, http://www.rlgsc.com
Hoff
Honored Contributor
Solution

Re: Dealing with a badly fragmented disk

0: There are storage controllers which have performance issues around unaligned transfers. This is why a senior storage engineer within OpenVMS engineering has recommended a multiple of 16 for the disk cluster factor.

1: dump file and listing file and archive fragmentation is not centrally relevant; fragmentation matters with performance critical files, and rather less so with infrequently-accessed and non-critical files.

2: that depends greatly on what creates it, and how it is extended. Databases can play all manner of games with their files and file placement.

A new file created through typical RMS means -- and databases can use completely different interfaces -- would follow the application and process and disk and system defaults, and would create and extend the file accordingly. If the creation or particularly the extension is in tiny hunks (and a disk this big should be set for big extents), then the file will be fragmented. Incremental extension tends to be worst case, as other activity can grab fragments.

3: I don't know that DFU and DFO use different schemes; I'd expect both use the move file primitive within OpenVMS.

4: As Bob G writes in his reply, you definitely do need identify the source of the performance troubles.

5: I'd here tend to look at the volume extent size, and at the settings of the processes that create the files. And at the volume contention. And I'd fix the cluster factor at your next opportunity.

6: Disk fragmentation can be your friend, though too much of a good thing can certainly lead to performance problems.

7: I'd not normally expect to see accesses within archival files, save for sequential writes and extensions. And extensions. If you know the size of the file to be created, pre-size and pre-allocate it. If you don't, then pick a reasonable guess at the size and pick an extent of 500 to 1000 or such; find the knee in the performance curve. Bigger extents may or may not provide a payback.

8: Consider splitting up the disks and disk structures differently, if this volume is being targeted by multiple nodes, and targeted for heavy I/O in parallel. Consider dedicating a volume per node, for instance. This to avoid lock contention.

DFU or DFO might well clean up the existing on-disk allocation, but if the creation and extension settings are stuffed, then the fragmentation will return or will continue.

(And why the snark is ITRC complaining about embedded tags and XSS in this posting when there are no embedded tags in this posting?)

Stephen Hoffman
HoffmanLabs LLC
Steven Schweda
Honored Contributor

Re: Dealing with a badly fragmented disk

> (And why the snark is ITRC complaining
> about embedded tags and XSS in this posting
> when there are no embedded tags in this
> posting?)

When this happened to me most recently, a
re-Submit worked. The ITRC forum software
appears to be approximately garbage.
Sometimes "connection refused" is more of a
blessing than a curse.
John A. Beard
Regular Advisor

Re: Dealing with a badly fragmented disk

I am not up on what you are referencing by snark, etc, but I don't believe it was anything intentional on my part. No sooner that I post this request, I was not able to get back in to view your replies...please don't shoot the messenger, as the responses are sometimes confusing enough.

Before I delve into your suggestions, I was just curious about the CLUSTER size issue. If I follow the VMS Help suggestion for ODS-2 disks, then then result is 301 for the size of the volume and I can't see how that relates to a multiple of 16....
Glacann fear críonna comhairle.
John A. Beard
Regular Advisor

Re: Dealing with a badly fragmented disk

Hi Bob,

This whole tread may have to be put on hold for a week. The Application expert is away until then, and I will need to get him to answer some of your questions. I need to find out what from him what exactly is present on this disk, what is being accessed on a continous basis, and what program(s) are being used to create all these files originating from other disks.

I have attached the output for RMS_DEFAULT which I believe are the default values -

I may well be repeating myself here, and I fully accept that there may well be other problems lurking in the background, but the focus of our efforts have up to now being trying to figure out why the total elapsed time for backing up this particular disk is now takimg much much longer than before. We are backing up all this data to a TSM server, using ABC on the client side. We have now got to the point where it is taking more than 24 hours to complete a successful backup. I don't believe this is due to additional amounts of data being backed up, as it appears fairly consistant from one week to the next. Once again, everything seems to point to this particular disk as being the bottlekneck. Things get even wors if the following days backup kicks in before the previous one has completed. I have already wrote something to put things on hold if that situation should occur.
Glacann fear críonna comhairle.
Robert Gezelter
Honored Contributor

Re: Dealing with a badly fragmented disk

John,

Having debugged many of these problems over the years, certainly generalities repeat on a regular basis. Often, when this type of problem is looked at in its totally, all (or many) of the symptoms are interconnected.

The RMS parameters appear to be the defaults. In particular, there is a distinct possibility that the files are being extended in small extensions on an ongoing basis. This can cause extremely long delays in elapsed time when processing these tasks.

- Bob Gezelter, http://www.rlgsc.com

Jur van der Burg
Respected Contributor

Re: Dealing with a badly fragmented disk

>DSA11:[ORACLE_DATABASE.EXPORTS]PROD1_SYSTEM.EXP;119 19227768/19227800 1125/76557

So this means 1125 file headers and 76557 fragments which is indeed a VERY bad fragmentation. In general this means that the default extension of the file is way too low. I've seen this when the default of 3 blocks is used, and the file is growing 3 blocks at a time. Combine that with multiple files that get extended and there's your problem. The proper way is to preallocate the expected storage for the files, and be generous with the extend size.

I've seen this in the past when a customer was complaining about performance, and they were working with defraggers to get around it. It was much simpler, just create a file the proper way and give it it big extend size. Add some global buffers as well (this was just plain rms) and the number of disk i/o's in that case went back from 70 to 4 per second. And that was about 20 years ago when the hardware was way slower than today.

Jur.
Willem Grooters
Honored Contributor

Re: Dealing with a badly fragmented disk

[1] I wonder what the extent size is of the files. If this is relatively small, you may well run into large numbers of extents. If you know the size of the files, and have some idea of the growth rate, it would be an idea to create the files with enough space in one go. Increase the extentsize anyway.
As stated by Hoff, defining an appropiate clustersize is not just "pick any number". If you used 100 on this disk, 96 or 112 would have been better...
[2] AFAIK, backup will try to re-use clusters that have been marked 'free'and if these are scattered over the disk, you may find a lot of fragments. Within time, this gets worse. Regular image backup to an emoty disk will no doubt help. But that should fit in your scheme and probably is a problem to fit.

Do you use Index-sequential files in your application, and are these highly frequently updated (New records added, records deleted, records uopdated)? If so, takea look to internal fragmentation of these files. These files need to be converted on a regular bases, and the sizing of key- and data areas needs to be monitored and adjusted one in a while. I've seen significant incease of performance in applications after convert alone, and even more after resizing the area's.

[3] if DFU has a problem DFO might as well. It may well be that there is enough space alltogether, but too little contiguous space. The only solution in that case is creating an image backup of this disk to a re-initialised disk with a clustersize that is a multiple of 16 (112 would be my choins in this case)
Willem Grooters
OpenVMS Developer & System Manager
Peter Barkas
Regular Advisor

Re: Dealing with a badly fragmented disk

DFO is going to do a better job than DFU at least because, as far as I am aware, DFO consolidates free space and DFU does not. This can be a critical issue on badly fragmented disks.