Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Dealing with a badly fragmented disk

 
SOLVED
Go to solution
Hein van den Heuvel
Honored Contributor

Re: Dealing with a badly fragmented disk

I would suggest that more than 10 headers (or 20 as your fragments>1000 implies) is painfull performance wise IF the file is accessed at all. But I have seen OpenVMS more or less happily dealy with actively used files with thousands of headers. (Andy G was impressed/surprised by his own work!)

Even if a heavily fragmented file is not accessed, all those fragments are likely to cause other actively used file to be more fragmented.
So I would manually take your worst files, roll them out to an other disk /cont. Roll them back only if/when you have to.

John>> If I follow the VMS Help suggestion for ODS-2 disks, then then result is 301 for the size of the volume and I can't see how that relates to a multiple of 16....

They do not relate.

The 301 comes from the allocation bit map being the traditional 1 million bits = 256 blocks.

The multiple of 16 is driven mostly by storage characteristics, and a little by the way RMS and the XQP work.

The cluster size should be as big as possible within the acceptable 'waste' of space constraints.
You indicate 16,000 files.
Let's round up generously to 50,000 files.
Each of those might waste up to a cluster minus 1 blocks. What is you max waste?
5% of the volume? Then your max clustersize would be 314M/(20*50000) = 314 blocks.
So I would pick 256 or 512, probably 512.

That way you can count on 5x less fragment due to casual, careless allocation or free space fragmentation.

John>> I have attached the output for RMS_DEFAULT which I believe are the default values -

Correct. Free performance to be found.
Change the default extent to 2000 (or 5000)
SET RMS/SYS/SEQ/BUF=4/BLO=64
SET RMS/SYS/IND/BUF=20
SET RMS/SYS/EXTEN=2000

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
Robert Gezelter
Honored Contributor

Re: Dealing with a badly fragmented disk

John,

I agree with Jur, and have demonstrated the phenomenon to many clients over the years.

The more important questions here are whether this is a cumulative situation that has been gradually been getting worse, or a situation that is being created on a daily basis.

Running a defragmenter is a cure for the side effect, not a cure for the underlying condition.

- Bob Gezelter, http://www.rlgsc.com
Jon Pinkley
Honored Contributor

Re: Dealing with a badly fragmented disk

RE: "The device was created with 314572800 blocks and a /CLUSTER value of 100. There are a total of 486 directories, 15881 files and 237167981 blocks currently in use. 220 files (225126315 blocks) are recreated on a daily basis via BACKUP commands from other disks to DSA11"

From your description it sounds like you are using a scratch disk for files that have long lifetimes and for a temporary holding place for a disk-to-disk-to-tape operation.

95% of the used disk space is being recreated every day. (225126315/237167981) and that's in 220 large files (average size just over 1 million blocks each). The remaining 5% used is in "small" files with an average size of 759 blocks (237167981-225126315)/15881

There are things that can minimize the fragmentation. Increasing the extent size as suggested by others will help, but you probably don't want it to be the same for the backup files and the small files.

One thing that leads to fragmentation of free space is having multiple things writing to the disks and extending the files, when it is time to extend a file; the blocks adjacent to the end of the file have been used by another file. To eliminate that, you can dedicate a device to each writer. To reduce it without dedicated devices the best option is to have whatever creates the file ask for big pieces.

At a minimum, I would segregate the D2D2T files from the rest, and set the extend size on the volume to something like the largest multiple of your cluster size < 65536. For example, if the cluster size was 16, use 65520.

I would recommend using LDDRIVER to split the DSA disk into "partitions" that are used for different types of files. Alternatively if you have something like a EVA, just create another device of the appropriate size for the small files.

This is what I would recommend, assuming you can't create SAN devices an any size you want.

Stop activity to DSA11 (dismount and mount privately)

Backup the 220 D2D2T (big) files. After they are backed up, delete them.

Make an image backup of the disk (remaining smaller longer lived files) to a save set.

Dismount DSA11. Initialize at least one member. Set your cluster to some appropriate value. For example:

$ init $1$DGA11: D2D2T /cluster=16 /extend=65520 /index=begin /head=500 /limit /system /own=[1,1] ! You could certainly consider a larger cluster like 256 or even 512, since you won't have many files on the disk.

$ mount /cluster DSA200: /shadow=($1$DGA11:) ! if this is only for disk backups, consider /nocache

Using LDDRIVER (I recommend V9, but at least 8.3, especially if you have any volume sets) create a container files for the small files. Size as needed (you report the current size used as 12041666), I will use 25000000 in this example.

$ create /directory dsa200:[000dsk]
$ ld create dsa200:[000dsk]old_dsa11.dsk /contig/size=25000000/nobackup ! name so you will know what it is. Backups will be done of the ld device, but see more about this later.
$ mc sysman set env/cluster
SYSMAN> do ld connect dsa200:[000dsk]old_dsa11.dsk lda11 /share
SYSMAN> exit
$ init lda11: label_for_small_files /cluster=16/extend=256/header=20000/index=begin/limit
$ mount/cluster lda11: labelâ ¦

Change your startup/shutdown procedure to mount/dismount lda11

Change the location that he backups are being written to, and set the label of the lda11 device to what the old DSA11 label was.

No need to do incremental backup of the D2D2T, since you just want to back everything on it up.

For the LDA11 device, you will need to back it up. You can back that up just like you would any other disk device. However, if you can dismount the lda11 disk while you back it up, you can use/ignore=nobackup on your backup of the DSA200 device and it will do the equivalent of a backup/physical of the LDA11 device, since it is making a copy of the disk file that is acting like a disk. It has the same advantages/disadvantages of physical backup; it is fast, but to restore you must restore the whole thing. And if you can't dismount the LDA11 device, what you will get is much less likely to work than a backup/ignore=interlock. It is equivalent to a physical backup of a disk mounted for shared write access.

After you have done this reorg, if there is a time after you have made your tape backup of the save sets on the DSA200 disk, and before you start to recreate new save sets, you can delete all the 220 big files, and you will essentially start with a clean slate from the defragmentation point of view. You shouldn't need to do this too frequently as long as you have a large extent size on the volume. Perhaps once a month if you want to keep the free space from getting to fragmented.

Jon
it depends
Jon Pinkley
Honored Contributor

Re: Dealing with a badly fragmented disk

Oh, I see I never said anything about restoring all the files that were saved in the image backup. Once you have created the small disk (either a new EVA vdisk or the LDA device, and initialized it, you will want to do an image restore, specifying /noinit

$ mou/for lda11:
$ mou/for tape:/nowrite ! if the saveset was written to tape
$ backup/image/noinit /save lda11:/truncate ! the truncate is important
$ dism lda11:
$ mou/ov=id lda11:
$ set volume/limit lda11: ! unless using backup from VMS 8.3
$ analyze disk/repair/record lda11:
$ dism lda11:
$ mou/cluster/noassis lda11:
it depends
Highlighted
John A. Beard
Regular Advisor

Re: Dealing with a badly fragmented disk

My thanks to all who have replied. Due to other commitments, I will not be in a position to look at all the responses for a few days yet. I will follow up on all your suggestions and assign points later... thanks again.
Glacann fear críonna comhairle.
John A. Beard
Regular Advisor

Re: Dealing with a badly fragmented disk

I had just attempted to submit a lengthy reply, when I lost connection to the forum. I'll get the energy to retype it later, but in the mean time this piece of information from the Application team just came my way. It might help you to understand what type of data we are working with here.

ADSM = Tivoli network storage environment

the DSA11 (Oracle$disk) contains the following

1) Oracle Base code - 380 directories, 11544 files, 5446551 blocks. This code is used to create and operate the Oracle databases. Executables in the code tree are accessed by the Database engine and user apps. It also contains logs and alert files for all the databases on the system.

2) Oracle Archived Log files. These are the database transactional log needed
to recover the database. We store them on this disk as they are created.
Hourly they are copied to ADSM and deleted.
Normally there are a few files actually, unless we have high activity in the
database during nightly batch processing and the ADSM is unreachable.
In which case we can have a few giga-bytes of data.

3) Oracle Hot backups. These are the daily full copies of the databases from the oradata1 and oradata2 volumes. They are copied here in a "backup mode" so you can copy them off to ADSM.
This is the bulk of the data on the disk (100GB).

4) Oracle Exports - these are created nightly and are used to recover individual tables for applications, or to port data to the test systems.

5) user directories for some Hilltown associates. These are all very small in
size and should not contains any application operational data.

In relation to making image backups of this disk or noving "static" files to another home, we only have a very limited window in which we can perform such activities, ie twice a year for approximately 4-6 hours max.

I had also copied (contig) one of the badly fragmented files (2.5gig)to another disk, deleted the original and then tried to copy/contig back again. I would not allow a contigous operation, so I had to revert back to a regulat copy.

As for Extent sizing, I was wondering if setting the value on the volume itself as opposed to RMS might be a better short term answer. I have to take into account that there are 14 other volumes that go to make up this node.
Glacann fear críonna comhairle.
Jon Pinkley
Honored Contributor

Re: Dealing with a badly fragmented disk

RE:"I had just attempted to submit a lengthy reply, when I lost connection to the forum."

I had a similar experience when posting my first response to this thread. I intended to write a short note, but it evolved into a lengthy reply, and unfortunately I was using the web entry form from a windows PC. But my fingers are trained from years of using the VMS line editing commands, and I wanted to insert something, type ^A followed quickly by another character... no recovery possible. Control-A is windows "select everything" short cut key and typing something after that replaces the highlighted text with the new text, which definitely wasn't my intention.

I highly recommend using something outside of the web form for entry, even if it is MS Word. However, that has its own problems, specifically its use of characters that the forum software doesn't display correctly. To solve that I use metapad see thread http://forums12.itrc.hp.com/service/forums/questionanswer.do?threadId=1155331 my replay dated Aug 24, 2007 03:56:23 GMT and Aug 24, 2007 07:27:37 GMT

Back to your problem:

Straight to your last statement: "As for Extent sizing, I was wondering if setting the value on the volume itself as opposed to RMS might be a better short term answer."

I just did some testing with process RMS default and volume extent ($ set volume/extend or $ init /ext) settings (I didn't change the system default). In this case it appears to me that the MAX(process extend size,volume extend size) is what is used. The only way to override that value with something smaller is to explicitly specify the extend size to RMS (I tested this in FORTRAN using the EXTENDSIZE qualifier in open). The point is I don't think you want to make the volume extend size something really large if you have small files being created on the volume, especially short-lived files.

If specific usernames create the large files, you could have the login.com for those usernames set their RMS default /extend_quantity to 65535. Then if those applications aren't specifically asking for a smaller extent, then when the files are extended, they will be extended in large increments. That does not imply that the extensions will be contiguous.

In your original post, you included the following. Some interesting info can be extracted from this. If we take the allocated space divided by the number of extents, we get the average extent size for the file.

DSA11:[ORACLE_DATABASE.BACKUPS.PROD1C]CRD_INDEX_PROD.DBS_BACKUP_12;1 7168000/7168000 26/1681 Ave ext 7168000/1681 = 4264

DSA11:[ORACLE_DATABASE.EXPORTS]PROD1_SYSTEM.EXP;119 19227768/19227800 1125/76557 Ave ext 19227800/76557 = 251

DSA11:[ORACLE_DATABASE.BACKUPS.PROD1]TDSA_EXPORT.DMP;225 4134136/4134200 509/34540 Ave ext 4134200/34540 = 120

DSA11:[ORACLE_DATABASE.EXPORTS]AM0589P_SYSTEM.EXP;225 4132544/4132600 504/34198 Ave ext 4132600/34198 = 121

DSA11:[ORACLE_DATABASE.BACKUPS.PROD1]TDSA_ITEM1_DATA1.DBS_BACKUP_35;1 8282112/8282200 63/4233 Ave ext 8282200/4233= 1957

Notice that the DBS_BACKUP files have larger average extents than the other files do. This implies to me several possibilities.

The application creating the files is asking for larger extensions than the default. Another possibility is that there isn't anything else extending files on the volume when these files were created, so the blocks adjacent to the previous extension is available the next time a request is made, and the extent is combined with the previous one.

My guess is that what creates the .DBS_BACKUP files is explicitly asking for larger extents. If that is the case, then your average extent size on the disk is probably somewhere between 2000 and 4000 blocks (20 to 40 clusters of 100 blocks each).

And I would guess that what creates the EXP and DMP file is using the system/volume defaults which are probably less than the cluster size of the disk, so they are getting small chunks; in two of the five cases listed the average extent is 120 blocks, meaning that at least 4/5 of your extents have only a single cluster (if none of the extents were more than 2 clusters, then 4/5 extents would be 1 cluster, and 1/5 would be 2 clusters. If some extents have more than 2 clusters, then more than 4/5 have to have only 1 cluster to average to 1.2 clusters/extent.) In other words, the TDSA_EXPORT.DMP file is nearly perfectly fragmented.

Can you do the following and share the output?

$ define DFU$NOSMG T
$ dfu report dsa11:

The section about free space has what the largest free piece is, and the average size of the free pieces.

You can install DFO and use the reporting feature even if there is no PAK. You won't be able to defrag, but it is still worth installing (in my opinion). It can give you more information about free space fragmentation.

If I were in your position I would set the volume extent to something bigger than the default. The question is how big. The problem is that it is a volume wide setting, and any file that gets extended will grab large chunks for the file. So if someone starts an editor on a small file, the file and the journal file will be extended in large chunks. Normally these files will be truncated when they are closed. Be aware that doing that on volumes with high water marking can cause delays when files are extended, on non-sequential files. The point is that you will probably want to use something much smaller than 65535 on a disk that has many active small files. Also be aware that a large extent doesn't ensure contiguous allocation even if contiguous space is available. So if a disk is already fragmented, it won't help as much; it will just grab a bunch of extents in one go. It if turns out that only one cluster was needed and it has allocated 100 extents to get the 65000 blocks specified by the extent value, you have just created a multi-header file for no reason.

The worst case is an application that repeatedly opens file for append, appends a record and then closes it. Since your cluster size is 100, that won't hurt much. But consider what happens when the disk cluster is 1, the extend size is 65000, and each record is 1 block in size. Every time the file is opened for append, and one record is written, the file gets extended by 65000 blocks, 1 block is used, and then the file is truncated when it is closed (that is the default in many languages). In that case the default volume extend size of 5 blocks seems pretty reasonable. If you have an application that does something similar, it is best to explicitly ask for a 1 block extension it you know you are just going to write a single record and then close. Better is to ask for a larger extent, and not request truncate on close, as this is the only way to avoid getting a badly fragmented file if any other things use that disk.

You could start with something like a volume extend size of 400 or 800, perhaps even 1600, depending on the frequency of small file creations. If you have small temporary files being created, and them deleted, I would be more inclined toward the smaller end. I still think you have much more control if you know what is creating the large files, and can set the process defaults for those processes to something large.

On the other hand, if you your disk has all free space in a single piece, the disk will tend to fragment less quickly if the extent size is large, and the average size of free space extents is large, it won't take a lot of extents to satisfy even a request for a large extent size.

RE: "As regards what happens when a new file is being created on this disk, ie a 4GIG database copy, how exactly does the physical placement of this file get determined. Not using something like COPY /CONTIG, does RMS simply place this new file all over the shop, even if it means creating 1000s of fragments."

Strictly speaking, it isn't RMS that determines the placement, it is the XQP, of which RMS is a consumer. As far as I know, there is no "look aside list" categorizing the extents into size ranges, I think there is just the extent cache, and then extents are pulled from free space the based on what not marked as in use in BITMAP.SYS, continuing from its last position, with no respect for the size, unless the request was for contiguous or contiguous best try.

Which brings us to your failure to be able to create the 2.5 GB file contiguously. You don't have a free extent that is 2.5 GB in size. And unfortunately there is no copy/cbt. My guess is that the file that was copied back was still less fragmented than the original, as the size is known at the time the file is created, the whole amount was probably grabbed at once. Do you know how fragmented it was before (do you have output from DFU

Your original question was how to reduce the amount of time to do incremental backups on the DSA11 device. Unless you are doing something to avoid the hot backup file from being backed up, they are going to go to your incremental backups. If that's what you expect and want those files included in your incremental backups before they go to the ADSM system, then the following won't help. If you don't want to include those 100 GB in your incremental backups, you can reduce your incremental backup time by avoiding them being backed up. One way is to have a job that runs immediately before your incremental backup, and have it set the backup date to 1 second after the modification timestamp on the big files. Then your

$ backup DSA11:[000000...]*.*;*/modified/since=backup/fast tape:saveset ...

will avoid backing up those files. However, if nothing is backing up the ADSM, then you may need to be backing those files up.

If your evaluation version of PerfectDisk hasn't expired, I would use it on the volume to try to consolidate the free space. But I would try to do it immediately after copying the Oracle Hot backups to ADSM and deleting them. It would help if you could prevent new hot backup files from being created while you are defragging. If you were lucky enough to get enough free space that you could create a really large container file for the ORACLE backup files, and use LDDRIVER to present that as a different disk. But I think creating a small LD container for the small files is a "better" way. Those are probably "hotter" files, so depending on your SAN storage controller, there may be a performance advantage to these being "close" to each other. If you have an EVA, it really doesn't make much difference, and I would just use another vdisk instead of using LDDRIVER to partition your storage.

Good luck,

Jon
it depends
John A. Beard
Regular Advisor

Re: Dealing with a badly fragmented disk

My apologies to all over the lack of an update. I was away for the last week, and haven't had the chance to respond to your many fine suggestions. With the server and disk in question being accessed 7x24, I will not have an opportunity to perform an image back, re-initialization and restore of the volume until March 15th.

I just wanted to say thank you to all the many responses to my original question. They have proved to be very beneficial in giving me a better understanding as to what should be in place.
Glacann fear críonna comhairle.
Jan van den Ende
Honored Contributor

Re: Dealing with a badly fragmented disk

John,

from your Forum Profile:


I have assigned points to 26 of 52 responses to my questions.


Maybe you can find some time to do some assigning?

http://forums1.itrc.hp.com/service/forums/helptips.do?#33

Mind, I do NOT say you necessarily need to give lots of points. It is fully up to _YOU_ to decide how many. If you consider an answer is not deserving any points, you can also assign 0 ( = zero ) points, and then that answer will no longer be counted as unassigned.
Consider, that every poster took at least the trouble of posting for you!

To easily find your streams with unassigned points, click your own name somewhere.
This will bring up your profile.
Near the bottom of that page, under the caption "My Question(s)" you will find "questions or topics with unassigned points " Clicking that will give all, and only, your questions that still have unassigned postings.
If you have closed some of those streams, you must "Reopen" them to "Submit points". (After which you can "Close" again)

Do not forget to explicitly activate "Submit points", or your effort gets lost again!!

Thanks on behalf of your Forum colleagues.

PS. - nothing personal in this. I try to post it to everyone with this kind of assignment ratio in this forum. If you have received a posting like this before - please do not take offence - none is intended!

PPS. - Zero points for this.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Robert Gezelter
Honored Contributor

Re: Dealing with a badly fragmented disk

John,

Also consider that a full image backup/restore of the device, and the resulting interruption of operations may not be neccessary.

I do not have enough information, but working with the worst offenders (and the procedures that create them) may be sufficient to resolve the issue without a service interruption.

- Bob Gezelter, http://www.rlgsc.com