Distributing I/O on EVA virtual disks

Dima Bessonov · ‎10-21-2004

In the times when disks were just disks, we system managers were always instructed: "Split I/O evenly over your disks. Move page and swap file off the system disk. Use separate dedicated disks for I/O-intensive apps like Pathworks and DBMS servers. For especially I/O-heavy apps (database server, anyone?), you may even need to split their I/O over several disks." Now that we've got virtual storage on EVAs, is there still a sense to follow these instructions? A volume on a virtual storage controller is spread over the entire disk group. However you cut this disk group into pieces, every such piece is still evenly spread over all physical disks of its disk group. Ergo: why bother creating many disks where a single huge volume will do?

Of course I'm intentionally somewhat simplifying the situation. It still makes obvious sense to have several disks where a system manager wishes to be able to mount/dismount/back up volumes separately or to avoid contention for disk space. But, performance-wise, are there still any reasons to create separate volumes for the secondary system disk or the apps like Pathworks, Oracle etc? Especially with ODS-5 where minimum cluster size is no longer an issue?

Hein van den Heuvel · ‎10-22-2004

From a strict hardware IO perspective there is really no need to worry about one disk getting more basic read or write IOs than an other.
Oracle suggests to use the S.A.M.E methodology: Stripe and mirror everything.

There are software reasons to divide into multiple (virtual) disks:
- backup/restore... like you already indicated.
- monitoring: to understand how much of the io load is caused by given file/application (but now you almost want to go against the old advice, and just put all files from an application on a 'single' disk.
- file allocation: You don't want a single serialization lock to deal will all your files for all you application.
- access patterns stil count, as a valid reason to split, IF the virtual disks are carved from seperate disk-groups and/or controllers.
- IO depth. Not sure here, but it would better to have 5 piles of 50 outstanding IOs versus 1 pile of 250. It still goes through the same driver/hba though, so that would not matter too much.
- damage control. (operator accidents, bugs?!)
- thinking aid for clarity during management (but logicals can readily give that already).

fwiw,
Hein.

John Gillings · ‎10-22-2004

Dmitry,

One of the reasons for storage virtualisation is to eliminate the need to try to manage distribution of I/O. The EVA should redistribute data in response to I/O load in order to eliminate hot spots.

The concept of "single volume" has been imposed on us by the physical media, so don't feel it's necessary to continue to conform to its limitations.

The EVA is the most advanced storage virtualising system at the moment, but in some ways it's just the beginning. Concentrate on what's easy for you and your application to do what it wants and let the clever hardware and software work out how to distribute the load.

A crucible of informative mistakes

Sheldon Smith · ‎10-25-2004

While the virtual disk is spread across many spindles, you now have all traffic on a single HBA. Assuming you have multiple HBAs, you may want to balance the traffic across them.

Note: While I am an HPE Employee, all of my comments (whether noted or not), are my own and are not any official representation of the company

John Eerenberg · ‎10-25-2004

Dmitry,

I have been doing some benchmarks to see what is optimal (for the case of one EVA per cluster, stretch cluster would require two).

HP and I have gone round and round on some of my findings, but the overall concensus is that the following makes sense:

1) Two groups per cluster.
1a) This first group is *optional*. It is for data that does not require performance. In fact, you may intentionally want to create a bottleneck (to keep the second group at its peak peformance). In general, this group has the fewest spindles, with the lowest RPMs, with the highest capacity (for example ten 146GB 10KRPM disks). For instance, one app I work with has the need for large, bulk storage; when in use, this group should have minimum impact on the second group (below).

1b) The second group is meant for performance. It has the most disks, with the highest RPMs and suitable capacity. This most likely includes the system disk (with the page/swap files in their respective sysroots), app disks, database disks, etc. This group should contain sixty+ 73GB 15KRPM disks. I need to maintain a number of units for stetch cluster and snapclone needs. Based on best peformance, 40 spindles is too few; 60+ is good, and more is great. If a lot of the space in this group is unallocated, that helps performance all the more.

2) Unit numbers within a group. Keep it to a minimum. Do what is best/easiest for you; one has to change how they think about unit numbers. In the case of the first group above, one unit number satifies all our needs. In the case of the second group, we find we are reducing the number of unit numbers from 40 down to 15.

> "It still makes obvious sense to have several disks where a system manager wishes to be able to mount/dismount/back up volumes separately or to avoid contention for disk space."
Yes, Agreed.

> "But, performance-wise, are there still any reasons to create separate volumes for the secondary system disk or the apps like Pathworks, Oracle etc?"
Two possibilities: Shadow sets in a stretch cluster (you may need more units to maintain FC throughput), and snapclone for point-in-time data (to keep from cloning too much data). Otherwise, just assign the units from a group based on what makes your job *and* the following the best and safest:
data recovery
high availability
disaster recovery
stretch cluster
your site specific needs

Rooted logicals can help minimize the unit numbers too.

> "Especially with ODS-5 where minimum cluster size is no longer an issue?"
If you don't need backward compatibility before VMS 7.2, you can set an ODS-2 disk to have the same size cluster as ODS-5. Just use the $init command.

Hope it helps.

john

It is better to STQ then LDQ

Jan van den Ende · ‎10-25-2004

John

-- "stretch cluster"?
are we talking VMS here?

-- First group:
"for example ten 146GB.. " , & "In the case of the first group above, one unit number satifies all our needs."
... interesting. Try the INIT ... , and then the SHOW DEVICE/FULL
Have some limits spontaneously been lifted?

Cheers

Have one on me

Jan

Don't rust yours pelled jacker to fine doll missed aches.

John Eerenberg · ‎10-25-2004

Jan,
> "-- "stretch cluster"?
are we talking VMS here?"
Yep. Specifically the shadow sets.

> "... interesting. Try the INIT ... , and then the SHOW DEVICE/FULL
Have some limits spontaneously been lifted?"
I hope not. Well, I hope one day they are!
We use RAID 1+0 almost exclusively. So the thought of a useable 9*146GB RAID5 array didn't occur to me. Sorry about that.
10 disks creates a 5 mirror RAID 1+0 (assuming no spares are used/created). So the total is well under a TB (1 TB is the max IIRC) and should work.

Good catch, bulk storage volume size will also dictate the need for multiple units!

> "Have one on me"
You bet!
Have a good one too!
john

It is better to STQ then LDQ

Keith Parris · ‎10-25-2004

The EVA removes many of the issues related to physical disks, but you still have the VMS file system involved, and that could cause complications.

There is a limit on volume size of 1 TB under VMS today. That's not very many 173 GB disks.

As Hein pointed out, there is a single lock manager resource for disk allocation on a single volume, which could become a locking bottleneck.

On a single large volume, files like logfiles, which tend to be allocated incrementally and pick up small pieces of space from all over the disk surface (and then keep them "forever" in the case of long-running batch jobs), cause complications in the area of disk fragmentation. And it will take a long time for a disk defragmentor to run on a single large volume, compared with several smaller volumes. With Dynamic Volume Expansion now available, it may be most convenient to put large files which need contiguous space to expand into onto their own individual volumes.

In multi-site clusters, you'll be limited to a single shadow copy thread if you have a single large unit shadowed. This could make shadow-copy times unacceptably long.

Garry Fruth · ‎11-01-2004

If you plan on using shadow sets and EVAs, you should consider setting the read cost of one of the shadow members higher than the other. Cache hit rates in the EVA improve significantly for sequential read IO (e.g. backup) and may noticably improve performance of those tasks.

Dima Bessonov · ‎11-03-2004

Keith,

You mentioned increasing disk defragmentation times. Are you aware of any practical limitations/recommendations on disk size from DEFRAG's standpoint?

Related question: does anyone from the community have a real-life experience of defragmenting very large disks?

Galen Tackett · ‎11-03-2004

How important is defragmentation in an EVA environment? Some barely educated guesses of my own:

With the EVA's virtual storage you don't really know where pieces of a file live on physical drives, but perhaps they're more likely to be kept "close" somehow if a contiguous range of logical blocks is involved?

Having a contiguous range of logical blocks still cuts down on the actual number of I/O requests that must go thru the XQP, driver, FC adapter, etc. This is clearly a good thing but perhaps only makes a big difference in an EVA virtual storage environment when the I/O demand is high enough to significantly tax any of these pieces.

Having plenty of contiguous free space still makes it easier to allocate space for files, though this has already been made significantly easier by the file system caches.

Of course, the easy answer is "it all depends on the application." But other thoughts would be welcome...

Hein van den Heuvel · ‎11-03-2004

I agree with Galen's observations.
The EVA will spread stuff out over disks anyway (doing the actuall allocation on the first write, not when you create the VD)

The EVA will allocate is good sized PSEGs (Physical Extents) like 1MB. It would be a shame to break up a longer IO into chunks that need to touch mutliple PSEGs.
If the target unit will mostly hold larger filesm, then it should be enough tot use a good sized (256, 512) cluster size to avoid fragmentation. Also, be sure to set a good sized default file extend (1024? 2048?)

This may be interesting reading:
http://h200005.www2.hp.com/bc/docs/support/SupportManual/lpg29448/lpg29448.pdf

Somewhat to my surprise it suggests predictable mapping for VD's. I always thought thy were allocated 'just in time'. I'll need to verify what/when this changed"

"For non-snapshot Vdisks, the EVA always maps the data on the disks in its logical order. Unlike other virtual arrays
for which the layout is dynamic and based on the write order, the EVA data structure is predictable and
repeatable."

"LUN count influences performance
LUN count has no effect on native EVA performanceâ the maximum EVA performance can be demonstrated with
a single LUN per controller. However, because of the default operating system or host bus adapter (HBA) queue
management in some systems,..."

An other little handy document:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=PSD_CN0496W&prodTypeId=12169&prodSeriesId=321347

Cheers,
Hei

Dima Bessonov · ‎11-03-2004

Galen,

I guess keeping fragmentation low is still important, even on EVA. RMS can simultaneously map only a limited amount of file extents. If you opened a file which is heavily fragmented and the extent you need is out of the mapping window, RMS has to do a "window turn" to map this extent and that'll take time.

Uwe Zessin · ‎11-03-2004

A PSEG is 2 MBytes big, but multiple chunks of 128 KBytes are 'folded' inside a PSEG.

.

Jan van den Ende · ‎11-03-2004

Dmitry,

in these days of abundant cheap memory, it is very advisable to just use cathedral windows to avoid window turns.

Do this by setting SYSGEN param
ACP_WINDOW_SIZE 255

At the cost of some memory you buy performance.

Cheers.

Have one on me.

Jan

Don't rust yours pelled jacker to fine doll missed aches.

Keith Parris · ‎11-03-2004

> You mentioned increasing disk
> defragmentation times. Are you aware of
> any practical limitations/recommendations > on disk size from DEFRAG's standpoint?

I think this varies for different environments. If it's too slow, it will become apparent. Some folks can run the defragmentor all night every night and it's no problem.

My caution was based on my experience at E*Trade. DFO can't defragment open files. We were running an application which kept RMS files open 24x7 except for a few hours maybe once a quarter when we might take an application outage late on a Saturday night to do housekeeping like RMS file CONVERTs and disk defragmentation. On some of our disk volumes which were multi-member Host-Based RAID 0+1 arrays, we had trouble getting the disks defragmented sufficiently within our short outage window. A defragmentor can only deal with basically one thing at a time on a given volume, so running multiple copies on separate (smaller) volumes at once can increase the throughput.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Distributing I/O on EVA virtual disks

Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks

Re: Distributing I/O on EVA virtual disks