Re: High I/O rates to disk array; ideas on improving performance?

Stuart Powell · ‎03-02-2006

I have an RP7410 with two 1Gb Fibre Channel HBA's connected to two HP EVA 4000 disk arrays in a campus environment. The primary path to the local array is on one HBA and the primary path to the remote array is on the other HBA. I am using Mirror-UX for data redundancy.
On the PV's for the primary Informix DB space we are, for long periods approaching 8 hours, running close to 100% disk utilization on both HBA's. Below is a 10 hour sar -d output for the two mirrored PV's:
device %busy avque r+w/s blks/s avwait avserv
Average c27t2d4 54.50 0.68 288 2263 5.06 3.81
Average c26t2d4 85.12 0.70 427 2941 5.05 3.76
What options should we consider for improving the throughput in a SAN and array environment?

Stuart

Sometimes the best answer is another question

Mark Greene_1 · ‎03-02-2006

Get lsof:

http://hpux.cs.utah.edu/hppd/hpux/Sysadmin/lsof-4.76/

and see what processes are writing to files on those disks. If there are multiple files involved, you could possible spread them out over many disks. If theres only a few (or even one large one), then it's a utilization problem with too many processes going after the same data, or a couple of very poorly written queries.

mark

the future will be a lot like now, only later

Stuart Powell · ‎03-02-2006

Mark,
Will lsof will help with read/write database?

Stuart

Sometimes the best answer is another question

Stuart Powell · ‎03-02-2006

Mark,

Let me state my question with more clarity:

Will lsof help with identifying read/write to a database lvol?

Stuart

Sometimes the best answer is another question

A. Clay Stephenson · ‎03-02-2006

No lsof will not be of much benefit in this case.

One of the things of which you should be aware is that host-based tools like sar and Glance are not very good at analyzing disk arrays. All they know (or can know) is whole heaping helping of i/o is going through what it sees as one physical disk -- nevermind that the LUN may acctually be stripped across 10 disks. One of the things that you can do to make things APPEAR better to sar is to divide your big LUNS into equivalent smaller LUN's. The total i/o per disk will be reduced and things will APPEAR to be better; the actual throughput may be (and likely will be) unchanged.

If it ain't broke, I can fix that.

Stuart Powell · ‎03-02-2006

Clay,

I figured that breaking the PV in to smaller PV's would reduce the alarm numbers, but it wouldn't help the actual throughput. And if the HBA is max'd out then is there anything else to try before I go to 2Gb HBA's or additional interfaces? Another option is to moving the mirroring to the arrays and then split the PV configuration. But both of these options involve capital, that I don't have this year.

To follow up; what other system performance measurements would be good to evaluate database performance?

Thankfully we aren't receiving complaints from users on a regular basis about slow performance, but I'd like to stay ahead of that problem.

Stuart

Sometimes the best answer is another question

A. Clay Stephenson · ‎03-02-2006

Your requirement of mirroring makes this more difficult. Under LVM, you can use extent-based striping and mirror data but the smallest possible extent (1MB) is still much too large to be a good stripe size to efficiently distribute i/o.

Now if you can rethink your mirroring methodology to use CA (Continuous Access) so that the mirroring occurs behind the scenes then it may be possible to actually improve i/o.

The idea is that rather than using 1 large LUN per VG you divide that into as many LUN's as you have dedicated I/O channels between the host and the array. For example, suppose that you need a 300GB LUN and that you have 2 Fibre cards in the host. Create 2 150GB LUN's. LUN0's primary path should be SCSI 0, alternate 1; LUN1's primary path shoould be SCSI 1, alternate 0. You then stripe each LVOL in the VG across both LUN's in 64-128KB stripes. This will efficiently distribute your i/o across the available i/o path's. What we are really trying to do is throw the data at the array just as fast as we can and let it then decide what to do with it -- after all, that's what them expensive arrays is good at.

If it ain't broke, I can fix that.

Stuart Powell · ‎03-02-2006

Clay,

Would you say that your third statement really only applies once I move mirroring away from the OS?

Stuart

Sometimes the best answer is another question

Mark Greene_1 · ‎03-02-2006

If the lvol is raw, with no file system, then no, lsof isn't going to be much help.

Have you run fcmsutil to see if HBA is producing errors?

fcmsutil [dev] stat

where [dev] is the fully qualified path to the device file for the HBA. On my system, it's /dev/td0

mark

the future will be a lot like now, only later

Stuart Powell · ‎03-02-2006

Mark,

Excellant idea; however, that's not the problems.
Here's the top of the command for both HBA's:
$ sudo fcmsutil /dev/td1 stat
Thu Mar 2 13:53:54 2006
Channel Statistics

Statistics From Link Status Registers ...
Loss of signal 0 Bad Rx Char 255
Loss of Sync 2 Link Fail 0
Received EOFa 0 Discarded Frame 0
Bad CRC 0 Protocol Error 0

$ sudo fcmsutil /dev/td0 stat
Thu Mar 2 13:53:32 2006
Channel Statistics

Statistics From Link Status Registers ...
Loss of signal 0 Bad Rx Char 255
Loss of Sync 2 Link Fail 0
Received EOFa 0 Discarded Frame 0
Bad CRC 0 Protocol Error 0

I talked to our DBA and he believes that there are some month-end accounting activity going on right now. I'm going to look back a few days to see if performance was this high. I may have caught disk I/O when it was the most intense.

Stuart

Sometimes the best answer is another question

Steve Lewis · ‎03-02-2006

On our Informix DBs (about 1000 of them in 40 instances) we sometimes get hot luns where the busy% is 100% or the queue goes up, even on disk arrays with large cache.
Example reasons for this are:

Too many informix cleaner threads simultaneously cleaning LRUS. This is a key lun saturator. Reduce your LRU writes if at all possible. You may have to directly manage your checkpoints at certain periods just to prevent cleaning. Remember that during a checkpoint LRUS are allowed to clean down to LRU_MIN_DIRTY before the chunk writes occur, which can extend the time if your luns are hot.

Multiple chunks on a single lun. This is a big no-no on informix, as writes to chunks happen simultaneously in checkpoints. This increases the write queue. I have seen it go up to 1000 on really bad systems. To reduce this horror, follow ACS's advice on LUN striping and spread out your chunks more. I find that 4 LUNs to an LVOL/chunk works well with Informix. Extent based striping is not quite so good. The default queue depth in PH-UX lvm is 8 per device - the more devices, the less the queue, unless the array itself is saturated.

Poor table and index placement in the database. If the database is joining tables in the same dbspace/lun, the heads will be flapping all over the place and striping can only do so much.

Slow/ insufficient fibre. Go to multiple strands and stripe all LUNs over all strands, all of the time.

Poor RAID choice. Use RAID 0/1 always. (http://www.baarf.com).

A. Clay Stephenson · ‎03-02-2006

Yes. As I stated earlier, extent-based striping under LVM does allow mirroring. I have yet to see a significant improvement using extent-based stripping. As a said earlier, the smallest possible PE (1MB) is still much too large a stripe chunk; moreover, a 1MB PE severely limits the size of a modern-day PV. You really need to get the stripe size down to somewhere in the 64-128KB size to really see noticeable improvements. This conventional LVM striping; however, can't be done if the LVOL is mirrored and that is why I suggested using CA as your mirroring technology.

If it ain't broke, I can fix that.

Alzhy · ‎03-02-2006

Stuart,

How "remote" is your other EVA4000? Will the usage of just a 1GB FC-HBA mean its really far out --- as in you're using dark fibre accross campuses, accross cities?

You SAR stats appear healthy to me. In fact it appears normal. And since you're using EVAs, no amount of host based RAIDing (striping) will help you.

If you're seeing 100% disk utilization - that is probably because you do have valid load from your Informix DB. If you've issues with your storage infra .. you'll be seeing queueing (which you do not have) and bigger response and service times on inidividual LUNs.

I think you're okay and those 1GB FC channels to your EVAs AND your EVAs are working fine. What you're having is just plain and simple load that your server can crunch.

Hope this helps.

Hakuna Matata.

Stuart Powell · ‎03-02-2006

Steve,

I've got our DBA looking at some DB performance issues. Our read and writes to memory are in the 90%, which is good, and the flush to disk wait times are very low, also a good sign. The 100% number I'm getting is from Glance. After further investigation we've determined that such a number doesn't mean all HBA's are saturated. As a matter of fact we have seen transmission rates over 1Gb/sec on a 1Gb fiber. We are going to work with the Informix people to make sure we are monitoring the correct parameters for throughput.

Clay,

I understand you comments about a CA solution for mirroring. Another option we will be considering, if we are in a disk limiting situation is array mirroring.

Nelson,

The arrays are physically about 600' apart. I'm hoping that we can find information to support your hunch. I've looked at historical system data back to June of '05 and have seen the same levels of disk I/O then as we are seeing now. Our users are normally quick to complain if system performance drops below expected levels, and no one has complained. So either we have a bottle neck and they are used to it, or we really don't have a bottle neck and I need to adjust my expectations of the new daily reports I'm getting from OV Performance Manager.

Stuart

Sometimes the best answer is another question

Alzhy · ‎03-03-2006

Btw, how are you surmising that you are "close to 100% disk utilization on both HBA`s"? If you are using Glance/Measureware - that is not to be trusted since the default alarms I think date back to when disks were simple slow SCSI. Out of the box, Glance will report your storage subsystem as 100% utlilized even if only one disk/lun is approaching that utlization. Your best guage to ascertain if you've storage trouble is "sar -d" and look for queuing.

As far as mirroring, I've always entrusted mirroring my SAN disks (for whole array redundacy that is) to VxVM and not to In-Array solutions like BCVs, SRDF or CA. Bandwidth, CPU cycles and busses are significantly faster this days that most of my clients I've convinced to using this approach. One glaring benefit in using VxVM host based mirroring is that you're not beholden to your array vendor.. You can kick out or introduce any array from any vendor to the mix without disruption.

Hakuna Matata.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: High I/O rates to disk array; ideas on improving performance?

High I/O rates to disk array; ideas on improving performance?