Re: problems disk io and testing internal vs san

Doug_3 · ‎05-04-2004

Hi, I am testing disk io using time dd commands. I am finding the following and wonder if others would chip in their thoughts on the data.
time dd if=/dev/rdsk/cxtxdx of=/dev/null bs=1024 count=500 (used 5000 for internal)
yields dataset of:
SAN (xiotech) 1.5 to 2.64 seconds (500 meg read)
Internal seagate .5 to .6 seconds (5000 meg read)

This doesn't seem correct that the 2 gig FC and san is reading at 189-319 meg/sec and the internal scsi is reading at 8333 meg/sec.

Is the SAN working as expected and what can I check to validate or test further what is going on.

Regards, doug

Hein van den Heuvel · ‎05-04-2004

Your blocksize used for the test is unrealistically small. Well, unless the targetted application is indeed doing 1KB IO.
Currently you are really measuring IO/sec, not MB/Sec. Do the same experiment with bs=1024k

Granted, the SAN still seems slow.
That disk will be doing read-ahead thus it can respond right-away from it's cache.
Is your SAN controller doing read-ahead? (It probably should, buy a better one if it does not).
How many IO/sec is that SAN appliance rated at? What latency? ANY latency will kill the IO/sec rate. What is behind the controller?

Yoou may find that is you start a few concurrent stream (preferrably with an oseek to start at differnt places that the SAN solution scales with the number of streams where the direct connect disks is limited to what you see now and is even likely to drop back in performance as yo increase streams (thrashing).

Hein.

generic_1 · ‎05-04-2004

Glance might be usefull to look at your statistics as well. In terms of your disk performance is that san already getting hammered in addition to your test. This could sway your results. How busy is it before the test? Also I am curious what type of san you have, how is it configured (IE striping/raid level), and are those disks shared with other luns that may be inhibiting performance?

Duncan Edmonstone · ‎05-04-2004

Doug,

This kind of test is very simple, and apart from anything else doesn't in any way represent a real world situation - that's not to say that the test isn't valid or correct - only that I wouldn't put too much faith in a simple single threaded sequential read test. A better approach would be to use a more advanced testing tool such as iozone, available here:

http://www.iozone.org/

HTH

Duncan

I am an HPE Employee

Leif Halvarsson_2 · ‎05-04-2004

Hi,
It don't seems correct to me. What king of preformance are you interested in, IOPS or throughput.

I have used the "Postmark" for disk/filesystem benchmarks.

http://www.netapp.com/tech_library/3022.html

A. Clay Stephenson · ‎05-04-2004

You've just discovered one of the features of a SAN. Very rarely do disk arrays do as well as dedicated disks (with on-board cache) if all other things are equal when connected to a SINGLE server. However, when disk arrays are connected to many servers they really come into their own because the aggregate performance can be phoenomenal. If memory serves me, one of the Xiotech arrays (Magnitude?) really suffers because it intentionally lacks cache so that the host really, really knows that a transaction has been written to disk. Lack of array cache can be a very good thing from a perspective of knowing an i/o has occurred but it can be a bad things in terms of performance. Most disk array manufacturers feature very large caches to speed the overall transaction rate while assuming the small risk that what gets written to cache might not actually get written to disk.

If it ain't broke, I can fix that.

Doug_3 · ‎05-04-2004

It is a xiotech magnitude, I don't know the controller/cache info, but will ask the sys admin. We are interested in perf for a database app. Glance aggregates the disk i/o so it looks like it is 100% quite a bit of the time.
Stripe size may be an issue, the sys admin took the defaults, I think, when the vg was created, which is a 1 meg stripe. Raid level is 10. It was not created as a pvg. We are only using one of the 2 available fc cards.

A. Clay Stephenson · ‎05-04-2004

I just checked and one of the "virtues" of the Magnitude is indeed no cache; again, this is not necessarily bad but in many cases from a pure performance perspective it probably is. Don't worry too much about Glance reporting 100%; it has no way of knowing that this device is not a simple disk but I would try stripping each LVOL across two identically sized LUN's per VG using different primary paths. I would start with a stripe size of 64k.

If it ain't broke, I can fix that.

Ted Buis · ‎05-05-2004

You should check how the file system is mounted, as that can affect performance. It is very easy in SAM to set a file system to be more aggressive (taking risks with integrity) to gain additional peformance for writes. Also, for write performance you can change the kernel parameter default_disk_ir from 0 to 1, to enable the immediate reporting from the disk that the I/O was completed as soon as the data is in the disks buffer, as opposed to after it is written to magnetic media. I don't know how this will work with your array, however I have seen up to 4X improvements with a direct attached SCSI disks (JBOD) for write intensive jobs. I believe HP arrays with a battery protected cache report immediately regardless of the kernel setting by default (setting of the array).

Mom 6

Tim D Fulford · ‎05-05-2004

Doug

The dd you did over fibre channel I believe. The thing is with FC is it really likes a large block size, there is something like a 100 byte overhead on each fibre "frame".

Secondly.. I do not know how the xiotech works does the LUN /dev/rdsk/cxtxdx is physically a single disk or is it really a hardware RAID0 or RAID1+0 over a number of disks. If so, the hardware stripe width will be of issue here as it is probably greater than the block/frame size. e.g. if it was, say, 64kB the dd will read 64x 1k blocks before moving onto the next physical disk, hence you will be stressing a single disk at a time. If there are say 4 disks in the RAID0 stripe or 8 in RAID1+0 stripe with a stripe size of say 64k then best read performance will be for a block or frame of 4x64k ==> 256k, so
dd if=/dev/rdsk/cxtxdx bs=256k count=2000 of=/dev/null

This is a 500MB read and should "spin-up" all the physical disks in the LUN.

The above is not really such a neat idea as a benchmark as
o you probably access data from a filesystem and not straight from the disk. So any "tests" should be on a filesystem and not the raw disk.
o Unless your application reads masses of data sequentially dd is probably not the best benchmark tool. Unfortunately you are the only one that can judge that! Others above have offered info on other random tools.

Good luck

Tim

-

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: problems disk io and testing internal vs san

problems disk io and testing internal vs san