MSA Storage

MSA 1000 Performance Problem

 
Jason Keane
Occasional Advisor

MSA 1000 Performance Problem

Hi All,

I wonder if you could shed some light on a write performane problem we are having with a new MSA 1000 unit. HP support consultants/engineers have been looking at this for the last week and I am beginning to lose patience with the lack of progress on this.

The configuration is as follows:

3 x DL380 G4 Servers running Windows 2000 SP4 with 2GB RAM and mirrored 72GB 10k disks on the internal array.

Each server contains a HP FC2214 2Gb fiber card with the latest firmware and drivers (as per the support site).

These three cards connect to a Brockade(spelling?) switch plugged internally into the MSA 1000.

The MSA 1000 is configured with 256MB cache in a 50/50 split for read/write. The firmware is version4.32.

There are 10 x 146GB 10k U320 disks in the MSA and these are divided into two RAID5 sets (4 disks each) and one RAID1 set (the other two disks).

When using explorer to copy a file from the internal RAID or a SAN disk to a SAN disk we are getting approximately 10MBytes/sec throughput. Using IOmeter to write we are getting approximately 10MB/s also. Using IOmeter to read the disks we are getting throughput of about 130MB/s.

To troubleshoot the problem the following things have been tried:

1. The brockade switch has been replaced
2. The MSA controller has been replaced
3. The Qlogic cards were replace with an Emulex card
4. The cache split has been altered (0% read / 100% write)
5. A pre-release of the MSA controller (version 4.4) has been tested
6. All volumes have been defragged
7. The internal array controllers (on DL380) have been upgraded
8. The brockade switch has been removed and a direct fiber link to the MSA has been tested (still only 10MB/s)
9. A Dell server has been attached (still only 10MB/s)
10. Cache has been enabled and disabled (no difference) using dskcache.exe

Now for the real spanner in the works. If we start a write from one server we are getting 10MB/s throughput as monitored on brockade switch. If I start another write job on each of the other two servers my total throughpt on the switch is three times the maximum input from the servers (i.e. 10MB/s from each server and I can write at 30MB/s!!!) This is verfied by the port throughput performance graph on the switch.

So it looks as if the MSA is happy to write at 30MB/s but the servers seem to be limited to outputting 10MB/s. Again I can read at 130MB/s!!

Any body shed some light on this? Or do I just return the whole kit a defective?

Thanks for any help,
Jason.







28 REPLIES 28
Bostjan Kosi
Trusted Contributor

Re: MSA 1000 Performance Problem

Hi,

Have you maybe implemented zoning to separate the servers on the hardware level from each other? I would suggest checking some queues using performance monitor, to establish where is the bottleneck exactly. Check the io queues, if the queue is full than there could be something in the HBA driver/setting. You'll probably find there is something within the server (OS) stopping your bits and bytes from flying. You say that you tried also with Emulex..and the result were the same? Secure Path installed? If you still have Emulex cards, try increasing the queue depth and setting the tprlo parameters....

If the system is not yet in the production, you cloud try creating one array and two raid 1 volumes inside, just to try the performance....

I hope I have given you any untried options..must agree with you this is strange

rgds

Bostjan
Nothing is impossible for those that don't have to do it themselves!
Glenn N Wuenstel
Valued Contributor

Re: MSA 1000 Performance Problem

Jason,
Have you checked to make sure that there is no pending activity on the arrays? You can see this by going into ACU and looking for messages. If the array hasn't finished initialization then you will see a performance hit.

Glenn
KurtG
Regular Advisor

Re: MSA 1000 Performance Problem

Do all LUNS you expect the same performance from use the same amount og physical disks?

I you can post the "More Info" text from all the LUNs this might help.

KurtG
Jason Keane
Occasional Advisor

Re: MSA 1000 Performance Problem

Hi Folks,

Thanks for the offering of help so far. Here's some info for you

1. We've checked with performance monitor the queue's, etc and all seems within normal limits.

2. The Controller is no busying doing anything as we are testing on a RAID 0 (single disk)

But now for the spanner... we upgraded one of the servers to Windows 2003. After the upgrade it was still slow (about 12MB/s). When then turn on the Enable Write Protect option for the MSA in Device manager (Disk Drives -> Policies) and we could write at about 70-80MB/s! Happy days. However we need to turn on this option for Windows 2000. We are running SP4 and the hotfix as described here is installed (by default with SP4)

http://support.microsoft.com/default.aspx?scid=kb;en-us;811392

When we use the dskcache.exe program to turn on the cache it says its enabled but there's no performance increase. It really looks now like the write cache is not enabled. We have opened a call with MS to see if there's something here.

Surely I am not the only one to see this? Are people just running with a slow SAN and not notice?

Ta
Jason.

Re: MSA 1000 Performance Problem

Hi Jason,

We're having a similar issue with our MSA 1000 SAN attached to HP ProLiant DL380s.
- MSA firmware 4.32; 512 MB cache (2 controllers)
- 14 HDD 73GB 15K rpm.
- QLogic QLA23xx FCA

I've noticed little to no performance improvement when running an Oracle database striped over the 14 disks compared to one on just a local disk. So far, I've tried different RAID scenarios and cache settings. I'll check out the results with IOMeter

Denys

Re: MSA 1000 Performance Problem

update
John Kufrovich
Honored Contributor

Re: MSA 1000 Performance Problem

Denys,
What is your RAID level on that 14 disk array. What is the stripe size. You mentioned Oracle, what is your block size.




Re: MSA 1000 Performance Problem

John,

I'm running a batch job that generates about 30GB of redo in 2 hours. The I/O profile when I use dedicated disks for undo, redo, tables, index and temp, etc. is 100% busy on redo and undo. Total IO = > 60% write I/O.

In the Statspack report log file sync, log buffer and db file waits amount for 99% of the wait time with a cpu/wait ratio of 30/70. On waits its about 50/50 (redo write waits vs db file read waits).

One of the nuttiest configuration I tried was to stripe RAID0 over 14 disks for redo only to get the lof file sync waits down. This this did improve things much. The application commit rate is very high but this can not be altered.

I have not yet tried different block sizes to reduce the db file waits. Its at 8K.

Remarkably enough, the execution time of the batch is about the same for the MSA1000 SAN (14 disks in different RAID1+0 configs) as when using 2 local disks on the DL380 (Windows 2003 SE).
KurtG
Regular Advisor

Re: MSA 1000 Performance Problem

Are you sure the LUN(s) spans all 14 disks? I have seen situations where an array previosly expanded with more disk did not reconfigurer the LUNs to span all new physical disk.

KurtG