HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

High latency, low IO's, MBps

Fred Blum
Regular Advisor

High latency, low IO's, MBps


I have tested with IOmeter and IOSql against the servers local HD and the P4300 2 node SAN, Hd Raid5, nRaid 10.

I tested first against the local HD, then without jumbo/flow control/static trunk TRK1/ LACP/RTSP against the SAN and then with jumbo, flow control, static trunk TRK1, LACP, RTSP. In both cases teamed NICs with TLB(=ALB). First time SAN disk formatted NTFS default allocation unit size.
Random IO 32,64,128,256KB writes all better against harddisk. Random IO 8KB writes exeption 47% worse. Seq write IO's all around 22% worse. Random reading IO's small KB better (8KB 8875 344% better) over 128Kb worse, sequential IO's all worse.

Tried improving with jumbo, flowcontrol, static trunk, LACP and RSTP. Hard disk now formatted with 64KB allocation unit size. Small random writes slightly improved over 32kb random writes worse, Seeing worse performance with small random reads, improving 128KB and over. Same picture with sequential reads. See excel sheet.

I had expected to see an improvement across the board. Was I wrong to assume that?

What is the performance you are achieving? SQLIO test definition also in the excel sheet.

is there a way to monitor the HP 2910al switch perfromance?

TIA,
Fred


29 REPLIES
Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps

Sending excel sheet again.
mggates
Occasional Visitor

Re: High latency, low IO's, MBps

I am seeing similar disappointing results. My client has a P4300 SAS 7.2 SAN. both units are using 802.3ad link aggregation into a dedicated VLAN on a pair of Cisco 3750s. I don't think the network is a limiting factor. Unless it has to do with jumbo frames. A simple run of ATTO disk benchmark on a server with an attached SAN volume shows performance maxing out around 120mbs. The same server running benchmark on local raid array approaches 400mbs. I am struggling in my search for for tuning documents and just what my expectation of performance should be.
mggates
Occasional Visitor

Re: High latency, low IO's, MBps

Maybe I answered my own question. Regrading Bits and Bytes. My server in question only has a single 1gb nic into the storage vlan. If my understanding is correct that should top out at 125 MBs? If I add a nic and bundle them should I expect to see disk speeds approaching 250MBs?
Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps

@mggates

In earlier threads I've read that on average 125MBps was the max, but I am not achieving that with Advanced Load Balancing.

Have a look at this link: Bonding versus MPIO performance http://blog.open-e.com/bonding-versus-mpio-explained/
Damon Rapp
Advisor

Re: High latency, low IO's, MBps

With 802.3ad you can really only get 125MB per host. Each NIC on the LH box can only talk to one NIC in the server. So on the LH node, you could get 250MB of throughput but you would need 2 clients to test that out (125MB per client).

This of course assumes that you have enough disks in the right RAID configuration to be able to generate 250MB of throughput.

To get more throughput to the clients, you could bond interfaces on the clients and them have them access multiple LH nodes via network raid.

In my SAN setup, all LH nodes and servers are using 802.3ad and have at least 2 bond nics.

Thanks,

Damon
Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps

Hi Damon,

I have a 2 node 7,2TB Starter San. Which has 2x8 HD. Still in HD Raid5 (thinking about changing that to RAID10 for performance) and nRaid10. So as a rule of thumb I've read that that should be able to produce 16x150=2400iops.

If I follow this calculation (IOPS * Number of Disks * Segment Size) / 1024 I should be able to reach 150MBps.

Did you see my SQLIO results? The max Mbps was 110,01 Mbps - 1760,16 IOps at 64KB random reading IO's. This was with 64KB allocation unit size, jumbo, flowcontrol and RSTP.
With default W2008 R2 allocation unit size, no jumbo, no flowcontrol, no RSTP it was 112,96 MBps/1807,4 IOSps. both cases ALB. So it fell.

I had expected to see an overall improvement following the Networking Best Practices Guide. The improvement is seen only with writing 8K and 32K random IO's and reading sequential. Probably due to the 64KB allocation unit size. But 64Kb random IO's writing falls. That is not what I had expected and why I am questioning my configuration. Were my assumptions of an overall improvement wrong with jumbo/flowcontrol/RSTP/static LACP trunk?

I am thinking of testing again without jumbo, and testing with HD Raid10 before deciding on the production setup.

Pointers appreciated.






Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps

When reconfiguring one node, removed it from cluster and management group, the remaining node had to be changed to nRaid 0.
I copied the 25GB SQLio test file back over and noticed that the transfer speed doubled from 75GB to 150GB.
So I have 1/2 the spindles 8 instead of 16, but without network raid. Still the speed doubles. Is their such a high price in performance for nRaid?
teledata
Respected Contributor

Re: High latency, low IO's, MBps

Hmmm,

That doesn't sound correct...

I'd start by enabling SNMP on your switch, then collect interface statistics:

Packets in/out
Errors In/Out
dropped packets in/out


http://www.tdonline.com
Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps


I have just changed the HD Raid5 to Raid10. Did a SQL10 test with network Raid 0. Improvement of 8kb random write from IOs/sec 2471.28 - MBs/sec 19.30 to IOs/sec 13450.80 MBs/sec 105.08.
Volume is currently restriping will test also with network Raid 10. Expecting to see a drop again to 20 MBps.
Will try to find out how to monitor the sw2910al.




Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps

After setting nRaid10 and restriping finished:
Writing 8kb random io's
Throughput metrics:
IOs/sec: 4818.04
MBs/sec: 37.64

that is a drop from
IOs/sec: 13450.80
MBs/sec: 105.08
with no network Raid.
Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps

I had a look at the port counters and found that the error counters are mostly zero. There are Tx Drops but way below HP's rule of thumb 1 in 5000.

Switch 1
Port 5 Server ALB slave Nic connected. No errors
Port 9 SAN node 1 ALB slave Nic connected.
Bytes TX 1,706,663,918 Unicast Tx 132,150,964 Bcast TX 267,65
Drops Tx 183
Port 15 SAN node 2 ALB slave Nic connected.
B Tx 31,882,815 Bc Tx 132,883,178 U 268,263
Drops Tx 11

Strangely the trk1 ports show flow control off, while enabled in the Config menu. According to the manual happens when the port on the other side is not configured for flow control. Guess what, the connected Trk1 ports on switch 2 all show flow control on! Contradicting.

Switch 2 has no drops on the San nodes.
Server Nic port
B Tx 1,290,285,957 Bc tx 316,230,605 U Tx 201,778
Drops Tx 3245

Should I conclude that the overhead of network Raid 10 is the reason for the complaints about the P4300 performance?
teledata
Respected Contributor

Re: High latency, low IO's, MBps

volume Access Specification IOps Read IOps Write IOps MBps
NR0 8K; 55% Read; 80% random 842 462 380 6.6
NR-10 8K; 55% Read; 80% random 513 282 230 4.0

NR0 16K; 66% Read; 100% random 923 619 305 14.4
NR-10 16K; 66% Read; 100% random 485 325 160 7.6

NR0 64K; 66% Read; 100% random 470 315 155 29.4
NR-10 64K; 66% Read; 100% random 304 204 100 19.0

NR0 4K; 75% Read; 80% random 829 621 207 3.2
NR-10 4K; 75% Read; 80% random 606 455 151 2.4

NR0 32K; 55% Read; 80% random 541 297 244 16.9
NR-10 32K; 55% Read; 80% random 377 207 170 11.8

I ran a quick test... All I had handy though was a pair of VSAs (on ESXi 3.5, each VSA has 16 500GB SATA drives) so there is a lot more network overhead than a physical node, but even here you can see that the drop in performance isn't as large as you are seeing in your test...
http://www.tdonline.com
Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps

Thanks for the effort. IMHO the IOMeter Access specification with 55% reads masks the outcome. Still a significant drop is already noticeable. With 100% 8K random writes (SQL/Exchange server) the network Raid overhead will be more apparant. Our SAN is intended as the target for our Hyper-V SQL2008 server and mixed read/writes approaces reality better.

The switch 1 No flow control is now gone as I exchanged the dual personality ports for 10/100/1000 ports on the 2910al.

I have attached the results of my IOSql tests sofar on a P4300 7.2TB 2 node system. ALB, No Jumbo, No flow Control, No trunk versus ALB, Jumbo, trunk, flow control and RSTP; HD Raid5 versus Raid10; network Raid 0 versus network Raid 10.

Would there be a improvement in sequential reads and writes when adding a third node? Improvement in the order of?

TIA.
AuZZZie
Frequent Advisor

Re: High latency, low IO's, MBps

Did you ever get anywhere on this?

I'm currently looking into the P4300 solution but all I'm finding are people complaining about the performance of the network raid (the hole reason to purchase the san)
Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps

@Auzzie

We are currently using HP's High Availabity Bundle Midrange Rack HA. The P4300 is configured Raid10, nRaid10 with ALB. I have compared my test results with those provided by HP and seen comparable results. The bottle neck is P4300's two nodes for load balancing. Adding a third or fourth node will lead to a 50% and 100% percent performance increase as per HP information. Basicly two nodes is a poor man's solution.
I am currently running W2008R2 fail-over cluster with a W2008R2 Hyper-V server running Progress database server. Performance is acceptable. We are going to add more nodes before going live with further Hyper-V servers (SQL server, RDS, SBS server).

The SAN capabilities in combination with W2008R2 Hyper-V are a definite plus. Two nodes is not the recomended config and a big question mark for performance critical database servers. In such instances mutiple P4500 with 10GB ports or a server with solid state disks are maybe a better solution.

Re: High latency, low IO's, MBps

We have a two node P4300 cluster with ALB on redundant HP 2910al switches. Clients are two DL380 G7 with ESXi 4.1.

Attached is a screenshot of a robocopy job and of the HP SAN Performance Monitor. As you can see we are able to reach 122 MByte/s (max. for a 1 Gbit/s link is 125 MByte/s).

Source of the robocopy job is a Win2003 server using the MS iSCSI initiator, target is Win2003 server on VMFS.

All volumes are Network RAID 10 (volumes mirrored).

Flow control, jumbo frames and Rapid Spanning Tree are enabled.

Of course this is no IO test but it shows that a P4300 cluster can operate at the max. throughput limit of a 1 Gbit/s link.

Thomas

Re: High latency, low IO's, MBps

@teledata

comparing your results with 2 x P4300 G2 Cluster, each node has 8 x 450 GB SAS, Node Raid 5, Network Raid 10 (mirror)

Using IOMeter on a 5GB raw disk via MS iSCSI initiator (iops total, iops read, iops write,mbps):

4k, 75% Read, 80% Random: 2244,1684,559,8
8k, 55% Read, 80% Random: 1886,1038,848,14
16k, 66% Read, 100% Random: 2193,1443,750,34
32k, 55% Read, 80% Random: 1456,801,654,45
64k, 66% Read, 100% Random: 1192,786,405,74

Thomas
mggates
Occasional Visitor

Re: High latency, low IO's, MBps

Thomas,
what is your server to SAN connection look like? I see the G2 cluster has 2 internal 10/100 cards. Hard to believe your seeing that performance of those cards.
AuZZZie
Frequent Advisor

Re: High latency, low IO's, MBps

I think you're mistaken. The G2 has 2 X 10/100/1000 nics per node.

Re: High latency, low IO's, MBps

I have two HP 2910al switches with a 10Gbit interconnect. The two P4300 nodes have 1Gbit NICs where one nic is connected to one switch using ALB to provide load balancing. Quiet simple. Maybe I can create a Visio drawing to show our environment.

Thomas

Re: High latency, low IO's, MBps

See the attached file for my setup. One node is located in the ground floor. So are the VMWare servers. The other node is located in the first floor using FO with copper to FO converters.

All connections are 1 GBit links. The only 10Gbit is the interconnect between the switches.

Thomas
Fred Blum
Regular Advisor

Re: High latency, low IO's, MBps

@AuZZZie

This is the bench mark data I received from the HP partner:

4K blokken, 66/34 read/write, random IO's

HP P4000 G2 Products
Model BK716A P4300 G2 7.2TB SAS Starter SAN

Capacity in GB(with Network RAID 0)
RAID 0 x
RAID 5 5.694
RAID 6 4.546
RAID 10 3.420

Performance in IOPS(with Network RAID 10)
RAID 0 x
RAID 5 2.200
RAID 6 1.700
RAID 10 2.500

With HD RAID10 and nRaid10 I measured 2960iops with IOMeter 67/33 4K writes.




twg351
Occasional Advisor

Re: High latency, low IO's, MBps

Wow, I read this thread months back as I was beginning to work on my SAN-HyperV project.  I hoped all of your efforts would help me ... and they did.  BUT sadly it looks like we're all in the same boat.

 

I have all best practices done (ex: 4 1GB NICS use ALB,Server uses MPIO,flow control, jumbo, 2 switches, fully isolated networks,e tc.).  I am also seeing 30-60 MB/second speed on the SAN when it's network RAID 10 (and hardware RAID 5).  I simply cannot driop the Hardware RAID down to 10 due to cutting too much disk space out.  And I need  RAID 10 obviously for redundancy.

 

I see MB/sec of ~40 on average for the SAN vs ~400 for my local disk.  That is crazy.  I am amazed that Thomas Halwax somehow got 125MB/sec ... the only difference that I can see is the 10GB switch cable.

 

Fred's comment on adding a 3rd and/or 4th node to increase performance by 50% - 100% is noted.  But not in my budget ... likely forever.  So I'll give my current SAN setup a try and see how it goes ... I don't think my SQL database is going to work at this lower speed and odds are I'll end up removing the SQL cluster and having a non-clustered SQL server using local disks for speed instead of the SAN disk (thank goodness I have enough disk space on my local server for my SQL DB's).

 

This is unfortunate, if anyone has come up with a solution, anyone besides Thomas as I still have no idea why his setup is getting to 125MB/sec while all others seem stuck at the 30-60MB/sec range.

twg351
Occasional Advisor

Re: High latency, low IO's, MBps

I think I am re-thinking this now ... edit ...

 

I have been running more & more IO tests, and the more I run the more I think:

- using IOMeter generic scans is not overly helpful, I only got useful information when I customized the scans (R vs W, random vs seq, R+W, etc etc)

- READS from the SAN are pretty quick.  I was even getting in the 100-110MB/sec range ... much better than 40MB/sec

- Comparing the SAN to the local RAID10 array was not valid as I had not run a full range of IO tests.  Now that I have, I can say in general it's as simple as the local disk is MUCH faster on WRITES .  They are about the same on READS.  Changing the hardware RAID from 5 to 10 on the P4300 would likely help here, but it's not something I am going to re-do at this point as I think the current setup will do fine.