Re: 4x P4300G2 SAS performance worries.

Steve_NetAdmin · ‎05-08-2012

Hey folks,

I'm currently on a saga at work to see if we're getting what we paid for in our 4 node p4300g2 cluster. Using the CMC performance monitor I've rarely seen the maximum IOPS go over 2000 under heavy load, IOMeter 64k sequential reads peaked at 73.86MB/s read, and annectodal benchmarks of file transfers of known sizes from VM to VM on the cluster peaked at 34.98MB/s transfer.

I know I have some networking issues, working with Procurve 2810-24g's that apparently don't team connections across seperate physical switches, but is there more fundamental things I should be looking for? On the nodes the nics are teamed with adaptive load balancing.

We run an iSCSI setup from 5 Proliant servers.

oikjn · ‎05-10-2012

I can't remember what disk config the 4300 is, but it does sound like a switch issue.

check out some more of the VSA stats... watch your individual node latnecy on read/write and que depth. Keep looking at other stats to try and find where the bottleneck is. I bet it could be the switchs. I can get >73 from two VSAs fed using network raid1 and hardware raid1 with two 7200RPM sata disks! sustained IOS around 100 and peaks around 1500.

Steve_NetAdmin · ‎05-11-2012

The individual nodes are phyisical RAID-5, leading to slower write speeds than read speeds, and network RAID 10.

Hopefully in the next couple of weeks my documentation of current setup and planned changes will magically finish its self so I change how things are setup since we don't have layer 3 switches. we use 3 ProCurve 2810-24g's, so with the p4300g2's and the proliant servers I think I can set it up so that the vmNetwork and iSCSI nics are teamed to the same physical switch, and use Fault Tolerance (ESXi 5) on our critical production servers. Hopefully then we'll still have some redundancy while allowing Adaptive Load Balancing to do its job, as well as the Round Robin Multipathing in ESXi.

It's amazing how much of a learning experience this is. Lately I'm reading up on vLans, which I have to get right since we use a VOIP setup, but currently we weren't initially set up so that the regular virtual machine traffic is on a sepereate vlan from the iSCSI traffic. The vlan's were defined but not implemented, except for VOIP, that was set up and given higher priority than the rest.

oikjn · ‎05-11-2012

write speeds shouldn't be that much slower considering the cache on the VSAs unless your load is so mixed for so long.

kghammond · ‎06-15-2012

Hello,

We also had concerns with the performance of a 4 node p4300G2 SAS cluster...

We are in the same boat, RAID 5 with network RAID 10. We spent an extensive amount of time with HP LeftHand support, HP Sales Engineers and other escalated HP staff.

The best conclusion we have at this point is that performance is heavily dependent on your workload. It turns out our workload is about 50% read, 50% write.

With discussions from other SAN engineers and other research, we are suspicious that a heavy write workload performs significantly more poorly than HP represents when using RAID 5.

We have been told numerous times that the rule of thumb is RAID 10 gives about a 10% performance boost at the cost of half your disk space.

We just started testing a VDI workload, 95% write and 5% read on a RAID 10, p4500 SATA cluster and performance and IOPS are a lot better, hitting upwards of 4000 IOPS with almost no queuing.

When our additional two node p4300g2's arrive we plan to do some IOMeter tests with varying read/write ratios and RAID 5 versus RAID 10. Our gut feel is we are going to find that on a heavy write workload (not sure what heavy means yet, maybe 50% or greater...) that RAID 10 IOPS will be upwards of 2X performance versus RAID 5.

There is nothing conclusive in this post, but it feels like there is some mis-conception about RAID 5 in the LeftHand world and we are working to understand this more clearly so we can make better decisions on how to plan for both space and IOPS using a LeftHand SAN.

Kevin

Rukam · ‎06-30-2012

Network Raid 5 is not at all recommended in an environment where we have require more write performace as the nodes have to do dual parity in writing the data and rebuilds take a longer time , SANIQ reads the volumes as network raid 10 as default, so if we use the network raid 5, the nodes have to two extra jobs-

1. Converting the raid 10 volume which is read by saniq into raid 5 while writing the data to the volumes , and
2. Other of writing the dual parity.

30 % of the top layer of the NR5 volume is always NR10(by design) as Saniq reads and writes data in NR10.

This is a tedious job for the SAN and as compared to NR10, the performance on NR10 for writing data is poor.

Hence NR10 is the best recommeded on the SAN's with more write performance

Sorry for my bad english.

I hope that helps.

ryan_1212 · ‎07-21-2012

Your numbers are similar to what I see on a 4 node P4300G2 cluster, and maybe slighly better even. I see reads sometimes as low as 15MB/s. Oddly my write speed is usually high at around 50MB/s. My load is much more reads than writes. I have R5 in the nodes and R10 at the network level.

I do have about 55VMs running on this storage, none of which are particularily high I\O.

kghammond · ‎08-20-2012

We got a new two node p4500 g2 SAS Starter SAN in and I wanted to do some more testing. It seems that there is about a 2.5x increase in IOPS using RAID 10 over RAID 5 for the hardware RAID.

This is a two node cluster, 8 drives 15K 450 GB SAS). We ran IOmeter workload with the nodes configured as RAID 5, then again as RAID 1 + 0.

Our test setup was a Windows 2008 HP DL385 with 2 gig nics configured for MPIO using the HP DSM. At the time of our testing, we are running SAN IQ 9.5. The LUN was created as a Network RAID 10 lun, fully provisioned.

Our IOmeter workload was configured similar to the original tests:

Disk were left unformatted and the physical drive was selected
1 worker thread
8000000 sectors
64 outstanding IO’s

For each test the access specifications were setup as following:

100% access
Request size 4K
100% Random

Tests were run for 45 minutes with 30 second ramp up time. I also did the right click run as administrator when running the tests in Windows 2008. Apparently you can get inaccurate results without doing this.

The results were as we suspected but far from what HP engineers advertised to us. I am curious what the real word implications are but we have proven with one of our workloads that RAID 1 + 0 is substantially faster. Now IOmeter appears to be backing up this up.

4K, 70% Read, RAID 5 — 2792 IOPS, 10.91 MB/s, 18.24 Avg I/O, 5.66% cpu
4k, 70% Read, RAID 10 — 6827 IOPS, 26.67 MB/s, 7.94 Avg I/O, 11.59% cpu

4K, 50% Read, RAID 5 — 1952 IOPS, 7.63 MB/s, 38.21 Avg I/O, 4.36% cpu
4K, 50% Read, RAID 10 — 5178 IOPS, 20.23 MB/s, 9.7 Avg I/O, 8.35% cpu

4K, 15% Read, RAID 5 — 1437 IOPS, 5.61 MB/s, 88.42 Avg I/O, 2.67% cpu
4K, 15% Read, RAID 10 — 3703 IOPS, 14.47 MB/s, 13.34 Avg I/O, 5.82% cpu

I am not sure if I am missing anything or did anything wrong, but the tests look pretty consistent. I was expecting the heavier write access to be fewer IOPS as it is, but I was expecting more of a hit on writes in RAID 5. RAID 5 is supposed to be more intensive than RAID 10 so you would think that a 15% read pattern would get an exponential hit, but the above looks very linear.

I suppose this may have something to do with the way Network RAID 10 works. There might be some funky write caching going on and it might be that the write operation to the second node is more intensive than the write operation to RAID 5, therefore the real bottleneck is the latency in committing the write to the second node, not committing the writes to RAID 5 versus RAID 10.

David_Tocker · ‎08-25-2012

Something broken there.

We run 2x P4300 in raid 6.

Network Raid 1.

We are using 2x procurve 2810 switches using ALB on the P4000's.

Both switches trunked (LACP) to a C7000 enclosure running hp blade switch GbE2c.

Running most dirty old bl460 g1's

Get around (over?) ~100MB using IOMETER to a single blade when i set it up. If I do the same thing on a second blade it scales well - could probably get close to 200MB on reads if I set it up correctly.

If you havent read into the pre-reqs to setting these up then perhaps you should consult google.

Flow control *must* be on.

Jumbo frames in almost all applications makes things slower / breaks stuff. - You need some -really- fancy switches to pull this off.

Make sure your STP is setup across your infratructure correctly.

Do not overload your switches - if they are running your iSCSI, that is probably all they should be doing, unless they are amazing. (2810's are not, but they should do a hell of a lot better than you are getting)

You *must* run the iSCSI in a seperate VLAN at minimum.

You *should* setup a gateway on this VLAN to allow the P4000's to report faults via email / SNMP (both is better, the P4000 wont spam you)

We are about to change our setup to stacked Cisco 3750's which should allow us to enable cross-stack LACP - Im expecting this to raise our write performance up to a similar level to our read performance (across multiple servers)

Good luck.

Remember changing link configurations on P4000's can be disruptive - so make sure you only stuff around with one at a time and make sure you have a FOM. Make sure you link paths are redundant to each node or you will be in a world of pain.

The P4000's are great for IO on 1Gb. Not so great for throughput compared to direct attached IO. Keep this in mind, avoid swapping etc and you can make them perform magic :)

Regards.

David Tocker.

Regards.

David Tocker

David_Tocker · ‎08-25-2012

Just adding to this - with direct attached RAID 5 storage, performance is generally crap. Add iSCSI and the performance is expected to be good?

Regards.

David Tocker

kghammond · ‎08-25-2012

Just a quick follow up.

My reason for posting our results is that HP engineers continuously preached that the overhead of running hardware RAID 5 versus hardware RAID 10 was at most about 10% hit in performance (IOPS).

When we ran into a real workload that was very write intensive, hardware RAID 5 did not hold up well. HP insisted to us that we didn't have enough spindles and we needed more spindles. Although we may need more spindles, we did this test to validate if RAID 10 would give us more bang for our buck versus more spindles.

Our eniornment:

ALB on all nodes, we used to be LACP when we were on a single switch.
All nodes dual connected into separate HP 5400 series ProCurve switches
Dedicated HP 5400 series switches for iSCSI
No gateway for iSCSI, layer 2 traffic only.
We have a management server in our iSCSI that proxies any management traffic, such as email, dns, ntp, etc.
flow control is enabled for the HP LeftHand ports.
No jumbo frames

I think that covers the majority of the infrastructure questions.

I am not surprised by the results, but it does conflict with HP engineers. My only surprise is that I expected the hit on IOPS to be negligible at a 70% read ratio. I was not expecting the overhead of RAID 5 to be relatively linear regardless of the read/write ratio.

Thank You,

Kevin

Gediminas Vilutis · ‎08-25-2012

Just a few my cents...

3750 catalysts have almost no packet buffer at all - I would not recommend this switch for iscsi (not for heavy iscsi, at least)...

Gediminas

David_Tocker · ‎08-26-2012

The Packet buffer thing always confuses me.

HP recommend in all their bundles for P4000: Procurve 2910al

Dual ARM1156T2S @ 515 MHz, 4 MB flash, 1 GB compact flash, 512 MB SDRAM; packet buffer size: 6 MB

I am assuming this is a shared packet buffer and not per port, otherwise it is just huge.

We get great and consistent performance (heavily loaded) using 2810g switches at half the price:

MIPS @ 264 MHz, 16 MB flash, 64 MB SDRAM; packet buffer size: 0.75 MB (Per port? Shared?)

Now my understanding is that the 3750 in its various guises has at minimum a 0.75mb input packet buffer per 4 ports, along with (upto) a 2MB output buffer per every four ports (This is a shared buffer system with minimum values and the ability for each port to 'borrow' from a common pool of buffer memory)

This tells me that at a minimum 4.5MB of recieve buffer is *always* available over the switch which is not too far off what the 'recommended' switch (2910al, 6MB) has.

Add to this the dynamic output buffer and dont we get a very close (more likely larger) amount of frame buffer on the 3750 over the 2910al?

Or am I completely off the tracks?

If you really want your head to explode, consider the port buffers in a stackwise configuration... Latency higher or lower?

Regards.

David Tocker

Gediminas Vilutis · ‎08-27-2012

Regarding 3750. Unfortunately I was unable to find official Cisco spec stating buffer size for this and all lower switches (I did research ~1 year ago, maybe now something new is released). So what I can tell is only from my own experience.

I never used this switch for iscsi, but we had these in our WAN environment. Local internet peering exchange was connected to port on the switch. Packet drops started to appear when outbound average traffic reached ~600 Mbps, all due to lack of free buffers.... For bursty iscsi traffic the situation could be even worse. I haven't chance to play with cisco qos (as far as I remember packet buffer by default is divided to 4 separate subbuffers with no sharing between them), since we upgraded link to 10G.

BTW, cats 3750 in stack do not do cross switch LACP trunks - LACP trunk can be done only within one switch. 3750 stack is mainly just for ease of management...

BR,

Gediminas

David_Tocker · ‎08-27-2012

Hi There.

Apparently LACP can be configured cross-stack now. Previous IOS revisions could not:

http://www.cisco.com/en/US/products/hw/switches/ps5023/products_configuration_example09186a00806cb982.shtml#lacp

Link Aggregation Control Protocol (LACP) and Port Aggregation Protocol (PAgP)

EtherChannels have automatic configuration with either Port Aggregation Protocol (PAgP) or Link Aggregation Control Protocol (LACP). PAgP is a Cisco-proprietary protocol that you can only run on Cisco switches and on those switches that licensed vendors license to support PAgP. IEEE 802.3ad defines LACP. LACP allows Cisco switches to manage Ethernet channels between switches that conform to the 802.3ad protocol.

PAgP cannot be enabled on cross-stack EtherChannels while LACP is supported on cross-stack EtherChannels from Cisco IOS Software Release 12.2(25)SEC and later. Switch interfaces exchange LACP packets only with partner interfaces with the active or passive mode configuration. You can configure up to 16 ports to form a channel. Eight of the ports are in active mode, and the other eight are in standby mode. When any one of the active ports fails, a standby port becomes active. Interfaces with the on mode configuration do not exchange PAgP or LACP packets.

Regards.

David Tocker

Prakash Singh_1 · ‎08-27-2012

Hi,

You can also refer to the HP guided troubleshooting tree for P4000 series performance issues and some good solutions.

Click on link :

http://h20584.www2.hp.com/hpgt/guides/select?lang=en&cc=us&prodTypeId=12169&prodSeriesId=4118659

click on : HP P4000 Performance Troubleshooting and white papers

Regards,

PS
To assign points on this post? Click the white Thumbs up below!

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: 4x P4300G2 SAS performance worries.

4x P4300G2 SAS performance worries.