Wringing the best performance out of a VSA cluster?

BulkRate · ‎05-07-2012

Using VSA 9.5 on ESXi 4.1...aside from the obvious like using dedicated spindles/vmdk's for the storage backing the VSA's data volumes, setting reservations on host RAM & CPU... what do you do to get them to work as best they can?

Reason I ask is that while I'm not seeing latency spikes like some have, I'm also not seeing very good throughput, or at least not what I'd expect allowing for the limitations of the installation. I have the following:

2 VSAs, each running on a different physical host, with Network Raid 10 selected as the data protection level.
Each VSA fronts for VMDKs hosted on a dedicated RAID0 volume on each host composed of two 3TB 10L SAS drives. For this application, fault tolerance at the node level = OK.
The host storage controller is a P410 w/ 1GB FBC module installed, its charge is reported as OK.
Guest that's connected to the volumes (backing them up) is on a 3rd host with dual gigabit NICs dedicated to storage traffic. My earlier thread on MPIO issues were sorted out and determined to be me not reading the manual and manually making the additional connection using the other NIC to enable that storage path.
Connectivity between all hosts is in the form of multiple 1 gb/s links.
HP's P4000 DSM is installed and operational. I have verified that unique, individual NICs along the network to storage path are being utilized.
CPU utilization on each VSA seems to stay at around 5-30%, average Storage System latency is between 4-8ms.

I haven't busted out I/O Meter yet, but from several monitoring points it looks like I'm bottlenecked somewhere surprisingly low...getting an average of 20-30 MB/s reported in the CMC per VSA node, which kinda gibes with the job rate estimate provided by BackupExec and the NIC counters on the backup/vsa guests & host level reporting in the vSphere client's performance tab.

Not running with Jumbo Frames, and it looks like Flow control's not allowed on VSAs. Any tips from the trenches on this? Or should I be thanking my lucky stars I'm getting this much as it is?

5y53ng · ‎05-08-2012

I think your bottleneck is only having two spindles per host. I have a mix of six disk hosts and eight disk hosts and the there's a big differences in performance between the two. Also, recovering from a disk failure with RAID-0 is a tough task. I realize your options are limited with two disks, but if you can afford to lose the space I would switch to RAID-1.

BulkRate · ‎05-08-2012

In this situation, we don't expect to recover from a single disk failure on a given node...we're relying on volume-level netraid 10 so that there's a copy of the data on another node in the event that one craps out. The idea was to maximise fault-tolerance (at the chasis-level), storage capacity and IO throughput given the extremely limited # of drives we had available. This storage is basically considered a near-line tier in our organization...it's the receiving end of our satelite office's Remote Copy schedules and will probably also host an Exchange Personal Archive store in the near future.

So I'm a little confused...I'd normally expect be able to read/write more than 15-30MB/s from a two-disk RAID0 array running as DAS, let alone 2 of them coupled with SANiQ running the show. However, I was wrong in my earlier specs...they're SAS, but 7.2K RPM, not 10K. HP's a little quiet in the spec sheets as to their performance characteristics. I also didn't make note of the disk transfer size stats during the running job.

Might this be a situation where enabling the drive write caches would be a good idea (at least for write operations)? Or pleading and seeing if we could add another 2 drives to each array and convert the underlying arrays to RAID10?

oikjn · ‎05-08-2012

I bet its just your IO profile. if you have lots of small random and that just isn't something two 3TB HDDs will handle well. Try doing a benchmark run and you should see where the weakness is. another program you can try is atto benchmark... its really simple to use, but doesn't let you set as much detail into the benchmark.

5y53ng · ‎05-08-2012

we're relying on volume-level netraid 10 so that there's a copy of the data on another node in the event that one craps out

Understood, but I have run into a scenario where one VSA has a newer copy of the mirror than another. A disk failure renders the the node inoperable and since it had the latest copy of the data you lose those volumes. Assuming you have recent snapshots this isn't a show stopper, but if you do not it can be.

BulkRate · ‎05-08-2012

Hmmm...I was under the impression that the remaining node would stay up and get the deciding vote from the FOM servicing that particular Management Group to keep the cluster's volumes online. I'm wrong? (not for the first time)

The remote copies already have a couple of snapshots to "anchor" them so to speak...I'll just have to make sure that the personal archive Store volume has one snapped at all times, then as well?

5y53ng · ‎05-09-2012

Regarding RAID-0,

In most cases that is how the system will work, but I have run into a disk failure scenario on RAID-0 where one node has a newer copy of the data than the mirrored data on another node. In this situation your volumes are all offline and the management group lists the reason as "storage node x is inoperable." Just be careful with RAID 0 and ensure you have the means to recover in the event of a disk failure.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Wringing the best performance out of a VSA cluster?

Wringing the best performance out of a VSA cluster?

Re: Wringing the best performance out of a VSA cluster?

Re: Wringing the best performance out of a VSA cluster?

Re: Wringing the best performance out of a VSA cluster?

Re: Wringing the best performance out of a VSA cluster?

Re: Wringing the best performance out of a VSA cluster?

Re: Wringing the best performance out of a VSA cluster?