StoreVirtual Storage
1751797 Members
5718 Online
108781 Solutions
New Discussion

Re: Degraded performance since adding StoreVirtual Nodes and upgrading to SanIQ v11.5

 
Peter J West
Frequent Advisor

Degraded performance since adding StoreVirtual Nodes and upgrading to SanIQ v11.5

Hi,

 

We needed to add a couple of StoreVirtual 4530's to our existing cluster a few weeks back because we were running low on space.

 

In order for them to co-exist with our existing P4500's we had to upgrade all of them to run SANiQ v11.5.

 

Since we've done this we've noticed that performance has been degraded with some operations such as backups taking much much longer to complete.

 

Looking in vCenter we can clearly see a huge increase in latency when accessing all volumes that are hosts on the storage network.

 

Does anyone have any recommendations as to where we should start with troubleshooting an issue such as this?

 

Thanks

 

Pete

 

4 REPLIES 4
oikjn
Honored Contributor

Re: Degraded performance since adding StoreVirtual Nodes and upgrading to SanIQ v11.5

rebuild is complete?

 

how many nodes were in the cluster before?  How many now?

 

What is your switch situation?  Are you seeing any issues there now?  Is flow control on?

 

How much latency are you talking about?

 

 

What node type did you originally have?

Peter J West
Frequent Advisor

Re: Degraded performance since adding StoreVirtual Nodes and upgrading to SanIQ v11.5

Hi,

 

Sorry for the delay replying - i've been tied up with a few other tasks.

 

In answer to your questions.

 

rebuild is complete?

Yes - it is

 

how many nodes were in the cluster before?  How many now?

8 nodes originally split between two sites - all were P4500 G2's.

 

What is your switch situation?  Are you seeing any issues there now?  Is flow control on?

A total of 4 switches split between two sites  - so two at each location with one NIC on each Storage Node hooked to opposite switches.  Only 1gbit interfaces though.  We weren't anywhere near the limit for bandwidth on 8 nodes so I doubt 10 would push us there - but I will check the throughput.

 

How much latency are you talking about?

 I'll grab some stats from vmWare and the CMC and report back on this tomorrow - i'm on leave today so don't have time to grab the figures.

 

What node type did you originally have?

All P4500 G2's.  The upgrade to SANiQ 11.5 was mandatory to have them co-exist with the new StoreVirtual nodes.

 

 

 

Peter J West
Frequent Advisor

Re: Degraded performance since adding StoreVirtual Nodes and upgrading to SanIQ v11.5

This is what we can see in vCenter.  The change in the graph is exactly around the time that the upgrade was performed and no other significant changes were made at the same time.

 

The graph obviously looks a little odd because vCenter automatically aggregates older data so that explains why the more recent history appears to have more detail samples than the older data.  But clearly the latency on the disk is higher now than it was a couple of months ago.

Disk-Latency-1.png

 

I've opened up the CMC this morning and noticed that a number of patches need applying.  These are 45002-00, 45003-00, 45004-00 and 35024-00.

 

It's also strange to note that if you switch to the advanced upgrade mode then the CMC reports that the current Lefthand OS software version is 10.5.00, despite all nodes in the system reporting they are running version 11.5.00.0673.0.  It also reports that version 10.0.0.1.1486 of the StoreVirtual DSM for MPIO should be installed.  When we did the upgrade to 11.5 it told us to upgrade the MPIO software to the 11.5 version.  So no clue what's going on here.

 

Finally here are some of the figures we see from the CMC.  This is just the default view for the last 7 minutes.  If anyone can help diagose where out problem is and needs more CMC details then please let me know and I can post them.

 

CMC-Stat-1.png

 

I dont' believe that the number of IOPS we're seeing here is a problem, epsecially not when you consider we're distributing the spindles over 10 nodes at two locations.  But maybe someone can spot something amiss?

 

Thanks

 

Pete

 

oikjn
Honored Contributor

Re: Degraded performance since adding StoreVirtual Nodes and upgrading to SanIQ v11.5

think you should go to support about the version issues.  It should all match and you should have everything including the apps at v11.5 now and it should all report that way.

 

One weakness of the storevirtual design is that performance problem on a SINGLE node CAN cause the entire cluster to slow down.  Usually it kicks the offending node out of activity if that happens, but if it isn't bad enough or long enough to kick it out then it could explain your problem.

 

I would watch your LUN read and write ques as well as latency and monitor system latency, CPU and IO activity on each individual node.  You could have one node failing or you could have some network congestion across the WAN since replicating 1k I/O and an AVERAGE of 70MB is probably really close to capacity of that 1Gb WAN link (assuming you actually have a DEDICATED 1Gb link).  I would look to WAN link capacity first while also contacting HP about the version reporting issue.