StoreVirtual Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

Excessive packet loss after VSA upgrade from 12.0 to 12.6

 
Thomas_SG
Advisor

Excessive packet loss after VSA upgrade from 12.0 to 12.6

Community

I just upgraded one of my 3-node VSA clusters today from v12.0 to 12.6. The upgrade process was smooth with no issues along the way.
Only right after the upgrade completed successfully, I started getting warnings which said the following

The IP 'VSA1' has 'Excessive' packet loss. Packet Loss is ' 1.27'%.

I have a Cisco 3850 switch stack where these HP ProLiant DL380p Gen8 servers are connected to. Switch ports did not show any errors, input/output queue or any abnormally high CPU at this time.
I received these warnings for 2 of the 3 nodes and it continued on for about 1 hour after the upgrade was completed, after which it stopped.
I had a look at some performance counters within StoreVirtual CMC and I saw several peaks for "Queue Depth Total" which went to 100 at times.

Just wondering if these warnings are normal/expected after performing such an upgrade?

I have two other 3-node clusters to upgrade as well, so wanted to get some feedback on this before I proceed.

Thanks.

18 REPLIES 18
Highlighted
vlho
Advisor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Hi,

one of my customers has same problem. After start backup job (high utilization) one of two VSA node reports Excessive packet loss. Customer has two switches HP 2920 in stack (gigabit). I think that problem is in version 12.6...

Thomas_SG
Advisor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Thank you for the comments.
I have not seen the issue again after I upgraded and it has now been about 3 days since I did it.
I am guessing it was related to the volume sync after the nodes were rebooted as part of the upgrade, however I cannot be sure.

I am going to upgrade my second VSAN cluster tomorrow so we'll see how that plays out.

Hasrizal
Visitor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Any update regarding this issue? coz I'm also facing similar problem..

Thomas_SG
Advisor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

I upgraded a second VSA cluster yesterday and I did not see any such packet loss notifications.
It is worth mentioning that this was in a 10Gbe environment, so performance might have not been impacted that much.
I have a final cluster (1Gbe) to upgrade, likely tomorrow, so we'll see how this plays out.
As for the first cluster I upgraded, it has been stable since the upgrade and no further alarms about high packet loss.

Thomas_SG
Advisor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

I upgraded my third VSA cluster today and did not see any packet loss errors.
And I have also not seen any further alerts from the first cluster I upgraded a while back.

I am not going to spend any more time on this unless this issue reappears.

 

Stor_Mort
HPE Pro

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

There is a new feature in LHOS 12.6 which produces an informational warning if excessive packet loss is detected. This often uncovers pre-existing network issues - the packet loss is probably not due to the 12.6 upgrade. Examination of the hist.netstat.log file in the support bundle will show the hourly TCP packets out and retransmission count for several previous weeks. Some spreadsheet math will reveal the hourly retransmission rates.

The additional troubleshooting messages were added by HPE LeftHand engineering because of many support cases which identified network issues rather than storage system issues. The idea was to help administrators look in the right place to resolve these kinds of problems, which are often brief, intermittent, hard to track down and affect the end-user experience significantly.

Do not ignore these messages! Look at the switch logs and interface counters for more clues to the cause of them. Some possibilities are switch firmware bugs, inadequate switching resources (buffer memory, backplane capacity, CPU time), or possibly traffic surges that exceed the available path bandwidth.

Don't assume that because your network has worked for the last 10 years that everything will always be fine. Storage traffic has a tendency to grow over time as storage nodes and servers are added. You may not see signs of stress unless you look for it with the right tools.

 

I am an HPE employee - HPE SImpliVity Support

Accept or Kudo

kzee
Occasional Visitor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

The question right now is... what happen after we ack the messages?

How do we even reset the statistic or clear the alert notice?

Will the alert keeps showing us the historical data and be stuck there forever?

 

Stor_Mort
HPE Pro

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Each 12.6 storage node checks the TX retransmission rate every five minutes. When it is above a few tenths of a percent, a CMC alarm is generated and an email is sent, if you have email notifications enabled. The alarm is reset automatically after the next sample interval when packet loss goes below the threshold.

The alarm is being generated by a condition that could impair system performance. The correct response to the alarm is to fix the packet loss issue in the network.

I am an HPE employee - HPE SImpliVity Support

Accept or Kudo

Wraith
Occasional Visitor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Hi - I'm getting this too, but after a network upgrade (comverted each VSA host to 10GBe from 1GBe). Two hosts converted several weeks ago and showing no packet loss; third host converted this morning showing 2.7% (plus or minus) for last 30 minutes or so. My config is different from the norm, but supported, in that I am using two iSCSI storage arrays instead of local storage. Host 1 connects via iSCSI on 10GBe to a SAN, Host 2 is a failover manager only, and host 3 connects to a different SAN via iSCSI. Hosts 2 and 3 have been on the 10GBe switches for several weeks; host 1 joined them today and is apparently dropping packets. Note that the CMC host is doing continuous pings to both host 1 and to the iSCSI SAN presented by host 1 with 0% packet loss, yet I am seeing these messages. Port counters on the 10GBe switches show 0 issues. Any ideas?