StoreVirtual Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

Excessive packet loss after VSA upgrade from 12.0 to 12.6

Thomas_SG
Advisor

Excessive packet loss after VSA upgrade from 12.0 to 12.6

Community

I just upgraded one of my 3-node VSA clusters today from v12.0 to 12.6. The upgrade process was smooth with no issues along the way.
Only right after the upgrade completed successfully, I started getting warnings which said the following

The IP 'VSA1' has 'Excessive' packet loss. Packet Loss is ' 1.27'%.

I have a Cisco 3850 switch stack where these HP ProLiant DL380p Gen8 servers are connected to. Switch ports did not show any errors, input/output queue or any abnormally high CPU at this time.
I received these warnings for 2 of the 3 nodes and it continued on for about 1 hour after the upgrade was completed, after which it stopped.
I had a look at some performance counters within StoreVirtual CMC and I saw several peaks for "Queue Depth Total" which went to 100 at times.

Just wondering if these warnings are normal/expected after performing such an upgrade?

I have two other 3-node clusters to upgrade as well, so wanted to get some feedback on this before I proceed.

Thanks.

18 REPLIES
vlho
Advisor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Hi,

one of my customers has same problem. After start backup job (high utilization) one of two VSA node reports Excessive packet loss. Customer has two switches HP 2920 in stack (gigabit). I think that problem is in version 12.6...

Thomas_SG
Advisor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Thank you for the comments.
I have not seen the issue again after I upgraded and it has now been about 3 days since I did it.
I am guessing it was related to the volume sync after the nodes were rebooted as part of the upgrade, however I cannot be sure.

I am going to upgrade my second VSAN cluster tomorrow so we'll see how that plays out.

Hasrizal
Visitor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Any update regarding this issue? coz I'm also facing similar problem..

Thomas_SG
Advisor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

I upgraded a second VSA cluster yesterday and I did not see any such packet loss notifications.
It is worth mentioning that this was in a 10Gbe environment, so performance might have not been impacted that much.
I have a final cluster (1Gbe) to upgrade, likely tomorrow, so we'll see how this plays out.
As for the first cluster I upgraded, it has been stable since the upgrade and no further alarms about high packet loss.

Thomas_SG
Advisor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

I upgraded my third VSA cluster today and did not see any packet loss errors.
And I have also not seen any further alerts from the first cluster I upgraded a while back.

I am not going to spend any more time on this unless this issue reappears.

 

Stor_Mort
HPE Pro

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

There is a new feature in LHOS 12.6 which produces an informational warning if excessive packet loss is detected. This often uncovers pre-existing network issues - the packet loss is probably not due to the 12.6 upgrade. Examination of the hist.netstat.log file in the support bundle will show the hourly TCP packets out and retransmission count for several previous weeks. Some spreadsheet math will reveal the hourly retransmission rates.

The additional troubleshooting messages were added by HPE LeftHand engineering because of many support cases which identified network issues rather than storage system issues. The idea was to help administrators look in the right place to resolve these kinds of problems, which are often brief, intermittent, hard to track down and affect the end-user experience significantly.

Do not ignore these messages! Look at the switch logs and interface counters for more clues to the cause of them. Some possibilities are switch firmware bugs, inadequate switching resources (buffer memory, backplane capacity, CPU time), or possibly traffic surges that exceed the available path bandwidth.

Don't assume that because your network has worked for the last 10 years that everything will always be fine. Storage traffic has a tendency to grow over time as storage nodes and servers are added. You may not see signs of stress unless you look for it with the right tools.

 

I am an HPE employee - HPE StoreVirtual Support
kzee
Occasional Visitor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

The question right now is... what happen after we ack the messages?

How do we even reset the statistic or clear the alert notice?

Will the alert keeps showing us the historical data and be stuck there forever?

 

Stor_Mort
HPE Pro

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Each 12.6 storage node checks the TX retransmission rate every five minutes. When it is above a few tenths of a percent, a CMC alarm is generated and an email is sent, if you have email notifications enabled. The alarm is reset automatically after the next sample interval when packet loss goes below the threshold.

The alarm is being generated by a condition that could impair system performance. The correct response to the alarm is to fix the packet loss issue in the network.

I am an HPE employee - HPE StoreVirtual Support
Wraith
Occasional Visitor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Hi - I'm getting this too, but after a network upgrade (comverted each VSA host to 10GBe from 1GBe). Two hosts converted several weeks ago and showing no packet loss; third host converted this morning showing 2.7% (plus or minus) for last 30 minutes or so. My config is different from the norm, but supported, in that I am using two iSCSI storage arrays instead of local storage. Host 1 connects via iSCSI on 10GBe to a SAN, Host 2 is a failover manager only, and host 3 connects to a different SAN via iSCSI. Hosts 2 and 3 have been on the 10GBe switches for several weeks; host 1 joined them today and is apparently dropping packets. Note that the CMC host is doing continuous pings to both host 1 and to the iSCSI SAN presented by host 1 with 0% packet loss, yet I am seeing these messages. Port counters on the 10GBe switches show 0 issues. Any ideas?

Stor_Mort
HPE Pro

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Hi Wraith, 

The continuous ping from CMC may not show the issue for several reasons. The 12.6 retransmission monitor runs on the storage nodes and results are reported on the CMC. Pings from the CMC may not be traversing the path which is in trouble. It's also possible that the retransmissions are intense but brief if they happen at moments of peak load.

Check the NIC firmware and drivers on the hosts and SAN nodes and update if necessary. If the problem only affects one host, try swapping the cable and SFP with known good ones. We find a lot of 10 Gb packet loss issues are because of cables. Sometimes we find older or low end switches that do not have sufficient switching and buffering resources to handle the peak load.

I am an HPE employee - HPE StoreVirtual Support
dcampregher
Honored Contributor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Hi,

 

I have an environment with the same issues. There are some clusters with P4500G2 running the 12.5 version and 01 cluster running the SV4730 10GbE 12.6 version.

Only the new cluster 12.6 show in a first node  the alarm with the Excessive pack loss, even with no load on it.

Any update ?

-----------
Please assign Kudos
Stor_Mort
HPE Pro

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

You will not get these network alerts on storage nodes that run LHOS 12.5 or earlier. The network monitoring function was added in 12.6. (Note that P4x00 G2 systems cannot be upgraded past 12.5 and so will never report these packet loss alerts.)

If it is only one of your 4730 systems that reports packet loss alerts, you may have a bad cable, a bad SFP+ or bad NIC card at either end of the cable. It is also possible to see these alerts if you have congestion in the network. Be sure there are no 1Gb paths that try to handle the 10Gb traffic.

Here is a fault isolation tip. If you have two redundant paths, disable one to see if the remaining path has the problem. Then restore it and disable the other path to confirm the result. If only one path is affected, substitute known good parts to identify the failing component. If both paths are affected, the problem is probably somewhere else in the network that is common to both paths.

I am an HPE employee - HPE StoreVirtual Support
cenkozkan
Occasional Visitor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Hi,

Same situation here. Just upgraded to 12.6. I have no noticable performance problems and am currently getting these notifications. They have started when the backup started. The Packet Loss is around 1%. Is it possible to disable the notifications ? The system is working very well and is very fast. I see no reason to mess with equipment.

Thanks

Stor_Mort
HPE Pro

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

We recommend that your network have less than 0.1% packet loss to carry iSCSI traffic efficiently. Fixing the network is the best way to shut off the alarms.

There is no easy way to change the alarm threshold. You can open a support case with HPE to have one of our engineers log in to each node remotely and change the threshold permanently.

I am an HPE employee - HPE StoreVirtual Support
ITGuru
Visitor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6


cenkozkan wrote:

Hi,

Same situation here. Just upgraded to 12.6. I have no noticable performance problems and am currently getting these notifications. They have started when the backup started. The Packet Loss is around 1%. Is it possible to disable the notifications ? The system is working very well and is very fast. I see no reason to mess with equipment.

Thanks


Did you ever find a solution?  We just did a server refresh and have CMC 12.6 and same thing, after my backups start, the packet loss error messages start with 0.50% - 1% packet loss and even after the backups are finished, there is no way to clear this message but to reboot the VSA that is generating the error messages.

dcampregher
Honored Contributor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

Hello,

In my casy the Alert was generated by Broadcast of vMotion Network of vSphere inside the same L2 Switch. After create a segmented L2 network ( VLAN ) in the switches and the vSphere the Alert was automatically cleared.

So, i think thatyou have another trafic inside the same L2 Network of the iSCSI.

 

-----------
Please assign Kudos
ITGuru
Visitor

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6


dcampregher wrote:

Hello,

In my casy the Alert was generated by Broadcast of vMotion Network of vSphere inside the same L2 Switch. After create a segmented L2 network ( VLAN ) in the switches and the vSphere the Alert was automatically cleared.

So, i think thatyou have another trafic inside the same L2 Network of the iSCSI.

 


Thank you so much this actually helps as we currently have all networks (iSCSI, vMotion and LAN) on the same L2 broadcast domain.

Stor_Mort
HPE Pro

Re: Excessive packet loss after VSA upgrade from 12.0 to 12.6

That's a great case study. Our experience is that packet loss over a few tenths of a percent causes negative performance impact. That's why the check for packet loss was added to LeftHand OS 12.6.

Please take these alerts seriously. An iSCSI storage system only works as well as the network it's using.

If you got a new car and the Engine Check light came on, would you tape over it and keep driving, or would you find out the cause and get it fixed before it fails? Similarly, it's best to investigate and remediate alerts in your IT systems before the issue becomes critical.

I am an HPE employee - HPE StoreVirtual Support