Operating System - HP-UX
1833012 Members
3023 Online
110048 Solutions
New Discussion

VM's randomly lose Network connection

 
Paul_King
Occasional Advisor

VM's randomly lose Network connection

We currently have a BL860iC running two virtual machines. On a random basis these machines drop off the network. The machines were stable for the last 60 days but over the last two days the machines have dropped off the network again.

 

The VM's do not ping but we can console into them. 

 

- from the VM console, /sbin/init.d/net stop  and /sbin/init.d/net start does not solve the issue

- rebooting the VM solves the issue

- restarting the virtual switch solves the issue

 

We get the following error when the issue occurs:

 

Apr  5 08:07:37 soem2 vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xa63b8c01 .See ndd -h ip_ire_gw_
probe for more info

 

# nwmgr -c lan0 -v
lan0:
   Interface State =UP
   MAC Address = 0xCA5XXXX5B9A
   Subsystem = igssn
   Interface Type = 1000Base-T
   Hardware Path = 0/0/1/0
   NMID = 1
   Feature Capabilities = Physical Interface
                          IPV4 Recv CKO
                          IPV4 Send CKO
                          VLAN Tag Offload
                          64Bit MIB Support
                          IPV4 TCP Segmentation Offload
                          UDP Multifrag CKO
   Feature Settings = Physical Interface
                      IPV4 Recv CKO
                      IPV4 Send CKO
                      VLAN Tag Offload
                      64Bit MIB Support
                      IPV4 TCP Segmentation Offload
                      UDP Multifrag CKO
   MTU = 1500
   Speed = 2.0 Gbps Full Duplex

 

The odd thing today was that the VM's came back without a restart of the virtual switch or the actual VM

 

on the host we are running:

 

VMGuestLib                    B.04.30        Integrity VM Guest Support Libraries
  VMMGR                         A.6.1.0.89662  HP-UX Integrity Virtual Server Manager

 

# swlist | grep AVIO
  GuestAVIOStor                                 B.11.31.1211   HPVM Guest AVIO Storage Software
  GuestAvioLan                                  B.11.31.1211   HPVM Guest AVIO LAN Software
  HostAVIOStor                                  B.11.31.1211   HPVM Host AVIO Storage Software
  HostAvioLan                                   B.11.31.1211   HPVM Host AVIO LAN Software

 

18 REPLIES 18
Torsten.
Acclaimed Contributor

Re: VM's randomly lose Network connection

>> Speed = 2.0 Gbps Full Duplex


So the host is running APA?

It is worth to check if APA config, drivers and LOM firmware is correct and up to date.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Hello

 

Thanks for the response. I am pretty new to the virtualised solution. How do I find if I am running APA and if so what drivers?

 

 The network setup on the host is:

 

bash-4.0# nwmgr -c lan0 -v
lan0:
   Interface State =UP
   MAC Address = 0x3CDXXXXF934A
   Subsystem = iexgbe
   Interface Type = 10GBASE-KR
   Related Interface = lan900
   Hardware Path = 0/0/0/3/0/0/0
   NMID = 1
   Feature Capabilities = Physical Interface
                          IPV4 Recv CKO
                          IPV4 Send CKO
                          VLAN Tag Offload
                          64Bit MIB Support
                          IPV4 TCP Segmentation Offload
                          UDP Multifrag CKO
   Feature Settings = Physical Interface
                      IPV4 Recv CKO
                      IPV4 Send CKO
                      VLAN Tag Offload
                      64Bit MIB Support
                      IPV4 TCP Segmentation Offload
                      UDP Multifrag CKO
   MTU = 1500
   Speed = 1000 Mbps Full Duplex

Torsten.
Acclaimed Contributor

Re: VM's randomly lose Network connection

Just run

# nwmgr -S apa

on the host and we will see if this is used or not.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Hello

 

bash-4.0# nwmgr -S apa
Class    Mode        Load      Speed-               Membership
Instance             Balancing Duplex
======== =========== ========= ==================== ===========================
lan900   MANUAL      LB_MAC    2 Gbps Full Duplex   0,1
lan901   Not_Enabled LB_MAC    0 Mbps                -
lan902   Not_Enabled LB_MAC    0 Mbps                -
lan903   Not_Enabled LB_MAC    0 Mbps                -
lan904   Not_Enabled LB_MAC    0 Mbps                -

 

 

 

Torsten.
Acclaimed Contributor

Re: VM's randomly lose Network connection

OK, lan0 and lan1 are trunced on the host.

Now check with swlist what version is running, e.g. B.11.31.60.
(swlist | grep -i apa or auto port)

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Thanks Torsten

 

bash-4.0# swlist | grep -i apa
  HPUXExtns-Jpn                 B.11.31        Japanese font, input methods and printer extensions
  HPUXMan-Jpn                   B.11.31.1203   Minimum and Essential Japanese man pages
  HPUXMsgs-Jpn                  B.11.31        Minimum and Essential HP-UX Japanese Language Message Catalogs
  hpuxws22Apache                B.2.2.15.09    HP-UX Apache-based Web Server

bash-4.0# swlist | grep -i auto*
  DSAUtilities                  C.01.00.20     HP-UX Distributed Systems Administration Utilities
  J4240AA                       B.11.31.60     Auto-Port Aggregation Software

Torsten.
Acclaimed Contributor

Re: VM's randomly lose Network connection

It is meanwhile newer:

https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPUX-APA

but your version

J4240AA B.11.31.60 Auto-Port Aggregation Software

is better than the older versions. I don't know about fixes yet.

Is this a BL860c or BL860c i2?

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Torsten.
Acclaimed Contributor

Re: VM's randomly lose Network connection

Sounds like you better should have the newer version:

http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c03684559/c03684559.pdf

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Hello

 

It is a Integrity BL860c i2.

Torsten.
Acclaimed Contributor

Re: VM's randomly lose Network connection

Could you try

# nwmgr -q vpd -c lan0

to get the firmware of the LOM?

The FW of the server itself would be nice to know too:


MP:CM> sysrev

(or just look into the firmware table of the onboard administrator)

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Hello

 

bash-4.0# nwmgr -q vpd -c lan0
lan0 VITAL PRODUCT DATA:
   Product Description : Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC
   Part Number: N/A
   Engineering Date Code : N/A
   Part Serial Number : 0123456789
   Misc. Information : N/A
   Mfd. Date : N/A
   Check Sum : 0xe4
   EFI Version : 5.2.58
   ROM Firmware Version : 6.2.21
   Asset Tag : N/A

 

 

Firmawre:

 

System Firmware 01.92

 

Torsten.
Acclaimed Contributor

Re: VM's randomly lose Network connection

This looks good.

 

 

I'll tell you the story. Almost a year ago we started with a firmware upgrade of a server like yours, including the strongly recommended LOM firmware upgrade. After this the system had lost the network.

 

Why?

 

We upgraded APA and the issue remained.

 

Finally it turned out there was a problem with the network switch configuration. The previous server setup did just ignore this and worked more or less (2 trunced NIC pairs bundled in failover = 4 NICs involved, this made 1 logical network interface).

 

The new setup based on new firmware/software did now check the switch too and did not allow the configuration.

 

 

We had pass-thru modules installed, but you may have switches in the blade?

 

 

I would ask the network team if they can check the switches and the logs if they can see anything. Even a bad cable can create a lot of trouble.

 

 

Remember, there is so much involved here: the blade, several switches and cables, firmware, drivers, software, virtualization, drivers and software again ... you can only investigate step by step.


Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Thanks for your advice Torsten. We have other blades running an older release of the VM software and they run fine, this blade we took a much newer release and have run into trouble.

 

As you say multiple components so will need to start from the beginning.

 

Thanks again

Bill Hassell
Honored Contributor

Re: VM's randomly lose Network connection

>> Dead gateway detection can't ping the last remaining default gateway

 

I can't believe that this 'feature' is still turned on these days.

You are probably losing connectivity due to the dead gateway detection mechanism introduced years ago.

 

Here's how it works:

 

The network stack has a health check routine that pings the gateway for each active network.

If a ping fails to return, the gateway is assumed to be dead and the card is disabled.

 

The fix is easy (and my not so humble opinion should be the default).

Edit the nddconf file in /etc/rc.config.d and add these lines:

 

TRANSPORT_NAME[0]=ip
NDD_NAME[0]=ip_ire_gw_probe
NDD_VALUE[0]=0

 

NOTE: If something has already been added to this file that is using [0], pick the next unused array reference such as [1] or [2].

 

The above change will make the setting permanent between reboots.

Then turn off the feature with:

 

ndd -set /dev/ip ip_ire_gw_probe 0

 

And the problem should go away.

 

I've looked for several years trying to find a good reason why loss of a ping should disable the network. Some network admins turn off ICMP ping response completely at gateways which means you'll find the dead gateway detect issue immediately. For me, if a ping fails, I would like the network stack to keep trying in case the problem is transient. My own testing shows that on a subnet with 50 connections (servers, PCs, printers...) will lose 20%-40% of a 10-ping-per-hour test about once a week. My preference is that all the equipment keep running even if there is an occasional loss of a ping.



Bill Hassell, sysadmin
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Thanks Bill

 

I do this on the VM's I take it.

 

At the present I have the following:

 


TRANSPORT_NAME[0]=tcp
NDD_NAME[0]=tcp_keepalive_interval
NDD_VALUE[0]=3600000

TRANSPORT_NAME[1]=tcp
NDD_NAME[1]=tcp_keepalive_detached_interval
NDD_VALUE[1]=60000

TRANSPORT_NAME[2]=tcp
NDD_NAME[2]=tcp_time_wait_interval
NDD_VALUE[2]=20000

TRANSPORT_NAME[3]=tcp
NDD_NAME[3]=tcp_fin_wait_2_timeout
NDD_VALUE[3]=10000

 

I take it all I need to do is add:

 

TRANSPORT_NAME[4]=ip
NDD_NAME[4]=ip_ire_gw_probe
NDD_VALUE[4]=0

 

Thanks

Bill Hassell
Honored Contributor

Re: VM's randomly lose Network connection

Correct except it should be applied to every copy of HP-UX (11.00 and up) you have running on any platform.



Bill Hassell, sysadmin
Eric SAUBIGNAC
Honored Contributor

Re: VM's randomly lose Network connection

Bonjour,

 

 

If you loose local connections too, I really can't believe that the "dead gateway detection" is the source of the problem. I think it's just a side effect of loosing network connectivity, not the source of the problem.

 

More: in my mind [ but maybe it is a mistake, a false idea ? ], the dead gateway detection mechanism doesn't disable any lan card. If a gateway isn't pingable, routes through the gateway are deactivated. OK. But the lan card itself is not disabled. I have seen many cases where remote connections didn't work anymore while local connections were still operational. In this case deactivating ip_ire_gw_probe was the solution. Globally I agree with Bill : this feature should always be turned off, but in your case I don't think it will solve anything.

 

As underlined by Torsten, check all the components. And upgrade :

 

- to begin at low level, firmware of LOM : http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=3709945&prodSeriesId=4186428&swItem=MTX-ff5406036ec541018bb9713974&prodNameId=4186429&swEnvOID=4001&swLang=8&taskId=135&mode=4&idx=1 . Note : as mentioned, FC mezzanine cards should be upgrades first

 

- in a more wide way, the whole hardware : http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=4186429&prodTypeId=3709945&prodSeriesId=4186428&swLang=8&taskId=135&swEnvOID=4001

 

- at a higher level, HP-UX drivers like 10GigEthr-02 (iexgbe) https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=10GigEthr-02

 

And so on ...

 

Also, when the problem appears, what about the aggregate itself [ nwmgr -c lan900 -S apa -v ] ? Are all the nics active and ready ? Any valuable information in syslog, at the console (both host and guest) ?

 

And finally a call to HP support center will certainly help. For example, are you sure that a  MANUAL aggregate between LOM is supported ?

 

HTH

 

Eric

Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Hi Eric

 

Thanks for the reponse.

I am running onHPVM v6.1 and not the latest 6.1.5.

There may be an issue in v6.1 similar to your issue that is fixed with PK3 of v6.1.5.

 

does anyone know if there is a fix for this issue in the 6.15 release?

 

Thanks

 

Paul