Operating System - HP-UX
1752815 Members
5681 Online
108789 Solutions
New Discussion

Re: VM loses network connection

 
SOLVED
Go to solution
Jeromejay
Advisor

VM loses network connection

Hi all,

 

So, we built 2 VM on a BL860C blade, and everything is working fine: both are fully configured and running, no issues there.

However, after some time (ranging between 2h and a a few weeks), one of the 2 VM loses its network connection completely...

 

Things we have seen:

- from the VM console, /sbin/init.d/net stop  and /sbin/init.d/net start does not solve the issue

- rebooting the VM solves the issue

- restarting the virtual switch solves the issue

 

 

I don't think this issue can be solved on the spot with those infos (plus the ones below)... but my question is then:

=> what more can we check ?

we've checked logs (see below), NIC status, Ip status ... we can't find anything relevant.

ie: do you have specific commands for network troubleshooting we could use ?

 

Thanks for your help !

 

 

Some more info:

 

There are mostly no logs on either side, except in the VM syslog, which seems to be a result of the issue:

Jan 30 09:54:21 soem2 vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xa63xxc01 .See ndd -h ip_ire_gw_probe for more info

 

 

bash-4.0# nwmgr -c lan0 -v
lan0:
   Interface State =UP
   MAC Address = 0xCA5xxx75B9A
   Subsystem = igssn
   Interface Type = 1000Base-T
   Hardware Path = 0/0/1/0
   NMID = 1
   Feature Capabilities = Physical Interface
                          IPV4 Recv CKO
                          IPV4 Send CKO
                          VLAN Tag Offload
                          64Bit MIB Support
                          IPV4 TCP Segmentation Offload
                          UDP Multifrag CKO
   Feature Settings = Physical Interface
                      IPV4 Recv CKO
                      IPV4 Send CKO
                      VLAN Tag Offload
                      64Bit MIB Support
                      IPV4 TCP Segmentation Offload
                      UDP Multifrag CKO
   MTU = 1500
   Speed = 1 Gbps Full Duplex (Autonegotiation : On)

15 REPLIES 15
Stan_M
HPE Pro

Re: VM loses network connection

You did not provide HPVM version, AVIO drivers version nor any details about interface to which the vswitch is connected.

So we can speak only on a generic level - make sure to have the latest AVIO driver on both host and guest as well

as up to date driver for the underlying physical NIC on the host.

I work for HPE

Re: VM loses network connection


@Jeromejay wrote:

Jan 30 09:54:21 soem2 vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xa63xxc01 .See ndd -h ip_ire_gw_probe for more info

 

 

Looks like you need the following in your /etc/rc.config.d/nddconf :

 

TRANSPORT_NAME[3]=ip
NDD_NAME[3]=ip_ire_gw_probe
NDD_VALUE[3]=0

 

Adjust "[3]" to your other entries.

 

Cheers,

Jeromejay
Advisor

Re: VM loses network connection

"make sure to have the latest AVIO driver on both host and guest as well

as up to date driver for the underlying physical NIC on the host"
Thanks: I'm checking now ...
that's exactly this kind of advice I needed ;)
Jeromejay
Advisor

Re: VM loses network connection

TRANSPORT_NAME[3]=ip
NDD_NAME[3]=ip_ire_gw_probe
NDD_VALUE[3]=0

From what I understand, the log error message is more a consequence than a cause...
Changing this will only remove the detection
Bill Hassell
Honored Contributor
Solution

Re: VM loses network connection

Dead gateway detection...

 

I have found this on dozens of 'hung' systems causing hours of downtime and unnecessary reboots.

 

Turn it OFF.!

 

What is happening is that HP-UX will ping each of the gateways about every 3 -4 minutes and if the gateway fails to respond (or more likely, the ICMP packet gets lost), the network is immediately disabled, a very bad thing for any production system.  And some network administrators may decide to turn off ping response from gateways as a security measure, which means that every HP-UX system with dead gateway detection enabled will disappear from the network, usually resulting in mass panic from the end users and the desperate system administrator will reset (crash) the system to reboot.

 

This is yet another reason to verify that 100% of your systems had GSP/MP network access, a known to work LAN connection that is *NOT* affected by the dead gateway mess. By logging in over the console, you can determine that the system is NOT hung, but simply off the network.



Bill Hassell, sysadmin
Jeromejay
Advisor

Re: VM loses network connection

Thanks for the full info !

that's appreciated ;)

 

also: I really thought the error message was a consequence, whereas it's actually the cause ...

 

So now, I'm on to re-configuring all our servers :/

Patrick Wallek
Honored Contributor

Re: VM loses network connection

The dead gateway detection turning off the network will definitely cause you problems.

 

I have seen cases where the network was REALLY REALLY busy (in one case doing a backup over the network of an NFS mounted filesystem with a single interface) which likely caused the dead gateway detection ping to fail, thus causing the network to go down.

 

Basically this is a heads up for you that when you turn off the dead gateway detection, you may start seeing other symptoms on this VM, which may have been masked becuase the network was disabled.

Patrick Wallek
Honored Contributor

Re: VM loses network connection

Additionally, you can manually set the ip_ire_gw_probe value from the command line:

 

# ndd -set /dev/ip ip_ire_gw_probe 0

 

The above will set the value to '0' (disabled).  To check the value:

 

# ndd -get /dev/ip ip_ire_gw_probe

0

 

The instructions given above with setting up nddconf will only set the value when the system is rebooted, which is desireable.  But if you can't reboot the system, then use the ndd command to set the value now.

Jeromejay
Advisor

Re: VM loses network connection

Hi again,

 

so before making any changes accross all servers, and because I have some time, I thought I'd go for a quick test first:

 

- I blocked outgoing ICMP on the server (using firewall).

- After the expected ~3min, I started getting the Error messages about Dead Gateway ... 

- but the network connectivity was still there (I can still SSH, and HTTP to the server).

 

so:

- either my quick test is flawed

- either the error message is a consequence of the server dropping its network connectivity (ie: something else fails, and then the server can't ping the GW, and displays the message).

 

In case it's the 2nd option, could you give me an exhaustive list of checks I can do, for network investigation ? (my knowledge stops at ping, nwmgr basic commands, netstat, lsof, log investigation, lanscan)

 

thanks again for all the tips and explanations !