- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: VM loses network connection
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 02:53 AM
01-30-2013 02:53 AM
Hi all,
So, we built 2 VM on a BL860C blade, and everything is working fine: both are fully configured and running, no issues there.
However, after some time (ranging between 2h and a a few weeks), one of the 2 VM loses its network connection completely...
Things we have seen:
- from the VM console, /sbin/init.d/net stop and /sbin/init.d/net start does not solve the issue
- rebooting the VM solves the issue
- restarting the virtual switch solves the issue
I don't think this issue can be solved on the spot with those infos (plus the ones below)... but my question is then:
=> what more can we check ?
we've checked logs (see below), NIC status, Ip status ... we can't find anything relevant.
ie: do you have specific commands for network troubleshooting we could use ?
Thanks for your help !
Some more info:
There are mostly no logs on either side, except in the VM syslog, which seems to be a result of the issue:
Jan 30 09:54:21 soem2 vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xa63xxc01 .See ndd -h ip_ire_gw_probe for more info
bash-4.0# nwmgr -c lan0 -v
lan0:
Interface State =UP
MAC Address = 0xCA5xxx75B9A
Subsystem = igssn
Interface Type = 1000Base-T
Hardware Path = 0/0/1/0
NMID = 1
Feature Capabilities = Physical Interface
IPV4 Recv CKO
IPV4 Send CKO
VLAN Tag Offload
64Bit MIB Support
IPV4 TCP Segmentation Offload
UDP Multifrag CKO
Feature Settings = Physical Interface
IPV4 Recv CKO
IPV4 Send CKO
VLAN Tag Offload
64Bit MIB Support
IPV4 TCP Segmentation Offload
UDP Multifrag CKO
MTU = 1500
Speed = 1 Gbps Full Duplex (Autonegotiation : On)
Solved! Go to Solution.
- Tags:
- NIC
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 03:28 AM
01-30-2013 03:28 AM
Re: VM loses network connection
You did not provide HPVM version, AVIO drivers version nor any details about interface to which the vswitch is connected.
So we can speak only on a generic level - make sure to have the latest AVIO driver on both host and guest as well
as up to date driver for the underlying physical NIC on the host.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 04:20 AM
01-30-2013 04:20 AM
Re: VM loses network connection
@Jeromejay wrote:Jan 30 09:54:21 soem2 vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xa63xxc01 .See ndd -h ip_ire_gw_probe for more info
Looks like you need the following in your /etc/rc.config.d/nddconf :
TRANSPORT_NAME[3]=ip
NDD_NAME[3]=ip_ire_gw_probe
NDD_VALUE[3]=0
Adjust "[3]" to your other entries.
Cheers,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 04:49 AM
01-30-2013 04:49 AM
Re: VM loses network connection
as up to date driver for the underlying physical NIC on the host"
Thanks: I'm checking now ...
that's exactly this kind of advice I needed ;)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 04:53 AM
01-30-2013 04:53 AM
Re: VM loses network connection
NDD_NAME[3]=ip_ire_gw_probe
NDD_VALUE[3]=0
From what I understand, the log error message is more a consequence than a cause...
Changing this will only remove the detection
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 06:02 AM - edited 01-30-2013 06:04 AM
01-30-2013 06:02 AM - edited 01-30-2013 06:04 AM
SolutionDead gateway detection...
I have found this on dozens of 'hung' systems causing hours of downtime and unnecessary reboots.
Turn it OFF.!
What is happening is that HP-UX will ping each of the gateways about every 3 -4 minutes and if the gateway fails to respond (or more likely, the ICMP packet gets lost), the network is immediately disabled, a very bad thing for any production system. And some network administrators may decide to turn off ping response from gateways as a security measure, which means that every HP-UX system with dead gateway detection enabled will disappear from the network, usually resulting in mass panic from the end users and the desperate system administrator will reset (crash) the system to reboot.
This is yet another reason to verify that 100% of your systems had GSP/MP network access, a known to work LAN connection that is *NOT* affected by the dead gateway mess. By logging in over the console, you can determine that the system is NOT hung, but simply off the network.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 06:43 AM
01-30-2013 06:43 AM
Re: VM loses network connection
Thanks for the full info !
that's appreciated ;)
also: I really thought the error message was a consequence, whereas it's actually the cause ...
So now, I'm on to re-configuring all our servers :/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 07:03 AM
01-30-2013 07:03 AM
Re: VM loses network connection
The dead gateway detection turning off the network will definitely cause you problems.
I have seen cases where the network was REALLY REALLY busy (in one case doing a backup over the network of an NFS mounted filesystem with a single interface) which likely caused the dead gateway detection ping to fail, thus causing the network to go down.
Basically this is a heads up for you that when you turn off the dead gateway detection, you may start seeing other symptoms on this VM, which may have been masked becuase the network was disabled.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 07:07 AM
01-30-2013 07:07 AM
Re: VM loses network connection
Additionally, you can manually set the ip_ire_gw_probe value from the command line:
# ndd -set /dev/ip ip_ire_gw_probe 0
The above will set the value to '0' (disabled). To check the value:
# ndd -get /dev/ip ip_ire_gw_probe
0
The instructions given above with setting up nddconf will only set the value when the system is rebooted, which is desireable. But if you can't reboot the system, then use the ndd command to set the value now.
- Tags:
- ndd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 07:11 AM
01-30-2013 07:11 AM
Re: VM loses network connection
Hi again,
so before making any changes accross all servers, and because I have some time, I thought I'd go for a quick test first:
- I blocked outgoing ICMP on the server (using firewall).
- After the expected ~3min, I started getting the Error messages about Dead Gateway ...
- but the network connectivity was still there (I can still SSH, and HTTP to the server).
so:
- either my quick test is flawed
- either the error message is a consequence of the server dropping its network connectivity (ie: something else fails, and then the server can't ping the GW, and displays the message).
In case it's the 2nd option, could you give me an exhaustive list of checks I can do, for network investigation ? (my knowledge stops at ping, nwmgr basic commands, netstat, lsof, log investigation, lanscan)
thanks again for all the tips and explanations !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 07:18 AM
01-30-2013 07:18 AM
Re: VM loses network connection
>>but the network connectivity was still there (I can still SSH, and HTTP to the server).
Where were you SSH'ing from? Were you on the same network segment as the VM (where you DO NOT have to go through the router)? If so, the fact that you can SSH and HTTP makes sense.
Things to check:
netstat -in
netstat -rn
ping a server on the same subnet
ping the router
ping something on a different network subnet
- Tags:
- netstat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 07:34 AM
01-30-2013 07:34 AM
Re: VM loses network connection
I was SSHing from another network, and HTTP from yet another one ... which, in short, implies that the network connectivity was still there (as opposite to the original outage, where everything was down).
Also: I could not ping anything, since I blocked ping outbound (maybe I should have blocked the GW IP only).
Thinking back on it:
as mentionned in my original post: restarting the virtual switch on the physical host solved the issue for the Guest ... would that indication tell us that the Dead GW detection was a consequence, and not the issue ?
Moreover: we know for sure that the GW is fine (reliable server, used by many other servers). If our faulty server failed 1 ping to the GW, and activate the infamous Dead GW detection by stopping using this route ... would it not come back on the next succesful ping ? (I assume it keeps on trying, since we have error messages every 183seconds).
All in all: the more I think on it, the more I think the Dead GW detection error message is a consequence of another failure ...
note: still in the process of updating the AVIO drivers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 07:47 AM
01-30-2013 07:47 AM
Re: VM loses network connection
>>restarting the virtual switch on the physical host solved the issue for the Guest ... would that indication tell us that the Dead GW detection was a consequence, and not the issue?
I would think so, yes.
Is there a way to check statistics for the virtual switch? Things like packets in, packets out, number of errors, etc?
>>would it not come back on the next succesful ping ?
I don't think it does. I think once it is disabled, it stays that way. I could be wrong though...
>>the more I think the Dead GW detection error message is a consequence...
I tend to agree. A ping is a pretty low level check. If the network is so busy that a ping is dropped, then I would think there are other issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 07:52 AM
01-30-2013 07:52 AM
Re: VM loses network connection
>>>>would it not come back on the next succesful ping ?
>>I don't think it does. I think once it is disabled, it stays that way. I could be wrong though...
no offence, but I hope you're wrong :) (I can't conceive HP would have done something that stupid).
Also: since the error repeats every 183sec in the log file, I guess it keeps on trying.
As for checking the virtual switch: I should have done it before the restart ... (like any other investigation ...).
As usual in this case: I'm not sure if I hope it happens again so I can investigate, or if I hope it never happens again .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 08:31 AM
01-30-2013 08:31 AM
Re: VM loses network connection
I have been looking into the dead gateway detection some more and found something I was not aware of.
Supposedly if the dead gateway is the last default gateway it will remain enabled, but a message will still be logged.
To check the status of a gateway:
# ndd -get /dev/ip ip_ire_status | grep -e IRE_GATEWAY -e flag
I cannot find anything definitive about re-enabling a gateway, but the following from 'ndd -h' indicates that it should:
# ndd -h ip_ire_gw_probe_interval ip_ire_gw_probe_interval: Controls the probe interval for Dead Gateway Detection. IP periodically probes active and dead gateways. ip_ire_gw_probe_interval controls the frequency of probing. With retries, the maximum time to detect a dead gateway is ip_ire_gw_probe_interval + 10000 milliseconds.
Maximum time to detect that a dead gateway has come back to life is ip_ire_gw_probe_interval. [15000,- ] Default: 180000 (3 minutes)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2013 11:34 PM
01-30-2013 11:34 PM
Re: VM loses network connection
Thank you so much for the additional information !
I'll keep the command to check the Dead Gateway status ... although, since the error message is logged, I can already guess the results.
note: too bad, the server is still up and running
ps: I forgot to add: the other VM on the same host has the same IP settings than the one failing ... if one detects the GW as down, the other "should" maybe do the same
Thanks again !