Operating System - HP-UX
1752782 Members
6118 Online
108789 Solutions
New Discussion юеВ

Re: VM's randomly lose Network connection

 
Torsten.
Acclaimed Contributor

Re: VM's randomly lose Network connection

Could you try

# nwmgr -q vpd -c lan0

to get the firmware of the LOM?

The FW of the server itself would be nice to know too:


MP:CM> sysrev

(or just look into the firmware table of the onboard administrator)

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Hello

 

bash-4.0# nwmgr -q vpd -c lan0
lan0 VITAL PRODUCT DATA:
   Product Description : Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC
   Part Number: N/A
   Engineering Date Code : N/A
   Part Serial Number : 0123456789
   Misc. Information : N/A
   Mfd. Date : N/A
   Check Sum : 0xe4
   EFI Version : 5.2.58
   ROM Firmware Version : 6.2.21
   Asset Tag : N/A

 

 

Firmawre:

 

System Firmware 01.92

 

Torsten.
Acclaimed Contributor

Re: VM's randomly lose Network connection

This looks good.

 

 

I'll tell you the story. Almost a year ago we started with a firmware upgrade of a server like yours, including the strongly recommended LOM firmware upgrade. After this the system had lost the network.

 

Why?

 

We upgraded APA and the issue remained.

 

Finally it turned out there was a problem with the network switch configuration. The previous server setup did just ignore this and worked more or less (2 trunced NIC pairs bundled in failover = 4 NICs involved, this made 1 logical network interface).

 

The new setup based on new firmware/software did now check the switch too and did not allow the configuration.

 

 

We had pass-thru modules installed, but you may have switches in the blade?

 

 

I would ask the network team if they can check the switches and the logs if they can see anything. Even a bad cable can create a lot of trouble.

 

 

Remember, there is so much involved here: the blade, several switches and cables, firmware, drivers, software, virtualization, drivers and software again ... you can only investigate step by step.


Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Thanks for your advice Torsten. We have other blades running an older release of the VM software and they run fine, this blade we took a much newer release and have run into trouble.

 

As you say multiple components so will need to start from the beginning.

 

Thanks again

Bill Hassell
Honored Contributor

Re: VM's randomly lose Network connection

>> Dead gateway detection can't ping the last remaining default gateway

 

I can't believe that this 'feature' is still turned on these days.

You are probably losing connectivity due to the dead gateway detection mechanism introduced years ago.

 

Here's how it works:

 

The network stack has a health check routine that pings the gateway for each active network.

If a ping fails to return, the gateway is assumed to be dead and the card is disabled.

 

The fix is easy (and my not so humble opinion should be the default).

Edit the nddconf file in /etc/rc.config.d and add these lines:

 

TRANSPORT_NAME[0]=ip
NDD_NAME[0]=ip_ire_gw_probe
NDD_VALUE[0]=0

 

NOTE: If something has already been added to this file that is using [0], pick the next unused array reference such as [1] or [2].

 

The above change will make the setting permanent between reboots.

Then turn off the feature with:

 

ndd -set /dev/ip ip_ire_gw_probe 0

 

And the problem should go away.

 

I've looked for several years trying to find a good reason why loss of a ping should disable the network. Some network admins turn off ICMP ping response completely at gateways which means you'll find the dead gateway detect issue immediately. For me, if a ping fails, I would like the network stack to keep trying in case the problem is transient. My own testing shows that on a subnet with 50 connections (servers, PCs, printers...) will lose 20%-40% of a 10-ping-per-hour test about once a week. My preference is that all the equipment keep running even if there is an occasional loss of a ping.



Bill Hassell, sysadmin
Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Thanks Bill

 

I do this on the VM's I take it.

 

At the present I have the following:

 


TRANSPORT_NAME[0]=tcp
NDD_NAME[0]=tcp_keepalive_interval
NDD_VALUE[0]=3600000

TRANSPORT_NAME[1]=tcp
NDD_NAME[1]=tcp_keepalive_detached_interval
NDD_VALUE[1]=60000

TRANSPORT_NAME[2]=tcp
NDD_NAME[2]=tcp_time_wait_interval
NDD_VALUE[2]=20000

TRANSPORT_NAME[3]=tcp
NDD_NAME[3]=tcp_fin_wait_2_timeout
NDD_VALUE[3]=10000

 

I take it all I need to do is add:

 

TRANSPORT_NAME[4]=ip
NDD_NAME[4]=ip_ire_gw_probe
NDD_VALUE[4]=0

 

Thanks

Bill Hassell
Honored Contributor

Re: VM's randomly lose Network connection

Correct except it should be applied to every copy of HP-UX (11.00 and up) you have running on any platform.



Bill Hassell, sysadmin
Eric SAUBIGNAC
Honored Contributor

Re: VM's randomly lose Network connection

Bonjour,

 

 

If you loose local connections too, I really can't believe that the "dead gateway detection" is the source of the problem. I think it's just a side effect of loosing network connectivity, not the source of the problem.

 

More: in my mind [ but maybe it is a mistake, a false idea ? ], the dead gateway detection mechanism doesn't disable any lan card. If a gateway isn't pingable, routes through the gateway are deactivated. OK. But the lan card itself is not disabled. I have seen many cases where remote connections didn't work anymore while local connections were still operational. In this case deactivating ip_ire_gw_probe was the solution. Globally I agree with Bill : this feature should always be turned off, but in your case I don't think it will solve anything.

 

As underlined by Torsten, check all the components. And upgrade :

 

- to begin at low level, firmware of LOM : http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=3709945&prodSeriesId=4186428&swItem=MTX-ff5406036ec541018bb9713974&prodNameId=4186429&swEnvOID=4001&swLang=8&taskId=135&mode=4&idx=1 . Note : as mentioned, FC mezzanine cards should be upgrades first

 

- in a more wide way, the whole hardware : http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=4186429&prodTypeId=3709945&prodSeriesId=4186428&swLang=8&taskId=135&swEnvOID=4001

 

- at a higher level, HP-UX drivers like 10GigEthr-02 (iexgbe) https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=10GigEthr-02

 

And so on ...

 

Also, when the problem appears, what about the aggregate itself [ nwmgr -c lan900 -S apa -v ] ? Are all the nics active and ready ? Any valuable information in syslog, at the console (both host and guest) ?

 

And finally a call to HP support center will certainly help. For example, are you sure that a  MANUAL aggregate between LOM is supported ?

 

HTH

 

Eric

Paul_King
Occasional Advisor

Re: VM's randomly lose Network connection

Hi Eric

 

Thanks for the reponse.

I am running onHPVM v6.1 and not the latest 6.1.5.

There may be an issue in v6.1 similar to your issue that is fixed with PK3 of v6.1.5.

 

does anyone know if there is a fix for this issue in the 6.15 release?

 

Thanks

 

Paul