- Community Home
- >
- Servers and Operating Systems
- >
- Integrity Servers
- >
- BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, ...
-
- Forums
-
- Advancing Life & Work
- Advantage EX
- Alliances
- Around the Storage Block
- HPE Ezmeral: Uncut
- OEM Solutions
- Servers & Systems: The Right Compute
- Tech Insights
- The Cloud Experience Everywhere
- HPE Blog, Austria, Germany & Switzerland
- Blog HPE, France
- HPE Blog, Italy
- HPE Blog, Japan
- HPE Blog, Middle East
- HPE Blog, Latin America
- HPE Blog, Russia
- HPE Blog, Saudi Arabia
- HPE Blog, South Africa
- HPE Blog, UK & Ireland
-
Blogs
- Advancing Life & Work
- Advantage EX
- Alliances
- Around the Storage Block
- HPE Blog, Latin America
- HPE Blog, Middle East
- HPE Blog, Saudi Arabia
- HPE Blog, South Africa
- HPE Blog, UK & Ireland
- HPE Ezmeral: Uncut
- OEM Solutions
- Servers & Systems: The Right Compute
- Tech Insights
- The Cloud Experience Everywhere
-
Information
- Community
- Welcome
- Getting Started
- FAQ
- Ranking Overview
- Rules of Participation
- Tips and Tricks
- Resources
- Announcements
- Email us
- Feedback
- Information Libraries
- Integrated Systems
- Networking
- Servers
- Storage
- Other HPE Sites
- Support Center
- Aruba Airheads Community
- Enterprise.nxt
- HPE Dev Community
- Cloud28+ Community
- Marketplace
-
Forums
-
Blogs
-
Information
-
English
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
ā01-08-2018 01:31 PM
ā01-08-2018 01:31 PM
BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
I thought I solved this problem by upgrading the patch level to Sep 2015. and upgrading the firmware but no.
Here's the sequence of events:
- The following appears on the console and in syslog:
Jan 2 20:20:19 molhpi24 vmunix: iexgbe4/1689, Microcode assert 00000100 00000020 00000040 00000080 00000100
Jan 2 20:20:19 molhpi24 vmunix: iexgbe4/1701, Microcode assert 0x100
Jan 2 20:22:56 molhpi24 vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xa082001 .See ndd -h ip_ire_gw_probe for more info
Jan 2 20:23:16 molhpi24 xntpd[2545]: synchronisation lost
Jan 2 20:37:56 molhpi24 vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xa082001 .See ndd -h ip_ire_gw_probe for more info^M
Jan 2 20:38:16 molhpi24 above message repeats 5 times
Jan 2 20:38:16 molhpi24 above message repeats 7 times
This has probably happened a dozen times in the past couyple of years. Of course when the VM host loses network connectivity, so do the VMs. I've tried everything I could think of but within maybe 15 minutes of me trying to shutdown the VMs, the blade server crashes. And sometimes, it leaves a mess.
Thanks for reading this. Anyone have any ideas?
Thanks,
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
ā01-08-2018 06:37 PM - edited ā01-08-2018 06:41 PM
ā01-08-2018 06:37 PM - edited ā01-08-2018 06:41 PM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
This is a well known (but very bad) default for HP-UX.
The network code regularly pings routers to see if they are alive (even though ping is a primitive and useless test). When the router fails to respond, the network code assumes that the router is dead and stops using that route (an even more useless action). It is not unusual for the network team to disable ICMP response (ie, ping) but with this gateway setting, all HP-UX routed traffic is halted because of a missed ping. Rebooting restores the connection again.
You need to set the dead gateway detect to off on *every* HP-UX server you have.
To make the change permanent, edit the file /etc/rc.config.d/nddconf and add this:
TRANSPORT_NAME[0]=ip NDD_NAME[0]=ip_ire_gw_probe NDD_VALUE[0]=0
The above assumes that there are no [0] entries already in use in this script. If there are, use the next available array reference such as [1] or [2].
Then run:
ndd -c
which reads the file and performs the settings.
This sets the value to 0 and also validates that the file is of the proper format.
(Did I mention that *every* HP-UX server including vPars and VMs (any OS version) needs this fix?)
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
ā01-09-2018 10:24 AM
ā01-09-2018 10:24 AM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
Hi Bill,
Thanks, I never heard of that fix before, but it makes sense. I am implemented on all systems.
Any ideas on what is causing the port to go offline?
Thanks,
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
ā01-10-2018 08:18 AM
ā01-10-2018 08:18 AM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
As mentioned, HP-UX will stop using all routes when all the routers fail to respond to ping. Technically, the system is not offline as it will respond to other systems that are on the same subnet. Systems that are on other subnets will require a router (gateway) to communicate and this will fail since routing has been disabled.
As far as why this feature even exists has never been explained to my knowledge.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
ā01-10-2018 11:39 AM
ā01-10-2018 11:39 AM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
Sorry, I wasn't clear.
Neither the VMs nor the host are pingable from another system on the same network. It is my belief that the network port is down.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
ā01-10-2018 01:30 PM - edited ā01-10-2018 01:46 PM
ā01-10-2018 01:30 PM - edited ā01-10-2018 01:46 PM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
So the first question is: can you connect to your console port? This is a separate network connection with a separate IP address. It is almost impossible to troubleshoot the problem without this connection. Since it is an imbedded microcomputer, it is unaffected by HP-UX problems such as dead gateway detect. It is often refered to as the iLO port. When you connect to the port (telnet), you can view hardware status and logs and also connect to the HP-UX console . From there you can verify the state of the network connection. The OA (Onboard Admin) can setup the console IP addresses for each blade. You can also connect to the console through the blade's KVM/iLO port (special dongle required). This is a serial connection.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
ā01-15-2018 09:03 AM
ā01-15-2018 09:03 AM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
Hi Bill,.
I was able to access the Blade's console and observed a continuing error message on the console that the system could not reach its NIS server. This is server is on the same network on the blade.. I then tested ping to other hosts on the same network with the same result; no response.
Here are the relevent logs from the Virtual Connect
2018-01-03T07:03:21-05:00 VCEFTW20120152 vcmd: [SVR:enc0:dev1:5016:Minor] Server state DEGRADED : Component partially operational, but capacity lost, Previous: Server state OK
2018-01-03T07:03:21-05:00 VCEFTW20120152 vcmd: [ENC:enc0:2014:Minor] Enclosure state DEGRADED : Some Enet modules & servers not OK, Previous: Enclosure state OK
2018-01-03T07:03:21-05:00 VCEFTW20120152 vcmd: [VCD:HPBC1_vc_domain:1024:Minor] Domain state DEGRADED : 1+ enclosures & profiles OK, DEGRADED, UNKNOWN, NOT-MAPPED, Previous: Domain state OK
2018-01-03T07:03:28-05:00 VCEFTW20120152 vcmd: [VCD:HPBC1_vc_domain:1032:Warning] VCM remote session is invalid or has expired : hpvcd:showManagedObjects ([UNKNOWN]@[LOCAL])
2018-01-03T07:03:28-05:00 VCEFTW20120152 vcmd: [VCD:HPBC1_vc_domain:1032:Warning] VCM remote session is invalid or has expired : hpvcm:retrieveStateChangeCounters ([UNKNOWN]@[LOCAL])
2018-01-03T07:04:24-05:00 VCEFTW20120152 vcmd: [SVR:enc0:dev1:5012:Critical] Server state FAILED : Component is not operational due to an error, Previous: Server state DEGRADED
2018-01-03T07:04:24-05:00 VCEFTW20120152 vcmd: [PRO:molhpi24-BL890c-i2:6012:Critical] Profile state FAILED : Server [enc0:devbay1] state not OK: [VCM_OP_STATE_FAILED], Previous: Enet Network state OK
2018-01-03T07:12:20-05:00 VCEFTW20120152 vcmd: [SVR:enc0:dev1:5010:Info] Server state OK : Component fully operational, Previous: Server state FAILED
2018-01-03T07:12:20-05:00 VCEFTW20120152 vcmd: [NET:molhpi24-BL890c-i2:7010:Info] Enet Network state OK : All connections, PhysicalServer OK, Previous: Profile state FAILED
2018-01-03T07:12:20-05:00 VCEFTW20120152 vcmd: [ENC:enc0:2010:Info] Enclosure state OK : All modules & servers OK, Previous: Enclosure state DEGRADED
2018-01-03T07:12:20-05:00 VCEFTW20120152 vcmd: [VCD:HPBC1_vc_domain:1020:Info] Domain state OK : All enclosures & profiles OK, Previous: Domain state DEGRADED
Thanks,
Steve
Hewlett Packard Enterprise International
- Communities
- HPE Blogs and Forum
© Copyright 2021 Hewlett Packard Enterprise Development LP