- Community Home
- >
- Servers and Operating Systems
- >
- Integrity Servers
- >
- Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 V...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2018 01:31 PM
тАО01-08-2018 01:31 PM
BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
I thought I solved this problem by upgrading the patch level to Sep 2015. and upgrading the firmware but no.
Here's the sequence of events:
- The following appears on the console and in syslog:
Jan 2 20:20:19 molhpi24 vmunix: iexgbe4/1689, Microcode assert 00000100 00000020 00000040 00000080 00000100
Jan 2 20:20:19 molhpi24 vmunix: iexgbe4/1701, Microcode assert 0x100
Jan 2 20:22:56 molhpi24 vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xa082001 .See ndd -h ip_ire_gw_probe for more info
Jan 2 20:23:16 molhpi24 xntpd[2545]: synchronisation lost
Jan 2 20:37:56 molhpi24 vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xa082001 .See ndd -h ip_ire_gw_probe for more info^M
Jan 2 20:38:16 molhpi24 above message repeats 5 times
Jan 2 20:38:16 molhpi24 above message repeats 7 times
This has probably happened a dozen times in the past couyple of years. Of course when the VM host loses network connectivity, so do the VMs. I've tried everything I could think of but within maybe 15 minutes of me trying to shutdown the VMs, the blade server crashes. And sometimes, it leaves a mess.
Thanks for reading this. Anyone have any ideas?
Thanks,
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2018 06:37 PM - edited тАО01-08-2018 06:41 PM
тАО01-08-2018 06:37 PM - edited тАО01-08-2018 06:41 PM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
This is a well known (but very bad) default for HP-UX.
The network code regularly pings routers to see if they are alive (even though ping is a primitive and useless test). When the router fails to respond, the network code assumes that the router is dead and stops using that route (an even more useless action). It is not unusual for the network team to disable ICMP response (ie, ping) but with this gateway setting, all HP-UX routed traffic is halted because of a missed ping. Rebooting restores the connection again.
You need to set the dead gateway detect to off on *every* HP-UX server you have.
To make the change permanent, edit the file /etc/rc.config.d/nddconf and add this:
TRANSPORT_NAME[0]=ip NDD_NAME[0]=ip_ire_gw_probe NDD_VALUE[0]=0
The above assumes that there are no [0] entries already in use in this script. If there are, use the next available array reference such as [1] or [2].
Then run:
ndd -c
which reads the file and performs the settings.
This sets the value to 0 and also validates that the file is of the proper format.
(Did I mention that *every* HP-UX server including vPars and VMs (any OS version) needs this fix?)
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-09-2018 10:24 AM
тАО01-09-2018 10:24 AM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
Hi Bill,
Thanks, I never heard of that fix before, but it makes sense. I am implemented on all systems.
Any ideas on what is causing the port to go offline?
Thanks,
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-10-2018 08:18 AM
тАО01-10-2018 08:18 AM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
As mentioned, HP-UX will stop using all routes when all the routers fail to respond to ping. Technically, the system is not offline as it will respond to other systems that are on the same subnet. Systems that are on other subnets will require a router (gateway) to communicate and this will fail since routing has been disabled.
As far as why this feature even exists has never been explained to my knowledge.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-10-2018 11:39 AM
тАО01-10-2018 11:39 AM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
Sorry, I wasn't clear.
Neither the VMs nor the host are pingable from another system on the same network. It is my belief that the network port is down.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-10-2018 01:30 PM - edited тАО01-10-2018 01:46 PM
тАО01-10-2018 01:30 PM - edited тАО01-10-2018 01:46 PM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
So the first question is: can you connect to your console port? This is a separate network connection with a separate IP address. It is almost impossible to troubleshoot the problem without this connection. Since it is an imbedded microcomputer, it is unaffected by HP-UX problems such as dead gateway detect. It is often refered to as the iLO port. When you connect to the port (telnet), you can view hardware status and logs and also connect to the HP-UX console . From there you can verify the state of the network connection. The OA (Onboard Admin) can setup the console IP addresses for each blade. You can also connect to the console through the blade's KVM/iLO port (special dongle required). This is a serial connection.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-15-2018 09:03 AM
тАО01-15-2018 09:03 AM
Re: BL890c i2 running 11.31 and HPVM 4.20 with 8 VMs, loses network connectivity and then crashes
Hi Bill,.
I was able to access the Blade's console and observed a continuing error message on the console that the system could not reach its NIS server. This is server is on the same network on the blade.. I then tested ping to other hosts on the same network with the same result; no response.
Here are the relevent logs from the Virtual Connect
2018-01-03T07:03:21-05:00 VCEFTW20120152 vcmd: [SVR:enc0:dev1:5016:Minor] Server state DEGRADED : Component partially operational, but capacity lost, Previous: Server state OK
2018-01-03T07:03:21-05:00 VCEFTW20120152 vcmd: [ENC:enc0:2014:Minor] Enclosure state DEGRADED : Some Enet modules & servers not OK, Previous: Enclosure state OK
2018-01-03T07:03:21-05:00 VCEFTW20120152 vcmd: [VCD:HPBC1_vc_domain:1024:Minor] Domain state DEGRADED : 1+ enclosures & profiles OK, DEGRADED, UNKNOWN, NOT-MAPPED, Previous: Domain state OK
2018-01-03T07:03:28-05:00 VCEFTW20120152 vcmd: [VCD:HPBC1_vc_domain:1032:Warning] VCM remote session is invalid or has expired : hpvcd:showManagedObjects ([UNKNOWN]@[LOCAL])
2018-01-03T07:03:28-05:00 VCEFTW20120152 vcmd: [VCD:HPBC1_vc_domain:1032:Warning] VCM remote session is invalid or has expired : hpvcm:retrieveStateChangeCounters ([UNKNOWN]@[LOCAL])
2018-01-03T07:04:24-05:00 VCEFTW20120152 vcmd: [SVR:enc0:dev1:5012:Critical] Server state FAILED : Component is not operational due to an error, Previous: Server state DEGRADED
2018-01-03T07:04:24-05:00 VCEFTW20120152 vcmd: [PRO:molhpi24-BL890c-i2:6012:Critical] Profile state FAILED : Server [enc0:devbay1] state not OK: [VCM_OP_STATE_FAILED], Previous: Enet Network state OK
2018-01-03T07:12:20-05:00 VCEFTW20120152 vcmd: [SVR:enc0:dev1:5010:Info] Server state OK : Component fully operational, Previous: Server state FAILED
2018-01-03T07:12:20-05:00 VCEFTW20120152 vcmd: [NET:molhpi24-BL890c-i2:7010:Info] Enet Network state OK : All connections, PhysicalServer OK, Previous: Profile state FAILED
2018-01-03T07:12:20-05:00 VCEFTW20120152 vcmd: [ENC:enc0:2010:Info] Enclosure state OK : All modules & servers OK, Previous: Enclosure state DEGRADED
2018-01-03T07:12:20-05:00 VCEFTW20120152 vcmd: [VCD:HPBC1_vc_domain:1020:Info] Domain state OK : All enclosures & profiles OK, Previous: Domain state DEGRADED
Thanks,
Steve