Expert Day: HPVM migration network & host SG heartbeat

by on ‎12-21-2011 06:18 AM


Observing warnings from cmcld about heartbeat that correlate with when HPVM online migrations are being run.




Dec 11 12:15:08 node-t cmcld[11261]: Warning: cmcld was unable to run for the last 3.5 seconds.  Consult the Managing Serviceguard manual for guidance on setting MEMBER_TIMEOUT, and information on cmcld.

Dec 11 12:15:08 node-t cmcld[11261]: This node is at risk of being evicted from the running cluster.  Increase MEMBER_TIMEOUT.

Dec 11 12:15:08 node-t cmcld[11261]: Member node-a seems unhealthy, not receiving heartbeats from it.


node-t common.log:


12/11/11 12:14:45|SUMMARY|CLI|root|/opt/hpvm/bin/hpvmmigrate -S -P vm-j -h node-a


12/11/11 12:18:20|SUMMARY|vm-j|root|Guest 'vm-j' migrated successfully to VM host 'node-a'


Hosts have multiple networks. Per SG doc, “HP recommends that you configure all subnets that connect cluster nodes as heartbeat networks”, have done this.


Would you recommend making a change to not run heartbeat on the one network used for online migration? (Changing the migration network IP to STATIONARY_IP.) Provided have enough other networks for heartbeat redundancy.


Background: Customer on SX2000-based midrange server npars using HPVM for database and app workloads, in production on HPVM 4.2 with VMs as Serviceguard A.11.19 packages configuration on Virtual Disk SAN storage. Planning update to HPVM 4.3 and A.11.20. (Started with HPVM 2.0 in 2007.) Test/dev VMs are on an i2 blade using HPVM 4.3 on Virtual FileDisks on NFS.



> Certain optimization features are disabled automatically and the NIC port is put into promiscuous mode.

The above is for two-port NICs on SX2000-based midrange. Also have blades with Flex10 NICs. Do you know if these two changes when a vswitch is defined only impacts how the HP-UX network stack handles the interface or if it changes how the entire NIC card operates. What I’m getting at is if lan1 and lan2 are on the same two-port NIC or in the blade if lan16 - lan23 are FlexNICs on the same two-port Flex10 mezzanine card, if a vswitch is configured on lan1 or lan16 and there is no vswitch on lan2 or lan17 for example, does lan2 or lan17 get affected since they are on the same card?


Answer> In the example lan2 or lan17 should not be affected.


Question: Also have the HPVM host configured with another pair of interfaces that are just for the HPVM host administration (admin logins, reaches the default router for the HPVM host, monitoring tools, additional heartbeat network, network recovery archive creation) that currently does not have a vswitch configured on it. Since this interface is lightly used, considering moving some frontend traffic of VMs onto it (app, client traffic) that is currently on other interfaces so defining a vswitch. Would you recommend against this?


Answer> One of the uses you mention is for archive creation. I would be worried about the potential impact this could have on the VM traffic.