Expert Day: HPVM migration network & host SG heartbeat

Joe Ledesma · ‎12-13-2011

Hello,

Observing warnings from cmcld about heartbeat that correlate with when HPVM online migrations are being run.

syslog:

Dec 11 12:15:08 node-t cmcld[11261]: Warning: cmcld was unable to run for the last 3.5 seconds. Consult the Managing Serviceguard manual for guidance on setting MEMBER_TIMEOUT, and information on cmcld.

Dec 11 12:15:08 node-t cmcld[11261]: This node is at risk of being evicted from the running cluster. Increase MEMBER_TIMEOUT.

Dec 11 12:15:08 node-t cmcld[11261]: Member node-a seems unhealthy, not receiving heartbeats from it.

node-t common.log:

12/11/11 12:14:45|SUMMARY|CLI|root|/opt/hpvm/bin/hpvmmigrate -S -P vm-j -h node-a

...

12/11/11 12:18:20|SUMMARY|vm-j|root|Guest 'vm-j' migrated successfully to VM host 'node-a'

Hosts have multiple networks. Per SG doc, “HP recommends that you configure all subnets that connect cluster nodes as heartbeat networks”, have done this.

Would you recommend making a change to not run heartbeat on the one network used for online migration? (Changing the migration network IP to STATIONARY_IP.) Provided have enough other networks for heartbeat redundancy.

Background: Customer on SX2000-based midrange server npars using HPVM for database and app workloads, in production on HPVM 4.2 with VMs as Serviceguard A.11.19 packages configuration on Virtual Disk SAN storage. Planning update to HPVM 4.3 and A.11.20. (Started with HPVM 2.0 in 2007.) Test/dev VMs are on an i2 blade using HPVM 4.3 on Virtual FileDisks on NFS.

Joe

Dave Olker · ‎12-13-2011

Hi Joe,

Are you sure you're doing an online migration? I would expect to see the "-o" option on the hpvmmigrate command line if this were online.

> Would you recommend making a change to not run heartbeat

> on the one network used for online migration? (Changing the

> migration network IP to STATIONARY_IP.) Provided have enough

> other networks for heartbeat redundancy.

Absolutely. In fact, I recommend customers use a "dedicated" network interface for VM migration traffic. When I say "dedicated" I mean a network interface that does not have a virtual switch configured on it. When you configure a vswitch on an HPVM host, certain optimization features are disabled automatically and the NIC port is put into promiscuous mode. Both of these events can dramatically affect VM Host network performance for the NIC, and since VM migrations technically involve VM Host network traffic, having a vswitch on the NIC can cause migrations to really suffer.

If you have enough NICs to satisfy the Serviceguard requirements, I'd suggest a truly dedicated NIC for VM migrations.

> Planning update to HPVM 4.3 and A.11.20.

> (Started with HPVM 2.0 in 2007.) Test/dev VMs are

> on an i2 blade using HPVM 4.3 on Virtual FileDisks on NFS.

Are they planning on running with NFS-backed VM guests in production? If so, be careful about separating the NFS network traffic away from links used for SG heartbeats or VM migrations. :)

Regards,

Dave

I work for HPE

[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Joe Ledesma · ‎12-13-2011

Hi Dave,

> Are you sure you're doing an online migration? I would expect to see the "-o" option on the hpvmmigrate command line if this were online.

Yes, I was quoting from the /var/opt/hpvm/common/command.log as a record of what was done. An administrator ran hpvmsg_move(1M), which behind the scenes uses an apparently undocumented ‘-S’ option to hpvmmigrate:

> 12/11/11 12:14:45|SUMMARY|CLI|root|/opt/hpvm/bin/hpvmmigrate -S -P vm-j -h node-a

(to get around how hpvmmigrate normally errors out if you attempt to use the -o option on an SG-managed guest.)

> When you configure a vswitch on an HPVM host…[it]…can dramatically affect VM Host network performance for the NIC

Thank you. Did not know this.

Thought I understood the reason for having a dedicated network interface for online migrations was purely for bandwidth reasons, to avoid having the migration affect some other type of traffic or vice versa. So have a sort-of-dedicated network for online migrations: Have an interface (an APA LAN_MONITOR interface with two NICs) that is connected to a backend physical switch infrastructure that is intended for uses such as online migration (is also used for VMware vMotion for example), data backups, and databases on NFS so it is the appropriate place in our environment for HPVM online migration. However the interface does have a vswitch on it but it is the second vNIC in the VMs and in the prod VMs only has data backup traffic which only runs at certain times, so at other times this vswitch and interface on the host is not being used so assumed it would be good for online migration as well. But now I understand that just by having a vswitch on the interface this could be impacting it regardless of whether there is VM traffic at the time.

Regarding:

> certain optimization features are disabled automatically and the NIC port is put into promiscuous mode.

The above is for two-port NICs on SX2000-based midrange. Also have blades with Flex10 NICs. Do you know if these two changes when a vswitch is defined only impacts how the HP-UX network stack handles the interface or if it changes how the entire NIC card operates. What I’m getting at is if lan1 and lan2 are on the same two-port NIC or in the blade if lan16 - lan23 are FlexNICs on the same two-port Flex10 mezzanine card, if a vswitch is configured on lan1 or lan16 and there is no vswitch on lan2 or lan17 for example, does lan2 or lan17 get affected since they are on the same card?

Regarding:

> Are they planning on running with NFS-backed VM guests in production? If so, be careful about separating the NFS network traffic away from links used for SG heartbeats or VM migrations. :)

Yes, considering NFS-backed VM guests in production but only for VMs that already have all their app and database storage on NFS using the NFS client within the guest, so the VM would only need a single NFS-backed FileDisk for the VM boot disk.

I anticipated wanting to separate VM migration traffic from host NFS traffic for NFS-backed HPVM however now that I know the above about events caused by having a vswitch on a NIC, it sounds like the host NFS for HPVM should be on an interface that does not have a vswitch on it if possible.

In other words, try not to mix HPVM host traffic of any kind with VM traffic on the same interface--not just isolate online migration traffic or heartbeat traffic.

Question: Also have the HPVM host configured with another pair of interfaces that are just for the HPVM host administration (admin logins, reaches the default router for the HPVM host, monitoring tools, additional heartbeat network, network recovery archive creation) that currently does not have a vswitch configured on it. Since this interface is lightly used, considering moving some frontend traffic of VMs onto it (app, client traffic) that is currently on other interfaces so defining a vswitch. Would you recommend against this?

Already have this config in test/dev: VM apps are on different VLANs from HPVM host admin interface so the interface is using VLANs. HPVM host admin network is defined as a default VLAN that is untagged with the other VLANs being tagged. So lan0 runs the HPVM host IP untagged and vswitch is defined on lan0 and vNICs are defined using VLANs in the vswitch.

Joe

iyer · ‎12-13-2011

Hi Joe

Regarding your questions, here are my thoughts:

> certain optimization features are disabled automatically and the NIC port is put into promiscuous mode.

The above is for two-port NICs on SX2000-based midrange. Also have blades with Flex10 NICs. Do you know if these two changes when a vswitch is defined only impacts how the HP-UX network stack handles the interface or if it changes how the entire NIC card operates. What I’m getting at is if lan1 and lan2 are on the same two-port NIC or in the blade if lan16 - lan23 are FlexNICs on the same two-port Flex10 mezzanine card, if a vswitch is configured on lan1 or lan16 and there is no vswitch on lan2 or lan17 for example, does lan2 or lan17 get affected since they are on the same card?

Answer> In the example lan2 or lan17 should not be affected.

Question: Also have the HPVM host configured with another pair of interfaces that are just for the HPVM host administration (admin logins, reaches the default router for the HPVM host, monitoring tools, additional heartbeat network, network recovery archive creation) that currently does not have a vswitch configured on it. Since this interface is lightly used, considering moving some frontend traffic of VMs onto it (app, client traffic) that is currently on other interfaces so defining a vswitch. Would you recommend against this?

Answer> One of the uses you mention is for archive creation. I would be worried about the potential impact this could have on the VM traffic.

Shankar

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Expert Day: HPVM migration network & host SG heartbeat

Expert Day: HPVM migration network & host SG heartbeat

Re: Expert Day: HPVM migration network & host SG heartbeat

Re: Expert Day: HPVM migration network & host SG heartbeat

Re: Expert Day: HPVM migration network & host SG heartbeat