Friday, June 14, 2013

Multi-NIC-vMotion (not so) deep dive

This morning when I opened my mail box I found a message regarding our Multi-NIC-vMotion setup failing. Despite of what KB2007467 says, there is another way of doing Multi-NIC-vMotion, that I will go into in the scope of this article. But what I find most interesting is the methodology vSphere applies when choosing the vMotion interface to migrate a VM.

Multi-NIC-vMotion on a single vSwitch

The above referenced KB article describes a Multi-NIC-vMotion setup on a single vSwitch with multiple uplinks. When you create a VMKernel port for vMotion, you need to override the default NIC teaming order of the vSwitch, as vMotion VMKernel ports can only utilize on physical switch uplink at a time (the same applies to VMKernel ports used for software iSCSI). Thus for a vSwitch with two uplinks you need to create two VMKernel ports with vMotion activated, where each VMKernel port uses one of the uplinks as active, the other is unused (not standby).

Alternative Multi-NIC-vMotion setup using multiple switches

In our environment we use multiple virtual switches to separate traffic. There is a vSwitch with two uplinks for customer traffic to the VMs, there are three vSwitches for admin, NAS and backup access to the VMs and there is a dedicated vSwitch for vMotion traffic. The dedicated switch sports two uplinks and has been configured with two VMKernel interfaces as described in KB2007467. It has recently been migrated to dvSwitch without any problems.

The other virtual switches, with the exception of the customer vSwitch, are heavily underutilized. Thus it seemed only logical to create VMKernel ports on each of those switches for vMotion usage. The ESX hosts have 1TB ram each, but unfortunately are equipped with 1GBit/s NICs only. Putting one of those monster hosts into maintenance mode is a lengthy process.

On the physical switch side we are using a dedicated VLAN for vMotion (and PXE boot for that matter).

Choosing the right interface for the job

After a lot of tinkering and testing, we came up with the following networking scheme to facilitate vMotion properly. Our vMotion IPs are spread over multiple network ranges as follows:

ESX01 - Last octet of management IP: 51

vmk1 - 172.16.1.51/24
vmk2 - 172.16.2.51/24
vmk3 - 172.16.3.51/24

ESX02 - Last octet of management IP: 52

vmk1 - 172.16.1.52/24
vmk2 - 172.16.2.52/24
vmk3 - 172.16.3.52/24

and so on. The network segments are not routed, as per VMware's suggestion. Thus the individual VMKernel ports of a single host cannot communicate with each other, there will be no "crossing over" into other network segments.

When we vMotion a VM from host ESX01 to ESX02 based on its routing table it will initiate network connections from ESX01.vmk1 to ESX02.vmk1, ESX01.vmk2 to ESX02.vmk2 and so on. Only if for some reason vMotion is not enabled on one of the VMkernel ports the hosts will try to connect to a different port. Thus only then the vMotion will fail.

The reason for splitting the network segments into class C ranges is simple: The physical layer is split into separate islands, which do not interconnect. For this specific network segment, a 22-netmask would do fine and all vMotion VMkernel ports could happily talk to each other. However since the "frontend" and "backend" edge switches are not connected this cannot be facilitated.

When we check the logs (/var/log/vmkernel.log), we can see however that all vMotion ports are being used:

2013-06-14T05:30:23.991Z cpu7:8854653)Migrate: vm 8854654: 3234: Setting VMOTION info: Source ts = 1371189392257031, src ip = <172.16.2.114> dest ip = <172.16.2.115> Dest wid = 8543444 using SHARED swap
2013-06-14T05:30:23.994Z cpu17:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.2.114'
2013-06-14T05:30:23.995Z cpu7:8854653)Tcpip_Vmk: 1059: Affinitizing 172.16.2.114 to world 8854886, Success
2013-06-14T05:30:23.995Z cpu7:8854653)VMotion: 2425: 1371189392257031 S: Set ip address '172.16.2.114' worldlet affinity to send World ID 8854886
2013-06-14T05:30:23.996Z cpu13:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.2.114'
2013-06-14T05:30:23.996Z cpu14:8910)MigrateNet: vm 8910: 1998: Accepted connection from <172.16.2.115>
2013-06-14T05:30:23.996Z cpu14:8910)MigrateNet: vm 8910: 2068: dataSocket 0x410045b36c50 receive buffer size is 563272
2013-06-14T05:30:23.996Z cpu13:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 1 added.
2013-06-14T05:30:23.996Z cpu13:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.3.114'
2013-06-14T05:30:23.996Z cpu14:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 2 added.
2013-06-14T05:30:23.996Z cpu14:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.4.114'
2013-06-14T05:30:23.996Z cpu14:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 3 added.
2013-06-14T05:30:23.996Z cpu14:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.1.114'
2013-06-14T05:30:23.997Z cpu12:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 4 added.
2013-06-14T05:30:23.997Z cpu12:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.0.114'
2013-06-14T05:30:23.997Z cpu12:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 5 added.
2013-06-14T05:30:38.505Z cpu19:8854654)VMotion: 3878: 1371189392257031 S: Stopping pre-copy: only 43720 pages left to send, which can be sent within the switchover time goal of 0.500 seconds (network bandwidth ~571.132 MB/s, 13975% t2d)


This makes for some impressive bandwidth ~571.132 MB/s on measly GBit/s NICs. The reason why our platform was having issues was a few ports, where vMotion was disabled. Thankfully a few lines of PowerCLI codes solved that problem:


$IPmask = "192.168.100."
$vmks = @();

for ($i=107;$i -le 118; $i++) {
    for ($j=1; $j -le 6; $j++) {
        if ($j -eq 5) { continue; }
        $vmk = Get-VMHost -Name $IPmask$i | Get-VMHostNetworkAdapter -Name vmk$j
        if ($vmk.VMotionEnabled -eq $false) { $vmks += $vmk }
    }
}

foreach ($vmk in $vmks) {
   $vmk | Set-VMHostNetworkAdapter -VMotionEnabled $true
}

No comments:

Post a Comment