Friday, June 14, 2013

Multi-NIC-vMotion (not so) deep dive

This morning when I opened my mail box I found a message regarding our Multi-NIC-vMotion setup failing. Despite of what KB2007467 says, there is another way of doing Multi-NIC-vMotion, that I will go into in the scope of this article. But what I find most interesting is the methodology vSphere applies when choosing the vMotion interface to migrate a VM.

Multi-NIC-vMotion on a single vSwitch

The above referenced KB article describes a Multi-NIC-vMotion setup on a single vSwitch with multiple uplinks. When you create a VMKernel port for vMotion, you need to override the default NIC teaming order of the vSwitch, as vMotion VMKernel ports can only utilize on physical switch uplink at a time (the same applies to VMKernel ports used for software iSCSI). Thus for a vSwitch with two uplinks you need to create two VMKernel ports with vMotion activated, where each VMKernel port uses one of the uplinks as active, the other is unused (not standby).

Alternative Multi-NIC-vMotion setup using multiple switches

In our environment we use multiple virtual switches to separate traffic. There is a vSwitch with two uplinks for customer traffic to the VMs, there are three vSwitches for admin, NAS and backup access to the VMs and there is a dedicated vSwitch for vMotion traffic. The dedicated switch sports two uplinks and has been configured with two VMKernel interfaces as described in KB2007467. It has recently been migrated to dvSwitch without any problems.

The other virtual switches, with the exception of the customer vSwitch, are heavily underutilized. Thus it seemed only logical to create VMKernel ports on each of those switches for vMotion usage. The ESX hosts have 1TB ram each, but unfortunately are equipped with 1GBit/s NICs only. Putting one of those monster hosts into maintenance mode is a lengthy process.

On the physical switch side we are using a dedicated VLAN for vMotion (and PXE boot for that matter).

Choosing the right interface for the job

After a lot of tinkering and testing, we came up with the following networking scheme to facilitate vMotion properly. Our vMotion IPs are spread over multiple network ranges as follows:

ESX01 - Last octet of management IP: 51

vmk1 - 172.16.1.51/24
vmk2 - 172.16.2.51/24
vmk3 - 172.16.3.51/24

ESX02 - Last octet of management IP: 52

vmk1 - 172.16.1.52/24
vmk2 - 172.16.2.52/24
vmk3 - 172.16.3.52/24

and so on. The network segments are not routed, as per VMware's suggestion. Thus the individual VMKernel ports of a single host cannot communicate with each other, there will be no "crossing over" into other network segments.

When we vMotion a VM from host ESX01 to ESX02 based on its routing table it will initiate network connections from ESX01.vmk1 to ESX02.vmk1, ESX01.vmk2 to ESX02.vmk2 and so on. Only if for some reason vMotion is not enabled on one of the VMkernel ports the hosts will try to connect to a different port. Thus only then the vMotion will fail.

The reason for splitting the network segments into class C ranges is simple: The physical layer is split into separate islands, which do not interconnect. For this specific network segment, a 22-netmask would do fine and all vMotion VMkernel ports could happily talk to each other. However since the "frontend" and "backend" edge switches are not connected this cannot be facilitated.

When we check the logs (/var/log/vmkernel.log), we can see however that all vMotion ports are being used:

2013-06-14T05:30:23.991Z cpu7:8854653)Migrate: vm 8854654: 3234: Setting VMOTION info: Source ts = 1371189392257031, src ip = <172.16.2.114> dest ip = <172.16.2.115> Dest wid = 8543444 using SHARED swap
2013-06-14T05:30:23.994Z cpu17:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.2.114'
2013-06-14T05:30:23.995Z cpu7:8854653)Tcpip_Vmk: 1059: Affinitizing 172.16.2.114 to world 8854886, Success
2013-06-14T05:30:23.995Z cpu7:8854653)VMotion: 2425: 1371189392257031 S: Set ip address '172.16.2.114' worldlet affinity to send World ID 8854886
2013-06-14T05:30:23.996Z cpu13:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.2.114'
2013-06-14T05:30:23.996Z cpu14:8910)MigrateNet: vm 8910: 1998: Accepted connection from <172.16.2.115>
2013-06-14T05:30:23.996Z cpu14:8910)MigrateNet: vm 8910: 2068: dataSocket 0x410045b36c50 receive buffer size is 563272
2013-06-14T05:30:23.996Z cpu13:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 1 added.
2013-06-14T05:30:23.996Z cpu13:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.3.114'
2013-06-14T05:30:23.996Z cpu14:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 2 added.
2013-06-14T05:30:23.996Z cpu14:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.4.114'
2013-06-14T05:30:23.996Z cpu14:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 3 added.
2013-06-14T05:30:23.996Z cpu14:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.1.114'
2013-06-14T05:30:23.997Z cpu12:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 4 added.
2013-06-14T05:30:23.997Z cpu12:8854886)MigrateNet: 1174: 1371189392257031 S: Successfully bound connection to vmknic '172.16.0.114'
2013-06-14T05:30:23.997Z cpu12:8854886)VMotionUtil: 3087: 1371189392257031 S: Stream connection 5 added.
2013-06-14T05:30:38.505Z cpu19:8854654)VMotion: 3878: 1371189392257031 S: Stopping pre-copy: only 43720 pages left to send, which can be sent within the switchover time goal of 0.500 seconds (network bandwidth ~571.132 MB/s, 13975% t2d)


This makes for some impressive bandwidth ~571.132 MB/s on measly GBit/s NICs. The reason why our platform was having issues was a few ports, where vMotion was disabled. Thankfully a few lines of PowerCLI codes solved that problem:


$IPmask = "192.168.100."
$vmks = @();

for ($i=107;$i -le 118; $i++) {
    for ($j=1; $j -le 6; $j++) {
        if ($j -eq 5) { continue; }
        $vmk = Get-VMHost -Name $IPmask$i | Get-VMHostNetworkAdapter -Name vmk$j
        if ($vmk.VMotionEnabled -eq $false) { $vmks += $vmk }
    }
}

foreach ($vmk in $vmks) {
   $vmk | Set-VMHostNetworkAdapter -VMotionEnabled $true
}

Monday, June 10, 2013

One more thing I don't like about Debian Wheezy

I'm running a Wheezy iSCSI target, that already caused some headaches. Today I wanted to add two VMDKs to the Wheezy VM to be able to provide more storage to my test cluster. In the past that was easy. Just add your disks, login to the Linux box and issue

echo "scsi add-single-device a b c d" > /proc/scsi/scsi

(Usage: http://www.tldp.org/HOWTO/archived/SCSI-Programming-HOWTO/SCSI-Programming-HOWTO-4.html)

However with Wheezy there is no /proc/scsi/scsi. The reason being that is has been disabled in the kernel config.

root@debian:/proc# grep SCSI_PROC /boot/config-3.2.0-4-amd64 
# CONFIG_SCSI_PROC_FS is not set

Wtf?! (pardon my French!)

The solution, however, is quite simple, and annoying in itself as well. All you need to do is install the scsitools package. Thankfully, the list of dependencies on a (relatively, I installed VMware tools, thus is has the Kernel headers, gcc, make, perl and iscsitarget incl. modules) fresh Debian installation is quite short...

fontconfig-config{a}
libdrm-intel1{a}
libdrm-nouveau1a{a}
libdrm-radeon1{a}
libdrm2{a}
libffi5{a}
libfontconfig1{a}
libfontenc1{a}
libgl1-mesa-dri{a}
libgl1-mesa-glx{a}
libglapi-mesa{a}
libice6{a}
libpciaccess0{a}
libsgutils2-2{a}
libsm6{a}
libutempter0{a}
libx11-xcb1{a}
libxaw7{a}
libxcb-glx0{a}
libxcb-shape0{a}
libxcomposite1{a}
libxdamage1{a}
libxfixes3{a}
libxft2{a}
libxi6{a}
libxinerama1{a}
libxmu6{a}
libxpm4{a}
libxrandr2{a}
libxrender1{a}
libxt6{a}
libxtst6{a}
libxv1{a}
libxxf86dga1{a}
libxxf86vm1{a}
scsitools
sg3-utils{a}
tcl8.4{a}
tk8.4{a}
ttf-dejavu-core{a}
x11-common{a}
x11-utils{a}
xbitmaps{a}
xterm{a}
 
That's all it takes for you to run "rescan-scsi-bus", which will discover your disks. That was easy, wasn't it?

Friday, June 7, 2013

Access denied. Your IP address [A.B.C.D] is blacklisted. - OpenVPN to the rescue!

Ok, so some of your ISP's fellow customers got their boxes infected and are now part of a botnet (in this specific case apparently the name of the trojan is "Pushdo", "Pushdo is usually associated with the Cutwail spam trojan, as part of a Zeus or Spyeye botnet." src.: http://cbl.abuseat.org). "Doesn't bother me" you may think. "I got all my gear secured" you may think.

Well, that's where you're wrong.

It does bother you!

Upon my morning round of blogs I realized I couldn't access http://longwhiteclouds.com/ any more. Instead I was being greeted with this friendly message:
 
Access denied. Your IP address [A.B.C.D] is blacklisted. If you feel this is in error please contact your hosting providers abuse department.

This is just one effect. I have been having a seriously choppy internet experience for the past two or three days that I'd like throw in the pot of symptoms I am seeing.

A bit of research quickly revealed what was going on. As a part time mail server admin for my company I know that we use spamhaus.org (among other services and mechanisms) for spam checking. A check in the Blocklist Removal Center provided information about the source and reason for the blockage. Just enter the IP in question and click on Lookup. I find myself, both in the Policy Based Blocklist as well as the Composite Blocking List and possibly else where, too.

Suggestions

Well, firstly, lets be sociable and inform our ISP. They may know already and be working on the case, or not.

But that doesn't help me right now! I wanna read blogs now!

OpenVPN to the rescue

Luckily I have access to a corporate OpenVPN based network. Unlike other solutions this network does not per sé route all traffic but just provides access to the corporate network. However in this case I wish to do just that.

If all I am worried about, is longwhiteclouds.com I can just set a static route to the tun-interface IP like so

user@box> ip r | grep tun0
192.168.1.0/24 via 172.16.5.17 dev tun0
192.168.5.0/24 via 172.16.5.17 dev tun0
172.16.5.17 dev tun0  proto kernel  scope link  src 172.16.5.18
192.168.7.0/24 via 172.16.5.17 dev tun0 

user@box> ifconfig tun0 | grep inet
          inet addr:172.16.5.18  P-t-P:172.16.5.17  Mask:255.255.255.255

user@box> sudo route add -host longwhiteclouds.com gw 172.16.5.18

But how do you route everything through the tunnel? Firstly you need to set a static route to your provider's VPN endpoint. Once that is out of the way you can reset your default gateway to your own tunnel.

user@box> ip r | grep default
default via 192.168.1.1 dev eth0
user@box> grep remote /etc/openvpn/corporate_vpn.conf
#remote vpn.example.com 1194
remote 1.2.3.4 1194
tls-remote vpn

user@box> sudo route add -host 1.2.3.4 gw 192.168.1.1 
user@box> sudo route del default
user@box> sudo route add default gw 172.16.5.18user@box> ip r
default via 172.16.5.18 dev tun0  scope link
[...]

1.2.3.4 via 192.168.1.1 dev eth0

Now everything is swell again in network land, you requests are happily traversing through the VPN tunnel.

user@box> tracepath longwhiteclouds.com
1:  172.16.5.18                                          0.349ms pmtu 1350
1:  172.16.5.1                                         312.647ms
1:  172.16.5.1                                         314.739ms
[...] until they finally reach their destination


Hope that helps someone at some point...

Btw.: Excuse the formatting, I'm not too happy with blogger these days.

Monday, June 3, 2013

iscsitarget-dkms broken in Debian Wheezy

Now that was disappointing. An aging iSCSI bug has resurfaced in Debian's latest and greatest stable release, Wheezy, or in numbers 7. Its rendering Debian's iSCSI package useless. Upon scanning a Debian target using an initiator, e.g. ESXi's software iSCSI adaptor, the following messages pop up:

Jun  3 04:30:44 debian kernel: [  242.785518] Pid: 3006, comm: istiod1 Tainted: G           O 3.2.0-4-amd64 #1 Debian 3.2.41-2+deb7u2
Jun  3 04:30:44 debian kernel: [  242.785521] Call Trace:
Jun  3 04:30:44 debian kernel: [  242.785537]  [<ffffffffa03103f1>] ? send_data_rsp+0x45/0x1f4 [iscsi_trgt]
Jun  3 04:30:44 debian kernel: [  242.785542]  [<ffffffffa03190d3>] ? ua_pending+0x19/0xa5 [iscsi_trgt]
Jun  3 04:30:44 debian kernel: [  242.785550]  [<ffffffffa0317da8>] ? disk_execute_cmnd+0x1cf/0x22d [iscsi_trgt]
[...]


With ESXi in particular eventually the lun will show up, after a bunch of timeouts, I suppose, but is not usable in any way and may disconnect at any time.

Solution:

Thankfully there is a solution to the dilemma. Some Googleing around I found this again rather old thread in a Ubuntu forum describing the very same issue. Combined with the knowledge of the aforementioned bug I followed the instructions, grabbed the latest set of iscsitarget-dkms sources, compiled them and whatdoyouknow, it works like a charm.