Posted by Ken Cline on Friday, March 06, 2009

I've noticed something about engineers. They're never happy with the way something is configured out of the box - there's always a better way! Well, I have a different philosophy:

"If you don't have a very good reason to change a default value, don't change it!"
To me, this seems totally obvious - in most cases, the default values are there for a reason. Their set to whatever value they're set to based on testing or customer feedback or imperical knowledge - rarely because someone just chose a value. As an example, let's explore the configuration of the virtual switch (vSwitch) used to support the Service Console in a VMware ESX Server. Assume the following:
  • vSwitch0 is configured with two port groups - one for the Service Console, the other for VMotion
  • There are two physical NICs (pNICS) (vmnic0 & vmnic1) associated with vSwitch0
  • We want to have Service Console and VMotion traffic using different pNICs to pass their traffic
  • In the event of the failure of one path (pNIC, cable, physical switch (pSwitch)), we want the traffic to automatically failover to the second path
  • There are two separate physical switches available to support our environment
  • [Optional, but recommended] You're aware that VLANs are not intended to provide security and that you will have comingling of data on the wire when using VLANs. Aware of this risk, you choose to go ahead and use VLANs for traffic segmentation and configure your Service Console port group on "VLAN SC" and your VMotion port group on "VLAN VMotion" - obviously, you'll need to assign the appropriate VLAN number to each port group as well as configure the appropriate ports on the pSwitch to support 802.1Q trunking.

Scenario 1

Many people will want to do the following:

  • Configure vSwitch0 to use IP Hash as the load balancing algorithm
  • Configure the physical switch (pSwitch) to use 802.3ad static link aggregation
  • Configure the Service Console port group as follows:
    • vmnic0 - Active
    • vmnic1 - Standby
    • Load Balancing - vSwitch Port Based
  • Configure the VMotion port group as follows:
    • vmnic0 - Standby
    • vmnic1 - Active
    • Load Balancing - vSwitch Port Based
Now, let's look at what's been achieved...


Figure 1

As shown in Figure 1, all paths are redundant - but there are some issues with this configuration! First, when using IP Hash as the load balancing algorithm on the vSwitch, it is recommended that you not use any other load balancing approach for your port groups:

"All adapters in the NIC team must be attached to the same physical switch or an appropriate set of stacked physical switches. (Contact your switch vendor to find out whether 802.3ad teaming is supported across multiple stacked chassis.) That switch or set of stacked switches must be 802.3ad-compliant and configured to use that link-aggregation standard in static mode (that is, with no LACP). All adapters must be active. You should make the setting on the virtual switch and ensure that it is inherited by all port groups within that virtual switch."

Next, remember that we have NO control over inbound network traffic. That means that the pSwitch could easily send inbound packets to us on our inactive pNIC. While I'm sure that the vSwitch will figure this out and deliver the packets to us - it is counter-intuitive and (in my opinion) a bad practice.

To continue with the list of problems with this configuration, consider that by specifying Port Based load balancing at the Port Group, you're defeating the purpose of using IP Hash on the vSwitch! The reason for using IP Hash is to enable link aggregation and the potential for a single vNIC to pass more than a single pNIC's worth of bandwidth. By explicitly specifying Active and Standby pNICs, you've eliminated that possibility.

One other thing that can be an issue is the use of link aggregation and IP Hash load balancing at all. This is not a technical problem, but a political one. I've worked in many large environments and have frequently run into issues when having to interface with the networking (or storage, or security) team. Ideally, all parties involved in infrastructure support will be 100% behind the virtualization effort - unfortunately, the world is far from an ideal place! In general, I like to minimize the number of touch points between the various entities - especially in the early stages of implementation...just to reduce friction and allow time for familiarity to breed acceptance.

Scenario 1 - Summary
CriteriaScoreEvaluation

Overall Score


F

The fact that this is an invalid configuration forces me to give it an overall score of "F"


Valid Configuration

F


When the vSwitch is configured for IP Hash load balancing, it is improper to override at the Port Group level.
Fault Tolerance

A


There is no single point of failure in the network path - assuming you have stackable switches. If your switches don't support stacking, then the grade here would be reduced to a "C"
Simplicity

C


There is a lot of complexity in this configuration
Politics

B


This configuration uses IP Hash load balancing which requires coordination outside the virtualization team

Scenario 2

OK, so not everyone will go quite so overboard with the "tweeking". Let's make a couple changes that will at least make this a valid configuration - let's remove the specification of a load balancing algorithm on the port groups and set all vmnics active on all port groups. This means that we'll have something like this (changes from previous configuration highlighted in blue italics):

  • Configure vSwitch0 to use IP Hash as the load balancing algorithm
  • Configure the physical switch (pSwitch) to use 802.3ad static link aggregation
  • Configure the Service Console port group as follows:
    • vmnic0 - Active
    • vmnic1 - Active
    • Load Balancing - default to IP Hash
  • Configure the VMotion port group as follows:
    • vmnic0 - Active
    • vmnic1 - Active
    • Load Balancing - default to IP Hash

Here's the result of our labors:

Figure 2

At least we have a valid configuration! The benefit of using this configuration is that, if you have multiple VMotions active at the same time, you may use both of the pNICs in the vSwitch to increase your bandwidth utilization, which could help when you're trying to evacuate a bunch of VMs from a host that's in need of maintenance.

Scenario 2 - Summary
CriteriaScoreEvaluation

Overall Score


B

This is a good, relatively simple configuration. It has the advantage that you can use the bandwidth of both pNICs to expedite the migration of virtual machines from the host if you need to.


Valid Configuration

A


This is a valid configuration
Fault Tolerance

A


There is no single point of failure in the network path - assuming you have stackable switches. If your switches don't support stacking, then the grade here would be reduced to a "C"
Simplicity

C


There is complexity in this configuration, but it is reasonable and serves a valid purpose
Politics

B


This configuration uses IP Hash load balancing which requires coordination outside the virtualization team

Scenario 3

Alright, now let's see how we can go about simplifying this whole thing:

  • Configure vSwitch0 to use default port-based load balancing algorithm
  • Do not configure the physical switch (pSwitch) to use 802.3ad static link aggregation
  • Configure the Service Console port group as follows:
    • vmnic0 - Active
    • vmnic1 - Standby
    • Load Balancing - default to port-based
  • Configure the VMotion port group as follows:
    • vmnic0 - Standby
    • vmnic1 - Active
    • Load Balancing - default to port-based

Figure 3


With this scenario, we've achieved our objectives of having SC & VMotion traffic on separate interfaces, each with a failover adapter available. We also can deterministically say that, assuming no path failures, we KNOW which interface is carrying which type of traffic. An even bigger benefit, in many organizations, is that the configuration required on the physical switch is none, zero, zip, nada!

Scenario 3 - Summary
CriteriaScoreEvaluation

Overall Score


B+

This is a good, relatively simple configuration. The biggest negatives are 1) VMotion traffic is tied to a single interface and 2) there's more complexity than you have to have.


Valid Configuration

A


This is a valid configuration
Fault Tolerance

A


There is no single point of failure in the network path - assuming you connect to two different physical switches.
Simplicity

B


There is limited complexity in this configuration
Politics

A


This configuration uses vSwitch port based load balancing which requires no coordination outside the virtualization team

Scenario 4

Finally, let's get rid of all the complexity that we can - making this the simplest configuration that meets our goals:

  • Configure vSwitch0 to use default port-based load balancing algorithm
  • Do not configure the physical switch (pSwitch) to use 802.3ad static link aggregation
  • Configure the Service Console port group as follows:
    • vmnic0 - Active
    • vmnic1 - Active
    • Load Balancing - default to port-based
  • Configure the VMotion port group as follows:
    • vmnic0 - Active
    • vmnic1 - Active
    • Load Balancing - default to port-based

Basically, what we've done is to let everything default. All the adapters are active, the load balancing method is virtual switch port based and nothing is overridden by the port groups. This yeilds the configuration shown in Figure 4, below.

Figure 4

Here we've also achieved our objectives of having SC & VMotion traffic on separate interfaces, each with a failover adapter available. But wait! How do we know this to be the case? We did not configure Active and Standby adapters, so how do we know each type of traffic will have its own pNIC? Well, since the default load balancing scheme is based on the vSwitch port and there are two pNICs and only two vSwitch ports in use - that's the default behavior! The thing that we've lost in this configuration is the ability to deterministically know which interface is carrying which type of traffic. I contend that's not a big deal. We also retain the advantage of no pSwitch configuration required.

Scenario 4 - Summary
CriteriaScoreEvaluation

Overall Score


A

This is a good, relatively simple configuration. The biggest negatives are 1) VMotion traffic is tied to a single interface and 2) there's more complexity than you have to have.


Valid Configuration

A


This is a valid configuration
Fault Tolerance

A


There is no single point of failure in the network path - assuming you connect to two different physical switches.
Simplicity

A


There is no unavoidable complexity in this configuration
Politics

A


This configuration uses vSwitch port based load balancing which requires no coordination outside the virtualization team

Summary

I recognize that there are other load balancing options available in VI-3; however, I don't really see them as being overly useful. Therefore, I'm not going to cover them in this discussion.

So, what does all this mean? Basically, if you're concerned about the amount of time it takes to migrate virtual machines off of your hosts and you have the ability to use 802.3ad link aggregation, then Scenario 2 is your best bet. This scenario does have more complexity than is absolutely necessary, but provides the benefit of reduced evacuation time when you need to shut a host down quickly.

If, on the other hand, you cannot - or don't want to - use 802.3ad link aggregation, then Scenario 4 is for you. Honestly, for a large percentage of environments that are deploying VI-3, this is the way to go. I gave it an "A" for a reason. It is the simplest option to implement and to maintain and it will serve your needs admirably, and besides...it follows my mantra of "If you don't have a very good reason to change a default value, don't change it!"

Let me know what you think - am I right or am I barking up the wrong tree?

0 comments: