Well, I've decided to pack my bags and move. If you'd like to read my latest updates, please jump over to my new home on WordPress!
Jason Boche posted an interesting article on his blog today, and I thought I'd offer my thoughts on it.
First, here's the article from Virtualization Review magazine that started all this furor!
My guess is that part of the difference comes from both Hyper-V & Xen requiring VT capable CPUs (i.e. the VM always runs within a VT jail) while ESX supports binary translation (BT) for some 32-bit x86 instructions. The first generation of chips that supported VT weren't very good, and VMware's BT would often do a better job of executing the protected instructions than the hardware assist provided by the CPU. Intel has gotten better in the hardware support for virtualization which is shown in the subject test.
I've taken the data that was provide and normalized it and dropped it into some charts to help visualize the differences in performance between the three platforms. For each chart, the data was broken out by test and then normalized to percentages.
In the first two charts (CPU Performance & RAM Performance), longer bars represent better performance, while in the third & fourth charts (Disk Performance & SQL Performance), shorter bars are better.
When looking at CPU performance, ESX takes a beating in Test 1 & Test 2; however, in Test 3, it fares much better.
I supsect that CPU performance and RAM performance are inter-related - better CPU performance will yield better RAM performance. This suspicion is reasonably well borne out by the test results.
In disk performance, I don't know enough about how Hyper-V & Xen manage disk I/O to make much speculation here. Do they do caching above and beyond what is provided by the RAID controller? ESX does not. Were the .vmdk files (and the .vdf files) properly aligned when they were created? Did each hypervisor use the same block size? I don't know. I also have no idea why disk metrics were excluded from the Test 1 results - a question for the original authors, I suppose.
Remember, shorter bars = better performance in this graph.
For SQL performance, I took the elapsed time, converted it to seconds, and then into relative percentages. Here again, shorter bars represent better performance. Comparing the Test 2 & Test 3 results for Disk and SQL performance, it's clear that there's a close correlation between the two. Based on this, I surmise that you could extrapolate Test 1 disk performance from the Test 1 SQL performance numbers.
I do agree with Jason's remarks in his comment above - this was an "apples to apples" comparison, at least as far as we can tell from the information provided in the source article. It may have to do with the fact that both Hyper-V and Xen have both just completed a major rev and ESX is on the cusp of doing the same (I'm interested to see this same test with the next version of ESX, when it's available) - or it could simply be that VMware's got some performance tuning to do. Either way, an interesting article worth discussion.
Regardless of the differences shown in the tests that were performed, I feel that this particular test is relevant only for the SMB space. The enterprise environment isn't going to care so much about the performance differences shown here. The true differentiators in the enterprise are the very management features that were excluded from this test.
I've noticed something about engineers. They're never happy with the way something is configured out of the box - there's always a better way! Well, I have a different philosophy:
"If you don't have a very good reason to change a default value, don't change it!"To me, this seems totally obvious - in most cases, the default values are there for a reason. Their set to whatever value they're set to based on testing or customer feedback or imperical knowledge - rarely because someone just chose a value. As an example, let's explore the configuration of the virtual switch (vSwitch) used to support the Service Console in a VMware ESX Server. Assume the following:
- vSwitch0 is configured with two port groups - one for the Service Console, the other for VMotion
- There are two physical NICs (pNICS) (vmnic0 & vmnic1) associated with vSwitch0
- We want to have Service Console and VMotion traffic using different pNICs to pass their traffic
- In the event of the failure of one path (pNIC, cable, physical switch (pSwitch)), we want the traffic to automatically failover to the second path
- There are two separate physical switches available to support our environment
- [Optional, but recommended] You're aware that VLANs are not intended to provide security and that you will have comingling of data on the wire when using VLANs. Aware of this risk, you choose to go ahead and use VLANs for traffic segmentation and configure your Service Console port group on "VLAN SC" and your VMotion port group on "VLAN VMotion" - obviously, you'll need to assign the appropriate VLAN number to each port group as well as configure the appropriate ports on the pSwitch to support 802.1Q trunking.
Scenario 1
Many people will want to do the following:
- Configure vSwitch0 to use IP Hash as the load balancing algorithm
- Configure the physical switch (pSwitch) to use 802.3ad static link aggregation
- Configure the Service Console port group as follows:
- vmnic0 - Active
- vmnic1 - Standby
- Load Balancing - vSwitch Port Based
- vmnic0 - Active
- Configure the VMotion port group as follows:
- vmnic0 - Standby
- vmnic1 - Active
- Load Balancing - vSwitch Port Based
- vmnic0 - Standby
Figure 1
As shown in Figure 1, all paths are redundant - but there are some issues with this configuration! First, when using IP Hash as the load balancing algorithm on the vSwitch, it is recommended that you not use any other load balancing approach for your port groups:
"All adapters in the NIC team must be attached to the same physical switch or an appropriate set of stacked physical switches. (Contact your switch vendor to find out whether 802.3ad teaming is supported across multiple stacked chassis.) That switch or set of stacked switches must be 802.3ad-compliant and configured to use that link-aggregation standard in static mode (that is, with no LACP). All adapters must be active. You should make the setting on the virtual switch and ensure that it is inherited by all port groups within that virtual switch."
Next, remember that we have NO control over inbound network traffic. That means that the pSwitch could easily send inbound packets to us on our inactive pNIC. While I'm sure that the vSwitch will figure this out and deliver the packets to us - it is counter-intuitive and (in my opinion) a bad practice.
To continue with the list of problems with this configuration, consider that by specifying Port Based load balancing at the Port Group, you're defeating the purpose of using IP Hash on the vSwitch! The reason for using IP Hash is to enable link aggregation and the potential for a single vNIC to pass more than a single pNIC's worth of bandwidth. By explicitly specifying Active and Standby pNICs, you've eliminated that possibility.
One other thing that can be an issue is the use of link aggregation and IP Hash load balancing at all. This is not a technical problem, but a political one. I've worked in many large environments and have frequently run into issues when having to interface with the networking (or storage, or security) team. Ideally, all parties involved in infrastructure support will be 100% behind the virtualization effort - unfortunately, the world is far from an ideal place! In general, I like to minimize the number of touch points between the various entities - especially in the early stages of implementation...just to reduce friction and allow time for familiarity to breed acceptance.
Criteria | Score | Evaluation |
---|---|---|
Overall Score | F | The fact that this is an invalid configuration forces me to give it an overall score of "F" |
Valid Configuration | F | When the vSwitch is configured for IP Hash load balancing, it is improper to override at the Port Group level. |
Fault Tolerance | A | There is no single point of failure in the network path - assuming you have stackable switches. If your switches don't support stacking, then the grade here would be reduced to a "C" |
Simplicity | C | There is a lot of complexity in this configuration |
Politics | B | This configuration uses IP Hash load balancing which requires coordination outside the virtualization team |
Scenario 2
OK, so not everyone will go quite so overboard with the "tweeking". Let's make a couple changes that will at least make this a valid configuration - let's remove the specification of a load balancing algorithm on the port groups and set all vmnics active on all port groups. This means that we'll have something like this (changes from previous configuration highlighted in blue italics):
- Configure vSwitch0 to use IP Hash as the load balancing algorithm
- Configure the physical switch (pSwitch) to use 802.3ad static link aggregation
- Configure the Service Console port group as follows:
- vmnic0 - Active
- vmnic1 - Active
- Load Balancing - default to IP Hash
- vmnic0 - Active
- Configure the VMotion port group as follows:
- vmnic0 - Active
- vmnic1 - Active
- Load Balancing - default to IP Hash
- vmnic0 - Active
Here's the result of our labors:
Figure 2
At least we have a valid configuration! The benefit of using this configuration is that, if you have multiple VMotions active at the same time, you may use both of the pNICs in the vSwitch to increase your bandwidth utilization, which could help when you're trying to evacuate a bunch of VMs from a host that's in need of maintenance.
Criteria | Score | Evaluation |
---|---|---|
Overall Score | B | This is a good, relatively simple configuration. It has the advantage that you can use the bandwidth of both pNICs to expedite the migration of virtual machines from the host if you need to. |
Valid Configuration | A | This is a valid configuration |
Fault Tolerance | A | There is no single point of failure in the network path - assuming you have stackable switches. If your switches don't support stacking, then the grade here would be reduced to a "C" |
Simplicity | C | There is complexity in this configuration, but it is reasonable and serves a valid purpose |
Politics | B | This configuration uses IP Hash load balancing which requires coordination outside the virtualization team |
Scenario 3
Alright, now let's see how we can go about simplifying this whole thing:
- Configure vSwitch0 to use default port-based load balancing algorithm
- Do not configure the physical switch (pSwitch) to use 802.3ad static link aggregation
- Configure the Service Console port group as follows:
- vmnic0 - Active
- vmnic1 - Standby
- Load Balancing - default to port-based
- vmnic0 - Active
- Configure the VMotion port group as follows:
- vmnic0 - Standby
- vmnic1 - Active
- Load Balancing - default to port-based
- vmnic0 - Standby
Figure 3
With this scenario, we've achieved our objectives of having SC & VMotion traffic on separate interfaces, each with a failover adapter available. We also can deterministically say that, assuming no path failures, we KNOW which interface is carrying which type of traffic. An even bigger benefit, in many organizations, is that the configuration required on the physical switch is none, zero, zip, nada!
Criteria | Score | Evaluation |
---|---|---|
Overall Score | B+ | This is a good, relatively simple configuration. The biggest negatives are 1) VMotion traffic is tied to a single interface and 2) there's more complexity than you have to have. |
Valid Configuration | A | This is a valid configuration |
Fault Tolerance | A | There is no single point of failure in the network path - assuming you connect to two different physical switches. |
Simplicity | B | There is limited complexity in this configuration |
Politics | A | This configuration uses vSwitch port based load balancing which requires no coordination outside the virtualization team |
Scenario 4
Finally, let's get rid of all the complexity that we can - making this the simplest configuration that meets our goals:
- Configure vSwitch0 to use default port-based load balancing algorithm
- Do not configure the physical switch (pSwitch) to use 802.3ad static link aggregation
- Configure the Service Console port group as follows:
- vmnic0 - Active
- vmnic1 - Active
- Load Balancing - default to port-based
- vmnic0 - Active
- Configure the VMotion port group as follows:
- vmnic0 - Active
- vmnic1 - Active
- Load Balancing - default to port-based
- vmnic0 - Active
Basically, what we've done is to let everything default. All the adapters are active, the load balancing method is virtual switch port based and nothing is overridden by the port groups. This yeilds the configuration shown in Figure 4, below.
Figure 4
Here we've also achieved our objectives of having SC & VMotion traffic on separate interfaces, each with a failover adapter available. But wait! How do we know this to be the case? We did not configure Active and Standby adapters, so how do we know each type of traffic will have its own pNIC? Well, since the default load balancing scheme is based on the vSwitch port and there are two pNICs and only two vSwitch ports in use - that's the default behavior! The thing that we've lost in this configuration is the ability to deterministically know which interface is carrying which type of traffic. I contend that's not a big deal. We also retain the advantage of no pSwitch configuration required.
Criteria | Score | Evaluation |
---|---|---|
Overall Score | A | This is a good, relatively simple configuration. The biggest negatives are 1) VMotion traffic is tied to a single interface and 2) there's more complexity than you have to have. |
Valid Configuration | A | This is a valid configuration |
Fault Tolerance | A | There is no single point of failure in the network path - assuming you connect to two different physical switches. |
Simplicity | A | There is no unavoidable complexity in this configuration |
Politics | A | This configuration uses vSwitch port based load balancing which requires no coordination outside the virtualization team |
Summary
I recognize that there are other load balancing options available in VI-3; however, I don't really see them as being overly useful. Therefore, I'm not going to cover them in this discussion.
So, what does all this mean? Basically, if you're concerned about the amount of time it takes to migrate virtual machines off of your hosts and you have the ability to use 802.3ad link aggregation, then Scenario 2 is your best bet. This scenario does have more complexity than is absolutely necessary, but provides the benefit of reduced evacuation time when you need to shut a host down quickly.
If, on the other hand, you cannot - or don't want to - use 802.3ad link aggregation, then Scenario 4 is for you. Honestly, for a large percentage of environments that are deploying VI-3, this is the way to go. I gave it an "A" for a reason. It is the simplest option to implement and to maintain and it will serve your needs admirably, and besides...it follows my mantra of "If you don't have a very good reason to change a default value, don't change it!"
Let me know what you think - am I right or am I barking up the wrong tree?
OK, so all my friends have been trying to get me to start blogging - here goes!
We'll start with some background on me. That will help you decide whether there's any reason to believe what I post here.
I've been using technology to help get the job done since I was in high school. Back then, I was working on a (then antique!) IBM 402 accounting machine - and yes, I did 'program' them with those control panels!
Next, I had a six year stint in the US Air Force where I initially worked on a UNIVAC 418-III that was equiped with 128Kwords (yep, that's kilo-words) of RAM and an FH-1782 Drum unit with 4,194,304 words of storage. I helped write an implementation of the AUTODIN Mode I protocol for this system - which was fun because you had to do everything in overlays, and each overlay partition was 2K words big...
If anyone cares to learn more about this long-defunct system, Bit Savers has an archive of truly interesting documents :)
About two years in to my illustrious military career, I got recruited to go work at the White House Communications Agency. They liked the fact that I knew how to program in an assembly level programming language (it didn't matter that it wasn't the language they wanted me to use!). So...off to Washington DC and The White House! The task at hand was to migrate the White House record communications system from a RCA Spectra 70/45 to a UNIVAC 90/80. Interesting challenge - both systems used essentially the same assembly language; however, the 70/45 was overlay driven and used physical level (Head/Cylinder/Track) I/O addressing whereas the 90/80 used virtual memory and logical level (block number) I/O addressing. This is the project that I blame my having to wear hearing aides on. My "office" was a long, narrow hallway that was filled with disk drives. Not the ones you know and love today, but the washing machine size, 7 platter, 14" monsters that held a whopping 7.25MB of data. I'm guessing the ambient noise level was somewhere north of 110db...
During this time period, I did a stint as an instructor in IBM System/360 assembly language programming at the (now defunct) Computer Learning Center. Let me tell you, the best way to learn something is to teach it!
OK, time to separate from the Air Force. Off to work for Informatics General (who was later acquired by Sterling Software) and back to The White House to do again what we had just finished doing: replace the record communications system. This time, we migrated from the 90/80 to a DEC PDP 11/45.
Finished that project and did some work at the US Treasury Department and the Justice Department (JURIS, anyone?), too.
Left Informatics General (now Sterling Software) and went to work for Eaton (which was sold to Contel, which merged with GTE, who divested our division to DynCorp...) and back to The White House for another stint (by this time, they had migrated to yet another system for record communications!). I spent time at the DIA and several civilian agencies around the DC area. During this time I was mostly working with Novell NetWare (it was so amazing when you could pull files across the (10Mbps) network faster than you could read them from your local hard drive!).
Fast forward through MS-DOS and every version of Windows and we get to the GCSS-Army project and my first exposure to Microsoft Active Directory (MAD!). I became an AD Architect and jumped ship to head over to Compaq where I worked on several projects, including some AD replication work on the Exchange implementation for the US Senate.
HP and Compaq completed their merger.
Finally, in 2001, I get to go to a class on some new technology called VMware. I had no idea what it was - but I was up for something new. Immediately, I'm hooked. I go back to HP and try to find an opportunity to use this amazing new technology, but to no avail. So...I get put on a new project as the AD architect for a global construction materials manufacturing company's SAP implementation. When it comes time to place the hardware order, I convince the customer to include an extra DL380 for use as a VMware host - the toe is in the door!
It wasn't long until folks decided they needed more systems - but the procurement cycle was too long. This is when I piped up and said "we have this DL380 here with VMware installed on it - I can give you a new system in 30 minutes". We soon had multiple clusters with DL580's and DL740's backed by an XP1024 and an EVA5000 - the rest is history.
During this project, I discovered the VMware Community Forums and a bunch of really great people!
Since then:
- I've been to every US-based VMworld (missed the first, but been to (and presented at) all the rest)
- I've developed the virtual architecture for many Fortune 1,000 organizations, mostly in the Financial, Pharmaceutical, and Manufacturing verticals.
- I've done more P2V migrations than I care to count
- I earned my VCP in ESX 2 & VI 3
- I was named a VMware vExpert for 2009
- I am a community moderator for the VMware VMTN Communities forum
- I've provided technical review for books by Oglesby & Herold, Haletky, and Seibert
On the personal side:
- I live outside Washington, DC
- I have a wife and five children
- I have one grandchild
- We have four dogs: two beagles, a chihuahua/dachshund mix, and a great pyrenees
- I'm Executive Director of A Forever Home Rescue Foundation
- I love to fish (mostly bass, but pretty much anything that swims!)
If you've made it this far, you must be having a very slow day!