Friday, December 5, 2008

Building an Enterprise Class VMware Infrastructure. Take your time, blowing it here could cost you.

Designing an Enterprise class VMware VI3 environment is not an incredibly difficult task. It is though one that takes good planning and a full understanding of the network in both your data center and how your internal processes work. You also need a fair amount of VI3 understanding. You'll need to do your homework. A bunch of homework. So be ready. Remember that Enterprise means production, so treat it that way.

In the next few paragraphs I'll go through some of the items that I find it critical to look at when beginning a design.

So let's go through some of the basics.


1) Back-End Storage.


-You might be thinking "Why is storage given such a prominent place in design?". Here's your answer - Everything rides on your storage. Therefore it has to be more than adequate. You have to know your Read/Write ratio the number of IOPS you need to have available for the hosts, rough growth estimates, and don't forget adequate space.

So how do you know what to do?

Take valid measurements from physical hosts and plan your design around them. Example : Most VMFS volumes have a Read/Write ratio of 75/25. That makes them perfect for RAID 5. If your Read/Write moves more towards 50/50 or higher RAID 10 becomes a need instead of a desire.

Let's not discuss the "pooled storage" concept and what needs to be done to rid the market of it's presence. This is going to be long-term production and critical, treat it that way. Dedicated RAID sets for VMFS volumes. Let's remember the configuration maxims for VMware 64VM's per LUN is the MAX but you want to keep it to 20 or so as a sweet spot. Size your VMFS properly, if you plan to use a total of 800 GB remember that VMware needs some space to play just like Windows and Linux. That means your VMFS should be about 1TB in size.

Now about performance, you do know how much IO your going to push at these volumes if you did your homework. Please do yourself a favor and use at least SAS or FC-AL disks instead of SATA. We spoke earlier about VMware's intended purpose. It is production so please don't cheap it out with SATA. If you want to absorb 1200 IOPS you can't just use the RAW numbers of Disk performance. You have to calculate out your needs. Let's use a 75/25 Read/Write ratio and see what kind and how many disks we need to meet 1200 IOPS of performance on the front-end. To calculate the back-end storage we can use this formula for RAID 5 = (Disk IOPS * Read Ratio)+((Disk IOPS*Write Ratio)*4) ... this would give us a total of 2100 RAID adjusted IOPS. Therefore we will need at least 13 Disks in a RAID 5 array that spin at 15K (assuming we get 170 IOPS out of a 15K disk). So in the EMC world the best option would probably be 3 raid sets using a 4+1 Raid 5 layout and doing a metaLun across all of those. You would get some space and a touch of extra performance due to the metaLun. So do all that work for every VMFS or LUN that you need, don't forget to account for growth. In doing this you will properly size your storage environment and not have to go right back to the "well" in order to get more storage because performance sucks.

2) Network Infrastructure.


Networking in ESX is going to be more robust with the Nexus1000V but since we don't have that option yet, let's plan on the real networking horsepower to be in your core. I.E. - Cisco 6500, Foundry "JunkIron" ;P or your other various flavors of the "Core". If you need to span a bunch of VLAN's make sure that you have the vSwitches set for your needs. My personal preference is Tagging the frames at the vSwitch level. Then sending the frames over to the "Core" on Trunked links. Make sure you account for the amount of network usage you need. Don't under size this as NIC's are not overly expensive. Don't forget vMotion and redundant Service Console NIC's as they will play into your total NIC count.

Also make sure that you and your network guy have gone over this closely. If you are the "everything guy" double and triple check yourself. Make sure that you can account for all the bandwidth you need plus growth and the inevitable traffic spikes.

Also make sure that you connect ESX into the "Core" properly. If you are planning on EtherChannel then make sure that you have IP hash set on the NIC teaming for the vSwitch and portchannels properly configured on the switch.

If you have an internal vSwitch to vSwitch implementation. Please remember to account to everything on the inside of it so that you are not overwhelmed.

3) Server OS.

Next thing that I like to check, the Server Guest OS mix that I am going to run. If this is a production environment and you are planning on doing P2V's for most of your VM's then this is not a big deal. I like to verify that .ISO's for all the OS flavors I need are in a dedicated VMFS store that has been provisioned and has good performance. This keeps anyone from spending time locating OS media which is a waste of time that you can avoid.


4) Policies.

Policies are more of a "Who can screw up what." discussion with the Admin team. Locking people who don't understand VMware OUT of the system might be a good idea. After all, how many times have you seen the "IT Manager" think he understands all the technology log in and junk a VM because he didn't know what was going on? (I have. It happened more than once.) Don't lock management completely out. Just don't give them the creds to "help" you. Always make sure that no ONE person has UBER power. Always have a check and balance.

5) VI Host Hardware.

Host hardware is one of the places where you can demonstrate strategic ROI-based thinking. Looking to have the company spend just enough and then when the time comes to expand all the quantities are known and you don't need to perform the ENTIRE design process all over again. Buying 4 huge 16 proc boxes might not be the best use of funds. But if you got a number of Dual and a few Quad proc boxes it gives you a certain flexibility that cannot be underestimated. Being strategic here will demonstrate to your boss and those around you that your worry is the whole road map and not just a single point on a single solution. You gain credibility in a number of ways.

Some Bullet Points for Host Hardware:
  • Choose Either All AMD or ALL Intel don't mix/match.
  • Choose ONLY hardware on the HCL.
  • Forward thinking pays off here so do some.
  • Blades or Pizza Boxes. Mixing the two is just plain dumb. *cough* CoGR *cough* (In the end you get saddled with not being able to benefit 100% from either technology).
  • Build around total "Pools of resources" instead of being worried about individual specs.
  • Use EVC so that you can have forward mobility in your deployment.


4) Goals for implementing VMware.

Make sure that these Goals are documented.
Don't document the Goal without documenting the metrics that apply to them.
Verify that you are on track with the deployment.
Provide actual numbers for management to see and contrast the differences.
Make sure that you have planned to exceed your targets. (I know it seems elementary but it helps to have it in the back of your mind.)

**Do you have feedback or would you like an area of this article expanded? Please let me know.**

1 comment:

JennyMack said...

Hi Craig,

I apologize for posting this here, but I couldn't find any contact information for you... if you could e-mail me when you get a chance, I'd like to speak with you about a possible promotions trade between your blog and our site.

Thanks Craig,

Jenny
Community Manager, ITKnowledgeExchange.com
jenny@itknowledgeexchange.com