Leggo My Craiggo: December 2008

Wednesday, December 17, 2008

VMware without a Console. COS-Less and it's effect on you.

So in talking with some of the other professionals in the field we have discussed what needs to happen to make COS-less ESX the next big thing. ESXi is the first attempt and it's a good start. But there is a way to get the console in ESXi. In order to move this forward, all the console functionality will need to put in vCenter or something.

I was just thinking how Sweet it would be if when they move completely COS-less if you came up to the ESXi splash and entered in all the information and then put in the address of the vCenter server if you had one.

Once COS-less ESXi connects to the vCenter server you could have the options to roll-out a specified config (ala answer files for custom roll-outs) or push down a base config to use later. The only information you would need is the IP information on the hosts.

I can see a number of situations where this type of controlled roll-out would be beneficial. The other issue running through your head right now is "What if I don't have a vCenter server?". It's a good one. Maybe a limited functionality config database could reside on a server called "basic vCenter" some type of free version. By the time that this would happen clustered vCenter servers would have become the norm so up time would be slightly less of a concern.

I don't know if this is the way that VMware is heading towards but it would be cool nonetheless.

Let me know your thoughts. Then we can run through pro's and con's on the next blog article.

Friday, December 5, 2008

Building an Enterprise Class VMware Infrastructure. Take your time, blowing it here could cost you.

Designing an Enterprise class VMware VI3 environment is not an incredibly difficult task. It is though one that takes good planning and a full understanding of the network in both your data center and how your internal processes work. You also need a fair amount of VI3 understanding. You'll need to do your homework. A bunch of homework. So be ready. Remember that Enterprise means production, so treat it that way.

In the next few paragraphs I'll go through some of the items that I find it critical to look at when beginning a design.

So let's go through some of the basics.

1) Back-End Storage.

-You might be thinking "Why is storage given such a prominent place in design?". Here's your answer - Everything rides on your storage. Therefore it has to be more than adequate. You have to know your Read/Write ratio the number of IOPS you need to have available for the hosts, rough growth estimates, and don't forget adequate space.

So how do you know what to do?

Take valid measurements from physical hosts and plan your design around them. Example : Most VMFS volumes have a Read/Write ratio of 75/25. That makes them perfect for RAID 5. If your Read/Write moves more towards 50/50 or higher RAID 10 becomes a need instead of a desire.

Let's not discuss the "pooled storage" concept and what needs to be done to rid the market of it's presence. This is going to be long-term production and critical, treat it that way. Dedicated RAID sets for VMFS volumes. Let's remember the configuration maxims for VMware 64VM's per LUN is the MAX but you want to keep it to 20 or so as a sweet spot. Size your VMFS properly, if you plan to use a total of 800 GB remember that VMware needs some space to play just like Windows and Linux. That means your VMFS should be about 1TB in size.

Now about performance, you do know how much IO your going to push at these volumes if you did your homework. Please do yourself a favor and use at least SAS or FC-AL disks instead of SATA. We spoke earlier about VMware's intended purpose. It is production so please don't cheap it out with SATA. If you want to absorb 1200 IOPS you can't just use the RAW numbers of Disk performance. You have to calculate out your needs. Let's use a 75/25 Read/Write ratio and see what kind and how many disks we need to meet 1200 IOPS of performance on the front-end. To calculate the back-end storage we can use this formula for RAID 5 = (Disk IOPS * Read Ratio)+((Disk IOPS*Write Ratio)*4) ... this would give us a total of 2100 RAID adjusted IOPS. Therefore we will need at least 13 Disks in a RAID 5 array that spin at 15K (assuming we get 170 IOPS out of a 15K disk). So in the EMC world the best option would probably be 3 raid sets using a 4+1 Raid 5 layout and doing a metaLun across all of those. You would get some space and a touch of extra performance due to the metaLun. So do all that work for every VMFS or LUN that you need, don't forget to account for growth. In doing this you will properly size your storage environment and not have to go right back to the "well" in order to get more storage because performance sucks.

2) Network Infrastructure.

Networking in ESX is going to be more robust with the Nexus1000V but since we don't have that option yet, let's plan on the real networking horsepower to be in your core. I.E. - Cisco 6500, Foundry "JunkIron" ;P or your other various flavors of the "Core". If you need to span a bunch of VLAN's make sure that you have the vSwitches set for your needs. My personal preference is Tagging the frames at the vSwitch level. Then sending the frames over to the "Core" on Trunked links. Make sure you account for the amount of network usage you need. Don't under size this as NIC's are not overly expensive. Don't forget vMotion and redundant Service Console NIC's as they will play into your total NIC count.

Also make sure that you and your network guy have gone over this closely. If you are the "everything guy" double and triple check yourself. Make sure that you can account for all the bandwidth you need plus growth and the inevitable traffic spikes.

Also make sure that you connect ESX into the "Core" properly. If you are planning on EtherChannel then make sure that you have IP hash set on the NIC teaming for the vSwitch and portchannels properly configured on the switch.

If you have an internal vSwitch to vSwitch implementation. Please remember to account to everything on the inside of it so that you are not overwhelmed.

3) Server OS.

Next thing that I like to check, the Server Guest OS mix that I am going to run. If this is a production environment and you are planning on doing P2V's for most of your VM's then this is not a big deal. I like to verify that .ISO's for all the OS flavors I need are in a dedicated VMFS store that has been provisioned and has good performance. This keeps anyone from spending time locating OS media which is a waste of time that you can avoid.

4) Policies.

Policies are more of a "Who can screw up what." discussion with the Admin team. Locking people who don't understand VMware OUT of the system might be a good idea. After all, how many times have you seen the "IT Manager" think he understands all the technology log in and junk a VM because he didn't know what was going on? (I have. It happened more than once.) Don't lock management completely out. Just don't give them the creds to "help" you. Always make sure that no ONE person has UBER power. Always have a check and balance.

5) VI Host Hardware.

Host hardware is one of the places where you can demonstrate strategic ROI-based thinking. Looking to have the company spend just enough and then when the time comes to expand all the quantities are known and you don't need to perform the ENTIRE design process all over again. Buying 4 huge 16 proc boxes might not be the best use of funds. But if you got a number of Dual and a few Quad proc boxes it gives you a certain flexibility that cannot be underestimated. Being strategic here will demonstrate to your boss and those around you that your worry is the whole road map and not just a single point on a single solution. You gain credibility in a number of ways.

Some Bullet Points for Host Hardware:

Choose Either All AMD or ALL Intel don't mix/match.
Choose ONLY hardware on the HCL.
Forward thinking pays off here so do some.
Blades or Pizza Boxes. Mixing the two is just plain dumb. *cough* CoGR *cough* (In the end you get saddled with not being able to benefit 100% from either technology).
Build around total "Pools of resources" instead of being worried about individual specs.
Use EVC so that you can have forward mobility in your deployment.

4) Goals for implementing VMware.

Make sure that these Goals are documented.
Don't document the Goal without documenting the metrics that apply to them.
Verify that you are on track with the deployment.
Provide actual numbers for management to see and contrast the differences.
Make sure that you have planned to exceed your targets. (I know it seems elementary but it helps to have it in the back of your mind.)

**Do you have feedback or would you like an area of this article expanded? Please let me know.**

Wednesday, December 3, 2008

VI Too Easy? Too Easy To Screw-UP! YES.

So people have begun to see the value proposition in VMware and virtualization in general. But goodness. It is so easy to screw up the config and have it still perform! That speaks to the strength of the platform though. So how do we as VCP's and IT professionals change the CIO's and CTO's from ordering the "underlings" to deploy this great new technology when they don't even have a clue? We provide "Thought Leadership". We can give the concepts. Phil over at Joe the Consultant brings up a number of good points in one of his early blog posts titled "Why, Whatever shall we do?". In these tough economic times it's tough to see the value in bringing in a consultant to "just install ESX". But that is the problem. As consultants CTO's and CIO's need to recognize that we bring a whole HOST of services. Understanding both the technology and it's management so that the "corner stone" of your computing infrastructure can be managed and maintained in a proper manner. Is it too costly to spend 3-5000$ in the early stages to build it properly or 4 times that once it's running and the internal staff has hit a brick wall?

Front-loading time into design and understanding ALWAYS pays off during implementation.

So if you as a CIO or CTO are thinking about moving "Full-Speed" into virtualization, then please consider bringing in a consultant and tapping into their experience before starting out. They can guide you in your needs for Network Connectivity, Storage, and overall build practices. That money you spend up front will be easily recovered in NOT making common mistakes. This consultant can then work with you to determine a training path for your internal staff in order maintain the vision. Build a road mapped solution. Point solutions will only last for a short time and in the end will not provide either your internal clients OR your external clients the needed service that you can deliver.

Game Changing and IT Maturation.

So Phil over at http://joeconsultant.blogspot.com/ has been chiding me that this blog is more about short snippets than real content. I disagree, but lately he is right in the fact that I have been real busy. My early posts about "The Spark" is mainly about the lack of true Fire in the industry.

Technology that is new ... First off I'd say that NAND Flash may not be "NEW". But it's use as a SSD (Solid State Disk) is a newer development. The throughput on this technology is pretty good. The IOPS you can get are about 10000 random read and about 600 write (SPECS). SLC has a much faster write but reads are still really good. I have a MLC SSD on my laptop it's not bad at all. Once the inital disk buffer gets full you can really tell where MLC fails vs SLC technology. I don't have an SLC drive to test against like a drive from http://www.rocketdisk.com/ the Mtron's are fast and expensive but the throughput is excellent. The Intel X25-E's and the new Samsung 256GB SSD are putting up good numbers as well. **In case anyone from Intel or Samsung or Rocketdisk are reading. Toss me an SLC to test against and I'll get it done :)**

So what is so important about this and why is the a game changer? Up until now to get the kind of IOPS 4 of these drives in RAID 10 can provide you would be looking at about 2 or so trays of 15K FC-AL disks in an EMC array. No worries about rotational latency anymore with NAND Flash. That's not bad but of course these small 2.5" SATA drives are not enterprise NOR dual connected (Like the FC-AL controllers). EMC has announced that NAND Flash drives are coming to the new CX4 Line and soon the NS120 and NS480. My EMC people tell me that NAND Flash drives have to be bought by the tray, that's 15 at a time. Cost? A cool 150K roughly! Goodness. This is a game changer due to the fact that now you don't have to buy a huge SAN array in order to get smoking performance for Enterprise SQL/Oracle installations anymore. Also the type of performance on these drives allows enterprise virtualization consultants to run some pretty nice VM teams in Workstation to simulate some small-scale environments to demo to customers. No more are you sitting there with your laptop and watching that 7200 RPM SATA drive churn to run both windows and VM's in order to demo to a client and win the sale. With many of the newer laptops you can run 2 2.5" SATA drives in them. The capacity on these types of drives isn't up to the 500GB yet ... but it's getting there. Samsung is going to release it's 256GB soon, so we are close enough to consider 2 of these in a laptop running RAID 1. Since Most VM's have a 75/25 read/write ratio the IOPS are there to run the VM's that you need to and your host. Having the responsiveness and the simulation ready might just be the thing to get that sale.

Phil and Tom's little discussion on Maturing IT is interesting. Phil points out HERE that such HOT technologies as virtualization have been around for years. I agree for servers and Operating Systems, but the next game changer that I am going to speak about is the virtualization of the network switching layer in the Data Center network.

Imagine that you have a vCloud level virtualization practice and you need to have the following routing protocols in a virtual environment : EIGRP, OSFP, BGPv4 and other advanced switching and routing concepts including unified I/O and FCoE. Is this truly new? It's debatable on some of the stuff. FCoE is indeed newer concetp-wise. If you leverage the Nexus1000V which is going to be built into the ESX hypervisor, the possibilities are almost endless. the complexities of the internal infrastructure become limited ONLY by your imagination. You could have multiple complete routed infrastructures that have complete separation and stateful fail-over. It's coming. Microsoft has been touting that 97% of the server market is still physical. If that is truly the case properly implementing VMware ESX 4.0 and Nexus1000V will be of MAJOR importance. So if you are a virtualization consultant and you only know OS, VMware, and Storage you might be in need of up-skilling. True high-end virtualization consultants are going to need to understand more and more as this accelerates forward. So spinning forward into 2009, we can expect to see all these services tied into the Virtual DataCenter OS. Augment these base DataCenter OS services with the new VMware View, ThinApp, and the LifeCycle control set and your DataCenter becomes easily manageable and controlled. You have everything covered and just need to worry about your supply pools : Storage, CPU, Memory, Network, and how to manage your growth. At last a DataCenter that you can live with and meet your SLA's with less effort.

You can then:

-Retire the employees that don't meet your needs be that Vision and/or skillset.
-Focus on the TRUE nature of IT, the elimination of itself.
-Proving ROI by greening the DataCeter and focusing on proven value.
-Prove value by responding to Client requests both Internal and External faster than ever before.

- All this and MORE.

Next time I promise I'll post more, PHIL :P.

Tuesday, December 2, 2008

Solid State Storage.

The changes that we need to make are huge.

How will this influence SAN/NAS and other technology?

STAY TUNED!

Leggo My Craiggo