Compellent Storage Center 5: Best SAN for VMware VMFS

Disclaimer: I work as a Solutions Architect for Alliance Technologies, a Compellent partner servicing the Midwest.  I was invited to attend a private briefing for the launch of Fluid Storage with Compellent Storage Center 5.  You can find the new stuff at that announcement page, but I feel it’s more important to hammer the message of Why Compellent for VMware.  I also believe that auto-tiering storage at the block level rocks the free world for VMware deployments!

AutoTieredStorageGraphic

Can your storage system auto-tier storage at the block level? That answer is likely no.  Most storage vendors allow you to move whole LUNs between different tiers of storage but they lack the intelligence or technology to move individual blocks.  Why is this cool and desired?  Well, the vast majority of VMware Virtual Machines are stored on SAN volumes formatted with VMFS, VMware’s virtual machine file system.  On this single SAN volume, there can be many virtual machines.  For a server deployments, 10-20 VM/VMFS is common and for VDI deployments it could be up to 64 VMs/VMFS.  So the on-going problem for all VMware administrators has been how do you ensure each of these VMs has the performance it needs when some VMs are very active and others are not?  That’s a valid question for less capable storage systems for most vendors.  But a better question is, “How can I provide tier 1 storage storage performance to only the blocks that need it?”  Only Compellent can provide a simple answer to that question with their automated tiered storage.

If you take a typical windows VM you have plenty of unused system files and stale data that is not ever accessed.  In the case of an Microsoft Exchange mail server, think of all the mail that is older than a 1 day old.  It is rarely accessed, so until it can be permanently archived to cheaper tiers of storage, it makes great sense to let the Compellent Storage Center automatically move the data that is rarely accessed to lower tier of primary storage, saving money on tier 1 storage and helping maximize the performance.  Since Compellent has a unique Dynamic Block Architecture, it can “see” inside the vmdk and identify blocks that are rarely used and move them to lower, cheaper tiers of storage without administrator intervention, scripting or creation of snapshots on VMs to facilitate storage vmotion.

SC-Block-Architecture.ashx

One blog post can’t really do Compellent technology justice.  I’d invite you to view the announcement of the new features of their fifth version of Compellent Storage Center here.   Many others within the storage blogosphere are linked to at the bottom of that post as well.

And as always, please feel free to comment on this post by hitting me on Twitter, with @vSeanClark.   I’ve been working with Compellent in VMware deployments for over 2 years now and have plenty of Compellent/VMware goodness to share with you.

Simplify Data Protection and Disaster Recovery with vSeanClark

So the “day” job at Alliance Technologies has me prepping for a Tuesday afternoon presentation on a Sunday afternoon. Yes, who needs football, beers, naps and nachos, let’s write a presentation. :( Actually, this should be fairly short and will end up getting me back to the beer and football after a few short ideas get “etched” on various spinning rust-colored platters.  If you are interested in attending the presentation in person, you can learn more and sign up here.

Okay, this presentation is for a Dell co-sponsored event with Alliance Technologies.  So we’ll likely be discussing Dell Equallogic in depth, but I’ve been a HUGE fan of Equallogic even before going to work for a Dell partner, so don’t think for a second I’m selling out.  :)   Anyway, my  main topic is “Simplify Data Protection and Disaster Recovery Solutions”.  What are my 3 main keys to achieve simple?  Well that’s simple in my mind:

  1. Virtualize your servers!
  2. Virtualize your desktops!
  3. Virtualize your data w/ virtualized iSCSI SAN from Dell Equallogic!

Yes, it’s dangerously simple to spout out those 3 bullet points but the devil is in the details.  I will flog and beat the devil to death in the next three posts and let you know how I arrive at simple, successful data protection and disaster recovery solutions that are repeatable, predictable and simple.

vWorkspace AntiVirus Best Practice

Well, www.vWorkspace.com is down at the time of this writing.  So I’m scouring the intertubes for some quick guidance.  Here’s what I’m finding.

Guidance from Citrix.  Yes, it’s not vWorkspace, but the fundamentals should be similar:  http://support.citrix.com/article/CTX114522 – The Cliff Notes of this document are:

  • Exclude Spool directory, page file, and ../program files/Citrix  (Not sure if later is required with Quest vWorkspace)
  • Scan write events only
  • Scan local drives only

Hmm, that’s nice.  Still no vWorkspace.com access. On to the next piece.

After installing Trend Micro ServerProtect presentation becomes unresponsive: http://support.citrix.com/article/CTX114137 .  That sounds like us.  Better give Trend Micro a holler on the telephone.

Still no vWorkspace.com, so checking on a good topic on BrianMadden.com that seems to fit the bill perfectly.  Best anti-virus for Terminal Services -  The dude posting the question references Patrick’s recommendation here: http://www.sessioncomputing.com/anti-virus.htm but that is down too!   Is there some kind of DDoS on vWorkspace sites today?  Actually, looks like that might just be an old site.

I guess, I’ll look this up the old-fashioned way.  Email to our SE at Quest.  Will update post with findings and best practice for A/V in vWorkspace Terminal Server environment soon.  :)

UPDATE: Customer is turning off auto-protect.  Kind of feel like if we’re going to do that, why don’t we just focus AV protection on the gateway and just continue to rebuild fully patched clean Terminal Servers with visionapp Server Management on a more regular basis.  With no server-based AV mucking up performance.  Thoughts?

Notes from Get Your Head in the Cloud Conference

Just got done with the morning session of the Get Your Head in the Cloud Conference here on the Iowa State University campus. Jeff Barr of Amazon presented as well ad Dennis Quan of IBM. This was a good intro into Cloud computing and good overview of where Amazon and IBM see cloud computing heading (or where they want to steer it).   But is was a blatant ripoff of the VMunderground t-shirt slogan:PUT THE CLOUD BUMPER STICKER

Jeff Barr – Amazon

I got in a bit late, so I missed Jeff’s first half.  When I sat down he mentioned cloud use cases for AWS.  Intuit uses it for load testing their apps to ensure they have bugs caught before tax filing crunch time on April 14th.  Seasonal or as needed use of cloud is classic use case.  Then a slide about other Amazon cloud services came up, storage, VMs, web platform,,, but my favorite was on the far right:  mechanical turk!!  I had no idea mechanical turks existed.  I think I’ve found my new buzz word.  Any, Amazon’s mechanical turk is way to get human powered work done via the cloud or find work to do via the cloud.  I just did a quick search of this and highest pay rate was something like $2.07 an hour.   So I won’t be getting rich being a Turk, but might be able to find some cheap labor to knock out turk-like duties.

Overviewed other AWS advantages like: Offload heavy lifting when building out infrastructure for your app, lower costs (capital only, operating will still bite you butt, but it is all a trade off), and reduce time to market of your app (spin up resources when you need them).  Focus here was squarely on that of developers choosing a platform for their apps.  This is where I put together a tweet on battle for platform dominance, “Academia is the battleground for influencing cloud platform bias of next generation of app developers”

Jeff wrapped up by mentioning that cloud services like their SSS, start at home/individual level, then introduced to co-workers, then to departments and ultimately company wide.

Dennis Quan – IBM

Dennis was up next with the IBM perspective of cloud and what services they are offering.  Dennis started with the obligatory definition of what cloud services are:

  1. Self Service
  2. Delivered over the network
  3. Elastic scalability (grow as big as you need, pay as you go..)

Probably one of the most concise, general ways to define cloud services.  Next slide was a good reminder for the x86 virtualization zealots in the house (not pointing any fingers), but he showed the layers of cloud services form apps, to platforms (apache, .NET, MySQL..), os (windows, linux, Unix..) to hardware (x86, mainframe)…….  WHAT?!? Mainframe??  Yes, Sean and other x86 pizza box server loving zealots, you can do cloud with good ole mainframes as well.  You won’t hear that from VMware, but mainframes are perfectly valid hardware platform to build an app on top of that can be: 1. self-service 2. network delivered and 3. Elastic scale.  This will feed another thought on cloud that I will drive home with some future posts is that the platform is where the battle for cloud will be fought.  You have to attract developers to write apps that people want for your platform otherwise you will lose market share.

IBM has run a private cloud infrastructure to service their 400K employee base since 2006.  It’s called the Innovation Portal and speeds products to market and allows greater efficiency in new product development and testing.

Dennis mentioned a joint effort with IBM, Google, and the NSF to provide academia access to a massive compute cloud based on Google’s MapReduce computing techniques for large data compute processes.  This runs on Apache Hadoop and users can leverage Java or scripting language of choice to accomplish their large processing task.

Dennis mentioned that everyone loves cloud benefits and efficiencies but can’t/won’t release their arms around their datacenter.  VMware used to call these folks server-huggers, I’m going to coin new term (i hope): cloud huggers.  The private cloud is here to stay, although hybrid cloud use will continue to grow and augment private cloud setups.

Q + A:

Dev-ministrator: A developer that has to learn to administrator cloud services as part of daily job in order to more properly design and debug app to work and scale for the cloud.

I think we should all be dev-ministrators or enginevelopers or architadmins….  Well roundedness in IT is huge skill and increasingly needed as we swing back hard and fast to centralized computing of the cloud.  If we aren’t all the renaissance IT man, then we MUST be able to communicate properly across disciplines, departments and organizations.

How do I back up and protect my data in the cloud?

Great question.  The $25K question.  How do I secure my information assets in the cloud?  See Christopher Hoff to follow the ongoing quest for this holy grail.  But the gentleman who asked this question pared down the scope a bit.  He stores his pics and core data on Amazon’s S3 storage service.  But how should he best protect this data?  Buy a European S3 account? Backup to local hard drives?  Get service from another cloud provider?  What about use of Twitter, Facebook, bit.ly and services like them?  What happens if they go belly up?  How do you get the data back.

There is no pat answer to that.  The most powerful take-away here is the questions themselves.  Individuals and businesses need to consider these questions when choosing to rely on cloud services.  Dennis Quan brought up good analogy.  When households first bought lightbulbs and signed up for electricity service from the utility, they still chose to keep some candles and matches around the house.  With cloud, common sense and basic risk management will determine how you protect cloud assets or if you go to the cloud at all.  If you are the next twitter, you’ll probably play fast and loose with the cloud.  If you are global financial firm worth $100 billion dollars, you’ll probably consume clouds of clouds AND a local private cloud backup.

That’s my quick brain dump from a cafe in Ames, Iowa.  Looks like I’m not getting any magical invites via email or twitter to attend the private cloud sessions this afternoon, so I’ll head back to work.  Give me a holler if you have any questions.  Comments welcome as are tweets to vSeanClark.  :)

Killing a hung VM with /proc-FU

Finished the week with an interesting support call.  To make a long story short, customer ended up with a non-responsive VM.  We tried to open the console on the VM but got the following error more or less (not exact path but you get idea):

Error connecting: Error connecting to /vmfs/volumes/47a23275-63d1cb52-6968-0019b9e5c637/vCenter/vCenter.vmx because the VMX is not started.

Other VMs opened perfectly fine on this same ESX server, just this one VM was hosed. Couldn’t ping it’s IP either.  Tried restarting mgmt-vmware from the service console, and that removed the VMname from the ESX servers inventory the next time we logged in.  Just some weird placeholder VM instead, which I ended up removing from inventory.  Next tried to re-add the vCenter VM to inventory by browsing to the datastore.  No luck, this process hung.  So restarted mgmt-vmware again.  And this time decided to look at esxtop to see if this VM was still running or something.  And it was..or at least something was running with its name.  So now I set out to restart it with vmware-cmd.

Ran vmware-cmd from the service console and vCenter did not show up as a running VM.  Weird! It’s in esxtop but not vmware-cmd -l.  So now I need to try to find the process for this hung, posessed VM and kill it. So I tried the following

ps -ef | grep vCenter that I found on Google here:  http://www.esxguide.com/esx/content/view/11/14/ .  Cool, but this kept returning a ever-increasing and chanign PID.  Not cool.

Then I discovered this gem from the VMware Communities . First up attack, new PS argument from page 8 of the pdf.

ps axu | grep vCenter -> Still a fail.  PID kept incrementing every time I ran the command.  Is something relaunching, or what?  Must find root process but how?

Next up, /proc-FU.  On Page 9, michaelstan of the communities, suggests the following, which I follow verbatim and looked for my vm named “vCenter”:

(at the cmd prompt enter) cat /proc/vmware/vm/*/names

This lists the running VM’s on the host server you are logged on to.

vmid=1069 pid=-1 cfgFile=”/vmfs/volumes/45…/server1/server1.vmx” uuid=”50…” displayName=”server1″

vmid=1107 pid=-1 cfgFile=”/vmfs/volumes/45…/server2/server2.vmx” uuid=”50…” displayName=”server2″

vmid=1149 pid=-1 cfgFile=”/vmfs/volumes/45…/server3/server3.vmx” uuid=”50…” displayName=”server3″

vmid=1156 pid=-1 cfgFile=”/vmfs/volumes/45…/server4/server4.vmx” uuid=”50…” displayName=”server4″

vmid=1170 pid=-1 cfgFile=”/vmfs/volumes/45…/server5/server5.vmx” uuid=”50…” displayName=”server6″

vmid=1178 pid=-1 cfgFile=”/vmfs/volumes/45…/server6/server6.vmx” uuid=”50…” displayName=”server6″

vmid=1188 pid=-1 cfgFile=”/vmfs/volumes/45…/server7/server7.vmx” uuid=”50…” displayName=”server7″

vmid=1198 pid=-1 cfgFile=”/vmfs/volumes/45…/server8/server8.vmx” uuid=”50…” displayName=”server8″

[-If you are running ESX 2.5 then you can kill the vmx PID-]

If you are running ESX 3.0.x then you find group ID that controls the PID of the VM.

(at the cmd prompt enter) less -S /proc/vmware/vm/1149/cpu/status

vcpu vm type name uptime status costatus usedsec syssec wait waitsec idlesec (more…)

1149 1149 V vmm0:server3 350042.494 WAIT STOP 15968.954 518.916 COW 325800.734 322397.266 (more…)

Scroll right with the right arrow key to locate the “group” pid. In this case the group pid was 1148 (not shown in this

example)

Now with the group PID you can kill the VM safely without corrupting the VM as posted earlier.

(at the cmd prompt enter) /usr/lib/vmware/bin/vmkload_app -k 9 1148

Warning: Apr 20 16:22:22.710: Sending signal ‘9′ to world 1148.

THIS MEANS SUCCESS… if you receive another line then the process might not have been successful.

Hope this helps!

Michael Stan

In short, I did the following from the bold:

  1. cat /proc/vmware/vm/*/names
  2. “less -S /proc/vmware/vm/1149/cpu/status” where 1149 was the VMID of the VM in question (found with step 1) and then hit right arrow until I found the “group” pid.
  3. “/usr/lib/vmware/bin/vmkload_app -k 9 1148″ where 1148 was my group pid found in #2.
  4. received the following “success” message: “Warning: Apr 20 16:22:22.710: Sending signal ‘9′ to world 1148.” and ran esxtop to verify the VM was done running, which it was done.
  5. Re-added VM to inventory and all is well.

Haven’t had a hung VM since the ESX 2.5 days, so it was a fun little challenge to finish out my Friday afternoon.  But thought I would quickly share for the benefit of all.

Ping me on Twitter if you have questions.  vSeanClark

UPDATE: Jason Boche suggested I could have arrived at the PID w/ ps -auxwww | grep VM-Name.   Well that would have been quite a bit simpler but wouldn’t have given me an opportunity to say /proc-FU again.   :)

(at the cmd prompt enter) cat /proc/vmware/vm/*/names
This lists the running VM’s on the host server you are logged on to.
vmid=1069 pid=-1 cfgFile=”/vmfs/volumes/45…/server1/server1.vmx” uuid=”50…” displayName=”server1″
vmid=1107 pid=-1 cfgFile=”/vmfs/volumes/45…/server2/server2.vmx” uuid=”50…” displayName=”server2″
vmid=1149 pid=-1 cfgFile=”/vmfs/volumes/45…/server3/server3.vmx” uuid=”50…” displayName=”server3″
vmid=1156 pid=-1 cfgFile=”/vmfs/volumes/45…/server4/server4.vmx” uuid=”50…” displayName=”server4″
vmid=1170 pid=-1 cfgFile=”/vmfs/volumes/45…/server5/server5.vmx” uuid=”50…” displayName=”server6″
vmid=1178 pid=-1 cfgFile=”/vmfs/volumes/45…/server6/server6.vmx” uuid=”50…” displayName=”server6″
vmid=1188 pid=-1 cfgFile=”/vmfs/volumes/45…/server7/server7.vmx” uuid=”50…” displayName=”server7″
vmid=1198 pid=-1 cfgFile=”/vmfs/volumes/45…/server8/server8.vmx” uuid=”50…” displayName=”server8″
[-If you are running ESX 2.5 then you can kill the vmx PID-]
If you are running ESX 3.0.x then you find group ID that controls the PID of the VM.
(at the cmd prompt enter) less -S /proc/vmware/vm/1149/cpu/status
vcpu vm type name uptime status costatus usedsec syssec wait waitsec idlesec (more…)
1149 1149 V vmm0:server3 350042.494 WAIT STOP 15968.954 518.916 COW 325800.734 322397.266 (more…)
Scroll right with the right arrow key to locate the “group” pid. In this case the group pid was 1148 (not shown in this
example)
Now with the group PID you can kill the VM safely without corrupting the VM as posted earlier.
(at the cmd prompt enter) /usr/lib/vmware/bin/vmkload_app -k 9 1148
Warning: Apr 20 16:22:22.710: Sending signal ‘9′ to world 1148.
THIS MEANS SUCCESS… if you receive another line then the process might not have been successful.
Hope this helps!
Michael (at the cmd prompt enter) cat /proc/vmware/vm/*/names
This lists the running VM’s on the host server you are logged on to.
vmid=1069 pid=-1 cfgFile=”/vmfs/volumes/45…/server1/server1.vmx” uuid=”50…” displayName=”server1″
vmid=1107 pid=-1 cfgFile=”/vmfs/volumes/45…/server2/server2.vmx” uuid=”50…” displayName=”server2″
vmid=1149 pid=-1 cfgFile=”/vmfs/volumes/45…/server3/server3.vmx” uuid=”50…” displayName=”server3″
vmid=1156 pid=-1 cfgFile=”/vmfs/volumes/45…/server4/server4.vmx” uuid=”50…” displayName=”server4″
vmid=1170 pid=-1 cfgFile=”/vmfs/volumes/45…/server5/server5.vmx” uuid=”50…” displayName=”server6″
vmid=1178 pid=-1 cfgFile=”/vmfs/volumes/45…/server6/server6.vmx” uuid=”50…” displayName=”server6″
vmid=1188 pid=-1 cfgFile=”/vmfs/volumes/45…/server7/server7.vmx” uuid=”50…” displayName=”server7″
vmid=1198 pid=-1 cfgFile=”/vmfs/volumes/45…/server8/server8.vmx” uuid=”50…” displayName=”server8″
[-If you are running ESX 2.5 then you can kill the vmx PID-]
If you are running ESX 3.0.x then you find group ID that controls the PID of the VM.
(at the cmd prompt enter) less -S /proc/vmware/vm/1149/cpu/status
vcpu vm type name uptime status costatus usedsec syssec wait waitsec idlesec (more…)
1149 1149 V vmm0:server3 350042.494 WAIT STOP 15968.954 518.916 COW 325800.734 322397.266 (more…)
Scroll right with the right arrow key to locate the “group” pid. In this case the group pid was 1148 (not shown in this
example)
Now with the group PID you can kill the VM safely without corrupting the VM as posted earlier.
(at the cmd prompt enter) /usr/lib/vmware/bin/vmkload_app -k 9 1148
Warning: Apr 20 16:22:22.710: Sending signal ‘9′ to world 1148.
THIS MEANS SUCCESS… if you receive another line then the process might not have been successful.
Hope this helps!
Michael Stan