Browse











You are here: Home > The Problem with Thin Provisioning

The Problem with Thin-provisioning


Larry Loucks


www.virtualizationuniversity.com


 


There is much hype in the virtualization industry right now regarding software-based thin-provisioning of virtual disks.  This feature of VMware’s latest release, vSphere, is being talked about as if it were new, cutting-edge technology.  In reality, thin-provisioning has been around almost as long as virtualization technology and isn’t new at all.  Thin-provisioning also carries with it substantial risks that can cause VM crashes and even corruption of data in certain circumstances.  Making admins aware of these risks is the purpose of this article.


During my time with VMware as a Senior Consultant, myself and the other Senior Consultants with whom I was familiar would never recommend a customer thin-provision any production systems because software-based, thin-provisioning can be a very risky approach to disk-space management.  I personally engaged with a customer a few years ago that was utilizing a thin provisioning solution who accidently filled a LUN, crashed a bunch of VMs and corrupted thousands of files.  It took them a week to get back up and running.


Before we continue lets establish some key terms and definitions here just to make sure we’re all on the same page.



Virtual Disk – A file that houses information for a virtual machine.  The guest OS inside of the VM sees this file as a drive.


.vmdk (virtual machine disk) – the file type used by VMware’s ESX server for virtual disks



Thick-provisioning – Thick-provisioned virtual disks are virtual disks for which all of the assigned disk space is allocated immediately as soon as the virtual disk is created.  For example, if we build a VM and assign 100GB to the virtual disk, all 100GB of that space is allocated up front, even though there may be a much smaller amount of data actually residing in the virtual disk.  As the amount of data associated with this VM grows and is saved in the virtual disk, the size of the .vmdk stays the same and data is saved into the virtual disk.  The guest OS sees the size of the virtual disk as the total drive size.  So a 100GB thick-provisioned virtual disk looks like a 100GB HD to the guest OS.


Thin-provisioning – Thin-provisioned virtual disks are virtual disks for which the assigned amount of disk space is NOT allocated at creation time.  Only enough disk space is allocated to accommodate existing data with a small amount of additional “buffer” space.  As more data is saved into the .vmdk ESX server will expand the .vmdk in increments of 16MB up to the maximum amount assigned to the virtual disk.  The guest OS in the VM sees the max amount assigned to the virtual disk as the total drive size and is unaware the space isn’t really allocated.  So if a VM has 20GB of actual data and a 100GB, thin-provisioned virtual disk the guest OS sees a 100GB drive with 20GB of used space.  The guest OS is unaware that only slightly more than the 20GB of space needed for data has been allocated.



So, with these terms established, why is there so much hype in the industry regarding thin-provisioning?  Why now?  Is there a better solution?



Wasted Disk Space is the Problem


One of the issues we have in a virtual environment is that of wasted disk space.  The reality is until recently the most commonly used method to handle disk space allocation to virtual machines was to grossly over-allocate the space.  Let’s see how this works.


In any server environment we can roughly divide all of the VMs into one of two types of systems:



  1. Type 1 VM – VMs with which the users do NOT directly interact.  The users do NOT save data to these systems.  These tend to be fairly static except for the occasional service pack and expansion of log files.  They usually require relatively small amounts of disk space.  Many utility servers in the IT department such as DNS, DHCP and Active Directory Domain Controllers would be examples of these types of systems. 


  2. Type 2 VM – VMs with which the users DO directly interact.  The users DO save data to these servers, they tend to be more dynamic and the growth rate is less predictable.  These servers tend use larger amounts of disk space and are usually more critical servers.  Examples of these would include file servers, mail servers, document servers and database servers.

Now let’s take one of each of these types of VMs and discuss how disk space is typically allocated. 



Example 1 (Type 1 VM – No direct user saving of data)


Imagine we have a DHCP VM.  The only thing that will ever run in this VM is Windows with the DHCP service enabled.  The VM probably needs around 6GB of disk space.  Most IT shops would give the VM at least 15-20GB for drive C: even though they know they’ll never use that much space.  Why?  Because ultimately the assignment of disk space to a VM is a best-guess on our part as administrators.  We have to guess some amount of disk space that we believe will handle all of the patches, services packs, security updates, log files, etc. for the foreseeable future.  Now, one might say, “Well the VM is only over allocated by 14GB, that’s not that big a deal.”  True, but notice that we have allocated about 3X the amount of disk space the VM actually needs.



Example 2 (Type 2 VM – Users save data directly to these type of VMs)


Imagine we have a non-virtualized Exchange server that contains 25GB of data.  We want to virtualize this environment and the question comes up “How much disk space should we assign the virtual disk for this new Exchange VM?”  The reality is we have no idea how much disk space we will need.  We have no idea how many service packs/security updates will come out for Exchange and Windows during the next couple of years.  We have no idea how many emails and attachments this server will receive.  True, we may be able to make vaguely educated guess if we are really familiar with the environment and typical growth/usage patterns in the environment but even this is a guess.  So we have to guess some amount of disk space to assign to the VM.  The only thing that is certain is that if we’re smart about this we’re not going to guess some conservative amount of disk space that will create a near-term problem for the VM.  In other words I wouldn’t take a 25GB Exchange server, virtualize it and give it a 30GB virtual disk so it will run out of disk space in four weeks and create a big fire drill in the IT Department as we scramble to expand the virtual disk, expand the partitions, get it all back up and running and explain to management how we got here in the first place. 



So in an effort to avoid such a scenario most admins will assign some grossly over-allocated amount of disk space that they think will keep this VM running problem free for the foreseeable future, probably at least a couple of years.  How do I know this?  Well I know what I would do. Also of my 24 years in the IT Industry I’ve spent the past 6 working in the virtualization arena including contracting for VMware as an instructor and having been on staff with VMware as one of their Senior Consultants.  I speak with thousands of virtualized customers every year  in many different settings I would say around 90% of the customers that I speak with are running thick-provisioned virtual disks that are over allocated to accommodate future growth.  Also, up until recently, there really hasn’t been a better way to deal with this.  Perhaps in this example we would assign a 75GB-100GB virtual disk to this VM to accommodate  growth in the VM.  Notice once again we end up over-allocating by about 3X-4X.



What we end up with is that in many virtual environments 50-80% of allocated disk space is unused disk space.  This means our SANs fill up more quickly therefore shortening the useful life of those devices and/or forcing IT Departments to purchase more disk space.  Ultimately this waste leads to higher costs and a lower ROI for our investment in virtualization.




Introducing Thin-provisioning


Thin-provisioning is being touted as a way to avoid this waste of disk space, potentially lengthening the useful life of our storage and reduce storage-associated costs.  Recall from our terms that in a thin-provisioned environment only slightly more than the used disk space is actually allocated.  The .vmdk files can grow dynamically as needed and indeed, going with the industry statistic that about 50-80% of allocated disk space is unused disk space, one can definitely free up large percentages of storage space by implementing thin-provisioning.  So what’s the catch?




The Main Problem with Thin-provisioning


When a thick-provisioned virtual disk fills and someone tries to save data exceeding free-space, the error-checking mechanisms in the operating systems catch this and return an error (i.e. “insufficient disk space.  Please move or delete files….”).  When disks are thin-provisioned we effectively “lie” to the guest OS by implying it has disk space that isn’t actually allocated.  If the LUN unexpectedly fills and there are active, expanding .vmdks, EVERYTHING may come to a screeching halt in these VMs. 



Because the guest OS’s have been “lied to” the error checking mechanisms in the operating systems do not catch the fact that the drive is full.  The guest OS thinks it has room left on its “drive” but there is no space physically available for writing data. 



All of the expanding, active VMs on that LUN have nowhere to go but down.  There is no space to write a memory dump file, a log file, data or anything else.  The VMs may  instantly lock-up.  Whatever data was partially saved is just that, partially saved.  This can definitely corrupt data.   If a LUN accidently fills in a thin-provisioned environment, VMs will almost assuredly crash.  One additional inconvenience is that there is a performance hit for using thin-provisioning as the server continuously, dynamically allocated space on demand.  Admittedly this hit is fairly insignificant but it does exist.



How Can a LUN Fill Accidently?


First of all, many customers are unaware of other things that use LUN space in a virtual environment.  Every time a snapshot is opened additional unplanned disk space is used for delta files.  The first time every VM in their environment is booted a VMkernel swap file is created in accordance with the amount of RAM in the VM.  This means if I have a LUN with 20 VMs at an average of 2GB of RAM each around 40GB of unplanned disk space is consumed for those swapfiles.  Also, one of the characteristics of a virtual environment is its potentially dynamic nature. Things move around.  Cold migrations, new VMs and SVMotion are all great opportunities to fill a LUN you weren’t planning on filling.  My personal favorite is  twice I have seen an ESX LUN fill because an admin somewhere in the building copied a bunch of ISO files to the LUN without looking to see if the space actually existed.  FUN! FUN!  Just guess how eager that guy was to step forward and say “I did it…my bad.”




Now some admins might say “Well, I’ll keep an eye on it.”  Really?  In a completely dynamic virtual environment where people and processes are invoking snapshots, moving things around and provisioning new VMs?  And let’s not forget Johnny with the ISO files.  “Keeping an eye on it” is a tall promise to keep for several years of operation in this dynamic, changing, growing environment and a stress/uncertainty I personally wouldn’t want. 



Keep in mind also that a LUN with plenty of space this morning may be out of space come lunch time.  As one example, imagine we have a LUN hosting thin-provisioned virtual disks with 50GB of available disk space.  At 8am this morning 50GB was available.  At 10:30am an admin cold-migrated a 45GB VM to this LUN.  Two hours later the LUN filled and crashed 15 VMs.  Simply creating a VM on the LUN could provide the same effect.


Last but not least some might add “Well I’ll setup alarms in vSphere that will let me know when critical thresholds of overcommitment occur.”  So what happens when you get down the road a few years, your SAN is fully near full capacity with virtual disks, your SAN has more data on it than expected, you’re almost out of space and the VMs are thin-provisioned?  At least with thick provisioned disks you can stop the growth while you plot a solution, and worst-case scenario, if a VM fills it just fills…no catastrophic crash across all VMs on the LUN and no corruption of data.  The error checking mechanisms in the guest operating systems simply catch the fact that their drives have no more space.  If that environment was thin-provisioned the potential for major data corruption definitely exists once the LUNS fill because invariably we will end up with partially-saved (read “corrupted”) data.



Check out this article entitled “Don’t Let Thin-provisioning Gotchas Getcha”


http://www.networkworld.com/supp/2009/ndc1/012609-thin-provisioning.html



A Quote from the aforementioned article says :


“You've got to take the critical step of setting threshold alerts within your thin-provisioning tools because you're allowing applications to share resources…you can max out your storage space, and that can lead to application shutdowns and lost productivity because users can't access their data…You can get pretty close to your boundary, fast, and that can lead to panicked calls asking your vendor to rush you a bunch of disks.”



Is There a Better Way?


Yes.  Vizioncore, one of the leading software companies in the virtualization sector of the IT industry has a piece of software called vOptimizer Pro that effectively deals with and tackles the problem of wasted disk space in virtual environments without exposing your company to the performance hit and more importantly, the risks associated with, thin-provisioning.  Vizioncore unveiled this product at VMworld 2008 and it was a finalist for Best New Technology.  I am unaware of a competing product in the industry. 



vOptimizer Pro allows admins to create free-disk-space quotas using either fixed values (i.e. 20GB, 50GB, 1000GB) or percentage-based-values (I.e. 20%, 50%, 75%) and assign them to VMs.  The software then allows you use a calendar scheduling system to tell it when to run the optimizations, typically either after-hours or during regular network maintenance windows.  vOptimizer will run at the time prescribed by admins and where VM’s need to be right-sized, it will shutdown the VM, increase or decrease the size of the .vmdk as needed to comply with the free-space quotas established by the administrator, adjust the size of the internal partition to match the resized .vmdk, and as an icing-on-the-cake feature, it will align it all with the 64k block boundary of the underlying physical disk, enhancing performance.



I’m aware of a large media company in the SE region of the US that was able to free up 60TB of disk space by using this tool and was able to obtain an ELA (enterprise license agreement) for vOptimizer Pro for 1/10th of the cost of a new SAN in 1 of their 5 data centers.



vOptimizer Pro also has a nice pre-scanning application called vOptimizer Waste Finder that you can use to scan your environment to determine how much disk space can be freed through right-sizing.  This is free from Vizioncore’s website www.vizioncore.com.


vOptimizer Pro allows you to automate disk space management for your VM’s and eliminates the need to grossly over allocate disk space for growth without the dangers of thin-provisioning.  Also, vOptimizer can increase and decrease the size of your virtual disks as demanded by the environment (thin-provisioned disks can only increase as they grow).  The virtual disks used by vOptimizer Pro are all thick-provisioned.


Conclusion


I heard it said recently that “thin-provisioning is a way for storage admins to write bad checks.”  I totally agree.  This provisioning has the potential to create resume-generating events in your IT department staff.


Some say the solution is thin-provisioning more alerting to let you know when a train is coming.  I say stay off the tracks and avoid thin-provisioning problems altogether by using vOptimizer Pro.  Thick provisioned disks that are right-sized as needed is the way to go.  Software-based, thin-provisioning is being spoken of as if it’s some new technology in vSphere.  There reality is it’s not new and has been around for years and it’s still a bad idea.  On top of all of this it doesn’t perform as well, especially for disk I/O intensive applications. 


For more real-world virtualization tips and techniques check out the recorded, instructor-led training at



www.virtualizationuniversity.com