5 Steps to Reach Nirvana in the Cloud by Balancing the Yin and Yang
The Cloud presents many interesting opportunities for changing how business and IT function. An almost endless, instant capacity; no hardware to buy, rack, configure or manage; and a pay only for what you use or at least what you have provisioned model are just a few. But the Cloud is different than the traditional IT service delivery model. Here I offer 5 steps to reaching Nirvana in the Cloud.
Step 1: Create meaningful SLAs
The first step is to figure out how to address service management. Cloud platforms like Amazon and Rackspace don’t include any performance aspects in their standard SLAs only availability. Even GoGrid who does offer some Network performance SLAs does not make mention of VPS performance.
So how can you guarantee that you can deliver on the service level agreements you have with your internal or external customers? Adding practical measures of performance to SLAs from the providers is an obviously needed step as the Cloud matures. There’s a really detailed document by the Cloud Standards Customer Council called the Practical Guide to Cloud SLAs that’s worth a review for some ideas.
Step 2: Focus on User Experience
One of the most important steps is to establish a common set of metrics that define performance as it relates to user experience. I’ll skip my usual soapboxing about the data that supports why SPEED is so important…Web performance, in terms of delivering fast and feel-good user experiences, is one of the biggest single factors influencing online success and impacts all of those key business KPIs like bounce-rates, page views, conversions and user satisfaction scores.
While the owner of the platform will be managing to a reasonably high level of resource utilization for the platform, the business owner of the application is focused on driving business results and primarily cares about delivering the great user experiences required to ensure success. The real trick here is to manage the platform just to the level of utilization that does not degrade the user experience and to know where that is. This is the real art of managing that Yin and Yang.
Exactly how you measure user experience is another subject I talk more about here.
Step 3: Ensure APIs are fast and functional
Ensuring APIs are fast and functional for your Cloud applications is another key consideration. APIs are surely at a significant tipping point perhaps finally delivering on the promise of truly distributed applications that CORBA promised us in the 90s – yes I have some grey hair :). They are the supporting legs of almost all Mobile applications and are how we stitch together different application domains as well as publish our capabilities to other consumers. What’s really critical about APIs in terms of user experience is that they must function under much tighter tolerances that overall user experience because the calls the APIs are fulfilling are one of many that make up the overall speed and thus small slowdowns can have very negative affects on how the user perceives performance.
Step 4: Protect yourself from oversubscribed Clouds
I’ve often spoken to how User Experience becomes the one tangible thing that can be measured in a world gone virtual. The Cloud and virtualization means all the hardware resources are on springs and controlled by the hypervisor. And what if you have a bad neighbor whose application is always overconsuming CPU or I/O? How can you identify if your VPS is a victim of a term called “stolen time” which basically measures the amount of time that your VPS is ready to run but couldn’t because other VPS’ are competing for the CPU. Newer distros of Linux now have a CPU %age when you run ‘top’ called ‘st’ which stands for stolen CPU ticks. Ensuring that VPS performance, web performance and business performance are not suffering because your Cloud is oversubscribed is critical.
Step 5: Don’t put all your eggs in one geography
Finally, the recent Amazon outage as well as most data center blackouts are power related due to storms or a truck crashing into transformers. Just look at the listing that Rich Miller at Datacenterknowledge.com keeps Data centers go dark now and then. Without some multi-geography strategy for dealing with a complete data center outage you can’t really have a disaster recovery or business continuity plan. We really need the Redundant Array of Clouds and Redundant array of regions ideas.
Can I configure JBOC (just a bunch of Cloud) as a RAIC 5 (redundant array of inexpensive Clouds) please? Perhaps that’s something the Cloud providers should be building into their fabrics.
The cloud represents a multitude of possibilities for transforming how business and IT function and there are many paths to reach Nirvana, of which I’ve shared just 5 steps with you. The key to balancing the Yin and Yang though is to manage the VPS’ to just enough utilization to maintain fast and feel-good user experiences.
What do you think? Please share your thoughts and comments below.
See also:
- Polyglot: No One Language Will Rule the Cloud
- LBJ: Cloud Visionary?
- Cloudy with Scattered Outages – Dropbox #FAIL