The mythical 99.99% uptime SLA
amazon and salesforce.com outages grabbed headlines this week. Google's blueprint in Harper's for its massive new data center in Oregon will fuel more scrutiny of SaaS and utility computing and SLAs associated with them.
I find the general assumption among us bloggers is these providers have to be up 99.9999% of the time. Major crisis when any of them has a crisis. Because supposedly that is what corporate America has always got and continues to get.
Who started that myth? Other than some sensitive and global apps, there is plenty of downtime in corporate data centers and those of big outsourcers like IBM and EDS.
In a given year (unlike a leap one like this year), we have 525,600 minutes. To meet a 99.99% uptime, the system could only be down 52 minutes - less than an hour - in the year. I can tell you most corporate data centers have scheduled downtimes which exceed that every month, if not every week.
Few companies have data centers which need to service users across more than 7-8 time zones. One based in Switzerland could service most of West and East Europe, one based in Singapore would cover a swath from India to New Zealand. If their operations are so geographically diverse. The majority of them find 16 hours of weekday coverage and 8 on weekends more than adequate. So a 90% uptime is good enough for many using 525600 minutes as denominator.
Of course, there are some applications which are global and some critical ones which need to be up constantly. But that is a small percent of the overall portfolio. And folks like IBM and CSC charge them a king's ransom to support those.
It's nice to push amazon and Google and others for high availability...but the big kahunas in corporate world don't get it consistently even at their large budgets...


"I can tell you most corporate data centers have scheduled downtime's which exceed that every month, if not every week."
That is an extremely excessive amount of downtime for a data center. I hope that your numbers are incorrect because that amount of downtime would be completely unacceptable and wholly unnecessary. Our data center hasn't had a single scheduled downtime and it has been up for 1.5 years (knock on wood). This includes a power system upgrade that didn't cause a system interruption. If corporate data centers are really down this much, they need to consider outsourcing to data center professionals - fast.
Travis Stoliker
Posted by: Travis Stoliker | February 17, 2008 at 11:25 PM
Amazon and Saleforce are active around the clock. Corporate sites do get the benefit of after hours, but even then they seem to schedule down-time when it is convenient for IT.
But, here is the big difference. Corporate sites tend to have captive or nearly captive audiences. With Amazon or Saleforce, switching could be painful but it can be done.
It is a little more difficult to fire your own IT dept.
Uptime is important if your competition (current or potential) can do better.
Posted by: Luke Gedeon | February 17, 2008 at 11:52 PM
Travis, that's fantastic...don't get me wrong - I am not suggesting hard shut downs each week just for the fun of it...more disruptive than letting them hum along...
but see other comments and expectations on uptime below - even more generous
http://discuss.joelonsoftware.com/default.asp?biz.5.429153.10
like I said some organizations and some apps - single a global single order entry system, credit card validation, certain security apps absolutely justify it...but most businesses and their apps are just fine with 16 hour weekday support...
Posted by: vinnie mirchandani | February 18, 2008 at 12:07 AM
Hi Vinnie,
Thanks for the response. Wow, you're right. People have been pretty forgiving about down time in their comments. That's amazing.
One of the other commenter's mentioned "corporate benefit of after hours" - I don't know any corporation now that truly has "after hours". If you operate on the Internet, or you're running email, or you're doing E-commerce globally, there is no after hours. Any downtime is either lost productivity or revenue or both. Period.
Travis Stoliker
Posted by: Travis Stoliker | February 18, 2008 at 09:25 PM
Travis, you identified the major apps which call for high availability...but there are plenty of internal, back office systems that really don't and there are plenty of businesses that are regional or local so to them even the apps you mention stay mostly within a band of 12 to 20 hours a day...
Posted by: vinnie mirchandani | February 18, 2008 at 11:02 PM
Hi Vinnie,
You are right that in many cases 99.999% uptime SLAs make no sense. In particular, for SaaS users the only thing that matters is whether they can login to the application. If there is a network problem and they cannot login, then for the B2B ISV the user has the same experience as if the host was down.
I talk about this some more in my post here: http://isvsurvival.eh1/blog/article/99-999-saas-host-uptime-sla-irrelevant-b2b-isv/
Andrew.
Posted by: Andrew Biss | February 20, 2008 at 11:46 AM