Will this week’s Google Mail outage frighten you out of shifting more of your computing solutions into the cloud?
On balance, it shouldn’t, as no technology is perfect and failure is part of the landscape whether we keep our stuff in a data center, in a box under our desk, or on some unseen Web server on the other side of the country. But any failure of this magnitude offers up a prime opportunity to discuss — and hopefully improve — the weaknesses that can still bite us.
A dimmer forecast
By now, we all know what happened: On Tuesday, Gmail’s Web interface took a two-hour siesta and left millions of users unable to access their accounts. Thanks to social media tools like Twitter and Facebook, many users quickly learned that the Gmail application itself was still available, and switched to their iPhones and BlackBerrys to get their webmail fix.
Google, which initially called it a “minor problem” caused by underestimating the impact of taking some servers offline for routine maintenance, later fessed up and admitted what happened next was a pretty big deal. Close to 37 million users in the U.S. doubtlessly nodded their heads, and cloud computing’s sunny future dimmed significantly, if temporarily, in response.
This latest outage –Gmail’s last large-scale failure didn’t garner as many headlines because it happened when most folks were asleep– raises valid questions about the viability of cloud computing, and the vulnerabilities we take on as we shift applications and data from their traditional homes on client PCs and servers to unseen resources on the Web.
Don’t toss out the cloud. Yet.
Kindly note that “raises valid questions about the viability” does not mean “should never be adopted.” Cloud computing offers up significant advantages to businesses and consumers, and the latest Gmail outage doesn’t change what’s fundamentally right about this architecture. It does, however, give providers an opportunity to assess how Google could have responded more effectively, and how we, as users, can better protect our interests before the next inevitable failure.
Because the failure started within Google’s infrastructure, let’s first focus on what the company needs to do to minimize the potential for a recurrence, and to improve response in the event that something happens anyway:
Test more effectively. Routine maintenance shouldn’t take down a production environment. For a company with as much technical depth as Google, underestimating the impact of load balancing in the wake of taking some equipment offline is a pretty embarrassing goof. Going forward, I’d expect tighter internal processes to better understand the potential impact of any internal changes on external users.
Communicate more honestly. Google’s initial message — it’s a minor issue — served only to tick off millions of users already peeved that they had no e-mail. Customer Service 101 dictates that you do not minimize the experience of an aggrieved customer. Understand? Yes. Empathize? Certainly? But minimize? Avoid at all costs or risk the virtual equivalent of a kick in the shins.
Suggest alternatives more quickly. The outage affected only the Web interface. If you could access the app another way — say, through your BlackBerry Gmail app — you were fine. Google needs to more proactively communicate workarounds like this. At the very least, it gets a minority of those affected sort of back up and running.
And because responsibility is a two-way street, here’s what users should do to prepare for the inevitable and ensure they can keep working even if the worst happens.
Install Google Gears. This neat code that syncs your mail — and other Google stuff — to your local machine so you can continue working when you lose network connectivity, could have saved you some serious headaches this past Tuesday. Google Gears users simply kept working when the Web interface went down, and synced up when everything was fixed. Is it perfect? No. But sometimes good enough is all you need.
Identify alternative modes of access. If you’ve got an iPhone, BlackBerry or Android device, install Gmail. Now. When the inevitable next outage happens, the more ways you have to get in, the better.
Avoid the single point of failure. I’ve got a friend who long ago set up a rule in his Gmail account to reroute all his inbound e-mail to a Windows Live account. Although he’s likely Google’s biggest fan, even he appreciates the value of not putting all your eggs into one basket. He had never logged into the Microsoft account — until Tuesday. While everyone waited for Gmail to wake up from its slumber, he continued working.
The promise of cloud-based apps is the same whether they’re consumer-focused and free or subscription-based and used to run a business. Specifically, they deliver good enough functionality without requiring the complexity of a local install. For smaller businesses, cloud-based apps let them tap into greater capabilities without having to stretch their IT resources. No huge up-front investments or complex implementation projects. Just subscribe and go. And as the vendor updates the app, the company benefits, too.
But as the shift toward the cloud continues, end users have the right to the same kind of service level agreements that have governed more conventional applications for generations. If your installed version of Microsoft Office goes bad, these expectations are outlined right on the box the software came in. Even though most Gmail users are getting it for free, they nevertheless expect some kind of defined response. For consumers and businesses subscribing to Google’s premium services, expectations are decidedly — and understandably — greater.
It may very well be that Google’s current service levels exceed those that the average customer would be able to achieve using an in-house, server-based messaging solution. But despite this, the optics in the aftermath of Gmail’s latest outage are such that cloud computing carries risks, and the poster child for the cloud computing era, Google, needs to raise the level of its game and assume greater responsibility for the impact an outage has on its paying and non-paying customers. A tangible outcome to this outage will further solidify cloud computing’s future.
Carmi Levy is a Canadian-based independent technology analyst and journalist still trying to live down his past life leading help desks and managing projects for large financial services organizations. He comments extensively in a wide range of media, and works closely with clients to help them leverage technology and social media tools and processes to drive their business.