Editorials

When the Cloud Runs Out

Even when you are using cloud resources with a high degree of redundancy, and the cloud systems are performing correctly, you can still run into outages. That’s because you are running software hosted by cloud systems. Of course, your risk depends on what kind of cloud service you are using. In this case I am talking about platform as a service, where you are leasing time as a virtual machine to host your software. The cloud provider has the resources to assure your system has failover and all of that stuff.

Here is the kind of scenario I can envision. I remember years ago having an SQL Server database start throwing errors in the log in rapid succession. It was really scary. The logs read that memory objects were from one SQL process were being corrupted by another process running in the SQL Service. In other words, two different queries were impacting a common memory area. You could not count on the accuracy of query results, and it was possible for the database to become corrupt.

It turns out that this was a known issue, and had actually been fixed in the latest service patch that hadn’t been applied. We had been running that version of SQL Server for four years, and finally reached the threshold where the bug was exposed.

The big difference between this kind of bug, and the one Stephen mentioned in his editorial yesterday was the communication available. I was able to use entries from the log files and locate the issue, and the solution very quickly through a google search, directing me to specific information from Microsoft in the MSDN online libraries.

I really appreciate the transparency that is found when there are bugs in software. I’ve yet to meet someone who writes perfect software. So, why is it that we are afraid to admit we have issues?

Back to the main topic. It wouldn’t matter if I was running SQL Server in a self-hosted platform or in the cloud using Platform as a service, I would still have run into the bug, and still had to solve the problem. The point is, just because you are running in the cloud doesn’t mean you should not expect to have bugs. Even if you are using software as a service, such as SQL Azure. You are going to have bugs.

So what is the difference? In the cloud you have less control when the bug is in the software controlled by the cloud provider. I know that’s quite obvious. I guess that is one of the big issues holding some of us back from migrating fully to the cloud. We like to be in control. Then when things go wrong, it is on our schedule to get things resolved according to our priorities. When it is in the cloud, there could be many different systems impacted, and the priority is going to be completely controlled by the cloud provider.

What kind of agreement can you reasonably expect from a cloud provider? Even if they have huge penalties for failure, what difference does it make if it could ruin your business for a short time, or maybe altogether?

In short, just because your software is in the cloud, don’t expect that there will not be outage. So, what is your mitigation plan?

Cheers,

Ben