Skip to main content

Amazon Web Services Outage: Cloud Computing Proof of Concept


In late April 2011, a huge outage amongst the cloud computing network from Amazon, called Amazon Web Services (AWS), happened after massive failures in the company's Virginia-based facilities literally shut down many businesses operating on the network. Opponents of the cloud computing paradigm were all over the outage as proof that cloud computing is bad for business.
Except it isn't and the outage proved exactly the opposite.
Quick Look at How Cloud Computing Works
The idea behind cloud computing is to diversify resources so that they are better managed and put to more use. Basically, it spreads resources and at the same time allows them to be used by more users, thus lowering costs.
There are two basic models for cloud computing: 'design for failure' and 'traditional.' Contrary to the title, the traditional model is actually becoming outmoded by the design for failure (DFF) model. The AWS was a mixture of both.
In traditional cloud architecture, geographic limitations mean that while the workload is spread amongst various systems, they must all be within a relatively small geographic area. It puts most of the weight of availability of resources on the infrastructure and redundancy within it. The down side to this model is that when an area-wide failure happens, the traditional cloud is likely to go with it.
In the DFF model, redundancy is removed from the infrastructure and spread the availability to a combination of software management and physical design. This allows for failures of single or multiple parts of the cloud to happen without destroying the cloud's availability and the applications and data on it. In the best DFF setups, data is spread to multiple locations and mirrored (saved in copies) in multiple locations geographically to avert catastrophic failure and data loss.
Again, the Amazon system was based on a hybrid of both models.
The Five Levels of Redundancy
Five types of redundancy can happen in cloud computing and having all of them is optimal. These five are: physical, virtual resource, availability zones, regions, and the cloud itself. Having redundant resources and facility for all five of these means the cloud will be stable even in a major shakeup. AWS didn't provide for all five. The Virginia outage showed that their regional redundancy was lacking and their physical backups of some data were inadequate.
The trouble with DFF systems is that they must be designed from the ground up to be DFF systems. The AWS was not designed this way because it began before DFF was really mainstream. So, like many cloud computing platforms today, it was retrofitted rather than designed from scratch to be designed for failure.
The Good News, Even for Amazon
The good news here is that the AWS failure proved the DFF model works well, if applied correctly. For Amazon, the news remains good because the provider learned a valuable lesson and has gained the opportunity to rebuild their system to be more fail-proof.
Amazon has stated that they are now developing more dynamic systems that will better allow for load balancing and redundancy between their Virginia and their California networks. This should solve most of the problems shown to be inherent when the Virginia facility failed

Comments

Popular posts from this blog

Entrepreneurial Mindset

Kurumsal Dijitalleşme mi yoksa Dijital Kurumsallaşma mı? (+Anket)

Eğer benim gibi siz de işinizin önemli bir bölümünü pazar araştırması yaparak geçiriyorsanız muhtemelen siz de en az benim kadar Türkiye'de pazar verisine ulaşmanın ne kadar zor olduğu hakkında defalarca şikayet etmiş ve sonunda yaratıcı yollar keşfetme yolunu tercih etmişsinizdir. Bunun sebebinin analitik düşünceye ihtiyacımızın olmaması mı, tembellik mi, kısa vadeli düşünmemiz mi yoksa insanüstü tahmin ve öngörü yeteneklerine sahip olmamız mı emin değilim. "Y  ou can’t manage what you can’t measure " - "Ö  lçemedeğiniz şeyi yönetemezsiniz " Her ne kadar bu söz, günümüze  yanlış  bir şekilde aktarılmış olsa da, kendi içerisinde kısmi bir doğruluk barındırmakta. Aslında bu söz ile anlatılmak istenen, ölçerek herşeyin yönetilemeyeceği fakat sonuçları iyileştirmek için süreçlerin ölçülmesi ve takip edilmesinin önemli olduğudur.  Sözün asıl sahibi W. Edward Deming, verinin ve gözlemin önemini aşağıdaki sözüyle çok güzel bir şekilde anlatmaktadır....

A Creative Way to Meet Investors - UberX

Have a cool startup idea, and want to get it funded? You could go the traditional route, blindly sending your pitch deck to every VC in Silicon Valley. Or you could follow investors on Twitter, hoping that through casual badinage you can win the hearts (and eventually, the wallets) of your startup's money source.  Or maybe, just maybe, you should drive for Uber. UberX Lowers The Bar Yes, Uber, the popular mobile app that connects drivers with people who need a lift. Founded in 2009 as UberCab, Uber has become the go-to app for hailing a sedan in markets like San Francisco, New York City and London. And while historically Uber operators have been commercial sedan drivers filling time between jobs their employer provides them, Uber's introduction of UberX in July 2012 has opened the service to cars and drivers of all kinds. This means that not only will you be picked up in a Toyota Prius or Volkswagen Jetta instead of a Lincoln Town Car, but you're also going...