Skip to main content

Amazon Web Services Outage: Cloud Computing Proof of Concept


In late April 2011, a huge outage amongst the cloud computing network from Amazon, called Amazon Web Services (AWS), happened after massive failures in the company's Virginia-based facilities literally shut down many businesses operating on the network. Opponents of the cloud computing paradigm were all over the outage as proof that cloud computing is bad for business.
Except it isn't and the outage proved exactly the opposite.
Quick Look at How Cloud Computing Works
The idea behind cloud computing is to diversify resources so that they are better managed and put to more use. Basically, it spreads resources and at the same time allows them to be used by more users, thus lowering costs.
There are two basic models for cloud computing: 'design for failure' and 'traditional.' Contrary to the title, the traditional model is actually becoming outmoded by the design for failure (DFF) model. The AWS was a mixture of both.
In traditional cloud architecture, geographic limitations mean that while the workload is spread amongst various systems, they must all be within a relatively small geographic area. It puts most of the weight of availability of resources on the infrastructure and redundancy within it. The down side to this model is that when an area-wide failure happens, the traditional cloud is likely to go with it.
In the DFF model, redundancy is removed from the infrastructure and spread the availability to a combination of software management and physical design. This allows for failures of single or multiple parts of the cloud to happen without destroying the cloud's availability and the applications and data on it. In the best DFF setups, data is spread to multiple locations and mirrored (saved in copies) in multiple locations geographically to avert catastrophic failure and data loss.
Again, the Amazon system was based on a hybrid of both models.
The Five Levels of Redundancy
Five types of redundancy can happen in cloud computing and having all of them is optimal. These five are: physical, virtual resource, availability zones, regions, and the cloud itself. Having redundant resources and facility for all five of these means the cloud will be stable even in a major shakeup. AWS didn't provide for all five. The Virginia outage showed that their regional redundancy was lacking and their physical backups of some data were inadequate.
The trouble with DFF systems is that they must be designed from the ground up to be DFF systems. The AWS was not designed this way because it began before DFF was really mainstream. So, like many cloud computing platforms today, it was retrofitted rather than designed from scratch to be designed for failure.
The Good News, Even for Amazon
The good news here is that the AWS failure proved the DFF model works well, if applied correctly. For Amazon, the news remains good because the provider learned a valuable lesson and has gained the opportunity to rebuild their system to be more fail-proof.
Amazon has stated that they are now developing more dynamic systems that will better allow for load balancing and redundancy between their Virginia and their California networks. This should solve most of the problems shown to be inherent when the Virginia facility failed

Comments

Popular posts from this blog

The Most Useful Websites and Web Apps

The sites mentioned here, well most of them, solve at least one problem really well and they all have simple web addresses (URLs) that you can easily learn by heart thus saving you a trip to Google. 01.   screenr.com   – record movies of your desktop and send them straight to YouTube. 02.   ctrlq.org/screenshots   – for capturing   screenshots of web pages   on mobile and desktops. 03.   goo.gl   – shorten long URLs and convert URLs into   QR codes . 04.   unfurlr.come   – find the original URL that’s hiding behind a short URL. 05.   qClock   – find the local time of a city using a   Google Map . 06.   copypastecharacter.com   – copy special characters that aren’t on your keyboard. 07.   postpost.com   – a better search engine for twitter. 08.   lovelycharts.com   – create flowcharts, network diagrams, sitemaps, etc. 09.   iconfinder.com   – the best place to find icons of...

Entrepreneurial Mindset

Kurumsal Dijitalleşme mi yoksa Dijital Kurumsallaşma mı? (+Anket)

Eğer benim gibi siz de işinizin önemli bir bölümünü pazar araştırması yaparak geçiriyorsanız muhtemelen siz de en az benim kadar Türkiye'de pazar verisine ulaşmanın ne kadar zor olduğu hakkında defalarca şikayet etmiş ve sonunda yaratıcı yollar keşfetme yolunu tercih etmişsinizdir. Bunun sebebinin analitik düşünceye ihtiyacımızın olmaması mı, tembellik mi, kısa vadeli düşünmemiz mi yoksa insanüstü tahmin ve öngörü yeteneklerine sahip olmamız mı emin değilim. "Y  ou can’t manage what you can’t measure " - "Ö  lçemedeğiniz şeyi yönetemezsiniz " Her ne kadar bu söz, günümüze  yanlış  bir şekilde aktarılmış olsa da, kendi içerisinde kısmi bir doğruluk barındırmakta. Aslında bu söz ile anlatılmak istenen, ölçerek herşeyin yönetilemeyeceği fakat sonuçları iyileştirmek için süreçlerin ölçülmesi ve takip edilmesinin önemli olduğudur.  Sözün asıl sahibi W. Edward Deming, verinin ve gözlemin önemini aşağıdaki sözüyle çok güzel bir şekilde anlatmaktadır....