What’s in the cloud?
Nowadays, there are many services provided by Amazon WebServices (from now on referred to as AWS). Names and acronyms such as S3, EC2, SQS, FPS, SimpleDB, CloudFront, EBS, EIA are swirling around. I intend not to describe all of these, rather I will concentrate on the second acronym; EC2 (Elastic Compute Cloud) and its two companions EBS (Elastic Block Storage) and EIA (Elastic IP Address). These components (EC2/EBS/EIA + S3) together forms a compelling platform for building new dynamic applications running in the cloud.
EC2 provides you with one or more virtual computers. You can choose from many pre-built images or roll your own installation - either from scratch or by modifying an existing image. Most of the images runs Linux (Fedora and Ubuntu), some OpenSolaris and recently there are Windows images as well. In AWS terminology you launch an AMI (Amazon Machine Image) giving you a machine instance.
The storage model in the AWS cloud is different from common sense, i.e., concrete computers. In the cloud, there are several storage types.
- Fixed
- Ephemeral
- Persistent
- Permanent
Fixed storage is the AMI. Every time you launch an instance based on a specific AMI-ID, you get back the same storage content, which is the operating system plus the installed applications.
During the life time of a machine instance it has access to a file system. However, as soon as the instance terminates all data is lost and the ephemeral storage reclaimed. This indeed, is very different from physical hardware, where a server crash (most of the time) means you can reboot and re-read the data at the harddrive.
Clearly, one need to put business data somewhere else. Until recently, the only choice was S3 which was one of the first services of AWS. S3 is a howngrown distributed storage system, which can store unmodifiable blobs. Although, S3 serves it purpose it does not lend itself for finegrained read/write accesses. A while ago, AWS introduced EBS, which is a virtual harddrive. You allocate a drive of size between 1-1000 GB and attache it to a running machine instance. Within the instance the drive pops up as a new disk. The same procedure applies as for a physical disk, you have to format it (e.g. Ext3 or NTFS) and mount it. If the machine instance terminates the data on the virtual disk remains and can quickly be attached to another machine instance, which this time sees an initialized non-empty disk.
There is a non-zero probability that the virtual disk might fail/terminate, therefor a backup strategy is still needed. It’s very easy to take a snapshot of an EBS and automatically store the image in S3. Later on, it’s possible to (re-)create another EBS based on a saved snapshot in S3. In other words, this is a convenient recovery.
When a machine instance boots, it receives a dynamic IP number and a DNS name (for example: ec2-75-101-207-168.compute-1.amazonaws.com). It’s an understatement to point out that dynamic IP and host name complicates accessibility in the cloud. The solutions available have been to rely on a non-cloud based front-end listening on a public static IP address with a well-defined DNS name or Dynamic DNS services, such as DynDNS.
Recently, AWS introduced allocatable IP addresses (EIA). You allocate an EIA and assigns it to a running machine instance. If the instance goes down, it’s very easy to lauch a new instance and (re-)allocate the IP to the new recovered server. With a static IP address, it’s possible to let a DNS service refer to the cloud service using an understandable hostname.
Into the Cloud
I have previously investigated virtualization. A few years ago I used VMware WorkStation to run Fedora and Ubuntu on top WindowsXP. And before last summer, I used WMWS for a customer project running a deskop Ubuntu and two server Ubuntus (one with Oracle and the other as the project server). This autumn I discovered KVM @ Ubuntu.
During this year I have been “kicking the tires” of the next logical step of virtualization; from personal/organizational virtualization into global cloud computing. The prime player of this new technology area is Amazon - the online book company. Other players, such as Google and Microsoft, are expected to follow.
The concept is simple: use an (open source) virtualization technique (Xen, @WikiPedia), apply it to a global collection of data centres, define a simple pricing model and tell it to the developer community. The result on the other hand is far from simple - indeed it is a revolution. Why? Because it changes our perception of how to design a system architecture.
When I started with computers, which - by the way wasn’t that long after the Jurassic period - both processing power and memory space were scarce resources. Over time these restrictions of the mind has gradually dissolved, leading to the war-cry ‘memory is cheap, so let’s waste it‘. Today, nobody would be embarrassed of a 4GB foot print application. However, we still design a system architecture in terms of a few heavy weight nodes, say 2 or 4 nodes in a cluster of WAS/WLS/JBOSS app servers plus at least one DB server.
This has two consequences; either we buy too much hardware wasting money or we buy too little hardware leading to unacceptable response times and crashes. The right amount of hardware is not possible to achieve, because the system load varies over time.
With virtualization in general and a global cloud computing supplier as Amazon in particular we have relaxed the last bit of the system design restraints, leading to the contemporary war-cry ‘servers are cheap, so let’s waste it‘.
With cloud computing we have instant access to an unbounded number of computing resources, whenever we need it. (I hope you don’t take me literary when I’m using the term ‘unbounded’. I simply mean many more than you uses today).
From the system design point of view the interesting topic is: which factors of my architecture will change when I run my application over a dynamic number of servers, all with ephemeral storage?
I intend to describe Amazon Web Services (AWS) and its cloud computing service EC2 (Elastic Compute Cloud) from a practical point of view, in a series of blog posts. This was the first, introductory post.
