How to jump into the Cloud

December 12, 2008 · Posted in Cloud Computing · Comment 

Cloud computing is a general term for a public accessible virtualization system.

Virtualization + DataCenter + PublicAccess = CloudComputing

Although there are more than one provider and others to follow, Amazon (AWS) has pioneered the field and I will refer to AWS only when describing hands-on procedures.

Sign Up

The first thing to do is to create an AWS account and register a credit card - yes it’s not free, although not expensive either. The pricing starts at USD 0.10 per hour for the smallest machine type, plus some extra for storage and bandwidth. For a single lightly loaded server it adds up to USD 2.5-3 per day.

You should also sign up for EC2 and S3, which is just a single click on a button.

Get the Keys

The second thing to do, is saving the ACCESS and SECRECT keys of your account to a text file, because you will need these character sequences from time to time.

The following task is to generate a cryptographic key pair. Most convenient is letting AWS do it for you, but you can also upload you own public certificate. AWS only stores the certificate part. That means, after AWS has generated your key pair, there is a download button on the result page. Use that button to download your private key, because it is your only chance to get a working key pair. Ensure you download and store both the key and the certificate.

Get the Tools

Now when you have signed up you need the tools to start interacting with AWS and EC2. The fourth task to do is to download and install the AWS EC2 API Tools plus a very handy plugin to Firefox.

AWS/EC2 API Tools

This set of command line tools are implemented in Java, so in case you don’t have a decent JDK/JRE already installed, now is the time to fix that.

You need to set four environment variables in order for the tools to work. These are

  • EC2_HOME
  • EC2_CERT
  • EC2_PRIVATE_KEY
  • PATH

The last (PATH) should point to the bin directory of EC2_HOME. The cert and key variables should point to the PEM files you downloaded during the key pair task above. Now test the command line tools

ec2-describe-images --help
ec2-describe-images --verbose
ec2-describe-images --owner amazon

The first line shows the built-in help and the second how to see what SOAP data are interchanged. Every command responds to these options. The third line shows how to filter out the machine images provided by Amazon. In addition to Amazon, there are plenty of third-party images available. My personal preference is to use Ubuntu images created by Eric Hammond.

ElasticFox

Besides of using the AWS EC2 API command line tools, there are many others available implemented in Perl, Ruby, C# and more. However, my clear favorite is not a command line tool, rather a GUI. Elastic Fox is a plug-in to FireFox and provides you with a very convenient access to EC2. This plug-in is so valuable that I am using it all the time, instead of the command line tools

Prepare before launch

Before you can launch your first AMI (Amazon Machine Image), you need fixing three additional things.

  1. Generate an EC2 logon key pair
  2. A SSH client for Linux/Unix AMIs. For a Windows AMI, will use the built-in Remote Desktop client to logon via RDP.
  3. Create a security group (AWS-EC2 firewall rule) enabling access to SSH (port 22) or RDP (port 3389).

The EC2 logon key pair is different from the key pair you created in step three. The logon keys are used for SSH access and to encrypt/decrypt the Windows Administrator password. In contrast to the first key pair, you can create as many logon key pairs you like and use them for different purposes. Every key pair (or cert) is referred to by its name. Ensure you save the private key part to a PEM file on your own computer, because you will need that key for the SSH logon.

You need a remote access client, which for a Linux (Unix) machine will be SSH. If you are running Linux at your desktop, you probably already have ssh installed. If you are running Windows at your desktop, you need a SSH client like PuTTY or install an environment like CygWin. If you are going for PuTTY, install the full distribution or more exact PuTTY and PuTTYgen. You need the latter to convert your EC2 logon private key from PEM to PuTTY’s own format PPK (PuTTY Private Key).

By default, there are no ports opened into your running machine instances, which means you cannot logon unless you open an appropriate port. You do that by creating a security group (firewall rule). Use ElasticFox to create a new firewall rule (web), which permits access to port 22 (SSH) and 80 (HTTP) for all IP numbers (0.0.0.0/0). The funny zeroes denotes an IP number group (CIDR). You can study that topic at WikiPedia.

Launch a virtual server

Now is the time to lauch the first virtual server. Choose, for example, the Ubuntu 8.10 server 32bit, provided by Eric Hammond.

Right click and choose “Launch…”

Check that you are using your logon key, your firewall rule that at least open port 22 and the type is the smallest in only one instance. You will now see that the machine instance is booting (pending).

Click the refresh button, until you see the instance is running. When that happens, you will see that the instance now has a public IP and DNS name.

Logon to the instance

Use SSH to logon to the instance. If you are using PuTTY, you need to convert the PEM formatted private key into PuTTYs own key format PPK. Use PuTTYgen for that task. Load the PEM key and save as a private key. Ensure you give it the same base file name (differing only in the file extension), so ElasticFox can create file path to the key. Review the tools settings .

You can use ElasticFox to launch your SSH client directly. Just right click on the instance and choose “Connect …”. If PuTTY complains of not finding your key, double-check key template settings in the Tools setting of ElasticFox. If everything goes well you should be logged on to your own server in the cloud.

When your are done. Don’t forget to shutdown your instance again. Remember, AWS is affordable but not gratis.

What’s in the cloud?

December 9, 2008 · Posted in Cloud Computing · Comment 

Nowadays, there are many services provided by Amazon WebServices (from now on referred to as AWS). Names and acronyms such as S3, EC2, SQS, FPS, SimpleDB, CloudFront, EBS, EIA are swirling around. I intend not to describe all of these, rather I will concentrate on the second acronym; EC2 (Elastic Compute Cloud) and its two companions EBS (Elastic Block Storage) and EIA (Elastic IP Address). These components (EC2/EBS/EIA + S3) together forms a compelling platform for building new dynamic applications running in the cloud.

EC2 provides you with one or more virtual computers. You can choose from many pre-built images or roll your own installation - either from scratch or by modifying an existing image. Most of the images runs Linux (Fedora and Ubuntu), some OpenSolaris and recently there are Windows images as well. In AWS terminology you launch an AMI (Amazon Machine Image) giving you a machine instance.

The storage model in the AWS cloud is different from common sense, i.e., concrete computers. In the cloud, there are several storage types.

  • Fixed
  • Ephemeral
  • Persistent
  • Permanent

Fixed storage is the AMI. Every time you launch an instance based on a specific AMI-ID, you get back the same storage content, which is the operating system plus the installed applications.

During the life time of a machine instance it has access to a file system. However, as soon as the instance terminates all data is lost and the ephemeral storage reclaimed. This indeed, is very different from physical hardware, where a server crash (most of the time) means you can reboot and re-read the data at the harddrive.

Clearly, one need to put business data somewhere else. Until recently, the only choice was S3 which was one of the first services of AWS. S3 is a howngrown distributed storage system, which can store unmodifiable blobs. Although, S3 serves it purpose it does not lend itself for finegrained read/write accesses. A while ago, AWS introduced EBS, which is a virtual harddrive. You allocate a drive of size between 1-1000 GB and attache it to a running machine instance. Within the instance the drive pops up as a new disk. The same procedure applies as for a physical disk, you have to format it (e.g. Ext3 or NTFS) and mount it. If the machine instance terminates the data on the virtual disk remains and can quickly be attached to another machine instance, which this time sees an initialized non-empty disk.

There is a non-zero probability that the virtual disk might fail/terminate, therefor a backup strategy is still needed. It’s very easy to take a snapshot of an EBS and automatically store the image in S3. Later on, it’s possible to (re-)create another EBS based on a saved snapshot in S3. In other words, this is a convenient recovery.

When a machine instance boots, it receives a dynamic IP number and a DNS name (for example: ec2-75-101-207-168.compute-1.amazonaws.com). It’s an understatement to point out that dynamic IP and host name complicates accessibility in the cloud. The solutions available have been to rely on a non-cloud based front-end listening on a public static IP address with a well-defined DNS name or Dynamic DNS services, such as DynDNS.

Recently, AWS introduced allocatable IP addresses (EIA). You allocate an EIA and assigns it to a running machine instance. If the instance goes down, it’s very easy to lauch a new instance and (re-)allocate the IP to the new recovered server. With a static IP address, it’s possible to let a DNS service refer to the cloud service using an understandable hostname.

Into the Cloud

December 5, 2008 · Posted in Cloud Computing · Comment 

I have previously investigated virtualization. A few years ago I used VMware WorkStation to run Fedora and Ubuntu on top WindowsXP. And before last summer, I used WMWS for a customer project running a deskop Ubuntu and two server Ubuntus (one with Oracle and the other as the project server). This autumn I discovered KVM @ Ubuntu.

During this year I have been “kicking the tires” of the next logical step of virtualization; from personal/organizational virtualization into global cloud computing. The prime player of this new technology area is Amazon - the online book company. Other players, such as Google and Microsoft, are expected to follow.

The concept is simple: use an (open source) virtualization technique (Xen, @WikiPedia), apply it to a global collection of data centres, define a simple pricing model and tell it to the developer community. The result on the other hand is far from simple - indeed it is a revolution. Why? Because it changes our perception of how to design a system architecture.

When I started with computers, which - by the way wasn’t that long after the Jurassic period - both processing power and memory space were scarce resources. Over time these restrictions of the mind has gradually dissolved, leading to the war-cry ‘memory is cheap, so let’s waste it‘. Today, nobody would be embarrassed of a 4GB foot print application. However, we still design a system architecture in terms of a few heavy weight nodes, say 2 or 4 nodes in a cluster of WAS/WLS/JBOSS app servers plus at least one DB server.

This has two consequences; either we buy too much hardware wasting money or we buy too little hardware leading to unacceptable response times and crashes. The right amount of hardware is not possible to achieve, because the system load varies over time.

With virtualization in general and a global cloud computing supplier as Amazon in particular we have relaxed the last bit of the system design restraints, leading to the contemporary war-cry ‘servers are cheap, so let’s waste it‘.

With cloud computing we have instant access to an unbounded number of computing resources, whenever we need it. (I hope you don’t take me literary when I’m using the term ‘unbounded’. I simply mean many more than you uses today).

From the system design point of view the interesting topic is: which factors of my architecture will change when I run my application over a dynamic number of servers, all with ephemeral storage?

I intend to describe Amazon Web Services (AWS) and its cloud computing service EC2 (Elastic Compute Cloud) from a practical point of view, in a series of blog posts. This was the first, introductory post.