The AWS Alphabet Soup
An opinion on the diversity of cloud services
I've just returned from AWS Summit held at Taj Vivanta, Bangalore. It was a busy day of multiple back-to-back sessions interspersed with networking over tea, coffee and lunch. The venue was packed. The sessions were heavy, at least for someone like me who has never used AWS in any big way. I was familiar with some of the terms before coming to this event but I was surprised how much more there is to the AWS platform. They say that as a developer you can focus on developing your application while the cloud takes care of everything else: deployment, configuration, scaling, security, access control, monitoring, etc. While this is certainly true in the long term, as developers we need to put in upfront investment in terms of time and effort to understand the plethora of services that a particular cloud platform provides.
They say there are 90+ services in AWS. It's bad enough that developers need to aware of all these different services at their disposal. It's worse when you consider that making the choice of the right set of services for your application isn't trivial. This is particularly hard for folks used to only on-premise software built in monolithic fashion. We have to be really clear what we mean by the word "monolithic", which is usually not properly explained in such summits.
Monolithic applications are generally deployed as one executable. They might have been developed as one monolithic (as cloud providers would like us to believe) but not necessarily so. It's common practice to breakdown an application into modules, classes, namespaces and layers. Interfaces are clearly defined in terms of structures, signals, events, message queues and so on. While development might be done in such modular and flexible way, from the perspective of a cloud paradigm it's still monolithic because these components are brought together into a single executable or package.
This is exactly where traditional software has to be rethought and refactored into code that fits best in the cloud. Each component stands on its own and can be deployed independently. Components interact via APIs. The overall design is driven by APIs in preference to other methods of interaction. This change in design methodology is clearly not monolithic. So architects and developers have to start thinking in these terms. Long ago we moved from physical servers to virtual servers. Then we moved from virtual servers to microservices running on these servers. Today we are breaking up application logic into even smaller units, what is sometimes called Function as a Service or Serverless Architecture.
The challenge for someone new to AWS (or any other cloud platform) is to get to grips with all the different services and how best to use them. The paradox here is that something that's supposed to make things simple is in itself actually complex. Thanks to this complexity, AWS has many consulting partners and system integrators to help companies make that transition to the cloud. Some of them had stalls at the AWS Summit. Actually, the complexity lies in making choices and configuring AWS. Once done properly, actual run-time scaling, maintenance and operations need little intervention.
The AWS dashboard is intimidating to a newbie. With so many icons and names staring at you, it's hard to grapple what to make of it. For compute, we have EC2 as the basic service. If your application is deployed as a bunch of microservices on containers, you might do with fewer VMs but managing the containers becomes a problem. For that, we have EC2 Container Service (ECS). Problem with ECS is that you are still billed by the number of VMs even though at times you may not use all of them. To solve this, we have AWS Lambda. By thinking about your application as functions triggered by events, you pay only for what you use.
But an application could have hundreds of functions and managing them can be a chore. AWS offers Step Functions and X-Ray to simplify this management. They say DynamoDB is suited for serverless apps. Some apps such as real-time trading and ad-words bidding may need microsecond response times. DynamoDB Accelerator (DAX) brings 10x improvement in speed thanks to in-memory cache. But on AWS, Dynamo DB isn't the only database possible.
While DynamoDB is NoSQL in nature, relational database is the AWS RDS with well-known engines that include MySQL, MariaDB and PostgreSQL. Amazon has it own engine called Aurora that is based on a MySQL frontend and is claimed to be five times faster with commercial grade reliability. The idea is to encourage folks to migrate from commercial databases (with expensive licensing terms) to open databases on AWS. How easy is it to do this migration?
Blair Layton gave a useful presentation on the topic of DB migration. Clearly, on a busy live application we don't want any downtime. We want to be sure that the schema is properly converted from the source to destination databases. AWS Schema Conversion Tool (SCT) can help. To migrate the data, AWS Database Migration Service (DMS) is the way to do it. But we have to careful that the upload link is fat enough to handle large databases. Blair mentioned a test case under ideal conditions: 5 TB of data was migrated from EC2 to RDS in 33 hours at a total cost of only about SGD 50. Of course, if you're migrating from on-premise to cloud using the same source and destination database types, a native file replica is enough. Or if downtime is okay for your application, just take a dump, upload and import manually. Here's an interesting fact about databases: Aurora can handle 64 TB tables while RDS can handle only 6 TB; PostgreSQL is perhaps the best choice when migrating from Oracle.
When it comes to analyzing data, Amazon Athena can run queries on S3 storage and you pay for query time. A little more sophisticated service is the Amazon Elastic MapReduce (EMR) for running analysis on Hadoop, Spark, HBase, Flink, and more. What if you have data in multiple sources, your queries are complex and your data is at the scale of petabytes? These are typical of data warehouses, for which RedShift is the right service to use. RedShift Spectrum provides an interface to apply the power of RedShift to data stored in S3, where data could be in the order of exabytes (1 exabyte = 1000 petabytes). For real-time analysis on streaming data, Kinesis is the one to use.
Looking at the needs of developers, what does AWS offer? Daniel from GitHub gave a presentation on the integration of GitHub and AWS CodeDeploy. In the simplest case (without GitHub), code stored on S3 can be deployed on EC2. For the purpose of version control and collaboration, it makes sense to migrate the code to GitHub. CodeDeploy can then be configured to pick up a specific commit version of the repository and deploy. Hot deploy is also possible where GitHub can trigger a deploy based on a new commit. An alternative to using GitHub is to use AWS CodeCommit. For continuous delivery, there's AWS CodePipeline. You can even create your application on the cloud itself using AWS CodeStar.
In the world of ML and AI, developers can use Polly (text-to-speech) and Lex (speech-to-text). For image analysis, there's Rekognition. So developers could potentially use these to build their own flavour of Echo and Alexa. Or they could simply invoke ML APIs that Amazon has to offer.
For security, AWS CloudTrail offers logging capability while AWS CloudWatch gives alarms and notifications. AWS Shield guards against DDoS attacks. AWS Web Application Firewall (WAF) offers security at the application level. One important point is that security responsibility is shared between AWS and customer. AWS takes care of security of the cloud while the customer must take care of security in the cloud.
Clearly, there's much more to AWS that what we have discussed thus far. I feel that I've just scratched the surface. New services are likely to come out. Existing services might get upgraded. New use cases will come up. When you start using the services, you will have questions. It will certainly not be an easy task but if you're trying to build a world-class application that needs to scale, be reliable, be cost-effective, offer high performance and reliability, perhaps the initial learning curve is well worth climbing.
- Cloud Computing: An Introduction
- Cloud Computing: Architecture and Deployment Models
- An Introduction to Serverless Architecture
About the Author
Arvind Padmanabhan graduated from the National University of Singapore with a master’s degree in electrical engineering. With fifteen years of experience, he has worked extensively on various wireless technologies including DECT, WCDMA, HSPA, WiMAX and LTE. He is passionate about training and is keen to build an R&D ecosystem and culture in India. He recently published a book on the history of digital technology: http://theinfinitebit.wordpress.com.