Let's explore our cloud software system deployment options.
We've got the big 3: AWS, Google Cloud Platform and Azure. Maybe 5 if we include IBM Cloud and Oracle Cloud which are catching up rapidly.
Then we've got the small&medium ones in Heroku, Digital Ocean, Linode, Vultr and more.
One good mental model on how to categorize them is to separate them by if they provide hosted services.
Most of the big ones offer hosted app environments, hosted databases, email services, hosted queues, machine learning as a service, functions as a service, computer vision, data warehouseing, hosted cache, etc.
From the small&medium ones only Heroku offers hosted services, and they are for the most part limited to app servers and databases.
Digital Ocean seems to be trying to get more service oriented with Spaces & Kubernetes (coming soon). The rest focus mostly on giving us barebones machines that we need to install and manage whatever services we want on them, but they do provide very good value for the money.
AWS was the first platform to arrive in this space and is offering the most comprehensive set of cloud tools out of all the other cloud providers. In this article we're going to explore everything it has to offer.
Barebone machines - EC2/Lightsail
Like every other cloud provider mentioned above, this is the basic thing it offers. If your designed software architecture requires it and you've got yourself a strong devops and ops team, might be the way to go. Its advantage is the configurability. You can do whatever you want, install and configure services with whatever parameters. AWS is particularly good in this regard because it allows us to choose from a wide range of available machine configurations.
One of the complaints we've heard over the years are that you cannot choose the ratio of RAM/CPU. You can only pick from the predefined ones. Another one is that sometimes they are a bit more expensive than other cloud counterparts.
A lower-cost alternative from AWS is Lightsail, meant to compete with smaller vendors like Digital Ocean.
This is a service where they host your applications without you having to configure the machine yourself. You can pick your programming language environment from a set of predefined ones like: Java, .NET, Node.js, PHP, Go, Python, Ruby, and Docker.
The Docker option is particularly interesting because this way you can deploy any of the not-mainstream programming languages for backend development, like: Erlang/Elixir, Haskell, Crystal, Rust, OCaml.
The advantages of Elastic Beanstalk are obvious: you don't need to install packages on servers, keep the OS updated, apply security patches, worry about being hacked other than what's going on in your own code. On top of that, you can setup an auto-scaling system with Elastic Load Balancing so you can sleep at night when you know this part of your system auto-scales with load.
Keep in mind that Elastic Beanstalk still works on top of EC2, so sometimes you might still need to get your hands dirty and SSH into one of those machines to see what's happening.
Runtimes for functions as a service are getting better and better. You can now run your .NET, Java, Node.js, Go or Python application without actually having a server reserved just for you.
The runtime just quickly spawns your instance as it receives events and shuts it down after a while. This way, Lambda can actually be alot less expensive than renting an EC2 instance in some cases. It also scales by itself so basically this part of your system will never go down because of traffic spikes.
The only major downside of FaaS are startup times. They can be anything from 500ms for Go to 6s for Node.js and Python. Imagine what startup time for a Java based Spring application will be as it has to instantiate the dependency injection container.
The user experience will certainly be lesser, even if you keep functions warm by pinging them, because having a traffic spike, the system will have to spawn other instances on the fly.
Basically it's another case of reliability versus performance.
Containers - ECS/EKS
Containers are awesome. Why? Read about how the invention of shipping containers revolutionised worldwide transportation and global trade.
If you make the analogy and see how until 2015 software deployment has been all over the place, with pieces not fitting together, you quickly understand containers are the future. You get a nice abstraction of software, OS, and hardware configuration for free.
As opposed to FaaS which might also seem like the future, it doesn't actually present any downsides for user experience.
AWS was also one of the first movers to provide services in this space, even though Google was using containers internally long before.
What AWS provides for you here is a way to deploy and run containers with their own system, or Kubernetes. One key technology from AWS you should actually keep an eye on is Fargate. It allows you to run containers without having to manage servers or clusters.
Hosted databases have lots of advantages: automatic backups, redundancy, optional automatic scaling, logging, zero-configs and easy config options. It's a crucial part of most systems, so never go cheap or neglect this one. AWS provides:
RDS is AWS's hosted relational database service. Just a couple of clicks and you've got yourself a ready to rumble hosted database system.
You can choose from MySQL/MariaDB, PostgreSQL, Oracle, SQL Server or Aurora. Aurora is AWS's own flavor of MySQL/PostgreSQL that promises lower costs and greater scalability.
Even though we experienced lots of marketing for the newer database systems in the last couple of years, relational databases haven't fallen out of favor. They're 30-40 year old technology that's battle tested and powers 90% of our systems. You can't go wrong picking relational, though sometimes they're not the complete best option.
DynamoDB is AWS's hosted document database. It's a great option if you never want to do any maintenance/scaling. It has some downsides though, like everything.
First, the development experience is not as good as you would get with MongoDB or a RDBMS.
Second, it's inexpensive when your system is small/medium. Once you cross a certain threshold, it might be a lot more expensive than anything else.
Third, it's not good for everything. Some times you resort to use Elasticsearch for what might've been a simple query in other database systems. AWS has a great document explaining when to use/when not to use DynamoDB. It's certainly not very general purpose, but extraordinary for some use cases.
As a designer/architect, always use the best tools for the job, do a good analysis of requirements beforehand. When not sure, go back to choosing RDBMS.
This is a 2018 development. We should be praising AWS for making this step. If you've ever used Neo4j, you know how good graph databases can be. Development experience is great. Performance is very very good when you're dealing with deep relations. Consider that in RDBMS a join can be anything from O(1) to O(n*m). In graph databases it's always O(1), or it should be from a theoretical standpoint. Let's keep our eye on Neptune, because it's never been a better time to have graph databases at our disposal.
It's all about scale. Choose between Memcached and Redis as a backend, and you're good to save database hits and scale effortlessly.
It's again all about scale. This time scale of data. If you have so much data you need to warehouse, AWS has got your back. 100% hosted, Redshift can handle any amount of data you throw at it.
S3 is internet's darling. If S3 crashes, a big percentage of the internet will be down. It stores everything from websites, images, videos, database files, etc. It's a very reliable service, auto-scales, and gets lots of love from developers all over the world.
Glacier is a low-cost solution for long-term storage of files/data.
Amazon MQ (ActiveMQ) and SQS are two services that you can use for your integration patterns. You'll be able to scale some parts of your system easier, and queue stuff wherever you need to. As Apache Kafka is blowing up in popularity, Kafka as a service from AWS would be one nice thing to have, because ActiveMQ is antiquated and SQS is proprietary.
With Google launching Firebase a couple of years ago, AWS finally decided to offer their counterpart. Backed by Lambda and DynamoDB, they made the commitment to offer HTTP data interaction with GraphQL. Seems most platforms are pushing 'serverless' / FaaS + Managed Databases, but don't commit in standardizing stuff or open sourcing anything. As developers are skeptical people in general, we can see why this kind of service hasn't taken off. Vendor lock-in is a very serious problem.
API Gateway is another AWS service which aims to mediate between external network requests and your internal infrastructure. It's highly configurable, but other than AWS's CloudFormation files which are a mess to work with, there's no way to express your intent as code.
Route 53 is great for purchasing domains and routing them to your AWS resources like S3, Elastic Beanstalk, API Gateways, Load Balancers, etc.
Certificate Manager is one of the niftiest services from AWS. You can generate SSL certificates for your domains and and assign them easily to your resources. The certificates can be auto-renewable, so you can rest assured you won't need to mess with Letsencrypt stuff.
Cloudwatch is AWS's cloud service for logs/events/metrics. Works well, and it's well integrated with most other AWS services.
How to write code for a good deployment experience?
Here's a live diagram showing off our extensive library for AWS Services:
How can Archbee help with your AWS based system?
Archbee provides an easy to adopt solution for documentation & knowledge sharing that is specifically tailored to software development teams. We've studied many teams and their workflows, we've identified common usecases and coded them into our software's DNA. When your team picks Archbee, you are guaranteed to see upsides like team happiness, synergy and productivity within less than 3 months of using our product.