Last year we changed our EC2 system from long-running instances to on-demand, spot request instances. This reduced our EC2 bill by 98%. It also ensured that every instance was built with the latest image and security patches and ran only as long as needed.
One of our biggest wins last year was auto-scaling things. It’s the promise of the cloud but it’s really been years to get us to a place that we could use it (and perhaps to a place where the cloud providers really provided it). One place we applied this was our batch loading in AWS EC2. (See also, our Kubernetes solution.)
We used to run long-running instances to for batch processing jobs. This is clearly not ideal, but the hosting costs alone didn’t justify a change. However, long-running instances are hard to manage for security. Keeping them updated requires time (that may not be allocated to any specific task) and is prone to human error.
To address this I decided we must script the entire start-up process for the instances we were using for batch loading. We couldn’t justify the hourly cost of an instance-per-job until I looked into spot requests. We had already built resiliency into our batch loading to handle all sorts of network issues. If a spot instance were pre-empted due to change in price per our bid, our batch system would self-heal and ensure the data was loaded.
fog gem we changed our process to spin up an instance for each job (we use
resque to handle jobs). Specifically, we addressed the following:
- Ensure that instances were launched in a VPC
- Ensure that the proper security group (i.e. firewall configuration) and keys were installed in AWS
- Initiate the request and wait for it to be approved
- If not approved in our timeout period, cancel the request
- If approved, assign an elastic IP for external access
- Install dependencies and code to run the batch import
- Sweep regularly for orphaned instances and elastic IP addresses due to random failures of the API or timeout issues
As you can see from the chart at top there were two factors that helped us reduce costs. First was that spot instance pricing is drastically lower than on-demand pricing — provide you can deal with the potential termination of your instance. Second was AWS’s change to per-second pricing, rather than per-hour pricing. Because many of our batch import jobs run for less than an hour this helped a great deal in reducing both the hours we booked and the eventual cost. But far-and-away the spot bid discount made the biggest difference.