We’ve had a large number of people reach out over the past fortnight in regards to helping with their AWS Costs.
The great thing about AWS charges is the majority are variable costs and can be reduced without breaking contracts! We generally find in Enterprise there is an opportunity to reduce cloud costs by up to 40-50%!
We break the Cost Optimization process into 3 main areas:
The first step to getting your costs under control is visibility. We need to understand what makes up your AWS spend, so we can find trends that need to be investigated for optimization opportunities.
The second step is identifying cost optimization opportunities. These opportunities are broken into 2 key areas,
- technical usage based solutions and
- commercial options to reduce your cloud spend.
The third step is automation! The more we automate these tasks including the notification of end users, the more people are aware of cloud costs and awareness helps to ensure compliance to these best practises.
Here’s the process we generally advise our customers and partners to follow when optimizing an AWS bill:
1. Visibility: Use the AWS tools (AWS Cost Explorer or Kumolus Cost Reporting) to understand where your AWS spend is
- Break down your spend by service, does this match your expectations? i.e is Redshift higher than EC2?
- Match the patterns of your monthly costs, i.e by service / account / tag etc, what growth are you seeing? can it be explained? (i.e is a service per month increasing without explanation? i.e are you leaving services (e.g. snapshots, volumes) around over time)
2. Optimize: Identify Cost Optimization Opportunities and prioritize!
- Do you have stopped services?
- Are your services idle?
- Are your services able to be Right Sized?
- Are you turning off non-production services outside of usage hours?
- Do you have more snapshots than you require?
- Do you use Reserved Instances and/or Savings Plans?
- Review your Architecture. Is there a better way to do things?
3. Automate: To ensure constant compliance to these policies
Commonly in our Enterprise customers, a large majority of the spend can be consumed by EC2, EBS, RDS, S3 and VPC services, so we’ll start with what to look for in these services.
EC2 – Elastic Cloud Compute
Identify and Terminate Idle Instances – Ensure you EC2 instances are not idle. Use AWS Cloudwatch statistics to look into the usage of your Instances. Look at the CPU / Memory (Cloudwatch Agent is required) / Network usage and Disk IO.
Legacy Instances Classes – Look through all your existing EC2 instances and ensure they are running on the latest instance classes. Older instance classes perform slower (~10%) and cost more (3-7%) than the latest generation. The migration path can sometimes take a little effort but over a large fleet, significant benefits can be attained – read some how to documents from AWS (Windows) and (Linux).
RightSizing – Similar to Idle Instances, RightSizing is the process of looking at the AWS Cloudwatch usage characteristics (CPU / Memory (Cloudwatch Agent is required) / Network usage and Disk IO) for each EC2 instance and then reducing the size of the instance based on it’s usage. i.e moving from a m5.4xlarge to a m5.large (i.e 25% of the cost). RightSizing becomes more complex when you start to question whether you should be using some of the more specialized instances classes like memory optimized instances.
Turn off Non-Production workloads – Certain Non-Production workloads may only need to be switched on ~40 hours per week. That’s less than a quarter of the week. In other words, on these instances you can save up to 75% of the running cost. This can get a little tricky if you are working in a distributed work force, but can be managed.
Remove Stopped Services – a common challenge we find is people have stopped instances and don’t realize how much that instance still costs for the storage on a monthly basis.
Commitments – Are you using Reserved Instances (RIs) or Savings Plans? RI’s and Savings Plans let you commit to capacity in return obtaining a discount upto 50%.
Food for thought?
Do you need this many EC2 Instances? Can you look at your architecture? Use Auto Scaling? Containers? Serverless etc?
EBS – Elastic Block Store
Correct Volume Types – are you using the correct volume type? Different types of volumes have different costs and usage characteristics.
General Purpose SSD (gp2) Volumes and Cold HDD (sc1) Volumes are the most cost effective volumes.
Provisioned IOPS SSD (io1) Volumes and Throughput Optimized HDD (st1) Volumes have specific use cases but are more expensive.
Making sure you are reviewing the usage of these volumes with AWS Cloudwatch usage data is very important and will save significant money.
Are you using that Provisioned IOPS? One of the major cost optimization issues we see in many customers is the use of provisioned IOPS. Some times it’s set and forgotten, other times it was needed for some testing or incident and then no longer required. It can be a major savings opportunity. To identify unused provisioned IOPS, you will need to use the AWS Cloudwatch usage data (read/write IOPS) and identify the provision IOPS, vs to used IOPS.
Detached Volumes – are you leaving volumes around? Do you have lots of volumes with the “available” status? these volumes are not attached to a server. You can also use AWS Cloudwatch usage data to understand when they were last performing read/write, i.e when were they last used.
Snapshots? How many snapshots do you keep for each service? Are you automatically managing retention periods with appropriate backup software or AWS Backup? Now’s the time to review your backups and make sure they are inline with your defined retention periods. Do periodic checks for any ad-hoc backups that are taken as part of ad-hoc tasks.
Do you have Unused AMIs? – Review all private AMIs owned by your accounts on a periodic basis to ensure you are not leaving old AMIs you no longer need available. We often find AMI’s that are over 2000 days old!!!
RDS – Relational Database Service
Identify Idle Instances with no Connections and Terminate – Ensure you RDS instances are not idle. Use AWS Cloudwatch statistics to look into the usage of your RDS Instances. Look at the CPU / Connection Count / Network usage and Disk IO.
RightSizing RDS – Similar to Idle Instances, RightSizing is the process of looking at the AWS Cloudwatch usage characteristics (CPU / Network usage and Disk IO) for each RDS instance and then reducing the size of the instance based on it’s usage. i.e moving from a m5.4xlarge to a m5.large (i.e 25% of the cost)
RDS Auto Scaling Storage – Make sure you are enabling RDS Auto Scaling Storage on your databases rather than providing (and paying for) additional capacity for growth!
RDS Instance Classes – Look through all your existing RDS instances and ensure they are running on the latest instance classes. Older instance classes perform slower (~10%) and cost more (3-7%) than the latest generation.
Food for Thought?
OpenSource Over Commercial Databases – Easier said than done but AWS has paved the way for this type of migration over the last few years. The database tooling for migration (AWS Database Migration Service) is an awesome piece of technology which really simplifies the migration from one technology to another. The Licensing fee’s on commercial databases can be a major saving vs open source technologies!
RDS over servers, potential for query based server databases – Remove the management headache, let’s not have databases on servers anymore (unless really needed) and let’s explore our serverless Aurora Postgres and MySQL capabilities!
S3 – Amazon Storage Services
Remove unused buckets / data – you’d be surprised how many buckets are named _copy, _test, somepersonsnamewholefttheorganisation3yearsback, so firstly the most simple recommendation is ensure we’re cleaning up what we are using. Another one to look out for is multiple copies of Cost and Usage Reports, multiple copies of CloudTrail, logs etc.
Enable Lifecycle Policies – Lifecycle policies allow you to automatically tier, archive or delete data. They are really important to ensuring you’re not retaining anything that you don’t need to retain!
Use Intelligent Tiering – the key to using intelligent tiering is knowing your dataset, who accesses it and when. If you are not planning on accessing the data in the next 30 days, you can turn on intelligent tiering. Standard Storage is $0.023 per GB/month to Infrequent Access Storage $0.0125 per GB/month, which is a saving of about 45%. There is an additional fee per object to manage the tiering ($0.0025 per 1,000 objects)
VPC – Virtual Private Cloud
Review Data Transfer – Data transfer costs need reviewing on a regular basis, Data Transfer is not free in a VPC if it is crossing the AZ.
Review your Architecture – Do you need specific services to be across AZ?
VPC Endpoints and NAT Gateways – VPC Endpoints have great use cases, but can be expensive if used incorrectly.
The key to all of these recommendations is that we complete these on a regular basis. The more we automate, the more we save on an ongoing basis. This is where Kumolus comes in, we’ve put considerable effort into automating the resolution of these types of issues. We’ve built workflows which enable you to notify the appropriate service owners and allow them to opt-out of the resolutions! Enabling full self-service / automation. Click here for more information.