EC2 cost optimization: a practical guide for engineering teams

EC2 is usually the largest line item on an AWS bill. How to systematically reduce EC2 costs without impacting reliability.

EC2 is where the money goes

For most companies, EC2 represents 40-60% of total AWS spend. It is the first service teams adopt, the one with the most instance types, and the one where waste accumulates fastest. If you are going to optimize one thing on your AWS bill, start here.

EC2 cost optimization has four main levers, ordered from easiest to most complex. Each one can be applied independently, and the savings compound.

Lever 1: Terminate stopped and idle instances

This is the lowest-hanging fruit. Stopped instances still incur charges for attached EBS volumes and Elastic IPs. Idle instances (running but doing nothing useful) cost full price for zero value.

How to identify idle instances

An instance is likely idle if it meets all of these criteria for 14+ consecutive days:

✓Average CPU utilization under 5%
✓Network in/out under 5MB per day
✓No SSH/RDP sessions in CloudTrail
✓No associated load balancer receiving traffic

Check these metrics in CloudWatch. For a quick scan, go to EC2 > Instances, select an instance, and look at the Monitoring tab. If the CPU graph is a flat line near zero for two weeks, it is idle.

Common sources of idle EC2

✓Dev/test instances left running after a sprint ends
✓Bastion hosts for environments nobody accesses anymore
✓Jenkins workers from a CI system that was replaced
✓Instances from A/B tests or feature flags that were resolved
✓Demo instances created for a sales call months ago

What to do

For stopped instances: create an AMI (free to store, just snapshot costs), then terminate. For idle running instances: stop first, wait 7 days to confirm nobody complains, then terminate. This two-step process catches cases where an instance looks idle but actually handles periodic batch jobs.

Lever 2: Right-size running instances

Right-sizing means matching instance size to actual workload. A t3.large running at 8% CPU should be a t3.medium (or even t3.small). The savings are proportional — going from large to medium cuts the cost roughly in half.

How to identify right-sizing candidates

Look for instances where peak CPU (not average) stays below 40% over a 14-day period. Peak matters because you need headroom for traffic spikes. If the peak is 40%, the instance has 60% headroom — more than enough for most workloads.

Also check memory utilization if you have the CloudWatch agent installed. Some workloads are memory-bound rather than CPU-bound. An instance at 5% CPU but 90% memory is correctly sized — do not downsize it based on CPU alone.

Right-sizing in practice

✓t3.large (8GB, 2 vCPU) at $0.0832/hr → t3.medium (4GB, 2 vCPU) at $0.0416/hr = 50% savings
✓m5.xlarge (16GB, 4 vCPU) at $0.192/hr → m5.large (8GB, 2 vCPU) at $0.096/hr = 50% savings
✓r5.2xlarge (64GB, 8 vCPU) at $0.504/hr → r5.xlarge (32GB, 4 vCPU) at $0.252/hr = 50% savings

AWS Compute Optimizer provides free right-sizing recommendations based on your actual usage patterns. It is worth enabling even if you do not act on every recommendation — it gives you a prioritized list of instances to review.

Lever 3: Savings Plans and Reserved Instances

Once you have eliminated waste and right-sized your fleet, the remaining instances are ones you actually need. For these, Savings Plans offer 30-66% discounts in exchange for a 1 or 3-year commitment to a minimum spend level.

How Savings Plans work

You commit to a dollar amount per hour (e.g., $0.50/hr = $365/month). Any usage up to that amount gets the discounted rate. Usage above the commitment is billed at on-demand rates. If your usage drops below the commitment, you still pay the committed amount.

How much to commit

The safe rule: commit to 70% of your minimum monthly EC2 spend over the past 6 months. This ensures you never pay for commitment you do not use, while still capturing significant savings on your baseline workload.

✓1-year No Upfront: ~30% discount, most flexible
✓1-year All Upfront: ~36% discount, requires cash upfront
✓3-year No Upfront: ~50% discount, long commitment
✓3-year All Upfront: ~60-66% discount, maximum savings but least flexible

Start with 1-year No Upfront for your first commitment. You can always add more later as you gain confidence in your baseline usage.

Lever 4: Spot instances for fault-tolerant workloads

Spot instances offer up to 90% discount compared to on-demand pricing. The tradeoff: AWS can reclaim them with 2 minutes notice when capacity is needed. This makes them unsuitable for stateful services but excellent for workloads that can handle interruption.

Good candidates for Spot

✓CI/CD build agents (Jenkins, GitHub Actions self-hosted runners)
✓Batch processing jobs (data pipelines, ETL, video encoding)
✓Dev/test environments (interruption just means a restart)
✓Stateless web servers behind a load balancer (with enough on-demand baseline)
✓Machine learning training jobs with checkpointing

Bad candidates for Spot

✓Databases (data loss risk on interruption)
✓Single-instance applications with no redundancy
✓Long-running stateful processes that cannot checkpoint
✓Production API servers without sufficient on-demand baseline

A common pattern is to run your baseline on on-demand or Savings Plans and use Spot for burst capacity. For example, an auto-scaling group with 3 on-demand instances as baseline and up to 7 Spot instances for peak traffic.

Lever 5: Schedule non-production environments

Dev and staging environments typically run 24/7 but are only used during business hours — roughly 10 hours per day, 5 days per week. That is 50 hours of use out of 168 hours per week, meaning 70% of the cost is wasted.

AWS Instance Scheduler or a simple Lambda function can stop non-production instances outside business hours and start them again in the morning. For a staging environment costing $500/month, scheduling saves $350/month with zero impact on developer productivity.

Putting it all together

Applied in order, these levers compound:

1Terminate idle instances: saves 10-20% immediately
2Right-size remaining instances: saves another 20-30%
3Apply Savings Plans to baseline: saves 30-50% on what remains
4Use Spot for burst: saves 60-90% on variable workloads
5Schedule non-production: saves 70% on dev/staging

A team spending $10,000/month on EC2 can realistically reduce to $4,000-$5,000/month by applying all five levers. The first two (terminate and right-size) require no commitment and can be done this week.

Automate detection with Driftak

The hardest part of EC2 optimization is not knowing what to do — it is knowing which instances to act on. Manually checking CloudWatch metrics for every instance does not scale past 20-30 instances.

Driftak automates the detection step. It monitors CPU utilization, network activity, and connection patterns across all your EC2 instances and flags idle or underutilized ones with estimated monthly waste. Alerts escalate through Slack, email, and Telegram so problems get attention before the next bill.

Connect your AWS account in 5 minutes with read-only access. No agents, no code changes, no ongoing maintenance. Your first scan results appear immediately.