Featured Article Cloud DevOps

Cloud Cost Optimization: Practical Strategies to Reduce AWS, Azure, and GCP Spending

Comprehensive guide to reducing cloud costs across AWS, Azure, and GCP with practical strategies, tools, and automation techniques.

HA
Hari Prasad
October 11, 2024
5 min read ...
Financial Planning Tool

PPF Calculator

Calculate your Public Provident Fund returns with detailed projections and tax benefits. Plan your financial future with precision.

Try Calculator
Free Forever Secure
10K+
Users
4.9★
Rating
Career Tool

Resume Builder

Create professional DevOps resumes with modern templates. Showcase your skills, experience, and certifications effectively.

Build Resume
No Login Export PDF
15+
Templates
5K+
Created
Kubernetes Tool

EKS Pod Cost Calculator

Calculate Kubernetes pod costs on AWS EKS. Optimize resource allocation and reduce your cloud infrastructure expenses.

Calculate Costs
Accurate Real-time
AWS
EKS Support
$$$
Save Money
AWS Cloud Tool

AWS VPC Designer Pro

Design and visualize AWS VPC architectures with ease. Create production-ready network diagrams with subnets, route tables, and security groups in minutes.

Design VPC
Visual Editor Export IaC
Multi-AZ
HA Design
Pro
Features
Subnets Security Routing
Explore More

Discover My DevOps Journey

Explore my portfolio, read insightful blogs, learn from comprehensive courses, and leverage powerful DevOps tools—all in one place.

50+
Projects
100+
Blog Posts
10+
Courses
20+
Tools

Cloud cost optimization is crucial for maintaining healthy profit margins while scaling your infrastructure. This guide provides actionable strategies to reduce cloud spending across AWS, Azure, and GCP without compromising performance or reliability.

The Cost Optimization Framework

1. Visibility (Know Your Costs)

2. Right-Sizing (Match Resources to Needs)

3. Reserved Capacity (Commit for Savings)

4. Automation (Optimize Continuously)

5. Governance (Control and Prevent Waste)

AWS Cost Optimization

Enable Cost Explorer and Budgets

# Create budget using AWS CLI
aws budgets create-budget \
  --account-id 123456789012 \
  --budget file://budget.json \
  --notifications-with-subscribers file://notifications.json
// budget.json
{
  "BudgetName": "Monthly-Budget",
  "BudgetLimit": {
    "Amount": "10000",
    "Unit": "USD"
  },
  "TimeUnit": "MONTHLY",
  "BudgetType": "COST"
}

EC2 Cost Optimization

1. Use Reserved Instances

# Analyze RI recommendations
aws ce get-reservation-purchase-recommendation \
  --service "Amazon Elastic Compute Cloud - Compute" \
  --lookback-period-in-days SIXTY_DAYS \
  --term-in-years ONE \
  --payment-option ALL_UPFRONT

# Purchase Reserved Instance
aws ec2 purchase-reserved-instances-offering \
  --reserved-instances-offering-id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
  --instance-count 5

Savings: Up to 72% compared to On-Demand

2. Use Savings Plans

# Get Savings Plans recommendations
aws ce get-savings-plans-purchase-recommendation \
  --savings-plans-type COMPUTE_SP \
  --term-in-years ONE_YEAR \
  --payment-option ALL_UPFRONT \
  --lookback-period-in-days SIXTY_DAYS

Savings: Up to 66% for flexible compute usage

3. Use Spot Instances

# spot-instance-template.yaml
apiVersion: v1
kind: Pod
metadata:
  name: spot-pod
spec:
  nodeSelector:
    kubernetes.io/lifecycle: spot
  tolerations:
  - key: "spot"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
# Launch Spot Fleet
aws ec2 request-spot-fleet \
  --spot-fleet-request-config file://spot-fleet-config.json

Savings: Up to 90% compared to On-Demand

4. Right-Size Instances

# analyze_instance_utilization.py
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')
ec2 = boto3.client('ec2')

def analyze_instance_utilization(instance_id, days=14):
    """Analyze EC2 instance CPU and memory utilization"""
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=days)
    
    # Get CPU utilization
    cpu_metrics = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=start_time,
        EndTime=end_time,
        Period=3600,
        Statistics=['Average', 'Maximum']
    )
    
    avg_cpu = sum(d['Average'] for d in cpu_metrics['Datapoints']) / len(cpu_metrics['Datapoints'])
    max_cpu = max(d['Maximum'] for d in cpu_metrics['Datapoints'])
    
    # Recommend action
    if avg_cpu < 10 and max_cpu < 40:
        return "Consider downsizing or terminating"
    elif avg_cpu < 25:
        return "Consider downsizing to smaller instance type"
    elif avg_cpu > 80:
        return "Consider upsizing"
    else:
        return "Instance is appropriately sized"

# Get all running instances
instances = ec2.describe_instances(
    Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)

for reservation in instances['Reservations']:
    for instance in reservation['Instances']:
        recommendation = analyze_instance_utilization(instance['InstanceId'])
        print(f"{instance['InstanceId']}: {recommendation}")

S3 Cost Optimization

1. Lifecycle Policies

{
  "Rules": [
    {
      "Id": "Archive old logs",
      "Status": "Enabled",
      "Prefix": "logs/",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    },
    {
      "Id": "Delete incomplete multipart uploads",
      "Status": "Enabled",
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 7
      }
    }
  ]
}
# Apply lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration file://lifecycle-policy.json

2. Intelligent Tiering

# Enable Intelligent-Tiering
aws s3api put-bucket-intelligent-tiering-configuration \
  --bucket my-bucket \
  --id MyIntelligentTieringConfiguration \
  --intelligent-tiering-configuration file://intelligent-tiering.json

RDS Cost Optimization

# Stop RDS instances during non-business hours
aws rds stop-db-instance --db-instance-identifier mydb

# Use Aurora Serverless for variable workloads
aws rds create-db-cluster \
  --db-cluster-identifier mydb-serverless \
  --engine aurora-postgresql \
  --engine-mode serverless \
  --scaling-configuration MinCapacity=2,MaxCapacity=16,AutoPause=true,SecondsUntilAutoPause=300

# Take snapshot and restore to smaller instance
aws rds create-db-snapshot \
  --db-instance-identifier mydb \
  --db-snapshot-identifier mydb-snapshot

aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier mydb-smaller \
  --db-snapshot-identifier mydb-snapshot \
  --db-instance-class db.t3.medium

Lambda Cost Optimization

# Optimize Lambda memory for cost/performance
import boto3

lambda_client = boto3.client('lambda')

def optimize_lambda_memory(function_name):
    """Test different memory configurations"""
    memory_configs = [128, 256, 512, 1024, 2048]
    results = {}
    
    for memory in memory_configs:
        # Update function configuration
        lambda_client.update_function_configuration(
            FunctionName=function_name,
            MemorySize=memory
        )
        
        # Test invocations and measure duration
        # Calculate cost based on memory * duration
        # Store results
        
    # Return optimal configuration
    return min(results, key=lambda x: results[x]['cost'])

Azure Cost Optimization

Enable Cost Management

# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

# Login
az login

# Create budget
az consumption budget create \
  --budget-name monthly-budget \
  --amount 10000 \
  --category cost \
  --time-grain monthly \
  --time-period start-date=2024-01-01 end-date=2024-12-31

Virtual Machine Optimization

1. Reserved VM Instances

# View RI recommendations
az consumption reservation recommendation list \
  --scope "/subscriptions/{subscription-id}"

# Purchase reservation
az reservations reservation-order purchase \
  --reservation-order-id /providers/Microsoft.Capacity/reservationOrders/{order-id} \
  --sku Standard_D2s_v3 \
  --location eastus \
  --quantity 5 \
  --term P1Y

Savings: Up to 72%

2. Azure Spot VMs

# Create Spot VM
az vm create \
  --resource-group myResourceGroup \
  --name mySpotVM \
  --image UbuntuLTS \
  --priority Spot \
  --max-price -1 \
  --eviction-policy Deallocate

Savings: Up to 90%

3. Auto-Shutdown

# Configure auto-shutdown
az vm auto-shutdown \
  --resource-group myResourceGroup \
  --name myVM \
  --time 1800 \
  --timezone "Eastern Standard Time"

Azure Kubernetes Service (AKS) Optimization

# Enable cluster autoscaler
az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 10

# Use Spot node pools
az aks nodepool add \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name spotnodepool \
  --priority Spot \
  --eviction-policy Delete \
  --spot-max-price -1 \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 5 \
  --node-taints kubernetes.azure.com/scalesetpriority=spot:NoSchedule

Storage Optimization

# Set access tier for blob storage
az storage blob set-tier \
  --account-name mystorageaccount \
  --container-name mycontainer \
  --name myblob \
  --tier Cool

# Enable lifecycle management
az storage account management-policy create \
  --account-name mystorageaccount \
  --policy @policy.json

GCP Cost Optimization

Compute Engine Optimization

1. Committed Use Discounts

# Get recommendations
gcloud compute commitments describe-resources \
  --region=us-central1

# Create commitment
gcloud compute commitments create my-commitment \
  --region=us-central1 \
  --resources=vcpu=100,memory=400 \
  --plan=12-month

Savings: Up to 57%

2. Preemptible VMs

# Create preemptible instance
gcloud compute instances create preemptible-instance \
  --zone=us-central1-a \
  --machine-type=n1-standard-1 \
  --preemptible

Savings: Up to 80%

3. Right-Sizing Recommendations

# Get recommendations
gcloud recommender recommendations list \
  --project=my-project \
  --location=us-central1 \
  --recommender=google.compute.instance.MachineTypeRecommender

# Apply recommendation
gcloud recommender recommendations mark-claimed \
  RECOMMENDATION_ID \
  --location=us-central1 \
  --recommender=google.compute.instance.MachineTypeRecommender

GKE Cost Optimization

# Enable node auto-provisioning
gcloud container clusters update my-cluster \
  --enable-autoprovisioning \
  --min-cpu=1 \
  --max-cpu=100 \
  --min-memory=1 \
  --max-memory=1000

# Use Spot pods
gcloud container node-pools create spot-pool \
  --cluster=my-cluster \
  --spot \
  --enable-autoscaling \
  --min-nodes=0 \
  --max-nodes=10

Cloud Storage Optimization

# Set lifecycle policy
gsutil lifecycle set lifecycle.json gs://my-bucket
// lifecycle.json
{
  "lifecycle": {
    "rule": [
      {
        "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
        "condition": {"age": 30}
      },
      {
        "action": {"type": "SetStorageClass", "storageClass": "COLDLINE"},
        "condition": {"age": 90}
      },
      {
        "action": {"type": "Delete"},
        "condition": {"age": 365}
      }
    ]
  }
}

Kubernetes Cost Optimization

Resource Requests and Limits

apiVersion: apps/v1
kind: Deployment
metadata:
  name: optimized-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: myapp:1.0
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"

Vertical Pod Autoscaler

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 50Mi
      maxAllowed:
        cpu: 1
        memory: 500Mi

Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Cluster Autoscaler

# Configure cluster autoscaler
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-status
  namespace: kube-system
data:
  status: |
    {
      "scaleDown": {
        "enabled": true,
        "delayAfterAdd": "10m",
        "delayAfterDelete": "10s",
        "delayAfterFailure": "3m",
        "unneededTime": "10m"
      }
    }

Cost Monitoring Tools

Kubecost

# Install Kubecost
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token"

# Access dashboard
kubectl port-forward -n kubecost deployment/kubecost-cost-analyzer 9090

Infracost for Terraform

# Install Infracost
brew install infracost

# Authenticate
infracost auth login

# Show cost estimate
infracost breakdown --path .

# Compare changes
infracost diff --path . --compare-to infracost-base.json
# .github/workflows/infracost.yml
name: Infracost
on: [pull_request]

jobs:
  infracost:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: infracost/actions/setup@v2
        with:
          api-key: $
      - run: infracost breakdown --path=.
      - uses: infracost/actions/comment@v1
        with:
          path: infracost.json
          behavior: update

Automation Scripts

AWS Cost Optimization Script

# aws_cost_optimizer.py
import boto3
from datetime import datetime, timedelta

ec2 = boto3.client('ec2')
cloudwatch = boto3.client('cloudwatch')

def find_idle_resources():
    """Find and report idle EC2 instances"""
    idle_instances = []
    
    instances = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )
    
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            
            # Check CPU utilization
            metrics = cloudwatch.get_metric_statistics(
                Namespace='AWS/EC2',
                MetricName='CPUUtilization',
                Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
                StartTime=datetime.utcnow() - timedelta(days=7),
                EndTime=datetime.utcnow(),
                Period=86400,
                Statistics=['Average']
            )
            
            avg_cpu = sum(d['Average'] for d in metrics['Datapoints']) / len(metrics['Datapoints'])
            
            if avg_cpu < 5:
                idle_instances.append({
                    'InstanceId': instance_id,
                    'InstanceType': instance['InstanceType'],
                    'AvgCPU': avg_cpu,
                    'MonthlyCost': estimate_cost(instance['InstanceType'])
                })
    
    return idle_instances

def estimate_cost(instance_type):
    """Estimate monthly cost for instance type"""
    # Simplified pricing (use AWS Price List API for accurate pricing)
    pricing = {
        't3.micro': 7.5,
        't3.small': 15,
        't3.medium': 30,
        't3.large': 60,
        'm5.large': 70,
        'm5.xlarge': 140
    }
    return pricing.get(instance_type, 0)

# Find and report idle resources
idle = find_idle_resources()
total_waste = sum(i['MonthlyCost'] for i in idle)

print(f"Found {len(idle)} idle instances")
print(f"Potential monthly savings: ${total_waste:.2f}")

for instance in idle:
    print(f"{instance['InstanceId']} ({instance['InstanceType']}): {instance['AvgCPU']:.2f}% CPU, ${instance['MonthlyCost']:.2f}/month")

Cost Optimization Checklist

Compute

✅ Right-size instances based on utilization
✅ Use Reserved Instances/Savings Plans for stable workloads
✅ Use Spot/Preemptible instances for fault-tolerant workloads
✅ Enable auto-scaling
✅ Stop/terminate unused resources
✅ Use ARM-based instances (Graviton, Ampere)

Storage

✅ Implement lifecycle policies
✅ Delete unused snapshots and volumes
✅ Use appropriate storage tiers
✅ Enable compression and deduplication
✅ Review and remove old backups

Networking

✅ Optimize data transfer costs
✅ Use CDN for content delivery
✅ Review NAT Gateway usage
✅ Consolidate traffic paths

Database

✅ Right-size database instances
✅ Use read replicas instead of larger instances
✅ Consider serverless options
✅ Enable auto-pause for dev/test
✅ Use reserved capacity

Kubernetes

✅ Set resource requests and limits
✅ Use cluster autoscaler
✅ Implement pod autoscaling (HPA/VPA)
✅ Use Spot/Preemptible nodes
✅ Monitor with Kubecost

Governance

✅ Tag all resources
✅ Set up budgets and alerts
✅ Implement approval workflows
✅ Regular cost reviews
✅ Showback/chargeback to teams

Best Practices

  1. Visibility First: You can’t optimize what you can’t measure
  2. Automate Everything: Manual optimization doesn’t scale
  3. Culture of Cost Awareness: Make teams accountable
  4. Regular Reviews: Monthly cost optimization meetings
  5. Test in Lower Environments: Optimize dev/test first
  6. Monitor Continuously: Set up alerts for anomalies
  7. Document Decisions: Track why resources exist

Conclusion

Cloud cost optimization is an ongoing process, not a one-time project. By implementing these strategies—right-sizing, using reserved capacity, automating scale, and continuous monitoring—you can reduce costs by 30-50% while maintaining or improving performance.

Resources


What cost optimization strategies work best for you? Share in the comments!

HA
Author

Hari Prasad

Seasoned DevOps Lead with 11+ years of expertise in cloud infrastructure, CI/CD automation, and infrastructure as code. Proven track record in designing scalable, secure systems on AWS using Terraform, Kubernetes, Jenkins, and Ansible. Strong leadership in mentoring teams and implementing cost-effective cloud solutions.

Continue Reading

DevOps Tools & Calculators Free Tools

Power up your DevOps workflow with these handy tools

Enjoyed this article?

Explore more DevOps insights, tutorials, and best practices

View All Articles