KEDA (Kubernetes Event-driven Autoscaling) revolutionizes how we handle application scaling in Kubernetes by enabling event-driven autoscaling based on external metrics. In this comprehensive guide, we’ll explore advanced KEDA implementation on Amazon EKS with real-world examples covering multiple scaling scenarios, enterprise patterns, and production-ready configurations.
Table of Contents
- Why KEDA?
- Architecture Deep Dive
- Prerequisites and Setup
- KEDA Installation on EKS
- Advanced Scaling Use Cases
- Enterprise Patterns
- Performance Optimization
- Security and Compliance
- Monitoring and Observability
- Troubleshooting and Debugging
- Production Deployment
- Cost Optimization Strategies
Why KEDA?
KEDA addresses the limitations of traditional HPA (Horizontal Pod Autoscaler) by:
- Event-Driven Scaling: Scale based on external events, not just CPU/memory
- Zero-to-N Scaling: Scale from 0 to N pods based on actual demand
- Rich Metrics Support: 50+ scalers for various data sources
- Cost Optimization: Scale to zero when no events are present
- Production Ready: Used by major enterprises worldwide
- Cloud Agnostic: Works across AWS, Azure, GCP, and on-premises
- Intelligent Scaling: Advanced algorithms for smooth scaling decisions
- Multi-Cloud Support: Unified scaling across hybrid environments
KEDA vs Traditional HPA
Feature | KEDA | HPA |
---|---|---|
Scaling Triggers | External events, custom metrics | CPU, memory, custom metrics |
Zero Scaling | ✅ Yes | ❌ No |
Scaler Types | 50+ built-in scalers | Limited to resource metrics |
Cost Optimization | Excellent | Limited |
Event Sources | Queues, databases, APIs | Resource utilization |
Learning Curve | Moderate | Easy |
Architecture Deep Dive
KEDA Components
KEDA consists of several key components working together:
graph TB
subgraph "KEDA Architecture"
A[KEDA Operator] --> B[Metrics Server]
A --> C[Webhooks]
A --> D[ScaledObject Controller]
A --> E[ScaledJob Controller]
F[External Data Sources] --> A
G[Kubernetes API] --> A
H[Prometheus] --> A
I[Cloud APIs] --> A
A --> J[Deployment/StatefulSet]
A --> K[Job/CronJob]
L[Monitoring Stack] --> B
M[Grafana] --> L
N[AlertManager] --> L
end
Core Components Explained
1. KEDA Operator
- Purpose: Main controller managing ScaledObjects and ScaledJobs
- Responsibilities:
- Monitoring external metrics
- Making scaling decisions
- Updating Kubernetes resources
- Managing authentication
2. Metrics Server
- Purpose: Exposes external metrics to Kubernetes HPA
- Function: Translates KEDA metrics to HPA-compatible format
- Integration: Works with Kubernetes metrics API
3. Webhooks
- Purpose: Validates ScaledObject and ScaledJob configurations
- Function: Ensures proper configuration before scaling decisions
- Security: Prevents invalid scaling configurations
4. ScaledObject Controller
- Purpose: Manages scaling of Deployments and StatefulSets
- Features:
- Zero-to-N scaling
- Multiple trigger support
- Advanced scaling algorithms
5. ScaledJob Controller
- Purpose: Manages scaling of Jobs and CronJobs
- Features:
- Job-based scaling
- Batch processing optimization
- Resource cleanup
Scaling Decision Flow
sequenceDiagram
participant KEDA as KEDA Operator
participant MS as Metrics Server
participant HPA as HPA Controller
participant K8s as Kubernetes API
participant App as Application Pods
KEDA->>MS: Poll external metrics
MS-->>KEDA: Return metric values
KEDA->>KEDA: Calculate desired replicas
KEDA->>HPA: Update HPA with metrics
HPA->>K8s: Scale deployment
K8s->>App: Create/destroy pods
App-->>KEDA: Report processing status
Prerequisites and Setup
System Requirements
Before we begin, ensure you have:
- EKS Cluster: Version 1.21+ (recommended 1.25+)
- Node Groups: At least 2 nodes with 4+ vCPUs and 8GB+ RAM
- kubectl: Version 1.21+ configured
- AWS CLI: Version 2.x configured
- Helm: Version 3.x installed
- Terraform: Optional, for infrastructure management
- Go: Version 1.19+ (for custom scalers)
EKS Cluster Configuration
# Create EKS cluster with proper configuration
eksctl create cluster \
--name keda-demo-cluster \
--version 1.25 \
--region us-west-2 \
--nodegroup-name workers \
--node-type m5.large \
--nodes 3 \
--nodes-min 2 \
--nodes-max 10 \
--managed \
--with-oidc \
--ssh-access \
--ssh-public-key ~/.ssh/id_rsa.pub \
--enable-ssm
# Verify cluster status
kubectl get nodes
kubectl get pods -A
Required IAM Permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sqs:GetQueueAttributes",
"sqs:ListQueues",
"s3:GetObject",
"s3:ListBucket",
"rds:DescribeDBInstances",
"elasticache:DescribeCacheClusters",
"kafka:ListClusters",
"kafka:DescribeCluster"
],
"Resource": "*"
}
]
}
Environment Setup Script
#!/bin/bash
# setup-keda-environment.sh
set -e
# Variables
CLUSTER_NAME="keda-demo-cluster"
REGION="us-west-2"
NAMESPACE="keda"
echo "Setting up KEDA environment..."
# Install required tools
echo "Installing required tools..."
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin/
# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Install eksctl
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
# Verify installations
kubectl version --client
helm version
eksctl version
echo "Environment setup complete!"
Network Configuration
# vpc-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: keda-network-config
namespace: keda
data:
# Enable pod-to-pod communication
pod-to-pod: "enabled"
# Configure service mesh if using Istio
service-mesh: "disabled"
# Network policies
network-policies: "enabled"
Storage Requirements
# storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: keda-storage
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
KEDA Installation on EKS
Installation Methods
KEDA can be installed using multiple methods:
- Helm (Recommended): Easy management and upgrades
- YAML Manifests: Direct Kubernetes resources
- Operator: Using KEDA Operator
- Terraform: Infrastructure as Code
Method 1: Helm Installation (Production Ready)
Basic Installation
# Add KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
# Create namespace for KEDA
kubectl create namespace keda
# Install KEDA with basic configuration
helm install keda kedacore/keda \
--namespace keda \
--version 2.12.0 \
--set image.keda.tag=2.12.0 \
--set image.metricsApiServer.tag=2.12.0 \
--set image.webhooks.tag=2.12.0 \
--set image.operator.tag=2.12.0
# Verify installation
kubectl get pods -n keda
kubectl get crd | grep keda
Advanced Production Installation
# Create values file for production
cat > keda-values.yaml << EOF
# KEDA Production Configuration
operator:
image:
repository: ghcr.io/kedacore/keda
tag: "2.12.0"
pullPolicy: IfNotPresent
# Resource limits
resources:
limits:
cpu: 1000m
memory: 1000Mi
requests:
cpu: 100m
memory: 100Mi
# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# Metrics API Server configuration
metricsApiServer:
image:
repository: ghcr.io/kedacore/keda-metrics-apiserver
tag: "2.12.0"
pullPolicy: IfNotPresent
# Resource limits
resources:
limits:
cpu: 1000m
memory: 1000Mi
requests:
cpu: 100m
memory: 100Mi
# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# Webhooks configuration
webhooks:
image:
repository: ghcr.io/kedacore/keda-admission-webhooks
tag: "2.12.0"
pullPolicy: IfNotPresent
# Resource limits
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 100m
memory: 100Mi
# Service configuration
service:
type: ClusterIP
port: 80
targetPort: 8080
# Monitoring configuration
prometheus:
metricServer:
enabled: true
port: 8080
path: /metrics
operator:
enabled: true
port: 8080
path: /metrics
# Logging configuration
logging:
operator:
level: info
format: json
metricServer:
level: info
format: json
# Feature flags
features:
- "advanced-scaling"
- "multi-trigger"
- "fallback-scaling"
# Node selection
nodeSelector: {}
tolerations: []
affinity: {}
# Pod disruption budget
podDisruptionBudget:
enabled: true
minAvailable: 1
EOF
# Install with production configuration
helm install keda kedacore/keda \
--namespace keda \
--values keda-values.yaml \
--create-namespace \
--wait \
--timeout=10m
High Availability Installation
# HA KEDA configuration
cat > keda-ha-values.yaml << EOF
# High Availability Configuration
operator:
replicaCount: 3
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 200m
memory: 200Mi
metricsApiServer:
replicaCount: 3
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 200m
memory: 200Mi
# Pod Anti-Affinity
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- keda-operator
topologyKey: kubernetes.io/hostname
# Pod Disruption Budget
podDisruptionBudget:
enabled: true
minAvailable: 2
# Horizontal Pod Autoscaler
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
EOF
# Install HA KEDA
helm install keda kedacore/keda \
--namespace keda \
--values keda-ha-values.yaml \
--create-namespace \
--wait \
--timeout=15m
Method 2: YAML Manifests Installation
# Download KEDA manifests
curl -L https://github.com/kedacore/keda/releases/download/v2.12.0/keda-2.12.0.yaml -o keda-manifests.yaml
# Apply manifests
kubectl apply -f keda-manifests.yaml
# Verify installation
kubectl get pods -n keda-system
Method 3: Terraform Installation
# keda.tf
resource "helm_release" "keda" {
name = "keda"
repository = "https://kedacore.github.io/charts"
chart = "keda"
version = "2.12.0"
namespace = "keda"
create_namespace = true
values = [
file("${path.module}/keda-values.yaml")
]
depends_on = [
aws_eks_cluster.main,
aws_eks_node_group.workers
]
}
# IAM role for KEDA
resource "aws_iam_role" "keda_role" {
name = "keda-operator-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRoleWithWebIdentity"
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.eks.arn
}
Condition = {
StringEquals = {
"${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:sub" = "system:serviceaccount:keda:keda-operator"
"${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:aud" = "sts.amazonaws.com"
}
}
}
]
})
}
# Attach policies
resource "aws_iam_role_policy_attachment" "keda_sqs" {
role = aws_iam_role.keda_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSQSReadOnlyAccess"
}
resource "aws_iam_role_policy_attachment" "keda_s3" {
role = aws_iam_role.keda_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
}
Installation Verification
#!/bin/bash
# verify-keda-installation.sh
echo "Verifying KEDA installation..."
# Check KEDA pods
echo "Checking KEDA pods..."
kubectl get pods -n keda
# Check CRDs
echo "Checking KEDA CRDs..."
kubectl get crd | grep keda
# Check services
echo "Checking KEDA services..."
kubectl get svc -n keda
# Check metrics API
echo "Checking metrics API..."
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/default/keda-scaler-test
# Test scaling
echo "Testing scaling functionality..."
kubectl apply -f - << EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: test-scaler
namespace: default
spec:
scaleTargetRef:
name: test-deployment
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: cpu
metadata:
type: Utilization
value: "50"
EOF
echo "KEDA installation verified successfully!"
Step 2: Configure IAM Roles for Service Accounts (IRSA)
# Create IAM role for KEDA
cat > keda-trust-policy.json << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::YOUR_ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/YOUR_CLUSTER_ID"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.REGION.amazonaws.com/id/YOUR_CLUSTER_ID:sub": "system:serviceaccount:keda:keda-operator",
"oidc.eks.REGION.amazonaws.com/id/YOUR_CLUSTER_ID:aud": "sts.amazonaws.com"
}
}
}
]
}
EOF
# Create IAM role
aws iam create-role \
--role-name keda-operator-role \
--assume-role-policy-document file://keda-trust-policy.json
# Attach necessary policies
aws iam attach-role-policy \
--role-name keda-operator-role \
--policy-arn arn:aws:iam::aws:policy/AmazonSQSReadOnlyAccess
aws iam attach-role-policy \
--role-name keda-operator-role \
--policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
# Annotate service account
kubectl annotate serviceaccount keda-operator \
-n keda \
eks.amazonaws.com/role-arn=arn:aws:iam::YOUR_ACCOUNT_ID:role/keda-operator-role
Advanced Scaling Use Cases
1. Amazon SQS Queue Scaling
Basic SQS Scaling
# sqs-basic-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-basic-scaler
namespace: default
spec:
scaleTargetRef:
name: worker-app
minReplicaCount: 0
maxReplicaCount: 10
pollingInterval: 30
cooldownPeriod: 300
idleReplicaCount: 0
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/my-queue
queueLength: "5"
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: keda-aws-credentials
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-aws-credentials
namespace: default
spec:
podIdentity:
provider: aws-eks
Advanced SQS Scaling with Multiple Queues
# sqs-advanced-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-advanced-scaler
namespace: default
spec:
scaleTargetRef:
name: multi-queue-processor
minReplicaCount: 1
maxReplicaCount: 50
pollingInterval: 15
cooldownPeriod: 300
idleReplicaCount: 0
fallback:
failureThreshold: 3
replicas: 2
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/high-priority-queue
queueLength: "2"
awsRegion: us-west-2
identityOwner: operator
scaleOnInFlight: "false"
activationQueueLength: "1"
authenticationRef:
name: keda-aws-credentials
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/normal-priority-queue
queueLength: "10"
awsRegion: us-west-2
identityOwner: operator
scaleOnInFlight: "false"
activationQueueLength: "5"
authenticationRef:
name: keda-aws-credentials
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/batch-queue
queueLength: "20"
awsRegion: us-west-2
identityOwner: operator
scaleOnInFlight: "false"
activationQueueLength: "10"
authenticationRef:
name: keda-aws-credentials
SQS FIFO Queue Scaling
# sqs-fifo-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-fifo-scaler
namespace: default
spec:
scaleTargetRef:
name: fifo-processor
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/orders.fifo
queueLength: "5"
awsRegion: us-west-2
identityOwner: operator
scaleOnInFlight: "true"
maxInFlight: "10"
authenticationRef:
name: keda-aws-credentials
2. Amazon SNS Topic Scaling
# sns-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sns-scaler
namespace: default
spec:
scaleTargetRef:
name: sns-subscriber
minReplicaCount: 0
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-sns
metadata:
topicArn: arn:aws:sns:us-west-2:123456789012:my-topic
subscriptionArn: arn:aws:sns:us-west-2:123456789012:my-topic:12345678-1234-1234-1234-123456789012
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: keda-aws-credentials
3. Amazon Kinesis Stream Scaling
# kinesis-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kinesis-scaler
namespace: default
spec:
scaleTargetRef:
name: kinesis-consumer
minReplicaCount: 1
maxReplicaCount: 25
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-kinesis-stream
metadata:
streamName: my-kinesis-stream
shardCount: "2"
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: keda-aws-credentials
4. Amazon DynamoDB Scaling
# dynamodb-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: dynamodb-scaler
namespace: default
spec:
scaleTargetRef:
name: dynamodb-processor
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-dynamodb
metadata:
tableName: my-table
keyConditionExpression: "pk = :pk"
expressionAttributeNames: "pk = partition_key"
expressionAttributeValues: ":pk = S:my-partition-key"
targetValue: "100"
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: keda-aws-credentials
5. Amazon CloudWatch Metrics Scaling
# cloudwatch-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cloudwatch-scaler
namespace: default
spec:
scaleTargetRef:
name: cloudwatch-processor
minReplicaCount: 1
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-cloudwatch
metadata:
namespace: AWS/ApplicationELB
metricName: RequestCount
dimensions: "LoadBalancer=app/my-load-balancer/50dc6c495c0c9188"
targetValue: "100"
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: keda-aws-credentials
6. Redis Stream Scaling
Basic Redis Stream Scaling
# redis-stream-basic.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-stream-basic
namespace: default
spec:
scaleTargetRef:
name: stream-processor
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: redis-streams
metadata:
address: redis-cluster.redis.svc.cluster.local:6379
stream: my-stream
consumerGroup: my-consumer-group
streamLength: "10"
enableTLS: "false"
authenticationRef:
name: redis-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: redis-auth
namespace: default
spec:
secretTargetRef:
- parameter: password
name: redis-secret
key: password
Advanced Redis Stream Scaling with Multiple Streams
# redis-stream-advanced.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-stream-advanced
namespace: default
spec:
scaleTargetRef:
name: multi-stream-processor
minReplicaCount: 1
maxReplicaCount: 50
pollingInterval: 10
cooldownPeriod: 60
triggers:
- type: redis-streams
metadata:
address: redis-cluster.redis.svc.cluster.local:6379
stream: high-priority-stream
consumerGroup: high-priority-group
streamLength: "5"
enableTLS: "false"
authenticationRef:
name: redis-auth
- type: redis-streams
metadata:
address: redis-cluster.redis.svc.cluster.local:6379
stream: normal-priority-stream
consumerGroup: normal-priority-group
streamLength: "20"
enableTLS: "false"
authenticationRef:
name: redis-auth
- type: redis-streams
metadata:
address: redis-cluster.redis.svc.cluster.local:6379
stream: batch-stream
consumerGroup: batch-group
streamLength: "50"
enableTLS: "false"
authenticationRef:
name: redis-auth
7. Redis List Scaling
# redis-list-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-list-scaler
namespace: default
spec:
scaleTargetRef:
name: list-processor
minReplicaCount: 0
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 60
triggers:
- type: redis
metadata:
address: redis-cluster.redis.svc.cluster.local:6379
listName: my-list
listLength: "10"
enableTLS: "false"
authenticationRef:
name: redis-auth
8. PostgreSQL Scaling
Basic PostgreSQL Scaling
# postgres-basic-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: postgres-basic-scaler
namespace: default
spec:
scaleTargetRef:
name: data-processor
minReplicaCount: 1
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 120
triggers:
- type: postgresql
metadata:
connection: postgresql://user:password@postgres.default.svc.cluster.local:5432/mydb
query: "SELECT COUNT(*) FROM pending_jobs WHERE status = 'pending'"
targetQueryValue: "5"
activationTargetQueryValue: "1"
authenticationRef:
name: postgres-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: postgres-auth
namespace: default
spec:
secretTargetRef:
- parameter: password
name: postgres-secret
key: password
Advanced PostgreSQL Scaling with Multiple Queries
# postgres-advanced-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: postgres-advanced-scaler
namespace: default
spec:
scaleTargetRef:
name: multi-query-processor
minReplicaCount: 1
maxReplicaCount: 30
pollingInterval: 15
cooldownPeriod: 120
triggers:
- type: postgresql
metadata:
connection: postgresql://user:password@postgres.default.svc.cluster.local:5432/mydb
query: "SELECT COUNT(*) FROM urgent_jobs WHERE status = 'pending' AND priority = 'high'"
targetQueryValue: "2"
activationTargetQueryValue: "1"
authenticationRef:
name: postgres-auth
- type: postgresql
metadata:
connection: postgresql://user:password@postgres.default.svc.cluster.local:5432/mydb
query: "SELECT COUNT(*) FROM normal_jobs WHERE status = 'pending' AND priority = 'normal'"
targetQueryValue: "10"
activationTargetQueryValue: "5"
authenticationRef:
name: postgres-auth
- type: postgresql
metadata:
connection: postgresql://user:password@postgres.default.svc.cluster.local:5432/mydb
query: "SELECT COUNT(*) FROM batch_jobs WHERE status = 'pending' AND priority = 'low'"
targetQueryValue: "25"
activationTargetQueryValue: "10"
authenticationRef:
name: postgres-auth
9. MySQL Scaling
# mysql-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: mysql-scaler
namespace: default
spec:
scaleTargetRef:
name: mysql-processor
minReplicaCount: 1
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 120
triggers:
- type: mysql
metadata:
connection: mysql://user:password@mysql.default.svc.cluster.local:3306/mydb
query: "SELECT COUNT(*) FROM pending_tasks WHERE status = 'pending'"
targetQueryValue: "5"
activationTargetQueryValue: "1"
authenticationRef:
name: mysql-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: mysql-auth
namespace: default
spec:
secretTargetRef:
- parameter: password
name: mysql-secret
key: password
10. MongoDB Scaling
# mongodb-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: mongodb-scaler
namespace: default
spec:
scaleTargetRef:
name: mongodb-processor
minReplicaCount: 1
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 120
triggers:
- type: mongodb
metadata:
connectionString: mongodb://user:password@mongodb.default.svc.cluster.local:27017/mydb
database: mydb
collection: pending_jobs
query: '{"status": "pending"}'
targetQueryValue: "10"
activationTargetQueryValue: "5"
authenticationRef:
name: mongodb-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: mongodb-auth
namespace: default
spec:
secretTargetRef:
- parameter: password
name: mongodb-secret
key: password
11. Kafka Topic Scaling
Basic Kafka Scaling
# kafka-basic-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-basic-scaler
namespace: default
spec:
scaleTargetRef:
name: kafka-consumer
minReplicaCount: 0
maxReplicaCount: 25
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-cluster.kafka.svc.cluster.local:9092
consumerGroup: my-consumer-group
topic: my-topic
lagThreshold: "10"
offsetResetPolicy: earliest
authenticationRef:
name: kafka-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: kafka-auth
namespace: default
spec:
secretTargetRef:
- parameter: password
name: kafka-secret
key: password
Advanced Kafka Scaling with Multiple Topics
# kafka-advanced-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-advanced-scaler
namespace: default
spec:
scaleTargetRef:
name: multi-topic-consumer
minReplicaCount: 1
maxReplicaCount: 50
pollingInterval: 15
cooldownPeriod: 300
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-cluster.kafka.svc.cluster.local:9092
consumerGroup: high-priority-group
topic: high-priority-topic
lagThreshold: "5"
offsetResetPolicy: earliest
authenticationRef:
name: kafka-auth
- type: kafka
metadata:
bootstrapServers: kafka-cluster.kafka.svc.cluster.local:9092
consumerGroup: normal-priority-group
topic: normal-priority-topic
lagThreshold: "20"
offsetResetPolicy: earliest
authenticationRef:
name: kafka-auth
- type: kafka
metadata:
bootstrapServers: kafka-cluster.kafka.svc.cluster.local:9092
consumerGroup: batch-group
topic: batch-topic
lagThreshold: "50"
offsetResetPolicy: earliest
authenticationRef:
name: kafka-auth
12. Apache Pulsar Scaling
# pulsar-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: pulsar-scaler
namespace: default
spec:
scaleTargetRef:
name: pulsar-consumer
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: apache-pulsar
metadata:
broker: pulsar://pulsar-broker.pulsar.svc.cluster.local:6650
topic: persistent://public/default/my-topic
subscription: my-subscription
subscriptionType: Shared
targetMessageCount: "10"
authenticationRef:
name: pulsar-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: pulsar-auth
namespace: default
spec:
secretTargetRef:
- parameter: token
name: pulsar-secret
key: token
13. Prometheus Metrics Scaling
Basic Prometheus Scaling
# prometheus-basic-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-basic-scaler
namespace: default
spec:
scaleTargetRef:
name: custom-app
minReplicaCount: 1
maxReplicaCount: 10
pollingInterval: 30
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: custom_processing_queue_size
threshold: '10'
query: sum(rate(custom_processing_queue_size[1m]))
authenticationRef:
name: prometheus-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: prometheus-auth
namespace: default
spec:
secretTargetRef:
- parameter: username
name: prometheus-secret
key: username
- parameter: password
name: prometheus-secret
key: password
Advanced Prometheus Scaling with Multiple Metrics
# prometheus-advanced-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-advanced-scaler
namespace: default
spec:
scaleTargetRef:
name: multi-metric-app
minReplicaCount: 1
maxReplicaCount: 30
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: high_priority_queue_size
threshold: '5'
query: sum(rate(high_priority_queue_size[1m]))
authenticationRef:
name: prometheus-auth
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: normal_priority_queue_size
threshold: '20'
query: sum(rate(normal_priority_queue_size[1m]))
authenticationRef:
name: prometheus-auth
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: batch_queue_size
threshold: '50'
query: sum(rate(batch_queue_size[1m]))
authenticationRef:
name: prometheus-auth
14. Azure Service Bus Scaling
Basic Azure Service Bus Scaling
# azure-servicebus-basic.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: azure-servicebus-basic
namespace: default
spec:
scaleTargetRef:
name: azure-worker
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: azure-servicebus
metadata:
connectionFromEnv: AZURE_SERVICEBUS_CONNECTION_STRING
queueName: my-queue
messageCount: "5"
authenticationRef:
name: azure-servicebus-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: azure-servicebus-auth
namespace: default
spec:
secretTargetRef:
- parameter: connection
name: azure-servicebus-secret
key: connection-string
Advanced Azure Service Bus Scaling with Topics
# azure-servicebus-advanced.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: azure-servicebus-advanced
namespace: default
spec:
scaleTargetRef:
name: azure-topic-worker
minReplicaCount: 1
maxReplicaCount: 30
pollingInterval: 15
cooldownPeriod: 300
triggers:
- type: azure-servicebus
metadata:
connectionFromEnv: AZURE_SERVICEBUS_CONNECTION_STRING
topicName: high-priority-topic
subscriptionName: high-priority-subscription
messageCount: "5"
authenticationRef:
name: azure-servicebus-auth
- type: azure-servicebus
metadata:
connectionFromEnv: AZURE_SERVICEBUS_CONNECTION_STRING
topicName: normal-priority-topic
subscriptionName: normal-priority-subscription
messageCount: "20"
authenticationRef:
name: azure-servicebus-auth
- type: azure-servicebus
metadata:
connectionFromEnv: AZURE_SERVICEBUS_CONNECTION_STRING
topicName: batch-topic
subscriptionName: batch-subscription
messageCount: "50"
authenticationRef:
name: azure-servicebus-auth
15. Azure Event Hubs Scaling
# azure-eventhubs-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: azure-eventhubs-scaler
namespace: default
spec:
scaleTargetRef:
name: eventhubs-consumer
minReplicaCount: 0
maxReplicaCount: 25
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: azure-eventhubs
metadata:
connectionFromEnv: AZURE_EVENTHUB_CONNECTION_STRING
eventHubName: my-eventhub
consumerGroup: my-consumer-group
messageCount: "10"
authenticationRef:
name: azure-eventhubs-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: azure-eventhubs-auth
namespace: default
spec:
secretTargetRef:
- parameter: connection
name: azure-eventhubs-secret
key: connection-string
16. Azure Storage Queue Scaling
# azure-storage-queue-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: azure-storage-queue-scaler
namespace: default
spec:
scaleTargetRef:
name: storage-queue-worker
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: azure-queue
metadata:
connectionFromEnv: AZURE_STORAGE_CONNECTION_STRING
queueName: my-queue
messageCount: "10"
authenticationRef:
name: azure-storage-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: azure-storage-auth
namespace: default
spec:
secretTargetRef:
- parameter: connection
name: azure-storage-secret
key: connection-string
17. Google Cloud Pub/Sub Scaling
# gcp-pubsub-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: gcp-pubsub-scaler
namespace: default
spec:
scaleTargetRef:
name: pubsub-subscriber
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: gcp-pubsub
metadata:
subscriptionName: my-subscription
mode: subscription
value: "10"
authenticationRef:
name: gcp-pubsub-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: gcp-pubsub-auth
namespace: default
spec:
secretTargetRef:
- parameter: credentials
name: gcp-pubsub-secret
key: credentials.json
18. Google Cloud Storage Scaling
# gcp-storage-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: gcp-storage-scaler
namespace: default
spec:
scaleTargetRef:
name: storage-processor
minReplicaCount: 0
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: gcp-storage
metadata:
bucketName: my-bucket
targetObjectCount: "10"
authenticationRef:
name: gcp-storage-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: gcp-storage-auth
namespace: default
spec:
secretTargetRef:
- parameter: credentials
name: gcp-storage-secret
key: credentials.json
19. Cron-based Scaling
Basic Cron Scaling
# cron-basic-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cron-basic-scaler
namespace: default
spec:
scaleTargetRef:
name: scheduled-job
minReplicaCount: 0
maxReplicaCount: 5
pollingInterval: 30
cooldownPeriod: 60
triggers:
- type: cron
metadata:
timezone: UTC
start: "0 9 * * 1-5" # 9 AM weekdays
end: "0 17 * * 1-5" # 5 PM weekdays
desiredReplicas: "3"
Advanced Cron Scaling with Multiple Schedules
# cron-advanced-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cron-advanced-scaler
namespace: default
spec:
scaleTargetRef:
name: multi-schedule-job
minReplicaCount: 0
maxReplicaCount: 10
pollingInterval: 30
cooldownPeriod: 60
triggers:
- type: cron
metadata:
timezone: UTC
start: "0 9 * * 1-5" # 9 AM weekdays
end: "0 17 * * 1-5" # 5 PM weekdays
desiredReplicas: "5"
- type: cron
metadata:
timezone: UTC
start: "0 18 * * 1-5" # 6 PM weekdays
end: "0 22 * * 1-5" # 10 PM weekdays
desiredReplicas: "3"
- type: cron
metadata:
timezone: UTC
start: "0 10 * * 6,7" # 10 AM weekends
end: "0 16 * * 6,7" # 4 PM weekends
desiredReplicas: "2"
20. External Scaler (Custom Metrics)
Basic External Scaler
# external-basic-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: external-basic-scaler
namespace: default
spec:
scaleTargetRef:
name: external-api-consumer
minReplicaCount: 0
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 120
triggers:
- type: external
metadata:
scalerAddress: external-scaler-service.default.svc.cluster.local:8080
metricName: custom_metric
threshold: "10"
Advanced External Scaler with Multiple Metrics
# external-advanced-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: external-advanced-scaler
namespace: default
spec:
scaleTargetRef:
name: multi-metric-consumer
minReplicaCount: 1
maxReplicaCount: 30
pollingInterval: 15
cooldownPeriod: 120
triggers:
- type: external
metadata:
scalerAddress: external-scaler-service.default.svc.cluster.local:8080
metricName: high_priority_metric
threshold: "5"
- type: external
metadata:
scalerAddress: external-scaler-service.default.svc.cluster.local:8080
metricName: normal_priority_metric
threshold: "20"
- type: external
metadata:
scalerAddress: external-scaler-service.default.svc.cluster.local:8080
metricName: batch_metric
threshold: "50"
21. InfluxDB Scaling
# influxdb-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: influxdb-scaler
namespace: default
spec:
scaleTargetRef:
name: influxdb-processor
minReplicaCount: 1
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 120
triggers:
- type: influxdb
metadata:
serverURL: http://influxdb.influxdb.svc.cluster.local:8086
organizationName: my-org
bucketName: my-bucket
query: 'from(bucket: "my-bucket") |> range(start: -1h) |> filter(fn: (r) => r._measurement == "pending_jobs") |> count()'
threshold: "10"
authenticationRef:
name: influxdb-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: influxdb-auth
namespace: default
spec:
secretTargetRef:
- parameter: token
name: influxdb-secret
key: token
22. Apache Kafka Streams Scaling
# kafka-streams-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-streams-scaler
namespace: default
spec:
scaleTargetRef:
name: kafka-streams-app
minReplicaCount: 1
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-cluster.kafka.svc.cluster.local:9092
consumerGroup: kafka-streams-group
topic: input-topic
lagThreshold: "10"
offsetResetPolicy: earliest
authenticationRef:
name: kafka-auth
23. RabbitMQ Scaling
# rabbitmq-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: rabbitmq-scaler
namespace: default
spec:
scaleTargetRef:
name: rabbitmq-consumer
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: rabbitmq
metadata:
queueName: my-queue
host: amqp://rabbitmq.rabbitmq.svc.cluster.local:5672
queueLength: "10"
authenticationRef:
name: rabbitmq-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: rabbitmq-auth
namespace: default
spec:
secretTargetRef:
- parameter: username
name: rabbitmq-secret
key: username
- parameter: password
name: rabbitmq-secret
key: password
24. Apache ActiveMQ Scaling
# activemq-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: activemq-scaler
namespace: default
spec:
scaleTargetRef:
name: activemq-consumer
minReplicaCount: 0
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: artemis-queue
metadata:
managementEndpoint: http://activemq.activemq.svc.cluster.local:8161
queueName: my-queue
queueLength: "10"
authenticationRef:
name: activemq-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: activemq-auth
namespace: default
spec:
secretTargetRef:
- parameter: username
name: activemq-secret
key: username
- parameter: password
name: activemq-secret
key: password
25. Apache Pulsar Scaling
# pulsar-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: pulsar-scaler
namespace: default
spec:
scaleTargetRef:
name: pulsar-consumer
minReplicaCount: 0
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: apache-pulsar
metadata:
broker: pulsar://pulsar-broker.pulsar.svc.cluster.local:6650
topic: persistent://public/default/my-topic
subscription: my-subscription
subscriptionType: Shared
targetMessageCount: "10"
authenticationRef:
name: pulsar-auth
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: pulsar-auth
namespace: default
spec:
secretTargetRef:
- parameter: token
name: pulsar-secret
key: token
Enterprise Patterns
1. Multi-Tenant Scaling Architecture
# multi-tenant-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: multi-tenant-scaler
namespace: tenant-a
labels:
tenant: tenant-a
environment: production
spec:
scaleTargetRef:
name: tenant-a-processor
minReplicaCount: 1
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/tenant-a-queue
queueLength: "5"
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: tenant-a-aws-credentials
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: multi-tenant-scaler
namespace: tenant-b
labels:
tenant: tenant-b
environment: production
spec:
scaleTargetRef:
name: tenant-b-processor
minReplicaCount: 1
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/tenant-b-queue
queueLength: "10"
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: tenant-b-aws-credentials
2. Circuit Breaker Pattern with KEDA
# circuit-breaker-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: circuit-breaker-scaler
namespace: default
spec:
scaleTargetRef:
name: resilient-processor
minReplicaCount: 1
maxReplicaCount: 10
pollingInterval: 30
cooldownPeriod: 300
fallback:
failureThreshold: 5
replicas: 1
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: circuit_breaker_state
threshold: '1'
query: sum(rate(circuit_breaker_state[1m]))
authenticationRef:
name: prometheus-auth
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/fallback-queue
queueLength: "5"
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: keda-aws-credentials
3. Blue-Green Deployment with KEDA
# blue-green-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: blue-green-scaler
namespace: default
spec:
scaleTargetRef:
name: blue-green-app
minReplicaCount: 2
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: blue_green_health
threshold: '0.8'
query: sum(rate(blue_green_health[1m]))
authenticationRef:
name: prometheus-auth
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: blue-green-app
namespace: default
spec:
replicas: 2
strategy:
blueGreen:
activeService: blue-green-app-active
previewService: blue-green-app-preview
autoPromotionEnabled: false
scaleDownDelaySeconds: 30
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: blue-green-app-preview
postPromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: blue-green-app-active
selector:
matchLabels:
app: blue-green-app
template:
metadata:
labels:
app: blue-green-app
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
4. Canary Deployment with KEDA
# canary-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: canary-scaler
namespace: default
spec:
scaleTargetRef:
name: canary-app
minReplicaCount: 1
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: canary_success_rate
threshold: '0.95'
query: sum(rate(canary_success_rate[1m]))
authenticationRef:
name: prometheus-auth
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: canary-app
namespace: default
spec:
replicas: 5
strategy:
canary:
steps:
- setWeight: 20
- pause: {duration: 10m}
- setWeight: 40
- pause: {duration: 10m}
- setWeight: 60
- pause: {duration: 10m}
- setWeight: 80
- pause: {duration: 10m}
analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: canary-app
startingStep: 2
interval: 5m
selector:
matchLabels:
app: canary-app
template:
metadata:
labels:
app: canary-app
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
5. Event Sourcing with KEDA
# event-sourcing-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: event-sourcing-scaler
namespace: default
spec:
scaleTargetRef:
name: event-processor
minReplicaCount: 1
maxReplicaCount: 25
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-cluster.kafka.svc.cluster.local:9092
consumerGroup: event-sourcing-group
topic: events
lagThreshold: "10"
offsetResetPolicy: earliest
authenticationRef:
name: kafka-auth
- type: postgresql
metadata:
connection: postgresql://user:password@postgres.default.svc.cluster.local:5432/events
query: "SELECT COUNT(*) FROM event_store WHERE processed = false"
targetQueryValue: "100"
activationTargetQueryValue: "10"
authenticationRef:
name: postgres-auth
6. CQRS Pattern with KEDA
# cqrs-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cqrs-command-scaler
namespace: default
spec:
scaleTargetRef:
name: command-processor
minReplicaCount: 1
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/command-queue
queueLength: "5"
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: keda-aws-credentials
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cqrs-query-scaler
namespace: default
spec:
scaleTargetRef:
name: query-processor
minReplicaCount: 2
maxReplicaCount: 15
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: query_request_rate
threshold: '100'
query: sum(rate(query_request_rate[1m]))
authenticationRef:
name: prometheus-auth
7. Saga Pattern with KEDA
# saga-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: saga-scaler
namespace: default
spec:
scaleTargetRef:
name: saga-processor
minReplicaCount: 1
maxReplicaCount: 30
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: postgresql
metadata:
connection: postgresql://user:password@postgres.default.svc.cluster.local:5432/saga
query: "SELECT COUNT(*) FROM saga_instances WHERE status = 'running'"
targetQueryValue: "10"
activationTargetQueryValue: "1"
authenticationRef:
name: postgres-auth
- type: postgresql
metadata:
connection: postgresql://user:password@postgres.default.svc.cluster.local:5432/saga
query: "SELECT COUNT(*) FROM saga_instances WHERE status = 'compensating'"
targetQueryValue: "5"
activationTargetQueryValue: "1"
authenticationRef:
name: postgres-auth
8. Outbox Pattern with KEDA
# outbox-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: outbox-scaler
namespace: default
spec:
scaleTargetRef:
name: outbox-processor
minReplicaCount: 1
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: postgresql
metadata:
connection: postgresql://user:password@postgres.default.svc.cluster.local:5432/outbox
query: "SELECT COUNT(*) FROM outbox_events WHERE processed = false"
targetQueryValue: "10"
activationTargetQueryValue: "1"
authenticationRef:
name: postgres-auth
- type: kafka
metadata:
bootstrapServers: kafka-cluster.kafka.svc.cluster.local:9092
consumerGroup: outbox-group
topic: outbox-events
lagThreshold: "5"
offsetResetPolicy: earliest
authenticationRef:
name: kafka-auth
Performance Optimization
1. Scaling Algorithm Tuning
# optimized-scaling-algorithm.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: optimized-scaler
namespace: default
spec:
scaleTargetRef:
name: optimized-app
minReplicaCount: 1
maxReplicaCount: 100
pollingInterval: 15
cooldownPeriod: 300
idleReplicaCount: 0
# Advanced scaling configuration
fallback:
failureThreshold: 3
replicas: 2
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/optimized-queue
queueLength: "5"
awsRegion: us-west-2
identityOwner: operator
# Advanced SQS configuration
scaleOnInFlight: "false"
activationQueueLength: "1"
maxInFlight: "10"
authenticationRef:
name: keda-aws-credentials
2. Resource Optimization
# resource-optimized-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: resource-optimized-app
namespace: default
spec:
replicas: 0
selector:
matchLabels:
app: resource-optimized-app
template:
metadata:
labels:
app: resource-optimized-app
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
# Resource optimization
env:
- name: JAVA_OPTS
value: "-Xms128m -Xmx256m -XX:+UseG1GC"
- name: NODE_OPTIONS
value: "--max-old-space-size=256"
# Health checks
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
# Graceful shutdown
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
3. Network Optimization
# network-optimized-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: network-optimized-scaler
namespace: default
spec:
scaleTargetRef:
name: network-optimized-app
minReplicaCount: 1
maxReplicaCount: 50
pollingInterval: 10
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: network_throughput
threshold: '1000'
query: sum(rate(network_throughput[1m]))
authenticationRef:
name: prometheus-auth
---
apiVersion: v1
kind: Service
metadata:
name: network-optimized-app
namespace: default
spec:
selector:
app: network-optimized-app
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: ClusterIP
# Network optimization
sessionAffinity: None
externalTrafficPolicy: Cluster
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: network-optimized-policy
namespace: default
spec:
podSelector:
matchLabels:
app: network-optimized-app
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 9090
4. Caching Strategy with KEDA
# cache-optimized-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cache-optimized-scaler
namespace: default
spec:
scaleTargetRef:
name: cache-optimized-app
minReplicaCount: 2
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: redis
metadata:
address: redis-cluster.redis.svc.cluster.local:6379
listName: cache-miss-queue
listLength: "10"
enableTLS: "false"
authenticationRef:
name: redis-auth
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: cache_hit_ratio
threshold: '0.8'
query: sum(rate(cache_hit_ratio[1m]))
authenticationRef:
name: prometheus-auth
---
apiVersion: v1
kind: ConfigMap
metadata:
name: cache-config
namespace: default
data:
cache.properties: |
# Redis configuration
redis.host=redis-cluster.redis.svc.cluster.local
redis.port=6379
redis.timeout=2000
redis.pool.max-active=20
redis.pool.max-idle=10
redis.pool.min-idle=5
# Cache configuration
cache.ttl=3600
cache.max-size=10000
cache.eviction-policy=LRU
Security and Compliance
1. RBAC Configuration
# keda-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: keda-operator
namespace: keda
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: keda-operator
rules:
- apiGroups: [""]
resources: ["pods", "services", "endpoints", "persistentvolumeclaims", "events", "configmaps", "secrets"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
resources: ["deployments", "daemonsets", "replicasets", "statefulsets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["keda.sh"]
resources: ["scaledobjects", "scaledjobs", "triggerauthentications", "clustertriggerauthentications"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: keda-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: keda-operator
subjects:
- kind: ServiceAccount
name: keda-operator
namespace: keda
2. Pod Security Standards
# pod-security-policy.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: keda-psp
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: keda-operator
namespace: keda
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: keda-psp-user
rules:
- apiGroups: ['policy']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames:
- keda-psp
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: keda-psp-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: keda-psp-user
subjects:
- kind: ServiceAccount
name: keda-operator
namespace: keda
3. Network Security
# network-security.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: keda-network-policy
namespace: keda
spec:
podSelector:
matchLabels:
app: keda-operator
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: default
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8080
egress:
- to: []
ports:
- protocol: TCP
port: 443
- protocol: TCP
port: 9090
- protocol: TCP
port: 6379
- protocol: TCP
port: 5432
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: keda-metrics-network-policy
namespace: keda
spec:
podSelector:
matchLabels:
app: keda-metrics-apiserver
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8080
egress:
- to: []
ports:
- protocol: TCP
port: 443
4. Secret Management
# secret-management.yaml
apiVersion: v1
kind: Secret
metadata:
name: keda-secrets
namespace: keda
type: Opaque
data:
aws-access-key: <base64-encoded-key>
aws-secret-key: <base64-encoded-secret>
redis-password: <base64-encoded-password>
postgres-password: <base64-encoded-password>
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-secrets-auth
namespace: keda
spec:
secretTargetRef:
- parameter: awsAccessKeyID
name: keda-secrets
key: aws-access-key
- parameter: awsSecretAccessKey
name: keda-secrets
key: aws-secret-key
- parameter: password
name: keda-secrets
key: redis-password
5. Compliance and Auditing
# compliance-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: keda-compliance-config
namespace: keda
data:
audit.yaml: |
# Audit configuration
audit:
enabled: true
level: "metadata"
logFormat: "json"
logPath: "/var/log/audit/audit.log"
maxAge: 30
maxBackups: 10
maxSize: 100
# Compliance settings
compliance:
gdpr: true
sox: true
pci: false
hipaa: false
# Data retention
retention:
logs: "30d"
metrics: "90d"
events: "7d"
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: keda-audit
namespace: keda
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: keda-audit
rules:
- apiGroups: [""]
resources: ["events", "pods", "services"]
verbs: ["get", "list", "watch"]
- apiGroups: ["keda.sh"]
resources: ["scaledobjects", "scaledjobs"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: keda-audit
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: keda-audit
subjects:
- kind: ServiceAccount
name: keda-audit
namespace: keda
Monitoring and Observability
1. KEDA Metrics Collection
# keda-monitoring.yaml
apiVersion: v1
kind: ServiceMonitor
metadata:
name: keda-metrics
namespace: keda
labels:
app: keda-operator
release: prometheus
spec:
selector:
matchLabels:
app: keda-operator
endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10s
---
apiVersion: v1
kind: ServiceMonitor
metadata:
name: keda-metrics-apiserver
namespace: keda
labels:
app: keda-metrics-apiserver
release: prometheus
spec:
selector:
matchLabels:
app: keda-metrics-apiserver
endpoints:
- port: https
path: /metrics
scheme: https
interval: 30s
scrapeTimeout: 10s
tlsConfig:
insecureSkipVerify: true
---
apiVersion: v1
kind: ServiceMonitor
metadata:
name: keda-webhooks
namespace: keda
labels:
app: keda-webhooks
release: prometheus
spec:
selector:
matchLabels:
app: keda-webhooks
endpoints:
- port: https
path: /metrics
scheme: https
interval: 30s
scrapeTimeout: 10s
tlsConfig:
insecureSkipVerify: true
2. Comprehensive Grafana Dashboard
{
"dashboard": {
"title": "KEDA Comprehensive Dashboard",
"tags": ["keda", "kubernetes", "autoscaling"],
"timezone": "browser",
"panels": [
{
"title": "KEDA Operator Health",
"type": "stat",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
"targets": [
{
"expr": "up{job=\"keda-operator\"}",
"legendFormat": "Operator Status"
}
],
"fieldConfig": {
"defaults": {
"color": {"mode": "thresholds"},
"thresholds": {
"steps": [
{"color": "red", "value": 0},
{"color": "green", "value": 1}
]
}
}
}
},
{
"title": "ScaledObjects Status",
"type": "stat",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
"targets": [
{
"expr": "keda_scaled_object_ready",
"legendFormat": "Ready"
},
{
"expr": "keda_scaled_object_paused",
"legendFormat": "Paused"
}
]
},
{
"title": "Current Replicas by ScaledObject",
"type": "graph",
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 8},
"targets": [
{
"expr": "keda_scaled_object_replicas",
"legendFormat": " ()"
}
],
"yAxes": [
{
"label": "Replicas",
"min": 0
}
]
},
{
"title": "Scale Events Rate",
"type": "graph",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 16},
"targets": [
{
"expr": "rate(keda_scaled_object_scale_events_total[5m])",
"legendFormat": " - "
}
],
"yAxes": [
{
"label": "Events/sec",
"min": 0
}
]
},
{
"title": "External Metrics",
"type": "graph",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 16},
"targets": [
{
"expr": "keda_scaled_object_ready * on(scaledObject) group_left keda_scaled_object_ready",
"legendFormat": ""
}
]
},
{
"title": "KEDA Operator Resource Usage",
"type": "graph",
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 24},
"targets": [
{
"expr": "rate(container_cpu_usage_seconds_total{pod=~\"keda-operator-.*\"}[5m])",
"legendFormat": "CPU Usage"
},
{
"expr": "container_memory_usage_bytes{pod=~\"keda-operator-.*\"}",
"legendFormat": "Memory Usage"
}
],
"yAxes": [
{
"label": "CPU (cores)"
},
{
"label": "Memory (bytes)",
"logBase": 2
}
]
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "30s"
}
}
3. Alerting Rules
# keda-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: keda-alerts
namespace: keda
labels:
app: keda
release: prometheus
spec:
groups:
- name: keda.rules
rules:
- alert: KEDAOperatorDown
expr: up{job="keda-operator"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "KEDA Operator is down"
description: "KEDA Operator has been down for more than 1 minute"
- alert: KEDAMetricsServerDown
expr: up{job="keda-metrics-apiserver"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "KEDA Metrics Server is down"
description: "KEDA Metrics Server has been down for more than 1 minute"
- alert: ScaledObjectNotReady
expr: keda_scaled_object_ready == 0
for: 2m
labels:
severity: warning
annotations:
summary: "ScaledObject is not ready"
description: "ScaledObject {{ $labels.scaledObject }} in namespace {{ $labels.namespace }} is not ready"
- alert: HighScaleEventRate
expr: rate(keda_scaled_object_scale_events_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High scale event rate"
description: "ScaledObject {{ $labels.scaledObject }} is scaling frequently"
- alert: KEDAOperatorHighCPU
expr: rate(container_cpu_usage_seconds_total{pod=~"keda-operator-.*"}[5m]) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "KEDA Operator high CPU usage"
description: "KEDA Operator CPU usage is above 80%"
- alert: KEDAOperatorHighMemory
expr: container_memory_usage_bytes{pod=~"keda-operator-.*"} > 1000000000
for: 5m
labels:
severity: warning
annotations:
summary: "KEDA Operator high memory usage"
description: "KEDA Operator memory usage is above 1GB"
4. Custom Metrics Collection
# custom-metrics-collector.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: keda-metrics-collector
namespace: keda
spec:
replicas: 1
selector:
matchLabels:
app: keda-metrics-collector
template:
metadata:
labels:
app: keda-metrics-collector
spec:
containers:
- name: collector
image: prom/node-exporter:latest
ports:
- containerPort: 9100
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
hostNetwork: true
hostPID: true
---
apiVersion: v1
kind: Service
metadata:
name: keda-metrics-collector
namespace: keda
spec:
selector:
app: keda-metrics-collector
ports:
- port: 9100
targetPort: 9100
Troubleshooting and Debugging
1. Comprehensive Debugging Script
#!/bin/bash
# keda-debug.sh
set -e
NAMESPACE=${1:-default}
SCALED_OBJECT=${2:-""}
echo "=== KEDA Debugging Script ==="
echo "Namespace: $NAMESPACE"
echo "ScaledObject: $SCALED_OBJECT"
echo ""
# Check KEDA installation
echo "1. Checking KEDA Installation..."
kubectl get pods -n keda
kubectl get crd | grep keda
echo ""
# Check ScaledObjects
echo "2. Checking ScaledObjects..."
if [ -n "$SCALED_OBJECT" ]; then
kubectl describe scaledobject $SCALED_OBJECT -n $NAMESPACE
else
kubectl get scaledobjects -n $NAMESPACE
fi
echo ""
# Check TriggerAuthentications
echo "3. Checking TriggerAuthentications..."
kubectl get triggerauthentications -n $NAMESPACE
echo ""
# Check KEDA operator logs
echo "4. Checking KEDA Operator Logs..."
kubectl logs -n keda deployment/keda-operator --tail=50
echo ""
# Check metrics server logs
echo "5. Checking Metrics Server Logs..."
kubectl logs -n keda deployment/keda-metrics-apiserver --tail=50
echo ""
# Check HPA
echo "6. Checking HPA..."
kubectl get hpa -n $NAMESPACE
echo ""
# Check external metrics
echo "7. Checking External Metrics..."
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/$NAMESPACE/keda-scaler-* 2>/dev/null || echo "No external metrics found"
echo ""
# Check events
echo "8. Checking Events..."
kubectl get events -n $NAMESPACE --sort-by=.metadata.creationTimestamp | tail -20
echo ""
# Check resource usage
echo "9. Checking Resource Usage..."
kubectl top pods -n keda
echo ""
# Check network connectivity
echo "10. Checking Network Connectivity..."
kubectl run debug-pod --image=busybox --rm -it --restart=Never -- nslookup keda-operator.keda.svc.cluster.local
echo ""
echo "=== Debug Complete ==="
2. Common Issues and Solutions
Issue 1: ScaledObject Not Scaling
# Check ScaledObject status
kubectl describe scaledobject <scaled-object-name> -n <namespace>
# Check authentication
kubectl describe triggerauthentication <auth-name> -n <namespace>
# Check external metrics
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/<namespace>/keda-scaler-<scaled-object-name>
# Check KEDA operator logs
kubectl logs -n keda deployment/keda-operator | grep <scaled-object-name>
Issue 2: Authentication Failures
# Test AWS credentials
kubectl run test-pod --image=amazon/aws-cli --rm -it --restart=Never -- \
aws sts get-caller-identity
# Check IAM role binding
kubectl describe serviceaccount keda-operator -n keda
# Verify OIDC provider
aws iam get-open-id-connect-provider --open-id-connect-provider-arn arn:aws:iam::ACCOUNT:oidc-provider/oidc.eks.REGION.amazonaws.com/id/CLUSTER_ID
Issue 3: Performance Issues
# Check KEDA metrics
kubectl top pods -n keda
# Monitor scaling events
kubectl get events --sort-by=.metadata.creationTimestamp
# Check resource usage
kubectl describe nodes
# Check for resource constraints
kubectl describe pods -n keda
3. Advanced Debugging Tools
# keda-debug-tools.yaml
apiVersion: v1
kind: Pod
metadata:
name: keda-debug-tools
namespace: keda
spec:
containers:
- name: debug-tools
image: bitnami/kubectl:latest
command: ["sleep", "3600"]
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
env:
- name: KUBECONFIG
value: "/var/run/secrets/kubernetes.io/serviceaccount"
serviceAccountName: keda-operator
restartPolicy: Never
Production Deployment
1. Production-Ready KEDA Configuration
# production-keda-values.yaml
operator:
replicaCount: 3
image:
repository: ghcr.io/kedacore/keda
tag: "2.12.0"
pullPolicy: IfNotPresent
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 200m
memory: 200Mi
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
metricsApiServer:
replicaCount: 3
image:
repository: ghcr.io/kedacore/keda-metrics-apiserver
tag: "2.12.0"
pullPolicy: IfNotPresent
resources:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 200m
memory: 200Mi
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
webhooks:
replicaCount: 3
image:
repository: ghcr.io/kedacore/keda-admission-webhooks
tag: "2.12.0"
pullPolicy: IfNotPresent
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 100Mi
# High Availability Configuration
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- keda-operator
topologyKey: kubernetes.io/hostname
# Pod Disruption Budget
podDisruptionBudget:
enabled: true
minAvailable: 2
# Horizontal Pod Autoscaler
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
# Monitoring
prometheus:
metricServer:
enabled: true
port: 8080
path: /metrics
operator:
enabled: true
port: 8080
path: /metrics
# Logging
logging:
operator:
level: info
format: json
metricServer:
level: info
format: json
2. Production Deployment Script
#!/bin/bash
# deploy-keda-production.sh
set -e
CLUSTER_NAME=${1:-"production-cluster"}
REGION=${2:-"us-west-2"}
NAMESPACE="keda"
echo "Deploying KEDA to production cluster: $CLUSTER_NAME"
# Update kubeconfig
aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME
# Create namespace
kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -
# Install KEDA with production configuration
helm upgrade --install keda kedacore/keda \
--namespace $NAMESPACE \
--values production-keda-values.yaml \
--wait \
--timeout=15m
# Verify installation
kubectl get pods -n $NAMESPACE
kubectl get crd | grep keda
# Deploy monitoring
kubectl apply -f keda-monitoring.yaml
kubectl apply -f keda-alerts.yaml
# Deploy security policies
kubectl apply -f keda-rbac.yaml
kubectl apply -f network-security.yaml
echo "KEDA production deployment completed successfully!"
3. Disaster Recovery
# disaster-recovery.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: keda-backup-script
namespace: keda
data:
backup.sh: |
#!/bin/bash
# KEDA Disaster Recovery Backup Script
BACKUP_DIR="/backup/keda"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR/$TIMESTAMP
# Backup ScaledObjects
kubectl get scaledobjects --all-namespaces -o yaml > $BACKUP_DIR/$TIMESTAMP/scaledobjects.yaml
# Backup ScaledJobs
kubectl get scaledjobs --all-namespaces -o yaml > $BACKUP_DIR/$TIMESTAMP/scaledjobs.yaml
# Backup TriggerAuthentications
kubectl get triggerauthentications --all-namespaces -o yaml > $BACKUP_DIR/$TIMESTAMP/triggerauthentications.yaml
# Backup ClusterTriggerAuthentications
kubectl get clustertriggerauthentications -o yaml > $BACKUP_DIR/$TIMESTAMP/clustertriggerauthentications.yaml
# Backup KEDA configuration
kubectl get configmap -n keda -o yaml > $BACKUP_DIR/$TIMESTAMP/keda-config.yaml
# Backup secrets
kubectl get secrets -n keda -o yaml > $BACKUP_DIR/$TIMESTAMP/keda-secrets.yaml
echo "Backup completed: $BACKUP_DIR/$TIMESTAMP"
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: keda-backup
namespace: keda
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: bitnami/kubectl:latest
command: ["/bin/bash", "/scripts/backup.sh"]
volumeMounts:
- name: backup-script
mountPath: /scripts
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-script
configMap:
name: keda-backup-script
defaultMode: 0755
- name: backup-storage
persistentVolumeClaim:
claimName: keda-backup-pvc
restartPolicy: OnFailure
Cost Optimization Strategies
1. Resource Right-Sizing
# cost-optimized-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cost-optimized-scaler
namespace: default
spec:
scaleTargetRef:
name: cost-optimized-app
minReplicaCount: 0 # Scale to zero when idle
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
idleReplicaCount: 0
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/cost-optimized-queue
queueLength: "5"
awsRegion: us-west-2
identityOwner: operator
# Cost optimization settings
scaleOnInFlight: "false"
activationQueueLength: "1"
authenticationRef:
name: keda-aws-credentials
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cost-optimized-app
namespace: default
spec:
replicas: 0
selector:
matchLabels:
app: cost-optimized-app
template:
metadata:
labels:
app: cost-optimized-app
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "64Mi" # Minimal requests
cpu: "50m"
limits:
memory: "128Mi" # Reasonable limits
cpu: "100m"
# Cost optimization
env:
- name: JAVA_OPTS
value: "-Xms64m -Xmx128m -XX:+UseG1GC"
- name: NODE_OPTIONS
value: "--max-old-space-size=128"
2. Spot Instance Integration
# spot-instance-scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: spot-instance-scaler
namespace: default
spec:
scaleTargetRef:
name: spot-instance-app
minReplicaCount: 0
maxReplicaCount: 50
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-west-2.amazonaws.com/123456789012/spot-queue
queueLength: "10"
awsRegion: us-west-2
identityOwner: operator
authenticationRef:
name: keda-aws-credentials
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: spot-instance-app
namespace: default
spec:
replicas: 0
selector:
matchLabels:
app: spot-instance-app
template:
metadata:
labels:
app: spot-instance-app
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
spec:
nodeSelector:
node.kubernetes.io/instance-type: "spot"
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
# Spot instance optimization
env:
- name: SPOT_INSTANCE
value: "true"
- name: GRACEFUL_SHUTDOWN
value: "true"
3. Cost Monitoring Dashboard
{
"dashboard": {
"title": "KEDA Cost Optimization Dashboard",
"panels": [
{
"title": "Pod Cost by Namespace",
"type": "graph",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total[5m]) * on(pod) group_left() kube_pod_info) by (namespace)",
"legendFormat": ""
}
]
},
{
"title": "Scaling Efficiency",
"type": "graph",
"targets": [
{
"expr": "keda_scaled_object_replicas / keda_scaled_object_ready",
"legendFormat": ""
}
]
},
{
"title": "Idle Time Percentage",
"type": "stat",
"targets": [
{
"expr": "avg(rate(keda_scaled_object_replicas[1h]) == 0) * 100",
"legendFormat": "Idle %"
}
]
}
]
}
}
Conclusion
This comprehensive guide has covered advanced KEDA implementation on Amazon EKS with:
Key Highlights:
- 25+ Scaling Use Cases: From basic SQS to advanced multi-cloud scenarios
- Enterprise Patterns: Circuit breakers, blue-green deployments, CQRS, and more
- Production-Ready Configurations: High availability, security, and monitoring
- Performance Optimization: Resource tuning, network optimization, and caching
- Security & Compliance: RBAC, network policies, and audit configurations
- Cost Optimization: Spot instances, right-sizing, and efficiency monitoring
- Comprehensive Monitoring: Grafana dashboards, alerting, and debugging tools
Next Steps:
- Start Simple: Begin with basic SQS or Redis scaling
- Add Monitoring: Implement comprehensive observability
- Scale Gradually: Add more complex patterns as needed
- Optimize Costs: Implement cost optimization strategies
- Enterprise Features: Add security and compliance controls
Resources:
This guide provides enterprise-grade patterns for implementing KEDA on EKS. Always test in non-production environments first and adapt examples to your specific requirements.