High Availability Setup
High Availability Setup
Understanding High Availability
Core Concepts
What is High Availability?
- Continuous system operation
- Minimal downtime
- Fault tolerance
- Automatic failover
Key Components
- Active-Passive Setup
- Primary Jenkins master
- Standby Jenkins master
- Shared storage system
- Load balancer configuration
- Data Synchronization
- File system replication
- Database synchronization
- Configuration management
- Plugin state handling
Architecture Patterns
Active-Passive Configuration
Components Setup
graph TD
LB[Load Balancer] --> M1[Master-1 Active]
LB --> M2[Master-2 Standby]
M1 --> S[Shared Storage]
M2 --> S
M1 --> A1[Agent Pool]
M2 --> A1
Implementation Steps
- Primary Master Setup:
# Configure primary Jenkins jenkins.model.JenkinsLocationConfiguration.xml: <jenkins.model.JenkinsLocationConfiguration> <adminAddress>admin@example.com</adminAddress> <jenkinsUrl>https://jenkins.example.com</jenkinsUrl> </jenkins.model.JenkinsLocationConfiguration>
- Standby Master Configuration:
# Mirror primary configuration rsync -avz /var/jenkins_home/ standby:/var/jenkins_home/
Shared Storage Configuration
NFS Setup
- Server Configuration:
# /etc/exports /jenkins_home *(rw,sync,no_root_squash) # Apply changes exportfs -a
- Client Mount:
# /etc/fstab nfs_server:/jenkins_home /var/jenkins_home nfs defaults 0 0
Failover Configuration
Automatic Failover
Health Checks
- Basic Health Check:
#!/bin/bash # check_jenkins_health.sh curl -f http://localhost:8080/login >/dev/null 2>&1 if [ $? -eq 0 ]; then echo "Jenkins is healthy" exit 0 else echo "Jenkins is not responding" exit 1 fi
- Advanced Monitoring:
```yaml
Prometheus configuration
scrape_configs:
- job_name: ‘jenkins’
metrics_path: ‘/prometheus’
static_configs:
- targets: [‘jenkins:8080’] ```
- job_name: ‘jenkins’
metrics_path: ‘/prometheus’
static_configs:
Load Balancer Setup
HAProxy Configuration
global
log /dev/log local0
maxconn 4096
defaults
log global
mode http
timeout connect 5000
timeout client 50000
timeout server 50000
frontend jenkins
bind *:80
default_backend jenkins_masters
backend jenkins_masters
balance roundrobin
option httpchk GET /login
server master1 jenkins-master-1:8080 check
server master2 jenkins-master-2:8080 check backup
Data Synchronization
File System Replication
Continuous Sync
- Using Lsyncd:
# /etc/lsyncd.conf settings { logfile = "/var/log/lsyncd.log", statusFile = "/var/log/lsyncd-status.log" } sync { default.rsync, source = "/var/jenkins_home", target = "standby:/var/jenkins_home", delay = 1, rsync = { archive = true, compress = true } }
- Monitoring Sync Status:
# Check sync status tail -f /var/log/lsyncd.log
Backup and Recovery
Backup Strategy
Automated Backups
- Jenkins Home Backup:
#!/bin/bash # jenkins_backup.sh BACKUP_DIR="/backup/jenkins" DATE=$(date +%Y%m%d) # Stop Jenkins systemctl stop jenkins # Create backup tar czf $BACKUP_DIR/jenkins_$DATE.tar.gz /var/jenkins_home # Start Jenkins systemctl start jenkins # Cleanup old backups find $BACKUP_DIR -type f -mtime +7 -delete
- Database Backup (if applicable):
-- Backup Jenkins database pg_dump jenkins_db > jenkins_db_backup.sql
Recovery Procedures
Disaster Recovery
- Quick Recovery Steps:
# Restore from backup tar xzf jenkins_backup.tar.gz -C / chown -R jenkins:jenkins /var/jenkins_home systemctl restart jenkins
- Verification Process:
- Check system health
- Verify job configurations
- Test build execution
- Validate plugins
Monitoring and Maintenance
Health Monitoring
Metrics Collection
- JMX Monitoring:
# Enable JMX export JENKINS_JAVA_OPTIONS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8008 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"
- Grafana Dashboard Setup:
```yaml
Grafana dashboard configuration
datasources:
- name: Jenkins type: prometheus access: proxy url: http://prometheus:9090 ```
Hands-on Exercise
Setting Up HA Environment
- Primary Master Setup:
# Install Jenkins wget -q -O - https://pkg.jenkins.io/debian/jenkins.io.key | sudo apt-key add - sudo sh -c 'echo deb http://pkg.jenkins.io/debian-stable binary/ > /etc/apt/sources.list.d/jenkins.list' sudo apt update sudo apt install jenkins # Configure shared storage mount -t nfs nfs_server:/jenkins_home /var/jenkins_home
- Standby Master Configuration:
# Clone primary configuration rsync -avz primary:/var/jenkins_home/ /var/jenkins_home/ # Configure as standby echo "jenkins.model.Jenkins.instance.setMode(jenkins.model.Hudson.Mode.EXCLUSIVE)" >> /var/jenkins_home/init.groovy.d/standby.groovy
- Load Balancer Setup:
# Install HAProxy sudo apt install haproxy # Apply configuration sudo cp haproxy.cfg /etc/haproxy/ sudo systemctl restart haproxy
Testing Failover
- Simulate Primary Failure:
# Stop primary Jenkins sudo systemctl stop jenkins
- Verify Failover:
- Monitor HAProxy stats
- Check standby activation
- Verify job execution