mWatcher

Server health monitoring for CPU, memory, disk with alerting

Updated Jan 15, 2024
6 min read
Beginner
tools

mWatcher

mWatcher is a comprehensive server health monitoring tool that provides real-time monitoring of CPU, memory, disk usage, and other critical system metrics. With intelligent alerting and beautiful dashboards, mWatcher helps maintain optimal server performance and prevent downtime.

Overview

mWatcher provides comprehensive server monitoring with:

  • Real-time Monitoring: Live tracking of system resources
  • Intelligent Alerting: Smart notifications based on thresholds
  • Beautiful Dashboards: Modern web interface for monitoring
  • Historical Data: Long-term trend analysis and reporting
  • Multi-Server Support: Monitor multiple servers from one dashboard

Features

📊 System Metrics

  • CPU Usage: Per-core and overall CPU utilization
  • Memory Monitoring: RAM usage, swap, and memory pressure
  • Disk I/O: Read/write operations and disk space
  • Network Traffic: Inbound/outbound network monitoring
  • Process Monitoring: Top processes by resource usage

🚨 Smart Alerting

  • Threshold-based Alerts: Customizable warning and critical levels
  • Trend Analysis: Predictive alerts based on usage patterns
  • Multiple Channels: Email, Slack, webhook notifications
  • Alert Suppression: Prevent alert spam with intelligent grouping

📈 Dashboards

  • Real-time Charts: Live updating system metrics
  • Historical Views: Long-term trend analysis
  • Custom Dashboards: Create personalized monitoring views
  • Mobile Responsive: Monitor from any device

Installation

Quick Install

# Install via pip
pip install mwatcher

# Or via Docker
docker run -d \
  --name mwatcher \
  -p 8080:8080 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  harrythedevopsguy/mwatcher:latest

Manual Installation

# Clone repository
git clone https://github.com/HarryTheDevOpsGuy/mWatcher.git
cd mWatcher

# Install dependencies
pip install -r requirements.txt

# Install mWatcher
pip install -e .

# Initialize configuration
mwatcher init --config /etc/mwatcher/mwatcher.yml

Systemd Service

# Create systemd service file
sudo tee /etc/systemd/system/mwatcher.service > /dev/null <<EOF
[Unit]
Description=mWatcher Server Monitor
After=network.target

[Service]
Type=simple
User=mtracker
Group=mtracker
ExecStart=/usr/local/bin/mwatcher daemon --config /etc/mwatcher/mwatcher.yml
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

# Enable and start service
sudo systemctl enable mwatcher
sudo systemctl start mwatcher

Configuration

Basic Configuration

Create /etc/mwatcher/mwatcher.yml:

# Global settings
global:
  log_level: "INFO"
  log_file: "/var/log/mwatcher/mwatcher.log"
  data_dir: "/var/lib/mwatcher"
  web_port: 8080
  web_host: "0.0.0.0"
  
# Monitoring intervals
monitoring:
  system_check_interval: 10  # seconds
  process_check_interval: 30  # seconds
  disk_check_interval: 60  # seconds
  network_check_interval: 5  # seconds

# Alert thresholds
thresholds:
  cpu:
    warning: 70
    critical: 90
    
  memory:
    warning: 80
    critical: 95
    
  disk:
    warning: 85
    critical: 95
    
  load_average:
    warning: 2.0
    critical: 4.0

# Monitored processes
processes:
  - name: "nginx"
    pattern: "nginx.*master"
    alert_on_crash: true
    
  - name: "mysql"
    pattern: "mysqld"
    alert_on_crash: true
    
  - name: "redis"
    pattern: "redis-server"
    alert_on_crash: true

# Disk monitoring
disks:
  monitor_all: true
  exclude_paths:
    - "/proc"
    - "/sys"
    - "/dev"
    - "/run"
  
  include_paths:
    - "/"
    - "/var"
    - "/home"
    - "/opt"

# Network monitoring
network:
  interfaces:
    - "eth0"
    - "wlan0"
  
  monitor_connections: true
  alert_on_port_scan: true

# Notifications
notifications:
  email:
    enabled: true
    smtp_server: "smtp.gmail.com"
    smtp_port: 587
    username: "${EMAIL_USERNAME}"
    password: "${EMAIL_PASSWORD}"
    from_address: "mwatcher@company.com"
    to_addresses:
      - "admin@company.com"
      - "devops@company.com"
      
  slack:
    enabled: true
    webhook_url: "${SLACK_WEBHOOK_URL}"
    channel: "#server-alerts"
    username: "mWatcher"
    icon_emoji: ":computer:"
    
  webhook:
    enabled: true
    url: "https://your-webhook.com/server-alerts"
    headers:
      Authorization: "Bearer ${WEBHOOK_TOKEN}"

# Dashboard settings
dashboard:
  theme: "dark"  # light, dark, auto
  refresh_interval: 5  # seconds
  max_data_points: 1000
  chart_colors:
    - "#3b82f6"  # Blue
    - "#ef4444"  # Red
    - "#10b981"  # Green
    - "#f59e0b"  # Yellow

Environment Variables

# Email Configuration
export EMAIL_USERNAME="mwatcher@company.com"
export EMAIL_PASSWORD="your-app-password"

# Slack Configuration
export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/..."

# Webhook Configuration
export WEBHOOK_TOKEN="your-webhook-token"

# Database Configuration (optional)
export DATABASE_URL="postgresql://user:password@localhost/mwatcher"

Usage

Command Line Interface

# Start monitoring daemon
mwatcher daemon --config /etc/mwatcher/mwatcher.yml

# Start web dashboard
mwatcher web --port 8080 --host 0.0.0.0

# Check current status
mwatcher status

# Generate system report
mwatcher report --format html --output system-report.html

# Test configuration
mwatcher test-config

# View logs
mwatcher logs --tail 100 --follow

Python API

from mwatcher import SystemMonitor, AlertManager

# Initialize monitor
monitor = SystemMonitor(config_file='/etc/mwatcher/mwatcher.yml')

# Get current system status
status = monitor.get_system_status()
print(f"CPU Usage: {status['cpu']['usage']}%")
print(f"Memory Usage: {status['memory']['usage']}%")
print(f"Disk Usage: {status['disk']['usage']}%")

# Get process information
processes = monitor.get_top_processes(limit=10)
for proc in processes:
    print(f"{proc['name']}: {proc['cpu']}% CPU, {proc['memory']}% Memory")

# Check alerts
alerts = monitor.get_active_alerts()
for alert in alerts:
    print(f"Alert: {alert['message']} - {alert['severity']}")

# Send custom alert
alert_manager = AlertManager(config_file='/etc/mwatcher/mwatcher.yml')
alert_manager.send_alert(
    level='warning',
    message='Custom alert message',
    metric='cpu',
    value=85.5
)

Monitoring Features

System Metrics Collection

import psutil
import time
from datetime import datetime

class SystemMetrics:
    def __init__(self):
        self.metrics = {}
    
    def collect_cpu_metrics(self):
        """Collect CPU usage metrics"""
        cpu_percent = psutil.cpu_percent(interval=1)
        cpu_count = psutil.cpu_count()
        load_avg = psutil.getloadavg()
        
        return {
            'usage_percent': cpu_percent,
            'core_count': cpu_count,
            'load_1min': load_avg[0],
            'load_5min': load_avg[1],
            'load_15min': load_avg[2],
            'timestamp': datetime.now().isoformat()
        }
    
    def collect_memory_metrics(self):
        """Collect memory usage metrics"""
        memory = psutil.virtual_memory()
        swap = psutil.swap_memory()
        
        return {
            'total': memory.total,
            'available': memory.available,
            'used': memory.used,
            'free': memory.free,
            'usage_percent': memory.percent,
            'swap_total': swap.total,
            'swap_used': swap.used,
            'swap_free': swap.free,
            'swap_percent': swap.percent,
            'timestamp': datetime.now().isoformat()
        }
    
    def collect_disk_metrics(self):
        """Collect disk usage metrics"""
        disk_usage = psutil.disk_usage('/')
        disk_io = psutil.disk_io_counters()
        
        return {
            'total': disk_usage.total,
            'used': disk_usage.used,
            'free': disk_usage.free,
            'usage_percent': (disk_usage.used / disk_usage.total) * 100,
            'read_bytes': disk_io.read_bytes,
            'write_bytes': disk_io.write_bytes,
            'read_count': disk_io.read_count,
            'write_count': disk_io.write_count,
            'timestamp': datetime.now().isoformat()
        }

Alert System

class AlertManager:
    def __init__(self, config):
        self.config = config
        self.alert_history = []
        self.suppressed_alerts = set()
    
    def check_thresholds(self, metrics):
        """Check metrics against configured thresholds"""
        alerts = []
        
        # CPU alerts
        if metrics['cpu']['usage_percent'] > self.config['thresholds']['cpu']['critical']:
            alerts.append({
                'type': 'cpu',
                'level': 'critical',
                'message': f"CPU usage critical: {metrics['cpu']['usage_percent']:.1f}%",
                'value': metrics['cpu']['usage_percent'],
                'threshold': self.config['thresholds']['cpu']['critical']
            })
        elif metrics['cpu']['usage_percent'] > self.config['thresholds']['cpu']['warning']:
            alerts.append({
                'type': 'cpu',
                'level': 'warning',
                'message': f"CPU usage high: {metrics['cpu']['usage_percent']:.1f}%",
                'value': metrics['cpu']['usage_percent'],
                'threshold': self.config['thresholds']['cpu']['warning']
            })
        
        # Memory alerts
        if metrics['memory']['usage_percent'] > self.config['thresholds']['memory']['critical']:
            alerts.append({
                'type': 'memory',
                'level': 'critical',
                'message': f"Memory usage critical: {metrics['memory']['usage_percent']:.1f}%",
                'value': metrics['memory']['usage_percent'],
                'threshold': self.config['thresholds']['memory']['critical']
            })
        
        return alerts
    
    def send_alert(self, alert):
        """Send alert through configured channels"""
        alert_id = f"{alert['type']}_{alert['level']}"
        
        # Check if alert is suppressed
        if alert_id in self.suppressed_alerts:
            return
        
        # Send via email
        if self.config['notifications']['email']['enabled']:
            self._send_email_alert(alert)
        
        # Send via Slack
        if self.config['notifications']['slack']['enabled']:
            self._send_slack_alert(alert)
        
        # Send via webhook
        if self.config['notifications']['webhook']['enabled']:
            self._send_webhook_alert(alert)
        
        # Add to history
        self.alert_history.append(alert)

Web Dashboard

from flask import Flask, render_template, jsonify
import json

app = Flask(__name__)

@app.route('/')
def dashboard():
    """Main dashboard page"""
    return render_template('dashboard.html')

@app.route('/api/metrics')
def api_metrics():
    """API endpoint for current metrics"""
    monitor = SystemMonitor()
    metrics = monitor.get_current_metrics()
    return jsonify(metrics)

@app.route('/api/metrics/history')
def api_metrics_history():
    """API endpoint for historical metrics"""
    hours = request.args.get('hours', 24, type=int)
    monitor = SystemMonitor()
    history = monitor.get_metrics_history(hours=hours)
    return jsonify(history)

@app.route('/api/alerts')
def api_alerts():
    """API endpoint for active alerts"""
    monitor = SystemMonitor()
    alerts = monitor.get_active_alerts()
    return jsonify(alerts)

@app.route('/api/processes')
def api_processes():
    """API endpoint for process information"""
    monitor = SystemMonitor()
    processes = monitor.get_top_processes(limit=20)
    return jsonify(processes)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080, debug=True)

Dashboard Features

Real-time Charts

// Chart.js configuration for real-time monitoring
const chartConfig = {
    type: 'line',
    data: {
        labels: [],
        datasets: [
            {
                label: 'CPU Usage %',
                data: [],
                borderColor: '#3b82f6',
                backgroundColor: 'rgba(59, 130, 246, 0.1)',
                tension: 0.4
            },
            {
                label: 'Memory Usage %',
                data: [],
                borderColor: '#ef4444',
                backgroundColor: 'rgba(239, 68, 68, 0.1)',
                tension: 0.4
            },
            {
                label: 'Disk Usage %',
                data: [],
                borderColor: '#10b981',
                backgroundColor: 'rgba(16, 185, 129, 0.1)',
                tension: 0.4
            }
        ]
    },
    options: {
        responsive: true,
        maintainAspectRatio: false,
        scales: {
            y: {
                beginAtZero: true,
                max: 100
            }
        },
        plugins: {
            legend: {
                position: 'top'
            }
        }
    }
};

// Update charts with real-time data
function updateCharts() {
    fetch('/api/metrics')
        .then(response => response.json())
        .then(data => {
            const now = new Date().toLocaleTimeString();
            
            // Add new data point
            chartConfig.data.labels.push(now);
            chartConfig.data.datasets[0].data.push(data.cpu.usage_percent);
            chartConfig.data.datasets[1].data.push(data.memory.usage_percent);
            chartConfig.data.datasets[2].data.push(data.disk.usage_percent);
            
            // Keep only last 50 data points
            if (chartConfig.data.labels.length > 50) {
                chartConfig.data.labels.shift();
                chartConfig.data.datasets.forEach(dataset => {
                    dataset.data.shift();
                });
            }
            
            chart.update();
        });
}

// Update every 5 seconds
setInterval(updateCharts, 5000);

Integration Examples

Prometheus Integration

from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Prometheus metrics
cpu_usage = Gauge('mwatcher_cpu_usage_percent', 'CPU usage percentage')
memory_usage = Gauge('mwatcher_memory_usage_percent', 'Memory usage percentage')
disk_usage = Gauge('mwatcher_disk_usage_percent', 'Disk usage percentage')
alert_count = Counter('mwatcher_alerts_total', 'Total number of alerts', ['level', 'type'])

def expose_prometheus_metrics():
    """Expose metrics to Prometheus"""
    start_http_server(9090)
    
    while True:
        metrics = collect_system_metrics()
        
        # Update Prometheus metrics
        cpu_usage.set(metrics['cpu']['usage_percent'])
        memory_usage.set(metrics['memory']['usage_percent'])
        disk_usage.set(metrics['disk']['usage_percent'])
        
        time.sleep(10)

Grafana Dashboard

{
  "dashboard": {
    "title": "mWatcher System Monitoring",
    "panels": [
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "mwatcher_cpu_usage_percent",
            "legendFormat": "CPU Usage %"
          }
        ]
      },
      {
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "mwatcher_memory_usage_percent",
            "legendFormat": "Memory Usage %"
          }
        ]
      },
      {
        "title": "Disk Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "mwatcher_disk_usage_percent",
            "legendFormat": "Disk Usage %"
          }
        ]
      }
    ]
  }
}

Troubleshooting

Common Issues

High CPU Usage

# Check mWatcher process usage
ps aux | grep mwatcher

# Check system load
uptime
top

# Reduce monitoring frequency
# Edit /etc/mwatcher/mwatcher.yml
monitoring:
  system_check_interval: 30  # Increase from 10

Memory Leaks

# Monitor mWatcher memory usage
ps -o pid,vsz,rss,comm -p $(pgrep mwatcher)

# Check for memory leaks in logs
grep -i "memory" /var/log/mwatcher/mwatcher.log

# Restart service if needed
sudo systemctl restart mwatcher

Dashboard Not Loading

# Check web server status
curl http://localhost:8080/api/metrics

# Check firewall settings
sudo ufw status
sudo ufw allow 8080

# Check logs
sudo journalctl -u mwatcher -f

API Reference

REST API Endpoints

Endpoint Method Description
/api/metrics GET Get current system metrics
/api/metrics/history GET Get historical metrics
/api/processes GET Get process information
/api/alerts GET Get active alerts
/api/health GET Health check endpoint

WebSocket Events

// Real-time updates via WebSocket
const ws = new WebSocket('ws://localhost:8080/ws');

ws.onmessage = function(event) {
    const data = JSON.parse(event.data);
    
    switch(data.type) {
        case 'metrics_update':
            updateDashboard(data.metrics);
            break;
        case 'alert':
            showAlert(data.alert);
            break;
        case 'process_update':
            updateProcessList(data.processes);
            break;
    }
};

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Fork and clone repository
git clone https://github.com/your-username/mWatcher.git
cd mWatcher

# Create development environment
python3 -m venv venv
source venv/bin/activate

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/

# Run linting
flake8 mwatcher/
black mwatcher/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support


**Pro Tip**: Set up mWatcher to run as a systemd service for continuous monitoring. Check our [deployment guide](https://github.com/HarryTheDevOpsGuy/mWatcher/wiki/Deployment) for detailed instructions.

Found this helpful?

Help us improve this documentation by sharing your feedback or suggesting improvements.