Real-Time Monitoring

Pulsimo implements true real-time monitoring with sub-10-second alert latency, compared to traditional scraping-based systems like Prometheus which typically have 1-5 minute delays.

Key Innovation: Active Health Checks

Instead of waiting for metrics to be scraped, Pulsimo actively checks service health every 10 seconds and immediately publishes status changes via WebSocket and Redis PubSub.

Pulsimo vs Prometheus/Grafana

AspectPulsimoPrometheus + Grafana
ArchitecturePush-based (Active checks)Pull-based (Scraping)
Check Interval10 seconds (configurable)15-60 seconds (typical)
Alert Latency1-10 seconds1-5 minutes
Real-Time UIWebSocket (built-in)Polling (Grafana refresh)
Multi-TenancyNative (organizations)Complex setup required
Setup ComplexitySingle Docker ComposeMultiple components + config

Alert Latency Comparison

Prometheus Scraping Model 🐌

Timeline (Prometheus):
─────────────────────────────────────────────────────────
0s        15s       30s       45s       60s       75s
│         │         │         │         │         │
Scrape    Scrape    Scrape    Scrape    Scrape    Scrape
          ↑ Service down                ↑ Detected (30-45s later)

Prometheus Workflow:

  1. Service goes down at time T
  2. Prometheus scrapes at T+15s (next interval)
  3. Scrape fails → Metric shows service down
  4. Evaluation interval → Alert rule checked (another 15-30s)
  5. Alert fires → Alertmanager receives (another 1-5s)
  6. Notification sent → Email/Slack (another 5-10s)

ā±ļø Total Latency: 30-60 seconds minimum, often 1-5 minutes

Pulsimo Active Check Model ⚔

Timeline (Pulsimo):
─────────────────────────────────────────────────────────
0s        10s       20s       30s       40s       50s
│         │         │         │         │         │
Check     Check     Check     Check     Check     Check
          ↑ Service down
          ↓ Incident created (instant)
          ↓ WebSocket broadcast (instant)
          ↓ Email sent (1-2s)

Pulsimo Workflow:

  1. Service goes down at time T
  2. Next health check at T+10s (or less)
  3. Check fails → Immediately detected
  4. Incident created → Database write (50-100ms)
  5. Redis PubSub → Event published (5-10ms)
  6. Notification Service → Email sent (1-2s)
  7. WebSocket → Frontend updated (10-50ms)

⚔ Total Latency: 1-10 seconds

WebSocket Real-Time Updates

Instant Dashboard Updates

Status changes appear immediately on all connected dashboards without page refresh

Live Incident Notifications

New incidents appear in real-time with visual and sound notifications

Real-Time Metrics

Response times and health check results stream continuously

Multi-User Sync

All team members see the same state simultaneously

Benefits of Real-Time Monitoring

⚔ Faster Response: Detect and respond to incidents 3-30x faster than traditional monitoring systems. Minutes matter during outages.

šŸ‘„ Better Collaboration: Entire team sees incidents simultaneously. No confusion about who's working on what.

šŸ“Š Accurate MTTR: Sub-second precision in detecting failures leads to accurate Mean Time To Repair metrics.