Incident Management

Incidents are the cornerstone of operational response when services fail. Pulsimo provides a comprehensive incident management system that tracks the entire lifecycle from detection through resolution.

Overview

When a monitored endpoint fails health checks beyond its configured threshold, Pulsimo automatically creates an incident. This incident becomes the central hub for all information, actions, and collaboration related to that outage. From automatic creation to post-mortem generation, the incident management system guides your team through effective incident response.

Automatic Detection

Incidents created instantly when thresholds exceeded - no manual intervention required

Alert Management

Stop notification spam by acknowledging incidents - prevent alert fatigue

Collaboration

Multiple team members can work on same incident with investigation notes

Post-Mortems

Automatic report generation with timeline, metrics, and complete audit trail

Incident Lifecycle

Every incident progresses through four defined states:

1

OPEN

Red Badge

Triggered When:

  • • Endpoint fails consecutive health checks (threshold exceeded)
  • • Automatic incident creation, no manual intervention

System Actions:

  • ✓ Creates incident record
  • ✓ Sends alerts to notification channels
  • ✓ Displays in Incidents page
  • ✓ Records start time
2

ACKNOWLEDGED

Yellow Badge

Triggered When:

Team member clicks "Acknowledge" button (Member or Admin role)

System Actions:

  • ✓ Stops sending repeat alerts
  • ✓ Records who acknowledged and when
  • ✓ Notifies team of acknowledgement

Best Practice: Acknowledge immediately when starting work and add a note explaining what you're doing

3

INVESTIGATING

Blue Badge

What This Means:

  • • Active troubleshooting underway
  • • Root cause analysis in progress
  • • Fix being implemented

User Actions Available:

  • ✓ Add detailed investigation notes
  • ✓ Document findings and troubleshooting steps
  • ✓ Attach screenshots or logs
  • ✓ Update progress
4

RESOLVED

Green Badge

System Actions:

  • ✓ Marks incident as resolved
  • ✓ Calculates total downtime
  • ✓ Calculates time-to-resolve (TTR)
  • ✓ Updates MTTR metrics
  • ✓ Sends recovery notification

User Actions Available:

  • ✓ Add resolution notes (what fixed it)
  • ✓ Generate post-mortem report
  • ✓ Export incident data (JSON)
  • ✓ View complete timeline

How to Acknowledge an Incident

When you receive an alert notification:

  1. Navigate to the Incidents page in the sidebar
  2. Click on the open incident
  3. Click the "Acknowledge" button
  4. Add optional notes about your investigation plan
  5. This stops repeat notifications and prevents alert fatigue

Key Features

MTTR Tracking

Mean Time To Resolution calculated automatically for every incident

Complete Audit Trail

Every action timestamped and attributed to specific users

Incident Analytics

Track incident frequency, patterns, and affected services

Post-Mortem Generation

Automated reports with timeline, metrics, and resolution details

Incident Metrics

Pulsimo automatically calculates key metrics for every incident:

MetricDescriptionExample
MTTRMean Time To Repair - Average resolution time23.5 minutes
MTTDMean Time To Detect - Time until incident created2.3 minutes
Total DowntimeComplete duration service was unavailable1,410 seconds (23.5 min)
Affected ChecksNumber of failed health check attempts47 failed checks
Incident Count (24h)Total incidents in last 24 hours3 incidents

Best Practices

📝 Document Everything: Add investigation notes as you troubleshoot. Future you (and your team) will thank you when writing the post-mortem.

⚡ Acknowledge Quickly: Acknowledge incidents immediately when you start working on them to stop alert spam and signal to others that it's being handled.

🔍 Review Patterns: Regularly review incident history to identify recurring issues and proactively address root causes.