ResQ

ChatOps workflows to triage, resolve, and learn from incidents in Slack
Rating
Your vote:
Screenshots
1 / 3
Notify me upon availability

Open Slack, type /ResQ new, and your response playbook is in motion. ResQ spins up a focused channel, pulls context from monitoring and tickets, and proposes a severity and owner based on similar issues and response targets. On-call teammates are paged automatically while stakeholders receive rich updates that include graphs, logs, and links to affected services. A live dashboard mirrors the channel, so leads can watch status without interrupting responders. From the first minute, you have a queue of next steps, a visible timer on impact, and one place where everyone can see who is doing what.

Triaging feels like managing a project in chat. Use /assign to hand off roles, /task to create work items that sync with your tracker, and /priority to adjust impact as new facts arrive. If the situation escalates, ResQ routes the incident to the right group using patterns learned from past events and response times. Every message, status change, and decision is captured in a timeline you can filter later by service, person, or tag. External partners and execs can subscribe to enriched notifications without joining the channel, reducing noise while keeping them informed. When the dust settles, closing the incident automatically resolves linked tickets and posts a clean summary to your chosen rooms.

Finding the real cause is guided, not guessed. During and after the event, ResQ suggests likely fault domains and remediation steps by correlating alerts, deploys, and infrastructure changes. One click collects key evidence—graphs, queries, runbook steps—into a draft analysis. The post-incident report compiles the timeline, impact, costs, and contributing factors, then prompts you to capture actions that prevent repeats. Corrective and preventive items get owners, due dates, and measurable outcomes, and they stay visible until verified. Trends roll up into metrics you can review in weekly ops meetings: mean time to acknowledge, time to restore, recurrence rate, and coverage of follow-ups.

Preparation matters as much as response. Use ResQ to rehearse disaster scenarios, validate recovery objectives, and track system readiness. Safety checks quantify risk on critical changes, and incident forms make it easy for any team to report unusual events, not just outages. Audit trails show exactly who changed what and when, satisfying compliance reviews without extra spreadsheets. Policy rules let you customize escalation, data retention, and notification behavior by product line or geography. Plug it into your observability stack and help desk, and you have a single, dependable workflow from alert to learning.

Review Summary

Features

  • Auto-Assign (Incidents): Routes issues to the right team using history and response targets.
  • Real-time Dashboard: Live view of impact, status, owners, and metrics during the event.
  • Root-cause Diagnosis: Correlates alerts, changes, and logs to suggest likely fault domains.
  • Incident Prioritization: Sets and updates severity based on scope and customer impact.
  • Enriched Notifications: Delivers context-rich updates to stakeholders across channels.
  • Disaster Recovery: Guides recovery steps and tracks RTO/RPO during disruption.
  • CAPA: Creates corrective and preventive actions with owners and deadlines.
  • Incident Reporting: Captures structured details for any unusual event or safety issue.
  • Ticket Management: Links, updates, and resolves tickets across support systems.
  • Task Management: Creates, assigns, and tracks tasks from chat to completion.
  • Safety Management: Scores risk on changes and ensures mitigations are in place.
  • Audit Trail: Chronological record of actions, edits, and decisions.

How It’s Used

  • On-call triage in Slack: Create an incident with /resq new, accept ownership, auto-page on-call, and prioritize with /priority as new data arrives.
  • Major incident bridge: Spin up a war room, auto-assign roles, broadcast enriched updates to execs, and escalate to platform and network teams as needed.
  • Root cause and post-incident review: Collect evidence into a draft, run guided analysis, publish a postmortem, and convert findings into CAPA tasks with due dates.
  • Customer escalation handling: Link support tickets, post customer-impact snapshots, and push timely status messages to CSM channels without flooding responders.
  • Disaster recovery drill: Run a scheduled exercise, follow DR runbooks, measure RTO/RPO, and capture gaps as corrective actions.
  • Change risk and safety gate: Before rollout, run safety checks, assign mitigations, and monitor canary alerts with auto-escalation if risk rises.
  • Compliance audit prep: Export timelines and audit logs, map actions to policies, and show verification of completed CAPA items.
  • Release rollback coordination: Correlate alerts with deploy, initiate rollback tasks, notify stakeholders, and verify recovery on the dashboard.

Plans & Pricing

Resq

Custom

Integration with Communication Tools
Insights and Metrics
Incident Response Workflows
Postmortem and Retrospective Analysis

Comments

User

Your vote: