System Runbook
System Runbook is a clear and structured set of instructions that describes the customer’s cloud environment. It explains how the system behaves in different situations and outlines the investigation process.
The runbook includes the resolution steps taken, the final outcome, and any preventive measures introduced to avoid similar incidents in the future.
When to Use Them
System runbooks are best used for recurring, well-understood investigations where teams want consistent, repeatable analysis.
Use runbooks when:
- The same questions come up repeatedly (e.g., “Why is the service crashing?”, “Why did latency spike?”).
- Investigations follow a recommended sequence of checks across logs, metrics, traces, and incidents.
Runbooks turn informal “how we usually debug this” knowledge into executable system behavior.
When Hawkeye Uses Them
Hawkeye uses system runbooks automatically whenever a user question, alert, or investigation matches a runbook’s trigger condition.
This can happen:
- During an active incident investigation
- In response to a user asking a natural-language question
- As part of a larger autonomous investigation pipeline
- When validating or expanding on an existing RCA
Once triggered, Hawkeye executes the runbook step-by-step, querying the appropriate telemetry sources at each stage and building a structured reasoning trail. The user does not need to manually invoke or manage the runbook—Hawkeye selects and runs it when it is relevant.
For example, using In response to a user asking a natural-language question to illustrate when Hawkeye uses System Runbook in the demo below.
Fig.1 - A walkthrough of when Hawkeye uses System Runbooks
How System Runbooks Work
System runbooks are reusable, step-by-step investigation procedures that automate troubleshooting workflows.
When you create a runbook, Hawkeye analyzes your description and decomposes it into structured investigation steps (reasoning graph nodes). Each step includes:
- A focused investigative question
- The relevant data sources to query (logs, metrics, traces, incidents, change history, etc.)
- Dependencies on previous steps
When a question or alert matches a runbook’s trigger condition, Hawkeye automatically executes the runbook’s steps in sequence, gathering evidence and reasoning through the investigation end-to-end.
This allows teams to standardize common investigations—such as “investigate application errors” or “analyze performance degradation”—so they run automatically and consistently.
flowchart TD A[Create Runbook] --> B[Hawkeye analyzes description] B --> C[Generate investigation steps] C --> D1[Step 1] C --> D2[Step 2] C --> D3[Step N] D1 --> E1[Ask investigative question] D1 --> F1[Query logs metrics traces incidents changes] D1 --> G1[Check dependencies] D2 --> E2[Ask investigative question] D2 --> F2[Query data sources] D2 --> G2[Check dependencies] D3 --> E3[Ask investigative question] D3 --> F3[Query data sources] D3 --> G3[Check dependencies] H[Alert matches trigger] --> I[Execute runbook] I --> J[Run steps in sequence] J --> K[Gather evidence] K --> L[Complete automated investigation]
Fig.2 - A walkthrough of how System Runbook works
What Happens If a Runbook Does Not Resolve the Issue
Hawkeye continuously evaluates whether a runbook’s execution successfully explains or resolves the problem. If the runbook does not produce a conclusive answer, Hawkeye automatically transitions to its standard investigation workflow.
Importantly, the system does not discard the work already done. All evidence, context, and intermediate findings generated during the runbook execution are retained and used as input into the broader investigation, allowing Hawkeye to continue reasoning with a richer, pre-populated context rather than starting from scratch.