AI root cause analysis for modern operations

NOT ANOTHER SAAS. REAL DEPLOYMENT INSIDE THE CUSTOMER ENVIRONMENT.

Find the root cause before the war room spirals

MCP connector fabric

Agent to MCP Server to the API that holds the evidence.

Each service gets a scoped OpsDiag agent path: the agent talks to the matching MCP Server, the MCP Server talks to the provider API, and AI tokens carry only the context needed for RCA.

Agentic RCA simulation

The 3D scene below simulates how Agentic RCA works: an incident becomes a plan, scoped agents query MCP Servers and provider APIs asynchronously, and evidence returns into one reviewable RCA analysis.

3D Agentic RCA simulation

Loading the Agentic RCA path from incident plan to MCP evidence and review.

AI RCA workflow

Run RCA from schedules, chat, MCP evidence, and CLI workflows.

OpsDiag is built for teams that want scheduled checks, live incident questions, service evidence through MCP Servers, and terminal-native analysis through Codex CLI or Claude CLI.

Scheduled RCA

Run recurring RCA checks around deploy windows, high-risk services, provider events, or known operational weak spots.

Chat to RCA

Start from an alert, incident note, or natural-language question and keep asking follow-ups against the same evidence thread.

MCP Servers to services

Use MCP Servers as the path into logs, metrics, traces, Kubernetes, cloud APIs, deploy history, edge rules, and alerting systems.

Codex and Claude CLI analysis

Continue the RCA from terminal workflows, challenge the evidence, and refine the analysis without switching into another dashboard.

What OpsDiag actually does

Evidence workflows responders can question and continue.

Scheduled RCA runs can watch known risk windows. Chat-driven RCA can start from a fresh alert. MCP Servers connect the analysis to services and providers. Codex CLI and Claude CLI keep the investigation available from the terminal.

Scheduled

Scheduled RCA runs

Recurring RCA checks can inspect service risk, recent changes, and provider signals before responders are deep in escalation.

Chat

Chat-driven RCA questions

Responders can ask what changed, what is affected, and what still needs validation from the same RCA conversation.

MCP

MCP Server evidence paths

MCP Servers connect scoped agents to services, clusters, observability tools, cloud providers, edge controls, and incident systems.

CLI

Codex CLI and Claude CLI

Terminal workflows can inspect the RCA, ask follow-up questions, challenge weak causes, and keep investigation notes close to the operator.

Change

Config and deploy diff context

Recent releases, infrastructure changes, edge rules, policy updates, and provider events are placed beside symptoms.

Evidence

Evidence-backed RCA notes

Each proposed cause includes the signals that support it and the signals that still need validation.

Action

Action checklist

Responders get validation checks, rollback candidates, mitigation steps, owners, and handoff-ready notes.

Prevent

Follow-up prevention items

The RCA closes with guardrails, alert tuning, runbook updates, and checks that reduce repeat incidents.

RCA capabilities

RCA services responders can use from alerts, schedules, and terminals.

OpsDiag provides scheduled checks, chat-based RCA, MCP Server evidence collection, and terminal-native analysis through Codex CLI or Claude CLI.

Scheduled RCA

Recurring checks for risky windows

OpsDiag can run RCA analysis around planned releases, high-risk services, recurring alert windows, and provider-change windows, so the evidence thread already exists when responders need it.

Chat to RCA

Incident questions become investigations

Responders can ask what changed, what is affected, which service is suspicious, what evidence is missing, or which rollback is safest, then continue against the same RCA context.

MCP + CLI

Service evidence through MCP, analysis through CLI

MCP Servers connect to logs, metrics, traces, Kubernetes, cloud APIs, deploy history, edge rules, and alerting systems, while Codex CLI or Claude CLI keeps the RCA available from terminal workflows.