Observability in the AI Era

What does it mean to observe a system that can fail without ever throwing an error? For more than a decade, observability tools have answered a narrower question: when something in production breaks, what broke and why? The signals they were designed to capture (errors, latencies, traces of failed requests) assumed that software fails in predictable, syntactic ways. AI systems do not always offer that courtesy.

The observability tools that gave engineers visibility into distributed software are now being asked to make sense of systems they were never designed for. Observability companies built their businesses around a familiar architecture with three categories of telemetry data: logs, metrics, and traces. These categories were captured by separate tools, stored in separate systems, and stitched together when something broke. For cloud-native software written and operated by humans, the model worked. It scaled and generated tens of billions of dollars in enterprise spend.

AI workloads strain that model in ways the original architecture wasn’t designed to handle. LLM-based failure modes are becoming harder to detect with traditional signals, and engineers are increasingly using these tools alongside their own AI assistants. At a recent observability summit, attendees noted that telemetry volumes from AI applications are running significantly higher than traditional apps, driven predominantly by traces and spans.^1,2We have spent the past few months speaking with operators, founders, and incumbents, and what follows is our assessment of where the market stands today and where we see it going.

Part 1: Why Observability Exists

Observability emerged as a distinct discipline in the early 2010s, as companies shifted from monolithic applications to microservices on cloud infrastructure. A single user request might now touch dozens of services across multiple cloud providers, and traditional monitoring, which tracked a small set of predefined metrics on a small number of servers, is no longer sufficient.

The industry converged on three data types. Logs captured discrete events with contextual detail. Metrics aggregated system behavior into numerical measurements over time. Traces followed individual requests across services, pinpointing where in a distributed chain something had gone wrong. Together, these “three pillars” gave engineering teams a shared framework for systematically asking questions about complex systems rather than reacting to alerts on static dashboards. Vendors built businesses around each pillar, and some built across all three.

Even before AI workloads arrived, the observability model was already under strain. The pillars were generated, processed, and stored separately, so engineers investigating an incident had to switch between interfaces and manually reconstruct context. Engineering teams spend 33% of their time on interruptions and outages, and the median company loses $76 million annually from high-impact downtime.³

Cost compounded the problem. Legacy vendors priced for data ingestion and storage, so as microservices adoption increased, telemetry volumes grew exponentially. A tool that was reasonably priced for 50 services became unsustainable for 500. Observability now consumes roughly 15-25% of cloud spend at many organizations.⁴ The underlying data stores were built for specific query patterns and data types, which made high-cardinality analysis (e.g. filtering by individual user IDs across millions of concurrent requests) technically possible, but slow and expensive.

Part 2: How AI is Shifting Priorities

The move away from a strict three-pillar architecture predates the AI wave. Companies like Honeycomb and Chronosphere built on a different premise: logs, metrics, and traces are different views of the same underlying event stream, rather than separate data types. Build a store that can ingest and query raw, high-dimensional events at scale, and the three pillars can be derived from a single source.

OpenTelemetry has accelerated that shift by standardizing the instrumentation layer. Historically, each vendor controlled its own agent, and switching required months-long re-instrumentation projects. OpenTelemetry, now the second most active open-source project in the Cloud Native Computing Foundation behind Kubernetes, established a vendor-neutral standard for how telemetry is generated, collected, and exported.⁵ As collection becomes standardized, differentiation shifts to the analytics layer and the query engine. AI is sharpening that trend in three ways.

The first is a shift in how systems fail. Traditional software fails syntactically: a service returns a 500 error, a query times out, or a network call fails. Traditional observability tools were designed to capture those signals. By contrast, AI systems fail semantically. A model can return a plausible-but-wrong answer without triggering any alert. An agent can take the wrong path through a multi-step workflow without producing an error code. A retrieval system can surface irrelevant context that quietly degrades quality. Catching these failures requires new data types that capture prompt inputs and outputs, token usage, hallucination rates, context propagation across agent steps, and query patterns that identify which classes of inputs produce poor outputs rather than which service errored. Which signals matter most are also shifting in real time. At an observability summit, users cited traces as the most informative source for monitoring AI agents, as they capture the reasoning behind what is happening.⁶

The second is the collapse of the line between building and operating software. GitHub Copilot, Cursor, and Claude Code are accelerating the software development lifecycle, and more code reaching production faster means more failure modes. Engineers want observability context while writing code, not only after something breaks.

The third is that AI is becoming the observability tool itself. The direction vendors are moving toward, described as autonomous observability or “AI SREs,” is a system that detects anomalies, diagnoses root causes, and triggers remediation with less human involvement. The starting point is low, with only 13% of engineering teams reporting confidence in their ability to observe and debug production AI workflows today.⁷ That gap has produced a wave of purpose-built entrants and pushed incumbents to respond.

Part 3: What We Are Seeing in the Market

The pace and depth of adaptation vary. Public incumbents are shipping AI features on top of existing architectures. A handful of private players are pursuing more structural changes. And a new class of purpose-built entrants is defining the LLM observability and evaluation category.

Among public incumbents, Datadog has been at the forefront in shipping AI-related products: an LLM Observability module covering token usage, latency, and output quality, Bits AI SRE for automated incident investigation, and an MCP server that brings observability context into AI coding tools. Dynatrace launched Dynatrace Intelligence, an agentic operations layer that combines its AI engine with autonomous remediation, alongside domain-specific agents for SRE, security, and DevOps workflows, and a remote MCP server. Elastic shipped MCP Apps that embed fully interactive observability and security workflows directly inside tools like Claude and VS Code. Splunk (part of Cisco) has released Splunk AI Assistant 2.0, a digital teammate that helps users analyze data and solve problems.

Among private players, New Relic has released several modules, including an Agentic AI Monitoring product for multi-agent systems, an MCP server, and Logs Intelligence. The company’s 2026 AI Impact Report reports that teams using its AI features resolved issues 25% faster and shipped at 80% higher frequency than non-AI users.⁸ Honeycomb’s recent releases lean on its event-based architecture: Canvas for conversational querying and its Honeycomb MCP with capabilities including Agent Skills, Automated Investigations and Pipeline Intelligence. They acquired Grit in 2025 as a strategic investment into AI agents. Coralogix has also invested in its AI footprint – they acquired Aporia in late 2024 and later launched the Coralogix AI Center and its AI observability agent, Olly. Chronosphere was acquired by Palo Alto Networks in November 2025, but had previously announced its AI-guided troubleshooting and MCP integration. Grafana’s open-source LGTM stack remains broadly deployed, with its AI work focused on intelligent alerting, anomaly detection, and natural-language querying within the existing dashboard framework.

Among purpose-built AI observability entrants, Arize AI offers a dual-track platform – Arize AX for enterprise production monitoring and Phoenix, its open-source LLM observability tool, which has surpassed two million monthly downloads. Braintrust built its own database to handle the scale and nesting of agent traces, which can reach hundreds of megabytes per interaction. They believe the new pillars of AI observability are traces, evals and annotation. Selector, an AI-powered network observability platform that AVP invested in earlier this year, fits within the same broader shift toward AI-native analytics layers. Others active in the space include LangSmith, Latitude, HoneyHive, Laminar, Confident AI, Maxim, Respan, Fiddler and many more. M&A in this segment has also picked up this year: Langfuse was acquired by ClickHouse in January 2026, Traceloop by ServiceNow in March 2026 (now embedded in ServiceNow’s AI Control Tower), Helicone by Mintlify in March 2026 and Galileo by Cisco in April 2026.

Closing

Observability was built to answer a question software teams could not previously ask: why a system behaved the way it did. AI does not change that question. It makes it harder to answer, more important to get right, and possible to approach in ways the original tooling was not designed for. Public companies are layering AI capabilities onto existing platforms; private players are pursuing more structural changes; and a purpose-built cohort is defining the AI and agentic observability category from scratch. If you are building in this space, we’d love to talk – please email us at olivia.tanzman@avpcap.com and lizzy.mitchell@avpcap.com.