Loading Now

Public Preview Update: Azure Copilot Observability Agent

Today’s cloud applications produce vast amounts of telemetry, including metrics, logs, traces, alerts, and various platform signals.

However, whether you’re digging into your observability data or reacting to an unexpected issue, uncovering insights and identifying root causes necessitates a robust understanding of the application, the signals it produces, and the available tools. This knowledge is vital, especially when your business and customers might be affected.

The Observability agent serves as your dedicated monitoring partner throughout the entire observability lifecycle. It allows you to engage via chat, enhancing your grasp of the observability data at hand. Our goal is to facilitate a comprehensive range of activities—from onboarding and detection to triage and root cause analysis—significantly decreasing human effort and minimising downtime for customers.

Currently, the agent handles essential investigation and exploration scenarios, and we are swiftly broadening its capabilities across even more workflows and entry points.

 

Conducting In-Depth Investigations

In-depth investigations are tailored for instances when something has gone awry, aiming to clarify what occurred and the next steps to take.



 

The Observability agent is optimised for real-world, full-stack investigations within distributed systems, including those based on Azure Kubernetes Service (AKS) and Virtual Machines (VMs). To pinpoint the root cause, the agent utilises advanced reasoning techniques, employing a cutting-edge combination of Machine Learning (ML) and Large Language Models (LLM) to detect and correlate anomalies across a massive volume of signals from applications, infrastructure, and Azure platform layers. This helps converge on likely root-cause candidates in various scenarios, such as:

  • Application issues like deployment and performance regressions, request or dependency failures, resource exhaustion, and configuration errors
  • Infrastructure problems including compute saturation, disk I/O throttling, misconfigured dependencies, or network connectivity issues in AKS clusters and VMs
  • Platform incidents that involve Azure maintenance, outages, and managed infrastructure issues such as SNAT port exhaustion or upgrade blockers

 



The best way to kick off a deep investigation is through an Azure Monitor alert, be it in the Azure portal or via an alert notification. You can also initiate investigations through various other entry points—such as the agent chat, Logs, or Activity logs, with more entry points set to be introduced over time.

During a deep investigation, the agent compiles a comprehensive report that includes the analysis, root cause, and recommended next steps alongside significant signals and supporting data. It also provides detailed insights into its reasoning process, showing the data accessed and queries executed.

Users can continue their interaction with the agent within the context of the investigation, allowing for deeper exploration or guiding the agent to consider additional hypotheses:

  • What changes occurred just before the incident began?
  • Are there any issues in VM that could be related? If so, conduct a deep investigation including this VM.
  • Which dependencies are most closely associated with this spike in failures?
  • Are there related alerts or configuration changes that can shed light on this behaviour?

Investigation outcomes can be saved as an Azure Monitor Issue, allowing for collaboration and continuity by preserving the entire investigation context.

Data Exploration and Analytics

The Observability agent facilitates data exploration and analytics for spontaneous understanding and hypothesis development, without requiring an alert or full investigation.

To begin, just click on the “Observability Agent” button in the Logs blade or any other supported entry points. From there, you can inspect observability data like logs and metrics using straightforward language prompts, such as:

  • Show the top errors from the last hour.
  • Is there a connection between application errors and dependency errors?
  • Plot the trend of application errors alongside storage-related errors.
  • What operations in my app are affected by the ongoing authentication problem?
  • Identify latency spikes in my app over the past three days and their sources (specific users or regions).

If you already have a query or results displayed in the Logs blade—the agent will recognise this automatically, allowing you to request explanations of the results, assistance in modifying your query, or even optimisation tips.

Moreover, when exploration reveals a broader or more intricate issue, operators can select the option to run a deep investigation straight from the exploration context and save the results as an Issue.



Looking Ahead

We are continually enhancing the Observability agent to encompass a broader scope of the observability lifecycle, shifting from reactive investigations to more proactive and ongoing system understanding:

  • Greater integration across Azure Monitor experiences
    Expanding beyond alerts to incorporate additional entry points and workflows across the platform.
  • Autonomous observability
    When signals indicate developing or ongoing incidents, the agent can proactively correlate alerts, carry out investigations, and automatically generate Azure Monitor Issues—lessening the need for manual triage.
  • Connection with external systems
    Extending the context of investigations beyond Azure Monitor, facilitating the flow of insights and conclusions into existing engineering workflows.

 

Stay Informed

  • Follow this blog for continual deep dives, updates on the current capabilities, and a glimpse into what’s on the horizon.
  • Live Webinar
    A comprehensive walkthrough of real Observability agent scenarios, best practices, and today’s offerings—along with a preview of upcoming features and a live Q&A with the product team. Register here

We Value Your Feedback

The Observability agent is evolving based on real-world experiences and operator input. Please share your thoughts directly via the Give Feedback option, or reach out to us at: [email protected]

Share this content:


Discover more from Qureshi

Subscribe to get the latest posts sent to your email.

Discover more from Qureshi

Subscribe now to keep reading and get access to the full archive.

Continue reading