Agent Governance Toolkit: Architecture Deep Dive, Policy Engines, Trust, and SRE for AI Agents

Last week, we introduced the Agent Governance Toolkit on the Microsoft Open Source Blog. This is a brilliant open-source initiative aimed at enhancing runtime security for autonomous AI agents. In our announcement, we elaborated on the reason behind this: as AI agents begin to make independent decisions in a production environment, it’s crucial to employ the security measures that have safeguarded our systems for years in this new realm.

In this article, we’ll delve into the how: exploring the architecture, implementation details, and what’s involved in operating governed agents in a production setting.

If you’re responsible for managing production infrastructure, you’re likely familiar with the key principles: least privilege, mandatory access controls, process isolation, audit logging, and circuit breakers to prevent cascading failures. These methods have successfully protected production systems for many years.

Now, picture a fresh category of workload entering your infrastructure: AI agents operating autonomously, executing code, calling APIs, accessing databases, and even spawning subprocesses. They determine their actions, select tools, and operate in loops—often without the security protocols you would expect for any production workload.

This gap in security is exactly why we developed the Agent Governance Toolkit: it brings established security principles from operating systems, service meshes, and Site Reliability Engineering (SRE) into the rapidly evolving space of autonomous AI agents.

To give you a clearer picture: many AI agent frameworks today resemble a scenario where every process runs as root—totally lacking access controls, isolation, or an audit trail. The Agent Governance Toolkit works as the kernel, giving functionality to service meshes and acting as the SRE platform for AI agents.

When an agent makes a call, like executing `DELETE FROM users WHERE created_at < NOW()`, there’s often no policy layer in place to verify if that action is permitted. There’s also typically no identity check when one agent communicates with another. Additionally, an agent can make thousands of API calls in a minute without any resource limits. Plus, there’s no circuit breaker to manage cascading failures when incidents occur.

In December 2025, OWASP published the Agentic AI Top 10: the inaugural formal classification of risks associated with autonomous AI agents. This list resembles a security engineer’s worst nightmare: goal hijacking, tool misuse, identity abuse, memory poisoning, cascading failures, rogue agents, and much more.

If you’ve ever fortified a production server, these risks will resonate with you, and the urgency to address them is real. The Agent Governance Toolkit is created with the goal of mitigating all ten of these risks through effective policy enforcement, cryptographic identity, execution isolation, and practical reliability engineering patterns.

Note: The OWASP Agentic Security Initiative has subsequently adopted the ASI 2026 taxonomy (ASI01–ASI10). The toolkit’s copilot-governance package now incorporates these identifiers while maintaining backward compatibility for the original AT numbering.

The toolkit is organised as a v3.0.0 Public Preview monorepo comprising nine independent installable packages:

Package	Description
Agent OS	A stateless policy engine that intercepts agent actions before execution, using configurable pattern matching and semantic intent classification.
Agent Mesh	Handles cryptographic identity (DIDs with Ed25519), the Inter-Agent Trust Protocol (IATP), and secure communications between agents.
Agent Hypervisor	Implements execution rings inspired by CPU privilege levels, saga orchestration for multi-step transactions, and shared session management.
Agent Runtime	Provides runtime supervision with kill switches, dynamic resource allocation, and management of execution lifecycles.
Agent SRE	Utilises SLOs, error budgets, circuit breakers, chaos engineering, and progressive delivery, applying production reliability practices for AI agents.
Agent Compliance	Automated governance verification, complete with compliance grading and mapping against regulatory frameworks like the EU AI Act, NIST AI RMF, HIPAA, and SOC 2.
Agent Lightning	Governance for reinforcement learning training, featuring policy-enforced runners and reward shaping.
Agent Marketplace	Manages plugin lifecycle with Ed25519 signing, trust-tiered capability gating, and generating Software Bill of Materials (SBOM).
Integrations	Includes 20+ framework adapters for LangChain, CrewAI, AutoGen, Semantic Kernel, Google ADK, Microsoft Agent Framework, OpenAI Agents SDK, and more.

Agent OS intercepts calls made by agent tools before they’re executed:

from agent_os import StatelessKernel, ExecutionContext, Policy

kernel = StatelessKernel()
ctx = ExecutionContext(
    agent_id=”analyst-1″,
    policies=[
        Policy.read_only(),                    # No write operations
        Policy.rate_limit(100, “1m”),          # Max 100 calls/minute
        Policy.require_approval(
            actions=[“delete_*”, “write_production_*”],
            min_approvals=2,
            approval_timeout_minutes=30,
        ),
    ],
)

result = await kernel.execute(
    action=”delete_user_record”,
    params={“user_id”: 12345},
    context=ctx,
)

The policy engine works through two layers: configurable pattern matching (with example rule sets for SQL injection, privilege escalation, and prompt injection that users can adapt for their situations) and a semantic intent classifier that detects harmful intentions, irrespective of phrasing. When an action is deemed `DESTRUCTIVE_DATA`, `DATA_EXFILTRATION`, or `PRIVILEGE_ESCALATION`, the engine can block it, direct it for human approval, or lower the agent’s trust level based on the configured policy.

Important: All policy rules, detection patterns, and sensitivity thresholds are externalised in YAML configurations. The toolkit comes equipped with sample configurations in `examples/policies/` that should be reviewed and tailored before going into production. No built-in rules should be seen as exhaustive. Supported policy languages: YAML, OPA Rego, and Cedar.

The kernel is designed to be stateless, meaning each request carries its own context. This allows you to deploy it behind a load balancer, as a sidecar container in Kubernetes, or within a serverless function, without the need to manage shared state. It integrates effortlessly into existing deployment models on AKS or any Kubernetes cluster, with Helm charts available for agent-os, agent-mesh, and agent-sre.

In service mesh architectures, services authenticate their identity using mTLS certificates before interacting. AgentMesh effectively applies this concept to AI agents through decentralised identifiers (DIDs) with Ed25519 cryptography and the Inter-Agent Trust Protocol (IATP):

from agentmesh import AgentIdentity, TrustBridge

identity = AgentIdentity.create(
    name=”data-analyst”,
    sponsor=”[email protected]”,          # Human accountability
    capabilities=[“read:data”, “write:reports”],
）
# identity.did -> “did:mesh:data-analyst:a7f3b2…”

bridge = TrustBridge()
verification = await bridge.verify_peer(
peer_id=”did:mesh:other-agent”,
required_trust_score=700, # Must score >= 700/1000
)

A crucial aspect is trust decay: an agent’s trust score diminishes over time without positive reinforcements. An agent trusted last week but that has remained silent since then will gradually lose that trust—signifying that trust is an ongoing signal, not a one-off grant.

Delegation chains enforce scope narrowing: a parent agent with read+write access can only delegate read permissions to a child agent, thus preventing escalation.

CPU architectures implement privilege rings (Ring 0 for kernel access, Ring 3 for user space) to segregate workloads. The Agent Hypervisor mirrors this model for AI agents:

Ring	Trust Level	Capabilities
Ring 0 (Kernel)	Score ≥ 900	Complete system access; can modify policies.
Ring 1 (Supervisor)	Score ≥ 700	Cross-agent coordination with elevated tool access.
Ring 2 (User)	Score ≥ 400	Standard tool access, limited to assigned scope.
Ring 3 (Untrusted)	Score < 400	Read-only, sandboxed execution only.

New and untrusted agents start at Ring 3 and must earn their way up, embodying the principle of least privilege that production engineers apply to other workloads.

Each ring enforces per-agent resource limits such as maximum execution time, memory caps, CPU throttling, and request rate limits. If a Ring 2 agent attempts a Ring 1 operation, it gets blocked, just as a user-space process trying to access kernel memory would.

These definitions and associated trust score thresholds can be customised via policy. Organisations can establish their own ring structures, modify the number of rings, set distinct trust score thresholds for transitions, and define resource limits to align with their security needs.

The hypervisor also features saga orchestration for multi-step operations. If an agent carries out a sequence of actions, like drafting an email → sending → updating CRM, and the last step fails, it initiates compensatory actions in reverse. This approach, borrowed from distributed transaction methods, guarantees that multi-agent workflows remain consistent, even if individual steps falter.

If you’re involved in Site Reliability Engineering (SRE), you assess services based on SLOs and mitigate risks through error budgets. Agent SRE extends this to the management of AI agents:

When an agent’s safety SLI drops below 99%, meaning that more than 1% of its actions contravene policy, the system will automatically restrict the agent’s capabilities until it recovers. This mirrors the error-budget model SRE teams employ for production services, applied specifically to agent behaviours.

We’ve been proactive in creating nine templates for chaos engineering fault injection: network delays, failures in LLM providers, tool timeouts, trust score manipulation, memory corruption, and concurrent access racing. After all, the only way to test your agent system’s resilience is to intentionally disrupt it.

Agent SRE integrates well with your current observability setup via adapters for Datadog, PagerDuty, Prometheus, OpenTelemetry, Langfuse, LangSmith, Arize, MLflow, and more. Message broker adapters are also available for Kafka, Redis, NATS, Azure Service Bus, AWS SQS, and RabbitMQ.

If your organisation aligns with CIS Benchmarks, NIST AI RMF, or any compliance frameworks, the OWASP Agentic Top 10 serves as the equivalent standard for AI agent workloads. The toolkit’s agent-compliance package offers automated governance grading based on these frameworks.

The toolkit works across various frameworks and comes with 20+ adapters, making it simple to implement governance within an existing agent—typically requiring only a few lines of configuration rather than a complete rewrite.

It also exports metrics to any platform compatible with OpenTelemetry, Prometheus, Grafana, Datadog, Arize, or Langfuse. If you have an observability stack in place for your infrastructure, metrics from agent governance will flow seamlessly through the same pipeline.

Key metrics encompass: policy decisions per second, trust score distributions, ring transitions, SLO burn rates, the state of circuit breakers, and governance workflow latency.

# Install all packages
pip install agent-governance-toolkit[full]

# Or individual packages
pip install agent-os-kernel agent-mesh agent-sre

The toolkit is available across various language environments, including Python, TypeScript (`@microsoft/agentmesh-sdk` on npm), Rust, Go, and .NET (`Microsoft.AgentGovernance` on NuGet).

While the toolkit is designed to be platform-agnostic, we’ve included integrations to make the path to production as swift as possible on Azure:

Azure Kubernetes Service (AKS): You can deploy the policy engine as a sidecar container alongside your agents. Helm charts provide production-ready manifests for agent-os, agent-mesh, and agent-sre.

Azure AI Foundry Agent Service: Utilise the built-in middleware integration for agents deployed via Azure AI Foundry.

OpenClaw Sidecar: An appealing deployment scenario involves running OpenClaw, the open-source autonomous agent, within a container, with the Agent Governance Toolkit as a sidecar. This configuration enables policy enforcement, identity verification, and SLO monitoring for OpenClaw’s autonomous operations. In Azure Kubernetes Service (AKS), the deployment can be a standard pod featuring two containers: OpenClaw as the main workload and the governance toolkit as the sidecar, both communicating over localhost. We also have a reference architecture and Helm chart available in our repository.

This sidecar pattern can easily be used with any containerised agent, but OpenClaw is a particularly interesting case due to the focus on autonomous agent safety.

34+ step-by-step tutorials covering policy engines, trust, compliance, MCP security, observability, and cross-platform SDK usage can be found in our repository.

git clone https://github.com/microsoft/agent-governance-toolkit
cd agent-governance-toolkit
pip install -e “packages/agent-os[dev]” -e “packages/agent-mesh[dev]” -e “packages/agent-sre[dev]”

# Run the demo
python -m agent_os.demo

AI agents are gradually becoming autonomous decision makers within production environments, executing code, managing databases, and orchestrating services. The security practices that have safeguarded production systems for years—like least privilege, mandatory access controls, process isolation, and audit logging—are precisely what these new workloads require. We’ve developed these features, and they’re available as open source.

We’re committed to developing this in an open manner because ensuring agent security is too vital for any single organisation to tackle alone:

Security research: Adversarial testing, red-team findings, and vulnerability reports will enhance the toolkit for the entire community.
Community contributions: Framework adapters, detection rules, and compliance mappings from the community will expand coverage across various ecosystems.

We’re dedicated to open governance. Released under Microsoft, we aim to transition this project into a foundation home, such as the AI and Data Foundation (AAIF), where it can thrive under cross-industry stewardship. We’re actively discussing this pathway with foundation partners.

The Agent Governance Toolkit is open source, licensed under MIT. Contributions are encouraged at github.com/microsoft/agent-governance-toolkit.

Share this content:

Discover more from Qureshi

Subscribe to get the latest posts sent to your email.

Agent Governance Toolkit: Architecture Deep Dive, Policy Engines, Trust, and SRE for AI Agents

Like this:

Related

Discover more from Qureshi

Share this:

Like this:

Related

Discover more from Qureshi

April update: What’s new in Security for partners

Related Posts

Discover more from Qureshi