GitHub Copilot App – Canvas Is Not a UI Builder
Imagine if your development environment did more than just help you write code; what if it also enabled you to observe, steer, and evolve a living system as it operates? This is the transformative experience that the GitHub Copilot App Canvas offers. Canvas reshapes the way developers engage with agent-driven software—not by creating classic user interfaces, but by crafting interactive spaces where humans and AI collaborate, test, and iterate in real-time.
In this article, we’ll explore a practical example of a Canvas extension we developed, known as the Multi-Agent Dev Canvas. This extension illustrates how Canvas acts as both an observability and control plane during the runtime of an agent-driven system. We’ll explain the purpose of Canvas, how it sets itself apart from traditional UI development, and how it can accelerate the design-test-evolve cycle for any multi-agent application.
When many developers first encounter Canvas, their instinct is to view it as a UI framework—a place to create dashboards, boards, or user-centric applications. However, that’s not its primary purpose.
Here’s the key distinction:
- Traditional UIs focus on using software, catering to end-users who interact with a finished product.
- Canvas prioritises shaping software while it’s in operation, designed for developers and AI agents who are actively involved in building, testing, and refining a system.
Canvas tackles challenges that your final UI should never display. It serves as the observability layer, control plane, and validation surface—providing essential tools during development that vanish before production. Think of it this way: you wouldn’t allow users to see your debugger, but you’d definitely need it while building.
To illustrate how Canvas functions in development, we created a Multi-Agent Dev Canvas, a standalone GitHub Copilot Canvas extension (available in this repository: copilot-canvas-runtime). It treats an entire multi-agent system as a dynamic and observable environment. The same principles apply to any agent-driven architecture built on platforms like Microsoft Foundry.
The Multi-Agent Dev Canvas: an observability and control tool where developers and AI agents co-operate to design, test, and enhance an agent-driven system in real-time.
The canvas includes four integrated panels:
Five specialised agents are represented as live cards with real-time status updates. Each card displays the agent’s name, its role, current status (idle, running, done, or error), task count, and the last action taken. When an agent is active, its card pulses blue; if it encounters an error, it glows red—letting you witness the system’s rhythm.
decompose_system— Splits requirements into agent tasksexecute_workflow— Coordinates agents to carry out tasksvalidate_output— Runs evaluation tests and returns structured resultsupdate_system_design— Revises the architecture based on feedbacktrack_state— Keeps track of and updates system state over time
Below the agents, a flow graph visualises how tasks move between them. For example, when you break down a system requirement like “Create an AI-driven code review agent,” the canvas displays five components (pr-ingestion, code-analysis, feedback-generator, learning-loop, notification-service) transferring from the decomposer to the executor and designer agents. Each flow is accompanied by a status badge—pending, pass, or fail.
The validation panel presents structured test results alongside pass/fail badges and reasoning. When you run a validation, each test case evaluates against predetermined criteria:
- “PR ingestion manages large diffs” — Criteria met: processes diffs over 5,000 lines without timing out
- “Feedback is actionable” — Failed: does not fulfill the requirement that each suggestion must include a code fix
- “Learning loop converges” — Criteria met: acceptance rate improves over 10 iterations
- “Notifications are non-blocking” — Criteria met: delivery latency under 500ms
This isn’t a separate test runner; it’s a validation surface embedded in the development loop. You witness failures immediately as they occur, in context, alongside the agents and flows responsible for them.
The right panel chronicles every state change with timestamps, documenting decomposition events, workflow executions, validation runs, and failure injections—all displayed chronologically. This acts as the system’s memory, visible to both the human developer and the AI agents collaborating with them.
The unique aspect of Canvas as a runtime is that the agent can act through it. The canvas offers seven actions callable by agents:
| Action | Description |
|---|---|
decompose_system | Accepts requirements and components, generates task flows, updates the system design |
execute_workflow | Processes pending tasks through the agent pipeline, producing artifacts |
validate_output | Evaluates test cases against defined criteria, returning structured pass/fail results |
update_system_design | Changes the architecture description, constraints, or component list in real time |
track_state | Enables reading the full system state—agents, flow, validations, history, and artifacts |
inject_failure | Induces an error state in an agent to test system resilience |
pause_resume | Toggles execution on and off |
The human developer can directly click to Decompose, Execute, or Validate within the canvas. Likewise, the AI agent can invoke the same actions through programming. Both operate on the same surface, state, and system—this collaborative aspect sets Canvas apart from traditional tools.
To help you understand Canvas, here are some comparisons with familiar tools:
- Figma facilitates Human-to-Human collaboration in design. Multiple users interact with a shared visual surface, but nothing is executed—it’s purely a design tool.
- Traditional UIs are Human-to-System interactions, where users engage with completed software through polished interfaces.
- Canvas embodies Human-to-AI-to-System collaboration. It provides a shared environment where actions take place. The developer guides, the AI performs actions, and the system evolves—visible and real-time throughout.
While Canvas offers cooperative features similar to Figma—it’s a shared visual space—what sets it apart is that its participants include AI agents, and the surface represents a live system rather than mere mockups.
A Canvas extension is crafted using a standard GitHub Copilot CLI extension, which consists of a single extension.mjs file that communicates via JSON-RPC. Key components include:
Each instance of the canvas maintains its own system state: agents, task flows, validations, a history timeline, artifacts, and the current system design. This state is held in memory per instance and pushed to the iframe through Server-Sent Events upon any change.
function createInitialState() {
return {
agents: [
{ id: "decomposer", name: "decompose_system",
status: "idle", responsibility: "Break requirements into agent tasks" },
{ id: "executor", name: "execute_workflow",
status: "idle", responsibility: "Coordinate agents to perform tasks" },
// ... three more agents
],
taskFlows: [],
validations: [],
stateHistory: [],
artifacts: [],
systemDesign: { description: "", constraints: [], components: [] },
execution: { paused: false, stepCount: 0 },
};
}
The canvas operates a loopback HTTP server per instance. The iframe connects to an /events endpoint, receiving state updates as they occur—no polling or complicated websockets needed.
if (req.url === "/events") {
res.writeHead(200, {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache"
});
clients.add(res);
// Immediately push current state on connection
res.write(`data: ${JSON.stringify(getState(instanceId))}\n\n`);
}
Every action is accessible via two channels. The developer clicks a button in the iframe, triggering a POST to the local server. Alternatively, the AI agent can invoke invoke_canvas_action through the SDK. Both methods alter the same state and initiate the same SSE broadcast—neither has privileged access.
The canvas registers with the Copilot SDK by using createCanvas, outlining its identity, description, and all agent-callable actions, with JSON Schema validation for inputs:
createCanvas({
id: "multi-agent-dev",
displayName: "Multi-Agent Dev Canvas",
description: "Runtime observability and control plane for multi-agent development",
actions: [
{
name: "decompose_system",
description: "Break requirements into agent tasks",
inputSchema: {
type: "object",
properties: {
requirements: { type: "string" },
components: { type: "array", items: { type: "string" } }
},
required: ["requirements"]
},
handler: async (ctx) => { /* ... */ },
},
// ... six more actions
],
open: async (ctx) => { /* start server, return URL */ },
onClose: async (ctx) => { /* clean up */ },
});
The Multi-Agent Dev Canvas supports four development scenarios that traditional tools can’t handle:
Instruct the agent to “Create an AI-powered code review system.” Observe how it decomposes the requirement, directs tasks to specialised agents, executes the workflow, and validates outputs—all visible in real time. You can make changes by refining constraints or components and re-running processes.
Observe how agents delegate tasks to one another. The flow graph illustrates which agent produced what output, identifies pending tasks, and highlights bottlenecks. This level of observability is crucial for resolving issues in multi-agent orchestration, yet wouldn’t feature in a production UI.
Utilise inject_failure to deliberately put an agent into an error state. Watch the system’s recovery processes. Do downstream tasks handle failures gracefully? This chaos engineering approach during development, with real-time visibility, allows you to catch integration errors before they progress to production.
Define your test criteria, run the validation, observe any failures, update the system design, and re-test. The validation panel isn’t part of a separate CI pipeline; it’s integrated into your development efforts, creating a seamless feedback loop between design choices and tangible outcomes.
To create your own Canvas extension, follow these straightforward steps:
- Refer to the SDK documentation — Execute
extensions_manage({ operation: "guide" })in GitHub Copilot CLI to access the authoritative documentation. - Scaffold your project — Run
extensions_manage({ operation: "scaffold", kind: "canvas", name: "my-canvas", location: "project" })to generate the template. - Implement your logic — Edit
extension.mjsto incorporate canvas specifics: state model, actions, renderer HTML, and SSE updates. - Reload your extension — Execute
extensions_reloadto apply your changes. - Utilise your extension — Open it with
open_canvas, apply actions usinginvoke_canvas_action, and continue iterating.
Your canvas extension is located at .github/extensions/your-canvas/extension.mjs for project-specific extensions or in your user extensions directory for personal use. No package.json is necessary; the github/copilot-sdk import resolves automatically.
- Canvas acts as a development runtime, not merely a UI framework. It’s not about building Canvas over your UI; it’s about utilising Canvas to explore, test, and evolve the UI and system while building it.
- Canvas addresses issues that your final UI should never reveal. The observability of agents, fault injection, live state changes, and validation feedback are development matters, not user-facing concerns.
- Canvas fosters Human-to-AI-to-System collaboration. Both the developer and the AI agent work on the same surface, state, and running system. It offers a collaborative environment similar to Figma, but with AI agents and real execution.
- Canvas transforms debugging, testing, and execution into a continuous visual feedback cycle. Rather than toggling between an editor, terminal, test runner, and monitoring dashboard, you have one cohesive surface where the system exists and evolves.
- Canvas extensions are lightweight. A single
extension.mjsfile with no dependencies, using a loopback HTTP server with SSE, allows you to focus on building your system without additional complexity.
By redefining software development, Canvas shifts the paradigm from just writing static code to orchestrating dynamic systems. Developers and AI work together to co-create, observe, and enhance solutions live. Instead of just crafting UIs for users, we now build interactive environments for agents, turning debugging, testing, and execution into a seamless, visual feedback loop that accelerates innovation and brings ideas to life faster than ever.
The Multi-Agent Dev Canvas example is just one instance of this approach. This pattern is applicable anywhere you’re creating agent-driven systems, such as AI orchestration, workflow automation, data pipelines, and autonomous services. If you want to observe, manage, and validate a complex system while it operates, that’s where Canvas shines.
Share this content:
Discover more from Qureshi
Subscribe to get the latest posts sent to your email.
