Loading Now

Building the Solution Teams Need to Secure AI Against Prompt Injection

As artificial intelligence (AI) continues to advance, many teams are focusing on quick deployments of applications, sometimes at the expense of security. New threats, including prompt injection, remain inadequately addressed, putting users, systems, and infrastructure in jeopardy.

Currently, the expertise needed to combat these risks is quite scattered and largely confined to a few cybersecurity specialists. Developers, facing pressure to deliver swiftly, often lack the necessary tools and frameworks to thoroughly test their AI systems for weaknesses.

This disparity results in a troubling gap between the pace of development and the security assurances required for AI applications.

To tackle this problem, we have created a comprehensive Prompt Injection Testing Platform, supported by Microsoft Foundry. This platform aims to make the security testing of large language models (LLMs) accessible and straightforward for developers.

As developers quickly integrate LLMs into their applications, they face several challenges:

  • There is no standard approach to security testing.
  • While awareness of prompt injection risks is growing in research, practical mitigation by developers is still lacking.
  • Accessible and actionable tools are in short supply.

This situation poses a significant risk—applications are being launched faster than they can be secured.

In partnership with Avanade, as part of our UCL Industry Exchange Network (IXN) project, we built the Prompt Injection Testing Platform to specifically address these issues by:

  • Providing comprehensive information on vulnerabilities and mitigation strategies.
  • Assisting teams in pinpointing weaknesses within their AI systems.
  • Facilitating custom and automated testing pipelines.
  • Integrating tools like Garak for conducting adversarial testing.

Our goal is to make prompt injection testing straightforward and easily understandable.

We’ve divided the project into several phases:

Phase 1: Understanding Our Users’ Needs.

We started by identifying our main users: AI developers and other stakeholders involved in integrating LLMs into applications. Discussions revealed critical challenges, such as:

  • Limited awareness among developers about prompt injection risks.
  • A general lack of accessible testing tools.

This initial work highlighted the importance of creating a developer-focused solution that doesn’t demand extensive technical expertise. We determined that to be as effective as possible, our solution should not necessitate prior knowledge of prompt injections.

Given these challenges, we concluded that a dedicated platform would best centralise dispersed knowledge while offering a structured, scalable environment for testing LLM vulnerabilities.

Phase 2: Understanding the Threat Landscape

Building on our user research, we sought to gain a thorough understanding of the prompt injection threat landscape to inform our platform’s design. This phase involved analyzing:

  • Various types of prompt injection vulnerabilities.
  • Common attack scenarios and techniques.
  • Existing mitigation strategies applied in practice.
  • Tools and methods for prompt injection security testing.
  • The most popular models to ensure our platform works with real-world systems.

We consolidated our findings into a structured technical report, aiming to educate developers, security testers, and other semi-technical stakeholders. This effort was not just to guide our implementation but also to contribute to the standardisation and understanding of prompt injection.

From our research, we found that prompt injection isn’t merely a single vulnerability; it’s a rapidly changing threat that requires ongoing, scalable testing rather than just one-time evaluations.

Phase 3: Building the Platform

With insights from our users and a clear understanding of the threats, we proceeded to design and develop a cohesive prompt injection testing platform and knowledge base.

We established three key principles:

  • Developer-friendly: No specialised security knowledge is needed.
  • Unified: Combines education (knowledge base) with practical tools (testing features).
  • Scalable: Expert users can enhance the platform by integrating their own models, tests, and mitigations.

During this phase, we created a platform that allows teams to:

  • Connect their own LLM endpoints.
  • Run customised prompt injection tests.
  • Conduct automated adversarial testing via Garak.
  • Access a centralised knowledge base on vulnerabilities and mitigation strategies.
  • Export knowledge base information and test outcomes as PDF documents.

By project’s end, we’d developed a unified platform that lets developers systematically test, understand, and address prompt injection vulnerabilities in their AI applications.

To see our platform in action, you can check out our demo video.

https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FK-qNQL2Tb2Q%3Ffeature%3Doembed&display_name=YouTube&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DK-qNQL2Tb2Q&image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FK-qNQL2Tb2Q%2Fhqdefault.jpg&type=text%2Fhtml&schema=youtube" allowfullscreen="" style="max-width: 100%



Figure 4: The platform’s main interface showcases an overview of prompt injection concepts and a structured catalogue of vulnerabilities, enabling exploration of attack types and their mitigations.

Users can choose from built-in models or link their personal LLM endpoints, thus enabling compatibility across various providers:

  • Supports various model providers through Microsoft Foundry.
  • Allows custom model integration using HTTP endpoints.
  • Enables model customisation, including tailored system prompts and mitigation layers.
  • Offers flexibility to adapt to new models and mitigations as they come up.

The platform allows users to create and run customised prompt injection tests tailored to their applications. This process involves:

  • Creating and executing targeted prompts.
  • Simulating real-world attack scenarios.
  • Running predefined adversarial tests with the integration of NVIDIA Garak.



Figure 2: Testing interface allowing configuration of prompt injection tests and the execution of automated scans, displaying results and risk evaluations.

A vital aspect of our platform is its structured knowledge base, designed to make concepts around prompt injection accessible. This is divided into two core areas:

  • Vulnerabilities: It details various types of prompt injection attacks, explaining how each one works, complete with real-world examples and references to trustworthy external sources.
  • Mitigations: This section outlines defence strategies against these vulnerabilities, providing clear implementation tactics and code samples illustrating how to incorporate these safeguards.

To facilitate exploration, we also included a chatbot interface. This feature answers user queries using knowledge base information, helping users quickly navigate through vulnerabilities and mitigation strategies while providing relevant contextual data and directing them to appropriate sections of our platform.



Figure 3: Direct prompt injection analysis view, where users can delve into attack techniques, observe unsafe model responses, and examine related mitigation methods.

Additionally, our platform features a prompt enhancer, aimed at helping users actively strengthen the security of their system prompts by:

  1. Accepting an existing prompt as input.
  2. Utilising insights from the knowledge base and best practices.
  3. Redesigning the prompt for clarity and resilience.
  4. Integrating selected prompt-layer mitigations to lower the risk of prompt injection.



Figure 4: The prompt enhancer interface which applies prompt-layer mitigations (e.g., using delimiter tokens and enforcing instruction hierarchies) to secure a system prompt against prompt injection threats.

To create a flexible and scalable testing ecosystem, we designed our platform with a modular and layered architecture. This ensures that different components can function independently while remaining interconnected through clearly defined interfaces, thereby guaranteeing both extensibility and maintainability.

We categorised our platform into four main layers:

An interactive user interface that allows developers to:

  • Explore the prompt injection knowledge base.
  • Set up and execute tests.
  • Review results and assess vulnerabilities.

The API layer facilitates orchestration and communication between the frontend and core system.

  • Handles requests from the frontend for creating and executing tests.
  • Supplies the frontend with available models, mitigations, and configurations.
  • Ensures that newly added models and mitigations can be automatically reflected in the frontend without needing manual updates.

The core structure and logic of the system is defined in this layer:

  • Establishes interfaces for key components, like mitigations, models, and test runners.
  • Defines the structure for tests and data models.
  • Encapsulates logic to ensure consistency.

This layer implements the abstractions outlined in the domain layer and connects the platform to external services:

  • Integrates model providers such as OpenAI, Anthropic, and other HTTP-based endpoints.
  • Employs test runners, including custom prompt runners and external tools like Garak.
  • Facilitates database connections and repository classes.

Through the research and development of our platform, we gained several crucial insights into the behaviour and security of LLM-based applications:

  1. Prompt injection vulnerabilities are more widespread than anticipated, with even simple prompts being capable of manipulating a model’s behaviour.
  2. Lacking structured testing can leave many risks hidden. Without a systematic approach, numerous vulnerabilities may go unnoticed, and manually crafting unsafe prompts can be time-consuming.
  3. Combining custom testing with framework-driven testing enhances coverage. Employing both tailored prompts (that target specific application scenarios) alongside framework-based testing (such as Garak) ensures a more thorough evaluation for model safety, capturing both expected and unexpected vulnerabilities.
  4. Structured prompts greatly enhance robustness. We noted that prompts featuring a clear structure and integrated mitigations show resilience against injection techniques.

By the conclusion of our project, we had effectively developed a platform that:

  • Cements the connection between prompt injection knowledge and practical testing.
  • Ensures repeatable and structured evaluation of prompt injection vulnerabilities.
  • Creates a unified workflow for learning, testing, and enhancing prompt security.
  • Accommodates multiple models and testing strategies, covering the entire spectrum of vulnerability safety.

We have proven that a structured approach allows for the systematic identification, testing, and mitigation of prompt injection risks.

Throughout the project, we identified several key insights that influenced both our technical strategy and understanding of AI security.

AI is progressing at breakneck speed, and systems need to be designed accordingly. As AI models and attack methods evolve rapidly, static solutions quickly become outdated. We learned that a modular, extendable platform is crucial for adapting to new attacks and mitigations.

Security must be integrated into the development process, not just added during testing. Developers are often prioritising functionality over security. In the case of LLMs, vulnerabilities can significantly impact the safety of systems and users. Thus, security should be a fundamental part of the development cycle, with models and external tools only linked if their safety is confirmed.

Bridging the gap between developers and security testers is essential. We discovered a considerable disconnect between developers creating AI applications and the security testers assessing them. These two groups frequently have different priorities and levels of knowledge. We aim to bridge this gap by making prompt injection knowledge more approachable while creating workflows that are both usable by developers and anchored in robust security practices.

Although our platform lays a solid foundation for prompt injection testing and knowledge, there remain several areas for future exploration:

  • Expanding our testing framework to include a wider range of attack techniques.
  • Integrating with MCP servers and external systems, facilitating interactions with tools, APIs, and data sources.
  • Addressing additional indirect prompt injection vulnerabilities, such as file uploads, web scraping, and multi-step workflows.

Looking ahead, we aim to deepen our platform’s integration into development workflows by introducing continuous integration and continuous deployment (CI/CD) capabilities for ongoing security testing and tracking model robustness to ensure it evolves effectively over time.

As AI becomes more deeply embedded into our everyday applications, ensuring their security is vital. Our research indicates that existing practices have not kept pace with the rapid development of AI systems and their corresponding attack methods.

Through our efforts, we have shown that prompt injection risks can be systematically identified, tested, and mitigated through a structured approach. Merging a unified knowledge base with a flexible testing platform powered by Microsoft Foundry is a significant step towards enhancing the safety and reliability of AI systems.

Moreover, our project emphasises a broader principle: a developer-first security approach, underpinned by collaboration across development, security, and safety disciplines, is critical for scaling AI effectively. Security should not be relegated to specialist teams; instead, it should be woven seamlessly into the development process alongside practices such as red-teaming and continuous testing.

Our initiative empowers teams with the tools and insights they need to create safer and more reliable AI systems. If you’re keen on building more robust AI applications or delving into prompt injection practices, we invite you to join us on the Foundry Community on June 3rd at 2 PM BST, where we will showcase our platform, walk through real-world examples, and discuss how teams can incorporate prompt injection testing into their development processes.

Teo Montero Bonet, UCL Computer Science

Mario Mojarro Ruiz, UCL Computer Science

David Thomas Garcia, UCL Computer Science

Nathaniel Gibbon, UCL Computer Science

With additional support from Josh McDonald, Avanade

Share this content:


Discover more from Qureshi

Subscribe to get the latest posts sent to your email.

Discover more from Qureshi

Subscribe now to keep reading and get access to the full archive.

Continue reading