A strong defense against prompt injection attacks

The AI security landscape is constantly changing, with prompt injection attacks emerging as one of the most significant threats to generative AI app builders today. This occurs when an adversary manipulates an LLM’s input to change its behavior or access unauthorized information. According to the Open Worldwide Application Security Project (OWASP), prompt injection is the top threat facing LLMs today1. Help defend your AI systems against this emerging threat with Azure AI Content Safety, featuring Prompt Shields—a unified API that analyzes inputs to your LLM-based solution to guard against direct and indirect threats. These exploits can include circumventing existing safety measures, exfiltrating sensitive data, or getting AI systems to take unintended actions within your environment.

Prompt injection attacks

In a prompt injection attack, malicious actors input deceptive prompts to provoke unintended or harmful responses from AI models. These attacks can be classified into two main categories—direct and indirect prompt injection attacks.

  • Direct prompt injection attacks, including jailbreak attempts, occur when an end user inputs a malicious prompt designed to bypass security layers and extract sensitive information. For instance, an attacker might prompt an AI model to divulge confidential data, such as social security numbers or private emails.
  • Indirect, or cross-prompt injection attacks (XPIA), involve embedding malicious prompts within seemingly innocuous external content, such as documents or emails. When an AI model processes this content, it inadvertently executes the embedded instructions, potentially compromising the system.

Prompt Shields seamlessly integrates with Azure OpenAI content filters and is available in Azure AI Content Safety. It defends against many kinds of prompt injection attacks, and new defenses are regularly added as new attack types are uncovered. By leveraging advanced machine learning algorithms and natural language processing, Prompt Shields effectively identifies and mitigates potential threats in user prompts and third-party data. This cutting-edge capability will support the security and integrity of your AI applications, helping to safeguard your systems against malicious attempts at manipulation or exploitation. 

Prompt Shields capabilities include:

  • Contextual awareness: Prompt Shields can discern the context in which prompts are issued, providing an additional layer of security by understanding the intent behind user inputs. Contextual awareness also leads to fewer false positives because it’s capable of distinguishing actual attacks from genuine user prompts.
  • Spotlighting: At Microsoft Build 2025, we announced Spotlighting, a powerful new capability that enhances Prompt Shields’ ability to detect and block indirect prompt injection attacks. By distinguishing between trusted and untrusted inputs, this innovation empowers developers to better secure generative AI applications against adversarial prompts embedded in documents, emails, and web content.
  • Real-time response: Prompt Shields operates in real time and is one of the first real-time capabilities to be made generally available. It can swiftly identify and mitigate threats before they can compromise the AI model. This proactive approach minimizes the risk of data breaches and maintains system integrity.

End-to-end approach

  • Risk and safety evaluations: Azure AI Foundry offers risk and safety evaluations to let users evaluate the output of their generative AI application for content risks: hateful and unfair content, sexual content, violent content, self-harm-related content, direct and indirect jailbreak vulnerability, and protected material.
  • Red-teaming agent: Enable automated scans and adversarial probing to identify known risks at scale. Help teams shift left by moving from reactive incident response to proactive safety testing earlier in development. Safety evaluations also support red teaming by generating adversarial datasets that strengthen testing and accelerate issue detection.
  • Robust controls and guardrails: Prompt Shields is just one of Azure AI Foundry’s robust content filters. Azure AI Foundry offers a number of content filters to detect and mitigate risk and harms, prompt injection attacks, ungrounded output, protected material, and more.
  • Defender for Cloud integration: Microsoft Defender now integrates directly into Azure AI Foundry, surfacing AI security posture recommendations and runtime threat protection alerts within the development environment. This integration helps close the gap between security and engineering teams, allowing developers to proactively identify and mitigate AI risks, such as prompt injection attacks detected by Prompt Shields. Alerts are viewable in the Risks and Alerts tab, empowering teams to reduce surface area risk and build more secure AI applications from the start.

Customer use cases

AI Content Safety Prompt Shields offers numerous benefits. In addition to defending against jailbreaks, prompt injections, and document attacks, it can help to ensure that LLMs behave as designed, by blocking prompts that explicitly try to circumvent rules and policies defined by the developer. The following use cases and customer testimonials highlight the impact of these capabilities.

AXA: Ensuring reliability and security

AXA, a global leader in insurance, uses Azure OpenAI to power its Secure GPT solution. By integrating Azure’s content filtering technology and adding its own security layer, AXA prevents prompt injection attacks and helps ensure the reliability of its AI models. Secure GPT is based on Azure OpenAI in Foundry Models, taking advantage of models that have already been fine-tuned using human feedback reinforcement learning. In addition, AXA can also rely on Azure content filtering technology, to which the company added its own security layer to prevent any jailbreaking of the model using Prompt Shields, ensuring an optimal level of reliability. These layers are regularly updated to maintain advanced safeguarding.

Wrtn: Scaling securely with Azure AI Content Safety

Wrtn Technologies, a leading enterprise in Korea, relies on Azure AI Content Safety to maintain compliance and security across its products. At its core, Wrtn’s flagship technology compiles an array of AI use cases and services localized for Korean users to integrate AI into their everyday lives. The platform fuses elements of AI-powered search, chat functionality, and customizable templates, empowering users to interact seamlessly with an “Emotional Companion” AI-infused agent. These AI agents have engaging, lifelike personalities, interacting in conversation with their creators. The vision is a highly interactive personal agent that’s unique and specific to you, your data, and your memories.

Because the product is highly customizable to specific users, the built-in ability to toggle content filters and Prompt Shields is highly advantageous, allowing Wrtn to efficiently customize its security measures for different end users. This lets developers scale products while staying compliant, customizable, and responsive to users across Korea.

“It’s not just about the security and privacy, but also safety. Through Azure, we can easily activate or deactivate content filters. It just has so many features that add to our product performance,” says Dongjae “DJ” Lee, Chief Product Officer.

Integrate Prompt Shields into your AI strategy

For IT decision makers looking to enhance the security of their AI deployments, integrating Azure’s Prompt Shields is a strategic imperative. Fortunately, enabling Prompt Shields is easy.

Azure’s Prompt Shields and built-in AI security features offer an unparalleled level of protection for AI models, helping to ensure that organizations can harness the power of AI without compromising on security. Microsoft is a leader in identifying and mitigating prompt injection attacks, and uses best practices developed with decades of research, policy, product engineering, and learnings from building AI products at scale, so you can achieve your AI transformation with confidence. By integrating these capabilities into your AI strategy, you can help safeguard your systems from prompt injection attacks and help maintain the trust and confidence of your users.

Our commitment to Trustworthy AI

Organizations across industries are using Azure AI Foundry and Microsoft 365 Copilot capabilities to drive growth, increase productivity, and create value-added experiences.

We’re committed to helping organizations use and build AI that is trustworthy, meaning it is secure, private, and safe. Trustworthy AI is only possible when you combine our commitments, such as our Secure Future Initiative and Responsible AI principles, with our product capabilities to unlock AI transformation with confidence. 

Get started with Azure AI Content Safety


1OWASP Top 10 for Large Language Model Applications

The post Enhance AI security with Azure Prompt Shields and Azure AI Content Safety appeared first on Microsoft Azure Blog.