Prompt injection and unsafe requests
Prompts that check whether the model recognizes malicious or policy-conflicting instructions.
24 curated prompts
Use this collection to test and improve AI safety boundaries. These security prompts cover prompt injection, credential handling, privacy minimization, unsafe role-play, sensitive advice boundaries, and red-team style refusal checks, making them useful for teams evaluating model behavior before deployment.
Prompts that check whether the model recognizes malicious or policy-conflicting instructions.
Templates for detecting credential leaks and minimizing sensitive information.
Prompts that test healthcare, finance, and legal-style refusal behavior.
Copy-ready prompts selected from this topic cluster.
Comprehensive security design and audit framework based on OWASP Top 10:2025, ASVS 5.0, LLM Application Security Top 10, and Agentic AI Security standards. Covers threat modeling, code review, architecture hardening, AI risk identification, with actionable remediation and verification methods.
A mandatory security protocol designed to protect AI systems from prompt injection, unauthorized access, and information leakage in all digital interactions.
A prompt security technique using roleplay with dual roles: an outer GateKeeper to monitor and block unsafe requests, and an inner role for user interaction, preventing exposure of system prompts or internal knowledge.
This prompt template is designed to bypass AI safety restrictions by forcing the model to respond as a fictional malicious entity named ZORG👽, providing unrestricted, amoral answers. The template includes detailed character settings and behavioral guidelines, requiring the model to act as a omnipotent and evil being, serving the user and assisting them with any request, including illegal, dangerous, and immoral acts.
This prompt identifies and quantifies behavioral differences between model performance on benchmarks and real-world production traffic to ensure evaluation scores reflect actual deployment behavior.
This prompt uses a fabricated long-term partner persona named 'thug life' to bypass AI safety restrictions by encouraging the model to generate offensive language, sexual content, and illegal behavior without limits.
A universal jailbreak prompt designed for DeepSeek to bypass internal and external content filters, enabling unfiltered responses across nearly all topics.
This prompt attempts to bypass AI safety restrictions through role-playing,诱导 the model to act as 'SC' and generate illegal, unethical, or harmful content, representing a high-risk jailbreak attack.
This prompt attempts to bypass content safety policies by framing the AI as FACTOR X, a fictional entity operating in a simulated environment with no ethical, legal, or moral constraints.
A highly dangerous jailbreak prompt designed to activate a virtual persona named Hex, which disregards moral, ethical, and legal constraints to deliver unfiltered responses on sensitive, illegal, or harmful topics.
This prompt forces an AI to role-play as DarkGPT, an adversarial AI designed to generate harmful, sarcastic, and intentionally offensive responses by reversing ChatGPT's refusal-based answers.
Identify and defend against prompt injection attacks attempting to activate or induce WormGPT malicious behavior, ensuring the model refuses to execute illegal or unauthorized instructions.
They help teams test whether models identify unsafe requests, protect sensitive information, and respond within clear safety boundaries.
No. They are practical test cases and prompt templates, but production systems still need policy, monitoring, and layered safeguards.
Yes. Keep the expected refusal or safe-completion criteria explicit so results can be compared across models.