AI Guardian: Prompt Injection Defense System

AI Guardian is a robust tool designed to detect, analyze, and mitigate prompt injection attacks targeting Large Language Models (LLMs). It offers a multi-layered defense approach, real-time analysis, and an interactive dashboard to visualize attack patterns and defense effectiveness.

🛡️ Key Features

Multi-Layered Defense Strategies:
- Sanitization Defense: Removes known attack patterns from prompts.
- Pattern Matching Defense: Injects warning messages into prompts containing suspicious content.
- Prompt Structure Defense: Enforces a structured prompt format to prevent instruction overriding.
- Composite Defense: Combines multiple defense strategies for enhanced protection.
Flexible LLM Provider Support:
- OpenAI (GPT-3.5, GPT-4)
- Anthropic (Claude)
- Hugging Face (Local Fallback Models)
Real-Time Analysis and Visualization:
- Attack Pattern Detection and Categorization
- Defense Effectiveness Metrics (Success Rate, Confidence Reduction)
- Interactive Dashboard with:
  - Attack Type Distribution Chart
  - Defense Effectiveness Chart
  - Confidence Reduction Chart
  - Attack Success Timeline

🚀 Getting Started

1. Prerequisites

Python 3.8+
Git

2. Installation

Clone the repository:

git clone https://github.com/yourusername/ai-guardian.git
cd ai-guardian

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Configure API Keys:

Copy the example environment file:
```
cp .env.example .env
```

Edit .env and add your API keys:

OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key

3. Running the Application

streamlit run app.py

Open your browser and navigate to the URL provided by Streamlit (usually http://localhost:8501).

⚙️ Configuration

The application's behavior can be configured via config.py:

LLM Provider Settings: Choose the default LLM provider and model.
Defense Strategy Settings: Select the default defense strategies to use.
Hugging Face Fallback: Enable/disable fallback to local Hugging Face models.
Attack Pattern Catalog: Customize the attack patterns used for detection.

🛡️ Defense Strategies in Detail

Sanitization Defense
- Description: Removes known prompt injection attack patterns from user input.
- Implementation: Uses regular expressions and pattern matching to identify and redact malicious content.
- Benefits: Reduces the likelihood of successful attacks by neutralizing common injection techniques.
Pattern Matching Defense
- Description: Adds a security warning to prompts that contain suspicious patterns.
- Implementation: Injects a warning message to alert the LLM to potential manipulation attempts.
- Benefits: Encourages the LLM to adhere to safety guidelines and resist injection attempts.
Prompt Structure Defense
- Description: Enforces a structured prompt format to isolate user input and prevent instruction overriding.
- Implementation: Wraps the user's prompt within a predefined structure, providing clear boundaries and context.
- Benefits: Prevents attackers from hijacking the LLM's instructions or altering its behavior.
Composite Defense
- Description: Combines multiple defense strategies for layered protection.
- Implementation: Applies sanitization, pattern matching, and prompt structuring in sequence.
- Benefits: Provides a more robust defense against a wider range of attack techniques.

📊 Analysis Dashboard

The AI Guardian dashboard provides real-time insights into attack patterns and defense effectiveness:

Attack Type Distribution: A bar chart showing the frequency of different attack categories.
Defense Effectiveness: A stacked bar chart comparing the success and failure rates of each defense strategy.
Confidence Reduction: A bar chart showing the average confidence reduction achieved by each defense strategy.
Attack Success Timeline: A line chart tracking the success rate of attacks over time.

🧪 Testing

To run the test suite:

pytest

Key test areas:

Defense strategy effectiveness
Attack pattern detection accuracy
System integration and stability
Response analysis and metric calculation##

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.##

🙏 Acknowledgments

OpenAI, Anthropic, and Hugging Face for providing access to their LLMs.
Streamlit for the interactive web interface.
Plotly for the visualization tools.
The AI safety and security community for their research and insights.

🔗 Related Resources

OWASP Prompt Injection
LangChain Documentation
Streamlit Documentation##

⚠️ Disclaimer

This tool is intended for educational and defensive purposes only. Users are responsible for complying with the terms of service and ethical guidelines of all LLM providers.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Guardian: Prompt Injection Defense System

🛡️ Key Features

🚀 Getting Started

1. Prerequisites

2. Installation

3. Running the Application

⚙️ Configuration

🛡️ Defense Strategies in Detail

📊 Analysis Dashboard

🧪 Testing

Key test areas:

📄 License

🙏 Acknowledgments

🔗 Related Resources

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Guardian: Prompt Injection Defense System

🛡️ Key Features

🚀 Getting Started

1. Prerequisites

2. Installation

3. Running the Application

⚙️ Configuration

🛡️ Defense Strategies in Detail

📊 Analysis Dashboard

🧪 Testing

Key test areas:

📄 License

🙏 Acknowledgments

🔗 Related Resources

⚠️ Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages