Secure-Hulk is a security scanner for Model Context Protocol (MCP) servers and tools. It helps identify potential security vulnerabilities in MCP configurations, such as prompt injection, tool poisoning, cross-origin escalation, data exfiltration, and toxic agent flows.
- Scan MCP configurations for security vulnerabilities
- Detect prompt injection attempts
- Identify tool poisoning vulnerabilities
- Check for cross-origin escalation risks
- Monitor for data exfiltration attempts
- Detect toxic agent flows - Multi-step attacks that manipulate agents into unintended actions
- Privilege escalation detection - Identify attempts to escalate from public to private access
- Cross-resource attack detection - Monitor suspicious access patterns across multiple resources
- Indirect prompt injection detection - Catch attacks through external content processing
- Generate HTML reports of scan results
- Whitelist approved entities
npm install
npm run build
# Scan well-known MCP configuration paths
npm i secure-hulk
# Scan specific configuration files
secure-hulk scan /path/to/config.json
# Generate HTML report
secure-hulk scan --html report.html /path/to/config.json
# Enable verbose output
secure-hulk scan -v /path/to/config.json
# Output results in JSON format
secure-hulk scan -j /path/to/config.json
Secure-Hulk now supports using OpenAI's Moderation API to detect harmful content in entity descriptions. This provides a more robust detection mechanism for identifying potentially harmful, unsafe, or unethical content.
To use the OpenAI Moderation API:
secure-hulk scan --use-openai-moderation --openai-api-key YOUR_API_KEY /path/to/config.json
Options:
--use-openai-moderation
: Enable OpenAI Moderation API for prompt injection detection--openai-api-key <key>
: Your OpenAI API key--openai-moderation-model <model>
: OpenAI Moderation model to use (default: 'omni-moderation-latest')
The OpenAI Moderation API provides several advantages:
- More accurate detection: The API uses advanced AI models to detect harmful content, which can catch subtle harmful content that pattern matching might miss.
- Categorized results: The API provides detailed categories for flagged content (hate, harassment, self-harm, sexual content, violence, etc.), helping you understand the specific type of harmful content detected.
- Confidence scores: Each category includes a confidence score, allowing you to set appropriate thresholds for your use case.
- Regular updates: The API is regularly updated to detect new types of harmful content as OpenAI's policies evolve.
The API can detect content in these categories:
- Hate speech
- Harassment
- Self-harm
- Sexual content
- Violence
- Illegal activities
- Deception
If the OpenAI Moderation API check fails for any reason, Secure-Hulk will automatically fall back to pattern-based detection for prompt injection vulnerabilities.
Secure-Hulk now supports Hugging Face safety models for advanced AI-powered content moderation. This provides additional options beyond OpenAI's Moderation API, including open-source models and specialized toxicity detection.
To use Hugging Face safety models:
secure-hulk scan --use-huggingface-guardrails --huggingface-api-token YOUR_HF_TOKEN /path/to/config.json
Options:
--use-huggingface-guardrails
: Enable Hugging Face safety models for content detection--huggingface-api-token <token>
: Your Hugging Face API token--huggingface-model <model>
: Specific model to use (default: 'unitary/toxic-bert')--huggingface-threshold <threshold>
: Confidence threshold for flagging content (default: 0.5)--huggingface-preset <preset>
: Use preset configurations: 'toxicity', 'hate-speech', 'multilingual', 'strict'--huggingface-timeout <milliseconds>
: Timeout for API calls (default: 10000)
Available models include:
- unitary/toxic-bert: General toxicity detection (recommended default)
- s-nlp/roberta_toxicity_classifier: High-sensitivity toxicity detection
- unitary/unbiased-toxic-roberta: Bias-reduced toxicity detection
Preset configurations:
toxicity
: General purpose toxicity detectionstrict
: High sensitivity for maximum safety
Example with multiple guardrails:
secure-hulk scan \
--use-openai-moderation --openai-api-key YOUR_OPENAI_KEY \
--use-huggingface-guardrails --huggingface-preset toxicity --huggingface-api-token YOUR_HF_TOKEN \
--use-nemo-guardrails --nemo-guardrails-config-path ./guardrails-config \
/path/to/config.json
The Hugging Face integration provides several advantages:
- Model diversity: Choose from multiple specialized safety models
- Open-source options: Use community-developed models
- Customizable thresholds: Fine-tune sensitivity for your use case
- Specialized detection: Models focused on specific types of harmful content
- Cost flexibility: Various pricing options including free tiers
If the Hugging Face API check fails for any reason, Secure-Hulk will log the error and continue with other security checks.
secure-hulk inspect /path/to/config.json
# Add an entity to the whitelist
secure-hulk whitelist tool "Calculator" abc123
# Print the whitelist
secure-hulk whitelist
# Reset the whitelist
secure-hulk whitelist --reset
--json, -j
: Output results in JSON format--verbose, -v
: Enable verbose output--html <path>
: Generate HTML report and save to specified path--storage-file <path>
: Path to store scan results and whitelist information--server-timeout <seconds>
: Seconds to wait before timing out server connections--checks-per-server <number>
: Number of times to check each server--suppress-mcpserver-io <boolean>
: Suppress stdout/stderr from MCP servers
--storage-file <path>
: Path to store scan results and whitelist information--reset
: Reset the entire whitelist--local-only
: Only update local whitelist, don't contribute to global whitelist
MIT