Home | Docs | FAQ | Troubleshooting
⏱️ Get running in 2-15 minutes depending on deployment mode
This guide provides copy-paste-friendly instructions for deploying redd-archiver. Choose your deployment mode below and follow the steps.
Run these commands to verify your system is ready:
# Check Docker version (need 24.0+)
docker --version
# Check Docker Compose version (need v2.0+)
docker compose version
# Check available ports
sudo lsof -i :80 # Should be empty
sudo lsof -i :443 # Should be empty (for HTTPS)
sudo lsof -i :5432 # Should be empty📌 Important Note: Redd-Archiver supports two modes:
- Offline Browsing: Generated HTML files work without a server (browse via sorted index pages, no search)
- Server Deployment (below): Required for full-text search functionality (PostgreSQL FTS)
| Mode | Time | Prerequisites | Use Case | Search |
|---|---|---|---|---|
| Local Testing | 5 min | Docker only | Development, testing | ✅ Yes |
| Tor Homelab | 2 min | Docker only | Share archives privately, no networking config | ✅ Yes |
| Production HTTPS | 15 min | Docker + Domain + DNS | Public archives | ✅ Yes |
| Dual-Mode | 17 min | Docker + Domain + DNS | Public + private access | ✅ Yes |
Perfect for trying out redd-archiver on your local machine.
# Clone repository
git clone https://github.com/19-84/redd-archiver.git
cd redd-archiver
# Create required directories
mkdir -p data output/.postgres-data logs tor-public
# Create environment file
cp .env.example .env
# Edit configuration (change YOUR_SECURE_PASSWORD)
nano .envRequired changes in .env:
# Change the password in BOTH places:
POSTGRES_PASSWORD=YOUR_SECURE_PASSWORD
DATABASE_URL=postgresql://reddarchiver:YOUR_SECURE_PASSWORD@/reddarchiver?host=/var/run/postgresqlImportant: The password must match in both POSTGRES_PASSWORD and DATABASE_URL!
# Start all services
docker compose up -d
# Wait for services to become healthy
sleep 30
# Verify services are running
docker compose ps
# All services should show "healthy"# Test nginx
curl http://localhost/health
# Expected: OK
# Test search server
curl http://localhost:5000/health
# Expected: {"status":"healthy"}
# Test database connection
docker compose exec postgres pg_isready -U reddarchiver
# Expected: reddarchiver:5432 - accepting connections
# Visit in browser (will show placeholder until archive generated)
# Open: http://localhost/
# Expected: "Redd-Archiver Deployment Successful" pageNote: You'll see a placeholder page until you generate an archive (Step 4).
Redd-Archiver processes data dumps from multiple platforms:
| Platform | Format | Data Sources |
|---|---|---|
| .zst JSON Lines | Pushshift Complete Dataset (magnet link below) | |
| Voat | SQL dumps | Voat Archive 2021 (22,637 subverses, 3.8M posts, 24M comments) |
| Ruqqus | .7z JSON Lines | Ruqqus Archive 2021 (6,217 guilds, complete archive) |
Reddit Magnet Link:
magnet:?xt=urn:btih:3e3f64dee22dc304cdd2546254ca1f8e8ae542b4&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
Size: 3.28 TB compressed (2005-06 through 2025-12) | Content: 2.38B posts, 40K subreddits
Place downloaded files in the ./data/ directory before running Step 4.
Platform Auto-Detection: Redd-Archiver automatically detects the platform from file extensions:
.zst→ Reddit (Pushshift format).sql/.sql.gz→ Voat (SQL dumps).7z→ Ruqqus (JSON Lines)
Reddit (.zst files):
docker compose exec reddarchiver-builder python reddarc.py /data \
--subreddit YOUR_SUBREDDIT_NAME \
--comments-file /data/YOUR_SUBREDDIT_comments.zst \
--submissions-file /data/YOUR_SUBREDDIT_submissions.zst \
--output /output/ \
--min-score 5 \
--min-comments 2Voat (SQL dumps):
Option 1: Using Pre-Split Files (Recommended - 2-5 minutes)
# Import from pre-split Voat files (1000x faster than full dump)
docker compose exec reddarchiver-builder python reddarc.py /data/voat_split/submissions/ \
--subverse privacy \
--comments-file /data/voat_split/comments/privacy_comments.sql.gz \
--submissions-file /data/voat_split/submissions/privacy_submissions.sql.gz \
--platform voat \
--output /output/ \
--import-only
# Generate HTML
docker compose exec reddarchiver-builder python reddarc.py /data \
--output /output/ \
--export-from-database💡 Tip: Pre-split files import in 2-5 minutes vs 30+ minutes for full dump. See Voat Splitter Tool for details.
Option 2: Full Dump (Slower - imports all subverses)
# Import from complete Voat dump (scans all 22,637 subverses)
docker compose exec reddarchiver-builder python reddarc.py /data/voat/ \
--subverse voatdev,pics \
--output /output/ \
--import-onlyRuqqus (.7z files):
# Import Ruqqus data (p7zip included in Docker - no manual setup required)
docker compose exec reddarchiver-builder python reddarc.py /data/ruqqus/ \
--guild technology \
--comments-file /data/ruqqus/comments.fx.2021-10-30.txt.sort.2021-11-08.7z \
--submissions-file /data/ruqqus/submissions.f1.2021-10-30.txt.sort.2021-11-10.7z \
--platform ruqqus \
--output /output/ \
--import-only
# Generate HTML
docker compose exec reddarchiver-builder python reddarc.py /data \
--output /output/ \
--export-from-database📦 Note: Docker image includes p7zip for .7z decompression. Explicit file paths ensure correct files are used.
Multi-Platform Archive (all three platforms):
# Import Reddit
docker compose exec reddarchiver-builder python reddarc.py /data/reddit/ \
--subreddit banned \
--comments-file /data/reddit/banned_comments.zst \
--submissions-file /data/reddit/banned_submissions.zst \
--output /output/multi-platform/ \
--import-only
# Import Voat (pre-split recommended)
docker compose exec reddarchiver-builder python reddarc.py /data/voat_split/submissions/ \
--subverse privacy \
--comments-file /data/voat_split/comments/privacy_comments.sql.gz \
--submissions-file /data/voat_split/submissions/privacy_submissions.sql.gz \
--platform voat \
--output /output/multi-platform/ \
--import-only
# Import Ruqqus
docker compose exec reddarchiver-builder python reddarc.py /data/ruqqus/ \
--guild technology \
--comments-file /data/ruqqus/comments.fx.2021-10-30.txt.sort.2021-11-08.7z \
--submissions-file /data/ruqqus/submissions.f1.2021-10-30.txt.sort.2021-11-10.7z \
--platform ruqqus \
--output /output/multi-platform/ \
--import-only
# Export unified HTML archive with all three platforms
docker compose exec reddarchiver-builder python reddarc.py /data \
--output /output/multi-platform/ \
--export-from-database \
--base-url https://archive.example.com \
--site-name "Multi-Platform Archive"🌐 Multi-Platform: All three platforms coexist in one PostgreSQL database. Search works across all platforms with correct prefixes (r/, v/, g/).
- ✅ http://localhost/health returns "OK"
- ✅ http://localhost/ shows placeholder page (before archive generation)
- ✅ All containers showing "healthy" status
- ✅ After Step 4: Dashboard with subreddit(s) visible
Perfect for: Sharing archives privately without port forwarding, domain names, or internet exposure.
- ✅ No port forwarding or router configuration
- ✅ No domain name purchase ($0/year saved)
- ✅ Works behind CGNAT and restrictive ISPs
- ✅ Share with friends securely via .onion address
# Clone and configure (same as Local Testing Step 1)
git clone https://github.com/19-84/redd-archiver.git
cd redd-archiver
mkdir -p data output/.postgres-data logs tor-public
cp .env.example .env
nano .env # Change POSTGRES_PASSWORD
# Start with Tor profile
docker compose -f docker-compose.yml -f docker-compose.tor-only.yml --profile tor up -d# Wait for Tor to generate keys
sleep 60
# Display your .onion address
docker compose logs tor | grep "Your .onion address"
# Or read directly from file
cat tor-hidden-service/hostnameExample output:
abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqr.onion
Important: The .onion address will show a placeholder page until you generate an archive (see Step 5 below).
- Download Tor Browser: https://www.torproject.org/download/
- Open Tor Browser
- Visit:
http://YOUR_ONION_ADDRESS.onion - You should see your archive dashboard
- Open Tor Browser: https://www.torproject.org/download/
- Visit:
http://YOUR_ONION_ADDRESS.onion - You should see: "Redd-Archiver Deployment Successful" placeholder page
This is correct! The placeholder appears until you generate an archive (Step 5).
# Place your .zst files in data/ directory, then:
docker compose exec reddarchiver-builder python reddarc.py /data \
--subreddit YOUR_SUBREDDIT_NAME \
--comments-file /data/YOUR_SUBREDDIT_comments.zst \
--submissions-file /data/YOUR_SUBREDDIT_submissions.zst \
--output /output/ \
--min-score 5 \
--min-comments 2
# After processing: Refresh your .onion address to see the archiveShare your .onion address via:
- Encrypted messaging (Signal, etc.)
- Direct message
Recipients need Tor Browser to access.
- ✅ .onion address generated in tor-hidden-service/hostname
- ✅ Placeholder page accessible via Tor Browser (shows "Deployment Successful")
- ✅ After Step 5: Archive dashboard replaces placeholder
- ✅ All services healthy
Common Issue: If you see 403 Forbidden instead of the placeholder, the output/index.html file may have been created with wrong permissions. Run: chmod 644 output/index.html
CRITICAL: Backup ./tor-hidden-service/ directory immediately!
# Create encrypted backup
tar -czf tor-keys-backup-$(date +%Y%m%d).tar.gz tor-hidden-service/
gpg --symmetric --cipher-algo AES256 tor-keys-backup-*.tar.gz
# Store encrypted file securely
# If you lose these keys, your .onion address changes foreverFor public-facing archives with Let's Encrypt SSL certificates.
- ✅ Domain name pointing to your server (e.g., archive.YOUR_DOMAIN.com)
- ✅ DNS A record configured
- ✅ Ports 80 and 443 open in firewall
# Check your server's public IP
curl ifconfig.me
# Check DNS resolution
dig +short archive.YOUR_DOMAIN.com
# These should match!# Clone and create .env
git clone https://github.com/19-84/redd-archiver.git
cd redd-archiver
mkdir -p data output/.postgres-data logs tor-public
cp .env.example .env
nano .envRequired changes in .env:
# Change password in BOTH places:
POSTGRES_PASSWORD=YOUR_SECURE_PASSWORD
DATABASE_URL=postgresql://reddarchiver:YOUR_SECURE_PASSWORD@/reddarchiver?host=/var/run/postgresql
# Set domain and email:
DOMAIN=archive.YOUR_DOMAIN.com
EMAIL=YOUR_EMAIL@YOUR_DOMAIN.com
CERTBOT_TEST_CERT=true # Start with stagingImportant: The password must match in both POSTGRES_PASSWORD and DATABASE_URL!
# Make script executable
chmod +x docker/scripts/init-letsencrypt.sh
# Run automated setup
./docker/scripts/init-letsencrypt.shThe script will:
- Verify DNS configuration
- Start services in HTTP mode
- Request staging certificate (for testing)
- Switch to HTTPS mode
- Verify HTTPS works
# Test HTTPS (use -k for staging cert)
curl -k https://archive.YOUR_DOMAIN.com/health
# Expected: OK
# Test HTTP redirect
curl -I http://archive.YOUR_DOMAIN.com/
# Expected: 301 redirect to HTTPSAfter verifying staging works:
# Update .env
sed -i 's/CERTBOT_TEST_CERT=true/CERTBOT_TEST_CERT=false/' .env
# Remove staging certificates
sudo docker run --rm -v reddarchiver-certbot-certs:/etc/letsencrypt alpine rm -rf /etc/letsencrypt
# Re-run setup (will use production)
./docker/scripts/init-letsencrypt.sh- ✅ https://YOUR_DOMAIN.com returns dashboard (no warnings)
- ✅ http://YOUR_DOMAIN.com redirects to HTTPS
- ✅ Certificate valid and trusted
- ✅ SSL Labs test scores A/A+
Certificates auto-renew every 90 days. Monitor with:
docker compose logs certbot
docker compose exec certbot certbot certificatesCombine public HTTPS access with private Tor access.
- Production HTTPS already working (follow steps above)
# Add Tor profile (no downtime)
docker compose --profile production --profile tor up -d
# Wait for Tor keys
sleep 60
# Get .onion address
docker compose logs tor | grep "Your .onion address"# Test HTTPS
curl https://archive.YOUR_DOMAIN.com/health
# Test Tor (from Tor Browser)
# Visit: http://YOUR_ONION_ADDRESS.onion- ✅ Archive accessible via HTTPS (clearnet)
- ✅ Archive accessible via .onion (Tor)
- ✅ Both show identical content
- ✅ All services healthy
Use the same data and settings as the example instance (r/banned, r/RedditCensors).
# Place your .zst files in data/
ls data/
# Should show:
# banned_comments.zst
# banned_submissions.zst
# RedditCensors_comments.zst
# RedditCensors_submissions.zst
# Process both subreddits
docker compose exec reddarchiver-builder python reddarc.py /data \
--output /output/ \
--base-url https://YOUR_SITE.github.io/ \
--site-name "YOUR SITE Archive" \
--min-score 5 \
--min-comments 2After processing:
- Archive in ./output/
- Both subreddits visible on dashboard
- Search functionality working
- User pages generated
- Static files (CSS, JS) present
# Check logs
docker compose logs
# Verify ports available
sudo lsof -i :80
sudo lsof -i :443
sudo lsof -i :5432
# Check resources
docker stats# Verify PostgreSQL healthy
docker compose ps postgres
# Test connection
docker compose exec reddarchiver-builder \
psql -h /var/run/postgresql -U reddarchiver -d reddarchiver -c "SELECT 1"
# Common issue: Password mismatch in .env
# Verify POSTGRES_PASSWORD and DATABASE_URL have same password
grep POSTGRES_PASSWORD .env
grep DATABASE_URL .env
# The password should match in both lines!# Verify DNS
dig +short YOUR_DOMAIN
# Check certbot logs
docker compose logs certbot
# Test with staging first
# Set CERTBOT_TEST_CERT=true in .env# Check Tor logs
docker compose logs tor
# Common issues:
# 1. Directory ownership - must be owned by UID 100 (tor user)
sudo chown -R 100:100 tor-hidden-service/
# 2. Directory permissions - must be 700 (not readable by others)
sudo chmod 700 tor-hidden-service/
# 3. Restart Tor after fixing
docker compose restart tor
sleep 60
cat tor-hidden-service/hostname# This is NORMAL if you haven't generated an archive yet!
# The deployment creates infrastructure but no content.
# Solution: Generate an archive
docker compose exec reddarchiver-builder python reddarc.py /data \
--subreddit YOUR_SUBREDDIT \
--comments-file /data/SUBREDDIT_comments.zst \
--submissions-file /data/SUBREDDIT_submissions.zst \
--output /output/
# After processing completes, refresh your browser
# You should see the archive dashboard instead of 403# This means deployment is working correctly!
# Generate your first archive to see actual content:
docker compose exec reddarchiver-builder python reddarc.py /data \
--output /output/
# The placeholder will be replaced with the archive dashboardAfter your archive is running, you can enable AI integration with the MCP server:
# Start MCP server alongside other services
docker compose up -d mcp-server
# Or run locally
cd mcp_server/
uv run python server.py --api-url http://localhost:5000Add to your claude_desktop_config.json:
{
"mcpServers": {
"reddarchiver": {
"command": "uv",
"args": ["--directory", "/path/to/redd-archiver/mcp_server", "run", "python", "server.py"],
"env": { "REDDARCHIVER_API_URL": "http://localhost:5000" }
}
}
}Restart Claude Desktop and you should have access to 29 MCP tools for querying your archive.
See MCP Server Documentation for complete setup and tool reference.
For users who need more control over the archive generation process.
--min-score N # Minimum post score threshold
--min-comments N # Minimum comment count threshold
--hide-deleted-comments # Hide [deleted]/[removed] comments
--no-user-pages # Skip user page generation (saves memory)
--dry-run # Preview discovered files without processing
--force-rebuild # Ignore resume state and rebuild from scratch
--force-parallel-users # Override auto-detection for parallel processing
--log-file <path> # Custom log file location
--log-level DEBUG # Set verbosity (DEBUG, INFO, WARNING, ERROR, CRITICAL)
For advanced users optimizing large archives:
--debug-memory-limit 8.0 # Override memory limit in GB
--debug-max-connections 8 # Override DB connection pool size
--debug-max-workers 4 # Override parallel workers
# Required
DATABASE_URL=postgresql://user:pass@host:5432/reddarchiver
# Optional (auto-detected if not set)
REDDARCHIVER_MAX_DB_CONNECTIONS=8
REDDARCHIVER_MAX_PARALLEL_WORKERS=4
REDDARCHIVER_USER_BATCH_SIZE=2000
See also: FAQ for common questions about these options.
- Detailed Configuration: See
docker/README.mdfor advanced options - Tor Security Guide: See
docs/TOR_DEPLOYMENT.mdfor operational security - Static Hosting: See
docs/STATIC_DEPLOYMENT.mdfor GitHub/Codeberg Pages - REST API Reference: See
docs/API.mdfor 30+ API endpoints - MCP Server Setup: See
mcp_server/README.mdfor AI integration - Performance Tuning: See
.env.examplefor all configuration options
Last Tested: 2025-12-27 (Local HTTP deployment successful, all services verified healthy) Test Environment: Docker 28.5.2, Docker Compose v2.40.3 Docker Compose: 3.8 PostgreSQL: 18-alpine Python: 3.12
Test Status: ✅ Local HTTP deployment verified working. All verification tests passed (nginx, search-server, PostgreSQL).