This project uses Redis 8, which is the redis:8 docker image. Do not use Redis Stack or other earlier versions of Redis.
Get started in a new environment by installing uv:
pip install uv # Install uv (once)
make setup # Create .venv, sync deps, install pre-commit hooks
make sync # Sync latest dependenciesYou MUST always activate the virtualenv before running commands:
source .venv/bin/activateAlways run tests before committing. You MUST have 100% of the tests in the code basepassing to commit.
Run all tests like this, including tests that require API keys in the environment:
make test-apimake pre-commit # Run the exact formatting/lint hooks used in CI
make verify # Run full local verification (pre-commit + API tests)uv add # Add a dependency to pyproject.toml and update lock file uv remove # Remove a dependency from pyproject.toml and update lock file
uv run agent-memory api # Start REST API server (default port 8000) uv run agent-memory mcp # Start MCP server (stdio mode) uv run agent-memory mcp --mode sse --port 9000 # Start MCP server (SSE mode)
uv run agent-memory rebuild-index # Rebuild Redis search index uv run agent-memory migrate-memories # Run memory migrations
uv run agent-memory task-worker # Start background task worker
uv run agent-memory schedule-task "agent_memory_server.long_term_memory.compact_long_term_memories"
docker-compose up # Start full stack (API, MCP, Redis) docker-compose up redis # Start only Redis Stack docker-compose down # Stop all services
### Committing Changes
IMPORTANT: This project uses `pre-commit`. You should run `pre-commit`
before committing:
```bash
make pre-commit
- REST API: Traditional HTTP endpoints for web applications (
api.py) - MCP Server: Model Context Protocol for AI agent integration (
mcp.py) - Both interfaces share the same core memory management logic
# Two-tier memory system
Working Memory (Session-scoped) → Long-term Memory (Persistent)
↓ ↓
- Messages - Semantic search
- Context - Topic modeling
- Structured memories - Entity recognition
- Metadata - DeduplicationCRITICAL: Always use RedisVL query types instead of direct redis-py client access for searches:
# Correct - Use RedisVL queries
from redisvl.query import VectorQuery, FilterQuery
query = VectorQuery(vector=embedding, vector_field_name="vector", return_fields=["text"])
# Avoid - Direct redis client searches
# redis.ft().search(...) # Don't do this- All core operations are async
- Background task processing with Docket
- Async Redis connections throughout
Place all imports at the top of modules, not inside functions. Inline imports should only be used when strictly necessary (e.g., avoiding circular dependencies, optional dependencies, or significant startup performance concerns).
- PRODUCTION: Never set
DISABLE_AUTH=truein production - DEVELOPMENT: Use
DISABLE_AUTH=truefor local testing only - JWT/OAuth2 authentication required for all endpoints except
/health,/docs,/openapi.json
- Working memory automatically promotes structured memories to long-term storage
- Conversations are summarized when exceeding window size
- Use model-aware token limits for context window management
Always use RedisVL query types for any search operations. This is a project requirement.
The project uses pytest with testcontainers for Redis integration testing:
make test- Run the standard test suitemake test-api- Run all tests including API-key-dependent testsmake test-unit- Unit tests onlymake test-integration- Integration tests (require Redis)make test-cov- Run tests with coverage
agent_memory_server/
├── main.py # FastAPI application entry point
├── api.py # REST API endpoints
├── mcp.py # MCP server implementation
├── config.py # Configuration management
├── auth.py # OAuth2/JWT authentication
├── models.py # Pydantic data models
├── working_memory.py # Session-scoped memory management
├── long_term_memory.py # Persistent memory with semantic search
├── messages.py # Message handling and formatting
├── summarization.py # Conversation summarization
├── extraction.py # Topic and entity extraction
├── filters.py # Search filtering logic
├── llm/ # LLM client package (LiteLLM-based)
│ ├── __init__.py # Re-exports for clean imports
│ ├── client.py # LLMClient class with chat/embedding methods
│ ├── types.py # ChatCompletionResponse, EmbeddingResponse, LLMBackend
│ └── exceptions.py # LLMClientError, ModelValidationError, APIKeyMissingError
├── migrations.py # Database schema migrations
├── docket_tasks.py # Background task definitions
├── cli.py # Command-line interface
├── dependencies.py # FastAPI dependency injection
├── healthcheck.py # Health check endpoint
├── logging.py # Structured logging setup
├── client/ # Client libraries
└── utils/ # Utility modules
├── redis.py # Redis connection and setup
├── keys.py # Redis key management
└── api_keys.py # API key utilities
- Working Memory: Session-scoped storage with automatic summarization
- Long-term Memory: Persistent storage with semantic search capabilities
- Memory Promotion: Automatic migration from working to long-term memory
- Deduplication: Prevents duplicate memories using content hashing
- Semantic Search: Vector-based similarity search using embeddings
- Filtering System: Advanced filtering by session, namespace, topics, entities, timestamps
- Hybrid Search: Combines semantic similarity with metadata filtering
- RedisVL Integration: All search operations use RedisVL query builders
- Topic Modeling: Automatic topic extraction using BERTopic or LLM
- Entity Recognition: BERT-based named entity recognition
- Summarization: Conversation summarization when context window exceeded
- Multi-LLM Support: OpenAI, Anthropic, and other providers
- OAuth2/JWT: Industry-standard authentication with JWKS validation
- Multi-Provider: Auth0, AWS Cognito, Okta, Azure AD support
- Role-Based Access: Fine-grained permissions using JWT claims
- Development Mode:
DISABLE_AUTHfor local development
- Docket Tasks: Redis-based task queue for background operations
- Memory Indexing: Asynchronous embedding generation and indexing
- Compaction: Periodic cleanup and optimization of stored memories
Key environment variables:
# Redis
REDIS_URL=redis://localhost:6379
# Authentication (Production)
OAUTH2_ISSUER_URL=https://your-auth-provider.com
OAUTH2_AUDIENCE=your-api-audience
DISABLE_AUTH=false # Never true in production
# Development
DISABLE_AUTH=true # Local development only
LOG_LEVEL=DEBUG
# AI Services
OPENAI_API_KEY=your-key
ANTHROPIC_API_KEY=your-key
GENERATION_MODEL=gpt-4o-mini
EMBEDDING_MODEL=text-embedding-3-small
# Memory Configuration
LONG_TERM_MEMORY=true
ENABLE_TOPIC_EXTRACTION=true
ENABLE_NER=true- Session management (
/v1/working-memory/) - Working memory operations (
/v1/working-memory/{id}) - Long-term memory search (
/v1/long-term-memory/search) - Memory hydration (
/v1/memory/prompt)
create_long_term_memories- Store persistent memoriessearch_long_term_memory- Semantic search with filteringmemory_prompt- Hydrate queries with relevant contextset_working_memory- Manage session memory
- Install uv:
pip install uvto get started with uv - Setup:
make setup - Redis: Start Redis Stack via
docker-compose up redis - Development: Use
DISABLE_AUTH=truefor local testing - Testing: Run
make verifybefore committing - Linting:
make pre-commitmatches the CI lint gate exactly - Background Tasks: Start worker with
uv run agent-memory task-worker
- API docs available at
/docswhen server is running - OpenAPI spec at
/openapi.json - Authentication examples in README.md
- System architecture diagram in
diagram.png