|
| 1 | +--- |
| 2 | +name: redis-vector-search |
| 3 | +description: Redis vector search guidance covering HNSW vs FLAT algorithm choice, vector index configuration (dims, distance metric, datatype), filtered hybrid search combining vector similarity with TAG or NUMERIC filters, and the RAG retrieval pattern with RedisVL. Use when defining a VECTOR field in FT.CREATE, integrating embeddings (OpenAI, Cohere, sentence-transformers), tuning HNSW parameters (M, EF_CONSTRUCTION, EF_RUNTIME), building a retrieval-augmented generation pipeline, or filtering vector results by attribute. |
| 4 | +license: MIT |
| 5 | +metadata: |
| 6 | + author: Redis, Inc. |
| 7 | + version: "0.1.0" |
| 8 | +--- |
| 9 | + |
| 10 | +# Redis Vector Search |
| 11 | + |
| 12 | +Guidance for storing and searching embeddings in Redis. Covers index configuration, algorithm selection, hybrid filtering, and the RAG retrieval pattern with RedisVL. |
| 13 | + |
| 14 | +## When to apply |
| 15 | + |
| 16 | +- Defining a `VECTOR` field in `FT.CREATE` (raw RQE) or a RedisVL `IndexSchema`. |
| 17 | +- Choosing HNSW vs FLAT and tuning HNSW parameters. |
| 18 | +- Adding category, date, or tenant filters to a vector query. |
| 19 | +- Building a retrieval-augmented generation (RAG) pipeline on top of Redis. |
| 20 | + |
| 21 | +This skill builds on the `redis-query-engine` skill — vector fields live inside RQE indexes and share the same `FT.CREATE` / `FT.SEARCH` machinery. |
| 22 | + |
| 23 | +## 1. Configure the vector index properly |
| 24 | + |
| 25 | +Three settings must match the embedding model: |
| 26 | + |
| 27 | +- **`DIM`** — the model's output dimensionality (e.g. 1536 for OpenAI `text-embedding-3-small`). A mismatch produces silent garbage. |
| 28 | +- **`DISTANCE_METRIC`** — `COSINE` for normalized text embeddings (the common case), `IP` for unnormalized inner-product, `L2` for raw Euclidean. |
| 29 | +- **`TYPE` / `datatype`** — usually `FLOAT32`. Use `FLOAT16` or quantized variants only when memory cost is a hard constraint. |
| 30 | + |
| 31 | +Raw RQE: |
| 32 | + |
| 33 | +``` |
| 34 | +FT.CREATE idx:docs ON HASH PREFIX 1 doc: |
| 35 | + SCHEMA |
| 36 | + content TEXT |
| 37 | + embedding VECTOR HNSW 6 |
| 38 | + TYPE FLOAT32 |
| 39 | + DIM 1536 |
| 40 | + DISTANCE_METRIC COSINE |
| 41 | +``` |
| 42 | + |
| 43 | +RedisVL: |
| 44 | + |
| 45 | +```python |
| 46 | +schema = IndexSchema.from_dict({ |
| 47 | + "index": {"name": "idx:docs", "prefix": "doc:"}, |
| 48 | + "fields": [ |
| 49 | + {"name": "content", "type": "text"}, |
| 50 | + {"name": "embedding", "type": "vector", "attrs": { |
| 51 | + "dims": 1536, "algorithm": "HNSW", |
| 52 | + "datatype": "FLOAT32", "distance_metric": "COSINE", |
| 53 | + }}, |
| 54 | + ] |
| 55 | +}) |
| 56 | +``` |
| 57 | + |
| 58 | +See [references/index-creation.md](references/index-creation.md) for redis-py and RedisVL variants. |
| 59 | + |
| 60 | +## 2. HNSW vs FLAT |
| 61 | + |
| 62 | +| Algorithm | Speed | Accuracy | Memory | Best for | |
| 63 | +|---|---|---|---|---| |
| 64 | +| **HNSW** | Fast (approximate) | ~95%+ recall (tunable) | Higher | Large datasets (>10k vectors), latency-sensitive | |
| 65 | +| **FLAT** | Slow (exact) | 100% | Lower | Small datasets (<10k), accuracy-critical | |
| 66 | + |
| 67 | +Default to **HNSW** for any production-scale workload. Tuning levers: |
| 68 | + |
| 69 | +- `M` — connections per node (16–64). Higher = better recall, more memory. |
| 70 | +- `EF_CONSTRUCTION` — build-time graph quality (100–500). Higher = better index, slower build. |
| 71 | +- `EF_RUNTIME` — query-time candidate-list size. Higher = better recall, slower queries. |
| 72 | + |
| 73 | +Use **FLAT** when the corpus is small and you need exact results (e.g. semantic dedup over a few thousand items). |
| 74 | + |
| 75 | +See [references/algorithm-choice.md](references/algorithm-choice.md). |
| 76 | + |
| 77 | +## 3. Hybrid search — filter before vector |
| 78 | + |
| 79 | +Apply attribute filters (TAG / NUMERIC) so the engine narrows the search space *before* the vector comparison. Don't fetch a wide result set and then filter client-side — that's slower and less accurate. |
| 80 | + |
| 81 | +```python |
| 82 | +from redisvl.query import VectorQuery |
| 83 | +from redisvl.query.filter import Num, Tag |
| 84 | + |
| 85 | +filters = (Tag("category") == "technology") & (Num("date") >= 2024) |
| 86 | + |
| 87 | +query = VectorQuery( |
| 88 | + vector=query_embedding, |
| 89 | + vector_field_name="embedding", |
| 90 | + return_fields=["content", "category", "date"], |
| 91 | + num_results=10, |
| 92 | + filter_expression=filters, |
| 93 | +) |
| 94 | +results = index.query(query) |
| 95 | +``` |
| 96 | + |
| 97 | +For **text + vector fusion** (BM25-weighted text scoring combined with vector similarity), use `HybridQuery` on Redis ≥ 8.4 with redis-py ≥ 7.1, or `AggregateHybridQuery` on older Redis. That's a different "hybrid" from filtered vector search above. |
| 98 | + |
| 99 | +See [references/hybrid-search.md](references/hybrid-search.md). |
| 100 | + |
| 101 | +## 4. RAG pattern |
| 102 | + |
| 103 | +Standard pipeline: embed the user query → vector search Redis → pass top-K context to the LLM. |
| 104 | + |
| 105 | +```python |
| 106 | +# Index documents with embeddings |
| 107 | +records = [{"content": doc.content, |
| 108 | + "embedding": embed_model.encode(doc.content).tolist(), |
| 109 | + "source": doc.source} |
| 110 | + for doc in documents] |
| 111 | +index.load(records) |
| 112 | + |
| 113 | +# Retrieve relevant context for a user question |
| 114 | +q_emb = embed_model.encode(user_question) |
| 115 | +results = index.query(VectorQuery( |
| 116 | + vector=q_emb, |
| 117 | + vector_field_name="embedding", |
| 118 | + return_fields=["content", "source"], |
| 119 | + num_results=5, |
| 120 | +)) |
| 121 | + |
| 122 | +# Generate with retrieved context |
| 123 | +context = "\n".join(r["content"] for r in results) |
| 124 | +response = llm.generate(f"Context: {context}\n\nQuestion: {user_question}") |
| 125 | +``` |
| 126 | + |
| 127 | +Practical tips: |
| 128 | + |
| 129 | +- **Match metric to model.** Most modern text embedding models pair best with `COSINE`. |
| 130 | +- **Chunk long documents** before indexing — retrieval over 200–500-token chunks usually beats indexing whole pages. |
| 131 | +- **Batch inserts** with `index.load([...])` instead of one call per record. |
| 132 | +- **Pre-filter with attributes** (tenant, recency, document type) before the vector search. |
| 133 | + |
| 134 | +See [references/rag-pattern.md](references/rag-pattern.md). |
| 135 | + |
| 136 | +## References |
| 137 | + |
| 138 | +- [Redis: Vectors](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/) |
| 139 | +- [Redis: RAG quickstart](https://redis.io/docs/latest/develop/get-started/rag/) |
| 140 | +- [RedisVL documentation](https://docs.redisvl.com/en/latest/) |
0 commit comments