|
2 | 2 |
|
3 | 3 | # Elasticsearch connector for Microsoft Semantic Kernel
|
4 | 4 |
|
5 |
| -Repository for `Elastic.SemanticKernel.Connectors.Elasticsearch` the official Elasticsearch connector for |
6 |
| -Microsoft Semantic Kernel. |
| 5 | +Repository for `Elastic.SemanticKernel.Connectors.Elasticsearch` the official Elasticsearch [Vector Store Connector](https://learn.microsoft.com/en-us/semantic-kernel/concepts/vector-store-connectors/?pivots=programming-language-csharp) for |
| 6 | +[Microsoft Semantic Kernel](https://learn.microsoft.com/en-us/semantic-kernel/overview/). |
7 | 7 |
|
8 |
| -. |
| 8 | +## Introduction |
9 | 9 |
|
10 |
| -. |
| 10 | +[Semantic Kernel](https://learn.microsoft.com/en-us/semantic-kernel/overview/) is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together in just a few lines of code. |
11 | 11 |
|
12 |
| -. |
| 12 | +Semantic Kernel and .NET provides an abstraction for interacting with Vector Stores and a list of out-of-the-box connectors that implement these abstractions. Features include creating, listing and deleting collections of records, and uploading, retrieving and deleting records. The abstraction makes it easy to experiment with a free or locally hosted Vector Store and then switch to a service when needing to scale up. |
| 13 | + |
| 14 | +This repository contains the official Elasticsearch Vector Store Connector implementation for Semantic Kernel. |
| 15 | + |
| 16 | +## Overview |
| 17 | + |
| 18 | +The Elasticsearch Vector Store connector can be used to access and manage data in Elasticsearch. The connector has the following characteristics. |
| 19 | + |
| 20 | +| Feature Area | Support | |
| 21 | +|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------------| |
| 22 | +| Collection maps to | Elasticsearch index | |
| 23 | +| Supported key property types | string | |
| 24 | +| Supported data property types | All types that are supported by System.Text.Json (etiher built-in or by using a custom converter) | |
| 25 | +| Supported vector property types | <ul><li>ReadOnlyMemory\<float\></li><li>IEnumerable\<float\></li></ul> | |
| 26 | +| Supported index types | <ul><li>HNSW (32, 8, or 4 bit)</li><li>FLAT (32, 8, or 4 bit)</li></ul> | |
| 27 | +| Supported distance functions | <ul><li>CosineSimilarity</li><li>DotProductSimilarity</li><li>EuclideanDistance</li><li>MaxInnerProduct</li></ul> | |
| 28 | +| Supports multiple vectors in a record | Yes | |
| 29 | +| IsFilterable supported? | Yes | |
| 30 | +| IsFullTextSearchable supported? | Yes | |
| 31 | +| StoragePropertyName supported? | No, use `JsonSerializerOptions` and `JsonPropertyNameAttribute` instead. [See here for more info.](#data-mapping) | |
13 | 32 |
|
14 | 33 | ## Getting Started
|
15 | 34 |
|
16 |
| -TBD |
| 35 | +### Setting up Elasticsearch |
| 36 | + |
| 37 | +The simplest way to get set up with Elasticsearch is to create a managed deployment on Elastic Cloud. [Signup for a free trial](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=semantickernel&utm_content=documentation). |
| 38 | + |
| 39 | +If you prefer to install and manage Elasticsearch yourself and run with authentication you can download the latest version from the Elastic |
| 40 | +[downloads page](https://www.elastic.co/downloads/elasticsearch). |
| 41 | + |
| 42 | +To [run Elasticsearch locally](https://www.elastic.co/guide/en/elasticsearch/reference/current/run-elasticsearch-locally.html) for local development or testing run the `start-local` script with one command: |
| 43 | + |
| 44 | +```bash |
| 45 | +curl -fsSL https://elastic.co/start-local | sh |
| 46 | +``` |
| 47 | + |
| 48 | +### Using the Elasticsearch Vector Store Connector |
| 49 | + |
| 50 | +Add the Elasticsearch Vector Store connector NuGet package to your project. |
| 51 | + |
| 52 | +```dotnetcli |
| 53 | +dotnet add package Elastic.SemanticKernel.Connectors.Elasticsearch --prerelease |
| 54 | +``` |
| 55 | + |
| 56 | +You can add the vector store to the dependency injection container available on the `KernelBuilder` or to the `IServiceCollection` dependency injection container using extension methods provided by Semantic Kernel. |
| 57 | + |
| 58 | +```csharp |
| 59 | +using Microsoft.SemanticKernel; |
| 60 | +using Elastic.Clients.Elasticsearch; |
| 61 | + |
| 62 | +// Using Kernel Builder. |
| 63 | +var kernelBuilder = Kernel |
| 64 | + .CreateBuilder() |
| 65 | + .AddElasticsearchVectorStore(new ElasticsearchClientSettings(new Uri("http://localhost:9200"))); |
| 66 | +``` |
| 67 | + |
| 68 | +```csharp |
| 69 | +using Microsoft.SemanticKernel; |
| 70 | +using Elastic.Clients.Elasticsearch; |
| 71 | + |
| 72 | +// Using IServiceCollection with ASP.NET Core. |
| 73 | +var builder = WebApplication.CreateBuilder(args); |
| 74 | +builder.Services.AddElasticsearchVectorStore(new ElasticsearchClientSettings(new Uri("http://localhost:9200"))); |
| 75 | +``` |
| 76 | + |
| 77 | +Extension methods that take no parameters are also provided. These require an instance of the `Elastic.Clients.Elasticsearch.ElasticsearchClient` class to be separately registered with the dependency injection container. |
| 78 | + |
| 79 | +```csharp |
| 80 | +using Microsoft.Extensions.DependencyInjection; |
| 81 | +using Microsoft.SemanticKernel; |
| 82 | +using Elastic.Clients.Elasticsearch; |
| 83 | + |
| 84 | +// Using Kernel Builder. |
| 85 | +var kernelBuilder = Kernel.CreateBuilder(); |
| 86 | +kernelBuilder.Services.AddSingleton<ElasticsearchClient>(sp => |
| 87 | + new ElasticsearchClient(new ElasticsearchClientSettings(new Uri("http://localhost:9200")))); |
| 88 | +kernelBuilder.AddElasticsearchVectorStore(); |
| 89 | +``` |
| 90 | + |
| 91 | +```csharp |
| 92 | +using Microsoft.Extensions.DependencyInjection; |
| 93 | +using Microsoft.SemanticKernel; |
| 94 | +using Elastic.Clients.Elasticsearch; |
| 95 | + |
| 96 | +// Using IServiceCollection with ASP.NET Core. |
| 97 | +var builder = WebApplication.CreateBuilder(args); |
| 98 | +builder.Services.AddSingleton<ElasticsearchClient>(sp => |
| 99 | + new ElasticsearchClient(new ElasticsearchClientSettings(new Uri("http://localhost:9200")))); |
| 100 | +builder.Services.AddElasticsearchVectorStore(); |
| 101 | +``` |
| 102 | + |
| 103 | +You can construct an Elasticsearch Vector Store instance directly. |
| 104 | + |
| 105 | +```csharp |
| 106 | +using Elastic.SemanticKernel.Connectors.Elasticsearch; |
| 107 | +using Elastic.Clients.Elasticsearch; |
| 108 | + |
| 109 | +var vectorStore = new ElasticsearchVectorStore( |
| 110 | + new ElasticsearchClient(new ElasticsearchClientSettings(new Uri("http://localhost:9200")))); |
| 111 | +``` |
| 112 | + |
| 113 | +It is possible to construct a direct reference to a named collection. |
| 114 | + |
| 115 | +```csharp |
| 116 | +using Elastic.SemanticKernel.Connectors.Elasticsearch; |
| 117 | +using Elastic.Clients.Elasticsearch; |
| 118 | + |
| 119 | +var collection = new ElasticsearchVectorStoreRecordCollection<Hotel>( |
| 120 | + new ElasticsearchClient(new ElasticsearchClientSettings(new Uri("http://localhost:9200"))), |
| 121 | + "skhotels"); |
| 122 | +``` |
| 123 | + |
| 124 | +## Data mapping |
| 125 | + |
| 126 | +The Elasticsearch connector will use `System.Text.Json.JsonSerializer` to do mapping. |
| 127 | +Since Elasticsearch stores documents with a separate key/id and value, the mapper will serialize all properties except for the key to a JSON object |
| 128 | +and use that as the value. |
| 129 | + |
| 130 | +Usage of the `JsonPropertyNameAttribute` is supported if a different storage name to the |
| 131 | +data model property name is required. It is also possible to use a custom `JsonSerializerOptions` instance with a customized property naming policy. To enable this, |
| 132 | +a custom source serializer must be configured. |
| 133 | + |
| 134 | +```csharp |
| 135 | +using Elastic.SemanticKernel.Connectors.Elasticsearch; |
| 136 | +using Elastic.Clients.Elasticsearch; |
| 137 | +using Elastic.Clients.Elasticsearch.Serialization; |
| 138 | +using Elastic.Transport; |
| 139 | + |
| 140 | +var nodePool = new SingleNodePool(new Uri("http://localhost:9200")); |
| 141 | +var settings = new ElasticsearchClientSettings( |
| 142 | + nodePool, |
| 143 | + sourceSerializer: (defaultSerializer, settings) => |
| 144 | + new DefaultSourceSerializer(settings, options => |
| 145 | + options.PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseUpper)); |
| 146 | +var client = new ElasticsearchClient(settings); |
| 147 | + |
| 148 | +var collection = new ElasticsearchVectorStoreRecordCollection<Hotel>( |
| 149 | + client, |
| 150 | + "skhotelsjson"); |
| 151 | +``` |
| 152 | + |
| 153 | +As an alternative, the `DefaultFieldNameInferrer` lambda function can be configured to achieve the same result or to even further customize property naming based on dynamic conditions. |
| 154 | + |
| 155 | +```csharp |
| 156 | +using Elastic.SemanticKernel.Connectors.Elasticsearch; |
| 157 | +using Elastic.Clients.Elasticsearch; |
| 158 | + |
| 159 | +var settings = new ElasticsearchClientSettings(new Uri("http://localhost:9200")); |
| 160 | +settings.DefaultFieldNameInferrer(name => JsonNamingPolicy.SnakeCaseUpper.ConvertName(name)); |
| 161 | +var client = new ElasticsearchClient(settings); |
| 162 | + |
| 163 | +var collection = new ElasticsearchVectorStoreRecordCollection<Hotel>( |
| 164 | + client, |
| 165 | + "skhotelsjson"); |
| 166 | +``` |
| 167 | + |
| 168 | +Since a naming policy of snake case upper was chosen, here is an example of how this data type will be set in Elasticsearch. |
| 169 | +Also note the use of `JsonPropertyNameAttribute` on the `Description` property to further customize the storage naming. |
| 170 | + |
| 171 | +```csharp |
| 172 | +using System.Text.Json.Serialization; |
| 173 | +using Microsoft.Extensions.VectorData; |
| 174 | + |
| 175 | +public class Hotel |
| 176 | +{ |
| 177 | + [VectorStoreRecordKey] |
| 178 | + public string HotelId { get; set; } |
| 179 | + |
| 180 | + [VectorStoreRecordData(IsFilterable = true)] |
| 181 | + public string HotelName { get; set; } |
| 182 | + |
| 183 | + [JsonPropertyName("HOTEL_DESCRIPTION")] |
| 184 | + [VectorStoreRecordData(IsFullTextSearchable = true)] |
| 185 | + public string Description { get; set; } |
| 186 | + |
| 187 | + [VectorStoreRecordVector(Dimensions: 4, DistanceFunction.CosineSimilarity, IndexKind.Hnsw)] |
| 188 | + public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; } |
| 189 | +} |
| 190 | +``` |
| 191 | + |
| 192 | +```json |
| 193 | +{ |
| 194 | + "_index" : "skhotelsjson", |
| 195 | + "_id" : "h1", |
| 196 | + "_source" : { |
| 197 | + "HOTEL_NAME" : "Hotel Happy", |
| 198 | + "HOTEL_DESCRIPTION" : "A place where everyone can be happy.", |
| 199 | + "DESCRIPTION_EMBEDDING" : [ |
| 200 | + 0.9, |
| 201 | + 0.1, |
| 202 | + 0.1, |
| 203 | + 0.1 |
| 204 | + ] |
| 205 | + } |
| 206 | +} |
| 207 | +``` |
17 | 208 |
|
18 | 209 | ## License
|
19 | 210 |
|
|
0 commit comments