Skip to content

Add embeddings, segmenter, and tokenizer as MCP tools#24

Open
SBALAVIGNESH123 wants to merge 1 commit into
jina-ai:mainfrom
SBALAVIGNESH123:feat/add-embeddings-segmenter-tokenizer-tools
Open

Add embeddings, segmenter, and tokenizer as MCP tools#24
SBALAVIGNESH123 wants to merge 1 commit into
jina-ai:mainfrom
SBALAVIGNESH123:feat/add-embeddings-segmenter-tokenizer-tools

Conversation

@SBALAVIGNESH123

Copy link
Copy Markdown

Hey, noticed the embeddings and segment APIs are used internally (embeddings for the dedup tools, segment for the token guardrail) but aren't exposed as tools that MCP clients can actually call directly.

This adds three new tools:

  • generate_embeddings - calls /v1/embeddings, lets users generate vectors for their own text. supports model selection, task type, dimensions, normalization
  • segment_text - calls /v1/segment with return_chunks, splits text into semantic chunks with token counts. useful for RAG prep
  • count_tokens - also calls /v1/segment but just returns token counts. handles both single strings and arrays, runs them in parallel

Everything follows the same pattern as the existing tools (getProps, checkBearerToken, handleApiError, yamlStringify responses, createErrorResponse in catch). Added them to ALL_TOOLS, created a new 'embedding' tag in TOOL_TAGS, updated the server instructions so LLMs know when to pick these tools, and added descriptions to the root endpoint.

No new dependencies, no new files. Bumped version to 1.5.0.

Closes #2

Expose Jina embeddings and segmenter APIs as first-class MCP tools.
These APIs were already used internally (embeddings in dedup tools,
segment in token guardrail) but never surfaced to MCP clients.

- generate_embeddings: vector embeddings via /v1/embeddings
- segment_text: semantic chunking via /v1/segment
- count_tokens: lightweight token counting via /v1/segment

Registered under new 'embedding' tag for tool filtering.
No new dependencies. Closes jina-ai#2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Could you please add classifier, segmenter, embeddings, and tokenizer to the MCP tools?

1 participant