Skip to content

[Search] add longer timeouts to ingest examples for semantic_text #221315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TattdCodeMonkey
Copy link
Contributor

@TattdCodeMonkey TattdCodeMonkey commented May 22, 2025

Summary

When following the example code for ingestion with a semantic_text mapping you will usually get a client timeout with the defaults. This is because on the first ingest you have to wait for the model to deploy. To try and avoid this I am updating our example code to explicitly set a longer client timeout with a comment in the example about why.

Screenshots

Python
image

Javascript
image

Checklist

@TattdCodeMonkey TattdCodeMonkey requested a review from a team as a code owner May 22, 2025 20:08
@TattdCodeMonkey TattdCodeMonkey added release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting Team:Search v9.1.0 labels May 22, 2025
},
});
// Set the ingestion timeout to 5 minutes, to allow for semantic ingestion
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leemthompo copy check here, I'm thinking about how I can i18n this, but that would mean it would have to be single line vs this cleaner multi-line comment.

sampleDocuments,
}) => `from elasticsearch import Elasticsearch, helpers

# Set the ingestion timeout to 5 minutes, to allow time for semantic ingestion
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leemthompo essentially the same comment as above is here.

elasticsearchURL,
sampleDocuments,
indexName,
}) => `const { Client } = require('@elastic/elasticsearch');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated all of the imports to use require() syntax to make this copy/paste directly into node v20

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full ESM support is coming, but I am but one person so it hasn't happened yet. 😄

indexName,
}) => `const { Client } = require('@elastic/elasticsearch');
const client = new Client({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshMock while working on this I noticed the client now has an option for serverMode: 'serverless' I think it's worth updating these examples to show setting that when in serverless correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It updates some defaults to optimize for the serverless environment.

@TattdCodeMonkey TattdCodeMonkey force-pushed the search/explicit-client-timeouts-for-semantic-examples branch from 7ae9d44 to eab2728 Compare May 22, 2025 20:49
client = Elasticsearch(
"${elasticsearchURL}",
api_key="${apiKey ?? API_KEY_PLACEHOLDER}",
request_timeout=ingestion_timeout,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now looking at the notebooks I think maybe I should update to set this with client.options() when calling helpers.bulk()

Copy link

@miguelgrinberg miguelgrinberg May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this client is going to be used for other things besides this potentially slow bulk call, then yes, maybe you should initialize the client with the default timeout (which I believe is 10s), and then pass client.options(request_timeout=300) on the call to bulk(). But from looking at this file this is the only thing you do with this client, so it would really have the same effect as your current solution.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
searchIndices 180.2KB 182.0KB +1.8KB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Search v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants