-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[Search] add longer timeouts to ingest examples for semantic_text #221315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Search] add longer timeouts to ingest examples for semantic_text #221315
Conversation
}, | ||
}); | ||
// Set the ingestion timeout to 5 minutes, to allow for semantic ingestion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leemthompo copy check here, I'm thinking about how I can i18n this, but that would mean it would have to be single line vs this cleaner multi-line comment.
sampleDocuments, | ||
}) => `from elasticsearch import Elasticsearch, helpers | ||
|
||
# Set the ingestion timeout to 5 minutes, to allow time for semantic ingestion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leemthompo essentially the same comment as above is here.
elasticsearchURL, | ||
sampleDocuments, | ||
indexName, | ||
}) => `const { Client } = require('@elastic/elasticsearch'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated all of the imports to use require()
syntax to make this copy/paste directly into node v20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full ESM support is coming, but I am but one person so it hasn't happened yet. 😄
indexName, | ||
}) => `const { Client } = require('@elastic/elasticsearch'); | ||
const client = new Client({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshMock while working on this I noticed the client now has an option for serverMode: 'serverless'
I think it's worth updating these examples to show setting that when in serverless correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It updates some defaults to optimize for the serverless environment.
7ae9d44
to
eab2728
Compare
client = Elasticsearch( | ||
"${elasticsearchURL}", | ||
api_key="${apiKey ?? API_KEY_PLACEHOLDER}", | ||
request_timeout=ingestion_timeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now looking at the notebooks I think maybe I should update to set this with client.options()
when calling helpers.bulk()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this client is going to be used for other things besides this potentially slow bulk
call, then yes, maybe you should initialize the client with the default timeout (which I believe is 10s), and then pass client.options(request_timeout=300)
on the call to bulk()
. But from looking at this file this is the only thing you do with this client, so it would really have the same effect as your current solution.
💚 Build Succeeded
Metrics [docs]Async chunks
|
Summary
When following the example code for ingestion with a
semantic_text
mapping you will usually get a client timeout with the defaults. This is because on the first ingest you have to wait for the model to deploy. To try and avoid this I am updating our example code to explicitly set a longer client timeout with a comment in the example about why.Screenshots
Python

Javascript

Checklist
release_note:*
label is applied per the guidelines