You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/plugins/inputs/elasticsearch.asciidoc
+39-27Lines changed: 39 additions & 27 deletions
Original file line number
Diff line number
Diff line change
@@ -109,33 +109,45 @@ Common causes are:
109
109
110
110
NOTE: experimental:[] `tracking_field` and related settings are experimental and subject to change in the future
111
111
112
-
It is sometimes desirable to track the value of a particular field between two jobs:
113
-
* avoid re-processing the entire result set of a long query after an unplanned restart
114
-
* only grab new data from an index instead of processing the entire set on each job
112
+
.Technical Preview: Tracking a field's value
113
+
****
114
+
The feature that allows tracking a field's value across runs is in _Technical Preview_.
115
+
Configuration options and implementation details are subject to change in minor releases without being preceded by deprecation warnings.
116
+
****
115
117
116
-
For this, the Elasticsearch input plugin provides the <<plugins-{type}s-{plugin}-tracking_field>> and <<plugins-{type}s-{plugin}-tracking_field_seed>> options.
117
-
When <<plugins-{type}s-{plugin}-tracking_field>> is set, the plugin will record the value of that field for the last document retrieved in a run into
118
-
a file (location defaults to <<plugins-{type}s-{plugin}-last_run_metadata_path>>).
118
+
Some uses cases require tracking the value of a particular field between two jobs.
119
+
Examples include:
119
120
120
-
The user can then inject this value in the query using the placeholder `:last_value`. The value will be injected into the query
121
-
before execution, and the updated after the query completes if new data was found.
121
+
* avoiding the need to re-process the entire result set of a long query after an unplanned restart
122
+
* grabbing only new data from an index instead of processing the entire set on each job.
123
+
124
+
The Elasticsearch input plugin provides the <<plugins-{type}s-{plugin}-tracking_field>> and <<plugins-{type}s-{plugin}-tracking_field_seed>> options.
125
+
When <<plugins-{type}s-{plugin}-tracking_field>> is set, the plugin records the value of that field for the last document retrieved in a run into
126
+
a file.
127
+
(The file location defaults to <<plugins-{type}s-{plugin}-last_run_metadata_path>>).
128
+
129
+
You can then inject this value in the query using the placeholder `:last_value`.
130
+
The value will be injected into the query before execution, and then updated after the query completes if new data was found.
122
131
123
132
This feature works best when:
124
133
125
-
. the query sorts by the tracking field;
126
-
. the timestamp field is added by {es};
127
-
. the field type has enough resolution so that two events are unlikely to have the same value.
134
+
* the query sorts by the tracking field,
135
+
* the timestamp field is added by {es}, and
136
+
* the field type has enough resolution so that two events are unlikely to have the same value.
128
137
129
-
It is recommended to use a tracking field whose type is https://www.elastic.co/guide/en/elasticsearch/reference/current/date_nanos.html[date nanoseconds].
130
-
If the tracking field is of this data type, an extra placeholder called `:present` can be used to inject the nano-second based value of "now-30s".
138
+
Consider using a tracking field whose type is https://www.elastic.co/guide/en/elasticsearch/reference/current/date_nanos.html[date nanoseconds].
139
+
If the tracking field is of this data type, you can use an extra placeholder called `:present` to inject the nano-second based value of "now-30s".
131
140
This placeholder is useful as the right-hand side of a range filter, allowing the collection of
132
-
new data but leaving partially-searcheable bulk request data to the next scheduled job.
141
+
new data but leaving partially-searchable bulk request data to the next scheduled job.
133
142
134
-
Below is a series of steps to help set up the "tailing" of data being written to a set of indices, using a date nanosecond field
135
-
added by an Elasticsearch ingest pipeline, and the `tracking_field` capability of this plugin.
143
+
[id="plugins-{type}s-{plugin}-tracking-sample"]
144
+
===== Sample configuration: Track field value across runs
136
145
137
-
. create ingest pipeline that adds Elasticsearch's `_ingest.timestamp` field to the documents as `event.ingested`:
146
+
This section contains a series of steps to help you set up the "tailing" of data being written to a set of indices, using a date nanosecond field
147
+
added by an Elasticsearch ingest pipeline and the `tracking_field` capability of this plugin.
138
148
149
+
. Create ingest pipeline that adds Elasticsearch's `_ingest.timestamp` field to the documents as `event.ingested`:
150
+
+
139
151
[source, json]
140
152
PUT _ingest/pipeline/my-pipeline
141
153
{
@@ -150,8 +162,7 @@ added by an Elasticsearch ingest pipeline, and the `tracking_field` capability o
150
162
}
151
163
152
164
[start=2]
153
-
. create an index mapping where the tracking field is of date nanosecond type and invokes the defined pipeline:
154
-
165
+
. Create an index mapping where the tracking field is of date nanosecond type and invokes the defined pipeline:+
155
166
[source, json]
156
167
PUT /_template/my_template
157
168
{
@@ -174,8 +185,8 @@ added by an Elasticsearch ingest pipeline, and the `tracking_field` capability o
174
185
}
175
186
176
187
[start=3]
177
-
. define a query that looks at all data of the indices, sorted by the tracking field, and with a range filter since the last value seen until present:
178
-
188
+
. Define a query that looks at all data of the indices, sorted by the tracking field, and with a range filter since the last value seen until present:
189
+
+
179
190
[source,json]
180
191
{
181
192
"query": {
@@ -198,8 +209,8 @@ added by an Elasticsearch ingest pipeline, and the `tracking_field` capability o
198
209
}
199
210
200
211
[start=4]
201
-
. configure the Elasticsearch input to query the indices with the query defined above, every minute, and track the `event.ingested` field:
202
-
212
+
. Configure the Elasticsearch input to query the indices with the query defined above, every minute, and track the `event.ingested` field:
213
+
+
203
214
[source, ruby]
204
215
input {
205
216
elasticsearch {
@@ -215,11 +226,12 @@ added by an Elasticsearch ingest pipeline, and the `tracking_field` capability o
215
226
}
216
227
}
217
228
218
-
With this setup, as new documents are indexed an `test-*` index, the next scheduled run will:
229
+
With this sample setup, new documents are indexed into a `test-*` index.
230
+
The next scheduled run:
219
231
220
-
. select all new documents since the last observed value of the tracking field;
221
-
. use {ref}/point-in-time-api.html#point-in-time-api[Point in time (PIT)] + {ref}/paginate-search-results.html#search-after[Search after] to paginate through all the data;
222
-
. update the value of the field at the end of the pagination.
232
+
* selects all new documents since the last observed value of the tracking field,
233
+
* uses {ref}/point-in-time-api.html#point-in-time-api[Point in time (PIT)] + {ref}/paginate-search-results.html#search-after[Search after] to paginate through all the data, and
234
+
* updates the value of the field at the end of the pagination.
0 commit comments