Skip to content

Commit cce732a

Browse files
committed
Diff validate view suggestions
1 parent 3a2cc9c commit cce732a

File tree

1 file changed

+39
-27
lines changed

1 file changed

+39
-27
lines changed

docs/plugins/inputs/elasticsearch.asciidoc

Lines changed: 39 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -109,33 +109,45 @@ Common causes are:
109109

110110
NOTE: experimental:[] `tracking_field` and related settings are experimental and subject to change in the future
111111

112-
It is sometimes desirable to track the value of a particular field between two jobs:
113-
* avoid re-processing the entire result set of a long query after an unplanned restart
114-
* only grab new data from an index instead of processing the entire set on each job
112+
.Technical Preview: Tracking a field's value
113+
****
114+
The feature that allows tracking a field's value across runs is in _Technical Preview_.
115+
Configuration options and implementation details are subject to change in minor releases without being preceded by deprecation warnings.
116+
****
115117

116-
For this, the Elasticsearch input plugin provides the <<plugins-{type}s-{plugin}-tracking_field>> and <<plugins-{type}s-{plugin}-tracking_field_seed>> options.
117-
When <<plugins-{type}s-{plugin}-tracking_field>> is set, the plugin will record the value of that field for the last document retrieved in a run into
118-
a file (location defaults to <<plugins-{type}s-{plugin}-last_run_metadata_path>>).
118+
Some uses cases require tracking the value of a particular field between two jobs.
119+
Examples include:
119120

120-
The user can then inject this value in the query using the placeholder `:last_value`. The value will be injected into the query
121-
before execution, and the updated after the query completes if new data was found.
121+
* avoiding the need to re-process the entire result set of a long query after an unplanned restart
122+
* grabbing only new data from an index instead of processing the entire set on each job.
123+
124+
The Elasticsearch input plugin provides the <<plugins-{type}s-{plugin}-tracking_field>> and <<plugins-{type}s-{plugin}-tracking_field_seed>> options.
125+
When <<plugins-{type}s-{plugin}-tracking_field>> is set, the plugin records the value of that field for the last document retrieved in a run into
126+
a file.
127+
(The file location defaults to <<plugins-{type}s-{plugin}-last_run_metadata_path>>).
128+
129+
You can then inject this value in the query using the placeholder `:last_value`.
130+
The value will be injected into the query before execution, and then updated after the query completes if new data was found.
122131

123132
This feature works best when:
124133

125-
. the query sorts by the tracking field;
126-
. the timestamp field is added by {es};
127-
. the field type has enough resolution so that two events are unlikely to have the same value.
134+
* the query sorts by the tracking field,
135+
* the timestamp field is added by {es}, and
136+
* the field type has enough resolution so that two events are unlikely to have the same value.
128137

129-
It is recommended to use a tracking field whose type is https://www.elastic.co/guide/en/elasticsearch/reference/current/date_nanos.html[date nanoseconds].
130-
If the tracking field is of this data type, an extra placeholder called `:present` can be used to inject the nano-second based value of "now-30s".
138+
Consider using a tracking field whose type is https://www.elastic.co/guide/en/elasticsearch/reference/current/date_nanos.html[date nanoseconds].
139+
If the tracking field is of this data type, you can use an extra placeholder called `:present` to inject the nano-second based value of "now-30s".
131140
This placeholder is useful as the right-hand side of a range filter, allowing the collection of
132-
new data but leaving partially-searcheable bulk request data to the next scheduled job.
141+
new data but leaving partially-searchable bulk request data to the next scheduled job.
133142

134-
Below is a series of steps to help set up the "tailing" of data being written to a set of indices, using a date nanosecond field
135-
added by an Elasticsearch ingest pipeline, and the `tracking_field` capability of this plugin.
143+
[id="plugins-{type}s-{plugin}-tracking-sample"]
144+
===== Sample configuration: Track field value across runs
136145

137-
. create ingest pipeline that adds Elasticsearch's `_ingest.timestamp` field to the documents as `event.ingested`:
146+
This section contains a series of steps to help you set up the "tailing" of data being written to a set of indices, using a date nanosecond field
147+
added by an Elasticsearch ingest pipeline and the `tracking_field` capability of this plugin.
138148

149+
. Create ingest pipeline that adds Elasticsearch's `_ingest.timestamp` field to the documents as `event.ingested`:
150+
+
139151
[source, json]
140152
PUT _ingest/pipeline/my-pipeline
141153
{
@@ -150,8 +162,7 @@ added by an Elasticsearch ingest pipeline, and the `tracking_field` capability o
150162
}
151163

152164
[start=2]
153-
. create an index mapping where the tracking field is of date nanosecond type and invokes the defined pipeline:
154-
165+
. Create an index mapping where the tracking field is of date nanosecond type and invokes the defined pipeline:+
155166
[source, json]
156167
PUT /_template/my_template
157168
{
@@ -174,8 +185,8 @@ added by an Elasticsearch ingest pipeline, and the `tracking_field` capability o
174185
}
175186

176187
[start=3]
177-
. define a query that looks at all data of the indices, sorted by the tracking field, and with a range filter since the last value seen until present:
178-
188+
. Define a query that looks at all data of the indices, sorted by the tracking field, and with a range filter since the last value seen until present:
189+
+
179190
[source,json]
180191
{
181192
"query": {
@@ -198,8 +209,8 @@ added by an Elasticsearch ingest pipeline, and the `tracking_field` capability o
198209
}
199210

200211
[start=4]
201-
. configure the Elasticsearch input to query the indices with the query defined above, every minute, and track the `event.ingested` field:
202-
212+
. Configure the Elasticsearch input to query the indices with the query defined above, every minute, and track the `event.ingested` field:
213+
+
203214
[source, ruby]
204215
input {
205216
elasticsearch {
@@ -215,11 +226,12 @@ added by an Elasticsearch ingest pipeline, and the `tracking_field` capability o
215226
}
216227
}
217228

218-
With this setup, as new documents are indexed an `test-*` index, the next scheduled run will:
229+
With this sample setup, new documents are indexed into a `test-*` index.
230+
The next scheduled run:
219231

220-
. select all new documents since the last observed value of the tracking field;
221-
. use {ref}/point-in-time-api.html#point-in-time-api[Point in time (PIT)] + {ref}/paginate-search-results.html#search-after[Search after] to paginate through all the data;
222-
. update the value of the field at the end of the pagination.
232+
* selects all new documents since the last observed value of the tracking field,
233+
* uses {ref}/point-in-time-api.html#point-in-time-api[Point in time (PIT)] + {ref}/paginate-search-results.html#search-after[Search after] to paginate through all the data, and
234+
* updates the value of the field at the end of the pagination.
223235

224236
[id="plugins-{type}s-{plugin}-options"]
225237
==== Elasticsearch Input configuration options

0 commit comments

Comments
 (0)