Skip to content

Why save epoch_millis as string? #2318

Closed
@puppylpg

Description

@puppylpg

I've noticed that if a field is Instant with epoch_millis format:

    @Field(type = FieldType.Date, format = DateFormat.epoch_millis)
    private Instant timestamp;

spring-data-elasticsearch will convert this object as a json with string value like "timestamp":"1644234181000" rather than long value "timestamp":1644234181000. After digging into the code, I find that it's DateFormatter#format that returns only string value, so timestamp in Instant is converted into a string value rather long.

  1. Although string value(long literally) for epoch_mills is accepted by elasticsearch, it's not mentioned in the doc;
  2. Worse, we save/update our value as long for epoch_millis before(without using spring-data-elasticsearch), so now after using spring-data-elasticsearch, both string and long exist for timestamp field;
  3. Additionally, we also use elasticsearch-hadoop to read data in elasticsearch, and it can only read epoch_millis as long or string, not both.

Any ideas to support to convert epoch_millis and epoch_second for date type as long rather than string? or at least supply an option to determine it as long or string, rather than just use string whatever the real date type is.

Activity

changed the title [-]Why save `epoli_millis` as string?[/-] [+]Why save `epoch_millis` as string?[/+] on Oct 1, 2022
sothawo

sothawo commented on Oct 1, 2022

@sothawo
Collaborator

The documentation you already linked explicitly states:

Dates will always be rendered as strings, even if they were initially supplied as a long in the JSON document.

Elasticsearch stores the values in the _source the way they came in and when returning the _source in a query Elasticsearch will return what came in.

But when fields for example are retrieved with the fields option or with the docvalue_fields option, they are returned as string, no matter how they were sent in.

Consider this mapping for two fields with the same date format:

{
  "epoch-millis": {
    "mappings": {
      "properties": {
        "date1": {
          "type": "date",
          "format": "epoch_millis"
        },
        "date2": {
          "type": "date",
          "format": "epoch_millis"
        }
      }
    }
  }
}

We store this document:

{
  "date1": 1664641434,
  "date2": "1664641434"
}

The search it with field values (normally you'd set "_source": false when using fields):

{
  "fields": [
    "date1",
    "date2"
  ],
  "_source": true
}

The response is:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "epoch-millis",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "date1": 1664641434,
          "date2": "1664641434"
        },
        "fields": {
          "date2": [
            "1664641434"
          ],
          "date1": [
            "1664641434"
          ]
        }
      }
    ]
  }
}

In the _source the mixed notation is returned, but in the fields the values are returned as strings. Elasticsearch takes whatever it gets, internally uses an numeric instant value, but whenever returning it (besides in the _source) it represents the date as string - as documented.

If Spring Data Elasticsearch would convert Instant properties to a numeric values then it would fail on reading responses when users do not request the full document source but only selected fields, so there's no point in changing that behaviour.

If you got mixed data in your _source of the documents, you'd probably better use fields in your queries to get a consistently representation (which would be string).

One possibility would be to add a new format value epoch_millis_long which would explicitly convert to/from a Long value.

puppylpg

puppylpg commented on Oct 3, 2022

@puppylpg
ContributorAuthor

Thanks very much for your detailed response! It really helps me a lot.

Does spring-data-elasticsearch support query with fields/stored_fields/docvalue_fields options? I don't find clues about that in docs and codes so far.

sothawo

sothawo commented on Oct 3, 2022

@sothawo
Collaborator

Support for fields has been in Spring Data Elasticsearch from the beginning, since 4.4 it is available on every QueryBuilder with the withFields() methods. In older versions I think you had to set it directly on the Query instance.

Support for stored_fields has been added in 4.4 (#2004) to the NativeSearchQuery. In version 5 this is moved to the BaseQueryBuilder (#2250) so it's available for all queries then.

For docvalue_fields there is the open issue #2316.

puppylpg

puppylpg commented on Oct 3, 2022

@puppylpg
ContributorAuthor

Thanks~ I'll consider using these in the future.

Appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Why save `epoch_millis` as string? · Issue #2318 · spring-projects/spring-data-elasticsearch