Skip to content

Fluentd when restarts after data push, There is delay in observing data on OS dashboard for 45 mins - 1 hr. #158

@Sameer0998

Description

@Sameer0998

Problem statement : We are trying to push data to two opensearch instances from fluentd as shown above, Now at any point of time , if fluentd pod in k8s restarted due to any update in fluentd configuration , we are observing delay of data to be seen in Opensearch dashboards. Meaning that the data which we are pushing is not observed for atleast 45 mins - 1 hr and only after this time the data is visible on the opensearch dashboard.

Question 1 : Is this expected behaviour of fluentd , that when fluentd restarts the data to be visible on opensearch takes some time and why does this happen, (Need explanation to understand) ?
Question 2 : If above answer is No, How to avoid delay , can you help us with the updated configuration for fluentd ?
Question 3 : When fluentd restarts, we also observed 1-2 datapoint loss sometimes.

This is the current config of our fluentd as shown below where we are pushing the data to opensearch.

  containers.input.conf: |-
    <source>
      @id edge-tls-http-endpoint
      @type http
      port 4000
      <parse>
        @type json
        time_key nil
      </parse>
    </source>

  output.conf: |-
    <match **>
      @type copy
      <store ignore_error>
        @type relabel
        @label @primary
      </store>
  {{- if .Values.opensearch.sampletestingSecondary.enabled }}
      <store ignore_error>
        @type relabel
        @label @secondary
      </store>
  {{- end }}
    </match>
    
    <label @primary>
      <match **>
        @id elasticsearch_sampletesting_pm
        @type elasticsearch
        @log_level info
        verify_es_version_at_startup false
        default_elasticsearch_version 7
        max_retry_putting_template 5
        fail_on_putting_template_retry_exceed false
        scheme https
        ssl_verify false
        ssl_version TLSv1_2
        suppress_type_name true
        host {{ include "getsampletestingEndpointURL" . }}
        port 443
        user {{ required "Required sampletesting username" .Values.opensearch.sampletesting.creds.username | b64dec }}
        password {{ required "Required sampletesting password" .Values.opensearch.sampletesting.creds.password | b64dec }}
        templates { "sampletesting_default_template": "/etc/fluent/template/sampletesting-default-template", "sampletesting_cm_template": "/etc/fluent/template/sampletesting-cm-template", "sampletesting_2_shard_template": "/etc/fluent/template/sampletesting-2-shard-template", "sampletesting_3_shard_template": "/etc/fluent/template/sampletesting-3-shard-template", "sampletesting_4_shard_template": "/etc/fluent/template/sampletesting-4-shard-template", "sampletesting_5_shard_template": "/etc/fluent/template/sampletesting-5-shard-template", "sampletesting_6_shard_template": "/etc/fluent/template/sampletesting-6-shard-template"}
        template_overwrite true
        write_operation upsert
        target_index_key @target_index
        index_name defaulsampletestingtindex
        type_name _doc
        id_key request_id
        remove_keys request_id
        reconnect_on_error true
        reload_on_failure true
        reload_connections false
        request_timeout 45s
        bulk_message_request_threshold -1
        <buffer>
          @type file
          path /var/log/fluentd-edge-sampletesting-pm-buffers/sampletesting.system.buffer.pm
          flush_mode interval
          flush_thread_count 4
          flush_interval 60s
          retry_type periodic
          retry_max_times 20
          retry_wait 20s
          chunk_limit_size 64M
          queued_chunks_limit_size 100
          overflow_action throw_exception
        </buffer>
      </match>
    </label>
  {{- if .Values.opensearch.sampletestingSecondary.enabled }}
    <label @secondary>
      <filter **>
        @type record_modifier
        <record>
          @target_index ${record["@target_index"]}_${(((Time.at(time).strftime("%j").to_i - 1) / 3) * 3 + 1)}
        </record>
      </filter>
      <match **>
        @id secondary_elasticsearch_sampletesting_pm
        @type elasticsearch
        @log_level info
        verify_es_version_at_startup false
        default_elasticsearch_version 7
        max_retry_putting_template 5
        fail_on_putting_template_retry_exceed false
        scheme https
        ssl_verify false
        ssl_version TLSv1_2
        suppress_type_name true
        host {{ include "getsampletestingSecondaryEndpointURL" . }}
        port 443
        user {{ required "Required sampletesting username" .Values.opensearch.sampletestingSecondary.creds.username | b64dec }}
        password {{ required "Required sampletesting password" .Values.opensearch.sampletestingSecondary.creds.password | b64dec }}
        templates { "sampletesting_default_template": "/etc/fluent/template/sampletesting-default-template", "sampletesting_cm_template": "/etc/fluent/template/sampletesting-cm-template", "sampletesting_2_shard_template": "/etc/fluent/template/sampletesting-2-shard-template", "sampletesting_3_shard_template": "/etc/fluent/template/sampletesting-3-shard-template", "sampletesting_4_shard_template": "/etc/fluent/template/sampletesting-4-shard-template", "sampletesting_5_shard_template": "/etc/fluent/template/sampletesting-5-shard-template", "sampletesting_6_shard_template": "/etc/fluent/template/sampletesting-6-shard-template"}
        template_overwrite true
        write_operation upsert
        target_index_key @target_index
        index_name duplicatedefaulsampletestingtindex
        type_name _doc
        id_key request_id
        remove_keys request_id
        reconnect_on_error true
        reload_on_failure true
        reload_connections false
        request_timeout 45s
        bulk_message_request_threshold -1
        <buffer>
          @type file
          path /var/log/fluentd-edge-sampletesting-pm-buffers/small.sampletesting.system.buffer.pm
          flush_mode interval
          flush_thread_count 4
          flush_interval 60s
          retry_type periodic
          retry_max_times 20
          retry_wait 20s
          chunk_limit_size 64M
          queued_chunks_limit_size 100
          overflow_action throw_exception
        </buffer>
      </match>
    </label>
  {{- end }}
{{- end }}

How to reproduce : just restart fluentd whenever its pushing and observe data in OS dashboard.

Can someone please help here . @daipom

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions