Skip to content

fix(otlp-grpc-exporter-base): recreate client after 5 consecutive DEADLINE_EXCEEDED#6296

Open
afharo wants to merge 2 commits intoopen-telemetry:mainfrom
afharo:grpc-exporter-transport/recreate-client-after-5-consecutive-deadline-exceeded
Open

fix(otlp-grpc-exporter-base): recreate client after 5 consecutive DEADLINE_EXCEEDED#6296
afharo wants to merge 2 commits intoopen-telemetry:mainfrom
afharo:grpc-exporter-transport/recreate-client-after-5-consecutive-deadline-exceeded

Conversation

@afharo
Copy link

@afharo afharo commented Jan 14, 2026

Which problem is this PR solving?

This PR addresses an issue in which the gRPC exporter continuously fails with the error DEADLINE_EXCEEDED by forcing the recreation of the client after receiving that error 5 consecutive times.

#6231 explains the scenario in which this can happen. And, while the recommended solution is to enable keepalives for long-running processes (grpc/grpc-node#2502), this is not recommended for Serverless environments (grpc/grpc-node#2710).

Given that the DEADLINE_EXCEEDED issue is resolved when the server or the client is restarted, this PR attempts the latter by closing and recreating the client.

Fixes # (issue) While related to #6231 (and this bug motivated that issue), it doesn't fix the request of exposing the keepalive configuration that might be useful for other purposes.

Short description of the changes

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • I didn't find a way to consistently reproduce this bug. Based on my experience, it happens randomly.

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added
  • Documentation has been updated

@afharo afharo requested a review from a team as a code owner January 14, 2026 15:38
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 14, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: afharo / name: Alejandro Fernández Haro (8d68e32, 984b406)

Copy link
Member

@pichlermarc pichlermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for working on this.

Q: do we know what the overhead is of creating a new service client? 🤔 I'm wondering if we should re-create the client right at the first DEADLINE_EXCEEDED instead to solve that problem

cc @trentm: IIUC this is trying to solve what we talked about last week - the grpc exporter getting stuck in an unrecoverable state where everything keeps failing. Would appreciate your input as well 🙂

Client,
ServiceClientConstructor,
} from '@grpc/grpc-js';
import { status } from '@grpc/grpc-js';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delayed import you see below is important to avoid loading @grpc/grpc-js before it can be instrumented. Importing status like this means that @opentelemetry/instrumentation-grpc is likely to stop working.

suggestion(blocking): use a dynamic require/dynamic import to import status from @grpc/grpc-js

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8d68e32.

I went with the constant local declaration instead for now. Please, let me know if you'd rather import it from the library, and I'll do it with a dynamic import.

@afharo
Copy link
Author

afharo commented Jan 19, 2026

Q: do we know what the overhead is of creating a new service client? 🤔 I'm wondering if we should re-create the client right at the first DEADLINE_EXCEEDED instead to solve that problem

I'm not sure about how costly it is.

However, based on my experience, sometimes it recovers itself after one error. That's why I didn't want to recreate it on the first attempt.

afharo added a commit to elastic/kibana that referenced this pull request Jan 21, 2026
## Summary

Bearing in mind the currently known existing bug in the gRPC exporter,
we'd like to rely the `proto` exporter as recommended by the
maintainers.

### Adding a new dependency checklist:

- [x] **Purpose:** Use the official exporter for the `protobuf`
protocol.
- [x] **Justification:** It is the official way. I'm aware that the
latest is `0.210.0`, but it should upgrade with the other deps in one go
(we've seen some performance issues in the past when there were version
mismatches).
- [x] **Alternatives explored:** We've submitted a PR to fix the gRPC
exporter (open-telemetry/opentelemetry-js#6296).
But we'd like to use `proto` while the fix lands.
- [x] **Existing dependencies:** The other exporters (`http` and `grpc`)
provide similar functionality for other protocols. In theory,
performance of each library is expected (sorted from best to worst) as
`grpc`, `proto`, `http`. For this reason, we want to use `proto` over
`http`.


### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
yuliia-fryshko pushed a commit to yuliia-fryshko/kibana that referenced this pull request Jan 22, 2026
## Summary

Bearing in mind the currently known existing bug in the gRPC exporter,
we'd like to rely the `proto` exporter as recommended by the
maintainers.

### Adding a new dependency checklist:

- [x] **Purpose:** Use the official exporter for the `protobuf`
protocol.
- [x] **Justification:** It is the official way. I'm aware that the
latest is `0.210.0`, but it should upgrade with the other deps in one go
(we've seen some performance issues in the past when there were version
mismatches).
- [x] **Alternatives explored:** We've submitted a PR to fix the gRPC
exporter (open-telemetry/opentelemetry-js#6296).
But we'd like to use `proto` while the fix lands.
- [x] **Existing dependencies:** The other exporters (`http` and `grpc`)
provide similar functionality for other protocols. In theory,
performance of each library is expected (sorted from best to worst) as
`grpc`, `proto`, `http`. For this reason, we want to use `proto` over
`http`.


### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
qn895 pushed a commit to qn895/kibana that referenced this pull request Jan 22, 2026
## Summary

Bearing in mind the currently known existing bug in the gRPC exporter,
we'd like to rely the `proto` exporter as recommended by the
maintainers.

### Adding a new dependency checklist:

- [x] **Purpose:** Use the official exporter for the `protobuf`
protocol.
- [x] **Justification:** It is the official way. I'm aware that the
latest is `0.210.0`, but it should upgrade with the other deps in one go
(we've seen some performance issues in the past when there were version
mismatches).
- [x] **Alternatives explored:** We've submitted a PR to fix the gRPC
exporter (open-telemetry/opentelemetry-js#6296).
But we'd like to use `proto` while the fix lands.
- [x] **Existing dependencies:** The other exporters (`http` and `grpc`)
provide similar functionality for other protocols. In theory,
performance of each library is expected (sorted from best to worst) as
`grpc`, `proto`, `http`. For this reason, we want to use `proto` over
`http`.


### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
dennis-tismenko pushed a commit to dennis-tismenko/kibana that referenced this pull request Jan 22, 2026
## Summary

Bearing in mind the currently known existing bug in the gRPC exporter,
we'd like to rely the `proto` exporter as recommended by the
maintainers.

### Adding a new dependency checklist:

- [x] **Purpose:** Use the official exporter for the `protobuf`
protocol.
- [x] **Justification:** It is the official way. I'm aware that the
latest is `0.210.0`, but it should upgrade with the other deps in one go
(we've seen some performance issues in the past when there were version
mismatches).
- [x] **Alternatives explored:** We've submitted a PR to fix the gRPC
exporter (open-telemetry/opentelemetry-js#6296).
But we'd like to use `proto` while the fix lands.
- [x] **Existing dependencies:** The other exporters (`http` and `grpc`)
provide similar functionality for other protocols. In theory,
performance of each library is expected (sorted from best to worst) as
`grpc`, `proto`, `http`. For this reason, we want to use `proto` over
`http`.


### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants