fix(otlp-grpc-exporter-base): recreate client after 5 consecutive DEADLINE_EXCEEDED#6296
Conversation
pichlermarc
left a comment
There was a problem hiding this comment.
Hi, thanks for working on this.
Q: do we know what the overhead is of creating a new service client? 🤔 I'm wondering if we should re-create the client right at the first DEADLINE_EXCEEDED instead to solve that problem
cc @trentm: IIUC this is trying to solve what we talked about last week - the grpc exporter getting stuck in an unrecoverable state where everything keeps failing. Would appreciate your input as well 🙂
| Client, | ||
| ServiceClientConstructor, | ||
| } from '@grpc/grpc-js'; | ||
| import { status } from '@grpc/grpc-js'; |
There was a problem hiding this comment.
The delayed import you see below is important to avoid loading @grpc/grpc-js before it can be instrumented. Importing status like this means that @opentelemetry/instrumentation-grpc is likely to stop working.
suggestion(blocking): use a dynamic require/dynamic import to import status from @grpc/grpc-js
There was a problem hiding this comment.
Addressed in 8d68e32.
I went with the constant local declaration instead for now. Please, let me know if you'd rather import it from the library, and I'll do it with a dynamic import.
I'm not sure about how costly it is. However, based on my experience, sometimes it recovers itself after one error. That's why I didn't want to recreate it on the first attempt. |
## Summary Bearing in mind the currently known existing bug in the gRPC exporter, we'd like to rely the `proto` exporter as recommended by the maintainers. ### Adding a new dependency checklist: - [x] **Purpose:** Use the official exporter for the `protobuf` protocol. - [x] **Justification:** It is the official way. I'm aware that the latest is `0.210.0`, but it should upgrade with the other deps in one go (we've seen some performance issues in the past when there were version mismatches). - [x] **Alternatives explored:** We've submitted a PR to fix the gRPC exporter (open-telemetry/opentelemetry-js#6296). But we'd like to use `proto` while the fix lands. - [x] **Existing dependencies:** The other exporters (`http` and `grpc`) provide similar functionality for other protocols. In theory, performance of each library is expected (sorted from best to worst) as `grpc`, `proto`, `http`. For this reason, we want to use `proto` over `http`. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [x] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Bearing in mind the currently known existing bug in the gRPC exporter, we'd like to rely the `proto` exporter as recommended by the maintainers. ### Adding a new dependency checklist: - [x] **Purpose:** Use the official exporter for the `protobuf` protocol. - [x] **Justification:** It is the official way. I'm aware that the latest is `0.210.0`, but it should upgrade with the other deps in one go (we've seen some performance issues in the past when there were version mismatches). - [x] **Alternatives explored:** We've submitted a PR to fix the gRPC exporter (open-telemetry/opentelemetry-js#6296). But we'd like to use `proto` while the fix lands. - [x] **Existing dependencies:** The other exporters (`http` and `grpc`) provide similar functionality for other protocols. In theory, performance of each library is expected (sorted from best to worst) as `grpc`, `proto`, `http`. For this reason, we want to use `proto` over `http`. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [x] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Bearing in mind the currently known existing bug in the gRPC exporter, we'd like to rely the `proto` exporter as recommended by the maintainers. ### Adding a new dependency checklist: - [x] **Purpose:** Use the official exporter for the `protobuf` protocol. - [x] **Justification:** It is the official way. I'm aware that the latest is `0.210.0`, but it should upgrade with the other deps in one go (we've seen some performance issues in the past when there were version mismatches). - [x] **Alternatives explored:** We've submitted a PR to fix the gRPC exporter (open-telemetry/opentelemetry-js#6296). But we'd like to use `proto` while the fix lands. - [x] **Existing dependencies:** The other exporters (`http` and `grpc`) provide similar functionality for other protocols. In theory, performance of each library is expected (sorted from best to worst) as `grpc`, `proto`, `http`. For this reason, we want to use `proto` over `http`. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [x] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Bearing in mind the currently known existing bug in the gRPC exporter, we'd like to rely the `proto` exporter as recommended by the maintainers. ### Adding a new dependency checklist: - [x] **Purpose:** Use the official exporter for the `protobuf` protocol. - [x] **Justification:** It is the official way. I'm aware that the latest is `0.210.0`, but it should upgrade with the other deps in one go (we've seen some performance issues in the past when there were version mismatches). - [x] **Alternatives explored:** We've submitted a PR to fix the gRPC exporter (open-telemetry/opentelemetry-js#6296). But we'd like to use `proto` while the fix lands. - [x] **Existing dependencies:** The other exporters (`http` and `grpc`) provide similar functionality for other protocols. In theory, performance of each library is expected (sorted from best to worst) as `grpc`, `proto`, `http`. For this reason, we want to use `proto` over `http`. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [x] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Which problem is this PR solving?
This PR addresses an issue in which the gRPC exporter continuously fails with the error
DEADLINE_EXCEEDEDby forcing the recreation of the client after receiving that error 5 consecutive times.#6231 explains the scenario in which this can happen. And, while the recommended solution is to enable keepalives for long-running processes (grpc/grpc-node#2502), this is not recommended for Serverless environments (grpc/grpc-node#2710).
Given that the
DEADLINE_EXCEEDEDissue is resolved when the server or the client is restarted, this PR attempts the latter by closing and recreating the client.Fixes # (issue)While related to #6231 (and this bug motivated that issue), it doesn't fix the request of exposing the keepalive configuration that might be useful for other purposes.Short description of the changes
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Checklist: