Skip to content

feat: OpenTelemetry context activation POC #202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: v0.1.x
Choose a base branch
from

Conversation

bantonsson
Copy link

TL;DR

This PR is opened as a basis for discussion around providing a more seamless interoperability for users of the OpenTelemetry APIs when using frameworks and libraries that use tracing at their core.

An OpenTelemetry Context that mirrors the active tracing Span is now active on the executing thread while in a tracing Span. Having proper OpenTelemetry context activation will not only make interoperability easier for users of the OpenTelemetry API, but will also open up the possibility for simpler log/trace/baggage correlation and in the future profiling/trace correlation among other things.

Motivation

The aim is to provide users of the OpenTelemetry API a more seamless and consistent experience. One such thing is that calling the OpenTelemetry API Context::current() did not give you a Context that contained an OpenTelemetry Span that represented the current tracing Span. Another thing is propagation of Baggage or user defined types that were not picked up either.

A lot of effort has been put into making the existing OpenTelemetrySpanExt API work the same way to maintain maximum backwards compatibility and to avoid fragmentation by building a separate OpenTelemetry and Tokio Tracing bridge.

Long discussion about OpenTelemetry and Tokio Tracing interoperability, and an issue specifically discussing this POC.

Solution

This solution adds a new feature activate_context that will activate/deactivate an OpenTelemetry Context that mirrors the current tracing Span. This feature is currently on by default, but can be turned off to avoid any performance overhead incurred by the context activation and bookkeeping.

The second big change is that the OpenTelemetry SpanBuilder is lazily consumed and a real OpenTelemetry Span is created when necessary, so there is no longer any need for the PreSampledTracer and the special code in OpenTelemetry that it relies on. This has some implications, like you can no longer set the parent_context after you have entered a Span or called the context method on the Span.

Benchmark numbers are available in this comment, and when the activate_context feature is turned off, the performance stays the same. (There has been one more optimization since the last run there, fixing the regression in the many_events benchmarks)

@djc
Copy link
Collaborator

djc commented May 6, 2025

I have been passively maintaining tracing-opentelemetry (mostly by reviewing relatively small PRs) for a few months, but I don't currently have a use case for this crate. Unless some funding for doing this work materializes, I'm not motivated to review larger efforts like this, so I suggest the OpenTelemetry and/or tracing organizations find some other people to move this effort forward.

@bantonsson bantonsson force-pushed the ban/tracing-otel-context-updated branch from ce5421a to 51e471b Compare May 7, 2025 10:56
Copy link
Collaborator

@mladedav mladedav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a closer look at everything (including the previous discussions we've had in the other repo) later, but for now, I think the most useful thing would be to get rid of the feature and possibly other unrelated changes (maybe the patched dependencies?) to make it easier to understand which parts here are left because they are needed and which parts are here just for compatibility with the old way.

Generally, I think this would be a step in the right direction, though I'd still want to explore the possibility of providing parents at span creation time and hopefully starting the context then, which would be closer to what opentelemetry specifies.

.links
.get_or_insert_with(|| Vec::with_capacity(1))
.push(follows_link);
if let Some(builder) = data.builder.as_mut() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to add links after the context is built.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I'll add a test as well.

Comment on lines 66 to -77
impl PreSampledTracer for SdkTracer {
fn sampled_context(&self, data: &mut crate::OtelData) -> OtelContext {
let parent_cx = &data.parent_cx;
let builder = &mut data.builder;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no longer any need for the PreSampledTracer and the special code in OpenTelemetry that it relies on.

If it's not needed anymore, are these changes here because you want to support this as well as the old behavior through the feature?

I think we should really get rid of the feature. For the purpose of this PoC it just adds unnecessary changes and it's not clear whether for example this is or is not needed.

Even outside of the PoC I don't think we'd want the feature, because it adds maintenance burden with the justification being avoiding performance, but later you wrote there's no performance difference between the feature being on and off, so I'd say the feature might just cause confusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regards to the performance, what I meant to say with

Benchmark numbers are available in this comment, and when the activate_context feature is turned off, the performance stays the same. (There has been one more optimization since the last run there, fixing the regression in the many_events benchmarks)

Is that when the activate_context feature is turned off, the performance is the same as before the POC.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not removed the PreSampledTracer yet, but it is not needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does sampling work when the feature is turned off? That would have to use the same mechanism as before wouldn't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So regardless of the activate_context feature, the OTel SpanBuilder is consumed when necessary and the created OTel Span is inserted into an OTel Context that is stored with the tracing Span.

} else {
data.builder.links = Some(vec![follows_link]);
data.parent_cx.span().add_link(follows_context, vec![]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding the link to the parent context instead of the current context?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the names are confusing. This field is the parent_context while the SpanBuilder is being used, and then it becomes the current_context. The name should be changed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just change the struct into an enum so that we can also get rid of the options and make it clear when are things expected to be present and what they are. Now there's for example also an end time inside the builder, but also one in the struct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end_time in the builder API should be removed open-telemetry/opentelemetry-rust#2746

/// A span's parent should only be set _once_, for the purpose described above.
/// Additionally, once a span has been fully built - and the SpanBuilder has been consumed -
/// the parent _cannot_ be mutated.
///
fn set_parent(&self, cx: Context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should return a Result to signal whether this actually did something or not. I get that it's technically a breaking change and you tried to keep the api identical, but unless someone actually did let () = span.set_parent(...); which I highly doubt, or someone implemented this trait for their own type (which I doubt and they probably should not have done so either), users will get at most a warning about an unused Result.

Other methods that behave differently based on whether the context was already started or not should do the same thing, but I don't think anything else has this kind of behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's a valid point.

@bantonsson
Copy link
Author

Thanks for the comments @mladedav. I think that getting rid of the feature is a bit risky, since these changes do have performance implications, and it would be good to allow current users to continue.

I'll rebase the PR and try to clean it up a bit.

bantonsson and others added 7 commits June 17, 2025 17:28
Right now this is completely focused on the Spans but there could be a
feature that only propagates the context as well.
Ensure that the we materialize the span and activate the context when accessing the span. This ensures the correct parent child relationships.
@bantonsson bantonsson force-pushed the ban/tracing-otel-context-updated branch from 51e471b to f5c70c3 Compare June 17, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants