Share Span/Trace ID with other eBPF based components#1139
Share Span/Trace ID with other eBPF based components#1139christos68k merged 37 commits intoopen-telemetry:mainfrom
Conversation
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
| UnwindState *state = &record->state; | ||
|
|
||
| maybe_add_apm_info(trace); | ||
| maybe_add_otel_span_trace_id(trace); |
There was a problem hiding this comment.
This should take into account maybe_add_apm_info. For example, if maybe_add_apm_info completes successfully then we should not be executing maybe_add_otel_span_trace_id as it could be redundant and will overwrite previous data.
We should also decide which source is the highest-fidelity one in order to resolve such conflicts: is it APM since its data comes from SDK instrumentation and is expected to fail less often?
There was a problem hiding this comment.
Instead of adding complexity, I suggest moving forward with #1153.
maybe_add_apm_info implements a vendor specific option that is not adapted in the OTel community.
There was a problem hiding this comment.
We can treat maybe_add_apm_info as a placeholder for the upcoming OTel thread context specification: the exact same conflict I described here will be present, so while we're still at the step of discussing/reviewing specifications it's good to take that into account in order to come up with a better solution.
Even if we don't plan to merge this PR now (as the OTEP isn't final yet), it's still useful to discuss.
There was a problem hiding this comment.
if maybe_add_apm_info is kept, imho it should be higher fidelity as those info are set directly by the target process/sdk (so it should know better)
There was a problem hiding this comment.
We still can continue with #1153 - its ideas remain in the git history and are not lost by removing apmint. It also leaves the room of discussion which data source is more accurate and should be preferred. From a OTel perspective, apmint is not relevant as it is vendor specific and just adds complexity - both in eBPF space but also in the user space counterpart.
| UnwindState *state = &record->state; | ||
|
|
||
| maybe_add_apm_info(trace); | ||
| maybe_add_otel_span_trace_id(trace); |
There was a problem hiding this comment.
Since this function is OBI-specific maybe we should rename it to reflect OBI for now. There will be multiple OTEL specifications for sharing trace/span ids.
There was a problem hiding this comment.
OBI just happens to be the first to implement the specification open-telemetry/opentelemetry-specification#4855. Other OTel components, like OTel network, are also interested in this approach and plan to implement it.
As the implementation also is based on the OTel proposed specification and not OBI details, I suggest to not make it OBI specific.
There was a problem hiding this comment.
But the specification is OBI-specific, it doesn't work without OBI as that's the source of the data. There is another OTel specification under discussion that is not OBI-specific. A function maybe_add_otel_span_trace_id that only refers to the first but not the latter is confusing so we should do a better job of disambiguating.
There was a problem hiding this comment.
This specification is not OBI-specific, as this map also can be populated by OTel network. Profiling is just the beneficator of data that is provided by these other OTel components.
The required precedence of data source accuracy must be defined and documented once the discussion evolves into a specification, or even earlier, as an integral part of the ongoing discussion.
At the current state, I think, this should not block going forward with this approach.
There was a problem hiding this comment.
I still want to see the fidelity/priority issue clarified before merging this PR so I'll wait until the specification is approved to revisit this point here.
There was a problem hiding this comment.
@christos68k with 8458283 I updated the logic, to only use the OTel span/trace ID if span/trace ID are not yet set. Please let me know if this works for you.
Co-authored-by: Mattia Meleleo <melmat@tuta.io>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
christos68k
left a comment
There was a problem hiding this comment.
For context: @mmat11 reached out to me recently and mentioned that the currently open OTEP PR will be closed.
The specification for sharing this information will be moved to the OBI repository.
This means that we can move forward with the integration here and don't need to wait for the specification to be approved any more.
Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
christos68k
left a comment
There was a problem hiding this comment.
Final batch of specification link fixes
Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Co-authored-by: Roger Coll <roger.coll@elastic.co>
Just wanted to share the PoC for open-telemetry/opentelemetry-specification#4855.
Not sure it should be merged at this point as the specification might still change.
Using open-telemetry/opentelemetry-ebpf-instrumentation#1184 and this PR will make sure that components share span/trace IDs - while the profiler is just a consumer and OBI produces these IDs.
A configuration option for BPF FS is used, as not all deployments use the same (default) location.