Skip to content

Commit 0b4a07d

Browse files
pavolloffaymx-psi
andauthored
Add Collector Model Context Protocol (MCP) project proposal (#3128)
* Add Model Context Protocol (MCP) project proposal Signed-off-by: Pavol Loffay <p.loffay@gmail.com> * Add engineers and sponsors Signed-off-by: Pavol Loffay <p.loffay@gmail.com> * Remove repository Signed-off-by: Pavol Loffay <p.loffay@gmail.com> * Apply suggestion from @mx-psi --------- Signed-off-by: Pavol Loffay <p.loffay@gmail.com> Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
1 parent e2a4401 commit 0b4a07d

File tree

2 files changed

+150
-0
lines changed

2 files changed

+150
-0
lines changed

.cspell.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,13 @@ ignoreRegExpList:
1818
- GitHub Handle in YML
1919
words:
2020
- Abinet
21+
- agentic
2122
- Alain
2223
- Alff
2324
- Arize
2425
- Aronoff
2526
- Ashpole
27+
- austinlparker
2628
- automations
2729
- Baeyens
2830
- calendar-localization-ptbr
@@ -195,6 +197,7 @@ words:
195197
- mjwolf
196198
- mkorbi
197199
- molkova
200+
- mottibec
198201
- msomasu
199202
- Morgado
200203
- MSYS
@@ -218,6 +221,7 @@ words:
218221
- opentelemetrybot
219222
- ossf
220223
- otel
224+
- otelcol
221225
- otel-agentmanwg
222226
- otel-comms
223227
- otel-ebpf
@@ -232,6 +236,7 @@ words:
232236
- Prometheus
233237
- paixão
234238
- pająk
239+
- pavolloffay
235240
- passcodes
236241
- poncelow
237242
- proto
@@ -259,6 +264,7 @@ words:
259264
- severin
260265
- sguyon
261266
- sharma
267+
- shiftyp
262268
- shkuro
263269
- sigelman
264270
- signup

projects/agentic-workflow.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# OpenTelemetry Collector Agentic Workflows
2+
3+
## Background and description
4+
5+
The OpenTelemetry project consists of a large number of components, including collector, SDKs, and instrumentation libraries, which are often configured and managed separately. This distribution of components poses a major operational challenge which is universally recognized by the community [1](https://opentelemetry.io/blog/2025/otel-rocks/), [2](https://www.youtube.com/watch?v=xEu8_Aeo_-o).
6+
7+
Large language models (LLMs) and Agentic Workflows present a significant opportunity to simplify the adoption, implementation, and management of the OpenTelemetry stack. An AI agent could, for example, facilitate configuration changes, resolve deployment issues, or assist and simplify the instrumentation process.
8+
9+
At the moment, the OpenTelemetry project does not have official support for these workflows. This has led to the creation of several independent, open-source projects (MCP servers) to fill the gap.
10+
The [Can AI instrument OpenTelemetry](https://quesma.com/benchmarks/otel/) Benchmark demonstrates the complexity of instrumentation process and shows the gap of successfully using AI agents with OpenTelemetry.
11+
12+
As AI tooling becomes a standard part of developer workflows. Users, which are looking to extend their agents with tooling optimized for OpenTelemetry, have no easy way to discover what's available in the ecosystem. There's no central place to learn which MCP servers or other tools exist, what capabilities they offer, or where to file issues/requests.
13+
14+
This project is also motivated by the need to support the [Stability Proposal](https://opentelemetry.io/blog/2025/stability-proposal-announcement/) and [[Graduation] OpenTelemetry Graduation Application](https://github.com/cncf/toc/issues/1739). While the [OTEP: Stable by Default](https://github.com/open-telemetry/opentelemetry-specification/pull/4813) initiative aims to default to stable components, a large portion of the ecosystem—including the majority of collector components—remains in alpha or beta, creating complexity for users around potential breaking changes. This project aims to bridge this gap without adding core functionality or duplicating documentation. Instead, it focuses on making OpenTelemetry easier to use and more stable by enriching the ecosystem with new agentic workflows.
15+
16+
### Existing OpenTelemetry MCP Servers
17+
18+
The proliferation of these projects demonstrates strong community interest and the clear potential of this technology:
19+
20+
* [open-telemetry/weaver](https://github.com/open-telemetry/weaver): MCP server for the OpenTelemetry Weaver
21+
* [pavolloffay/opentelemetry-mcp-server](https://github.com/pavolloffay/opentelemetry-mcp-server): Focuses on collector configuration.
22+
* [austinlparker/otel-mcp](https://github.com/austinlparker/otel-mcp): Handles collector configuration and data profiling.
23+
* [mottibec/otelcol-mcp](https://github.com/mottibec/otelcol-mcp): Focuses on collector configuration.
24+
* [shiftyp/otel-mcp-server](https://github.com/shiftyp/otel-mcp-server): Provides data profiling, but requires OpenSearch.
25+
* [liatrio-labs/otel-instrumentation-mcp](https://github.com/liatrio-labs/otel-instrumentation-mcp): Manages instrumentation.
26+
* [traceloop/opentelemetry-mcp-server](https://github.com/traceloop/opentelemetry-mcp-server): Provides data profiling by connecting to Jaeger, Tempo and Traceloop.
27+
28+
Each of these servers uses a different approach, particularly for collector configuration and data profiling.
29+
This fragmentation creates confusion for users regarding installation and configuration. Furthermore, using multiple competing tools is inefficient as they consume the context window with overlapping functionality.
30+
31+
### Current challenges
32+
33+
Adopting OpenTelemetry presents several significant challenges. Many users lack deep observability expertise, and enabling it is often treated as an afterthought.
34+
35+
The sheer size and velocity of the OpenTelemetry ecosystem add to this difficulty. The project encompasses instrumentation for over 12 languages and includes diverse components like the Collector, OpAMP, and Weaver. Each component is released independently with its own setup requirements and release schedule. For example, the Collector is released bi-weekly, while auto-instrumentation libraries follow different schedules.
36+
37+
Maintenance is also complex. The ecosystem evolves rapidly, introducing frequent breaking changes. Our analysis of the Collector changelogs indicates that approximately 29% of changes are breaking. Keeping up with these updates requires significant manual effort to review release notes, update configuration files, and modify code.
38+
39+
## Project Scope and Architecture
40+
41+
The scope of this project is to enable **Agentic Workflows** to simplify deployment, configuration, and day-2 operations for the OpenTelemetry collector.
42+
Additional components (e.g., SDKs, instrumentation, semantic conventions) could be added in phased approach in the future.
43+
44+
To support this workflow, a standardized interface is required for Agents and LLMs to interact with the OpenTelemetry ecosystem. The project will focus on [The Model Context Protocol (MCP)](https://modelcontextprotocol.io/) and [Agent Skills](https://agentskills.io/home) concepts to provide this interface for agents to interact with the OpenTelemetry project.
45+
46+
The goal of this project is to deliver an initial implementation of MCP server(s) and/or Agent Skills for the OpenTelemetry collector in coordination with the collector SIG.
47+
48+
### Goals, objectives, and requirements
49+
50+
#### Collector
51+
52+
The Collector follows a fast two-week release cadence, which requires constant maintenance to stay up to date and avoid breaking changes. Additionally configuring the collector correctly and writing valid OTTL statements is important for effective usage, but requires domain expertise and isn't always trivial. General-purpose coding agents struggle here because they lack up-to-date knowledge of recent releases and aren't specialized for Collector workflows.
53+
54+
* Enable agents to read and write valid Collector configuration.
55+
* Enable agents to handle API breaking changes (e.g. deprecations, removals, renaming) in the configuration and collector Golang API.
56+
* Enable agents to upgrade collector.
57+
* Enable agents to write valid OpenTelemetry Transformation Language (OTTL).
58+
* Enable agents to troubleshoot collector issues.
59+
60+
The mentioned goals might require enhancements in the collector repositories. We expect to make improvements in the documentation as it is the primary source for building skills and knowledge base for the agents.
61+
Another example is improvements in the collector configuration schema which is already being worked on in the collector SIG.
62+
63+
#### Documentation and distribution
64+
65+
Coherent documentation and distribution of the agentic workflows are required to enable users install and manage the agentic workflows.
66+
67+
* Introduce documentation for the Agentic Workflows.
68+
* Align distribution and installation of the components with the Agentic Workflows.
69+
* Agentic workflow documentation will be part of the existing [OpenTelemetry documentation](https://opentelemetry.io/docs/) and will not duplicate any existing content.
70+
71+
### Non Goals
72+
73+
* The project will not implement any telemetry backends.
74+
* The project will not maintain a separate documentation knowledge base; it will leverage existing OpenTelemetry documentation.
75+
76+
## Deliverables
77+
78+
The following deliverables can change based on the project progress, community feedback and validation of the agentic workflows.
79+
The deliverables are ordered based on the priority the project team deems them to be.
80+
81+
* MCP server or agentic skill to facilitate deployment, configuration and day-2 operations of the collector.
82+
* MCP server or agentic skill to troubleshoot collector issues.
83+
84+
## Staffing / Help Wanted
85+
86+
This project requires a blend of OpenTelemetry collector, documentation and instrumentation expertise and expertise in building MCP server(s).
87+
88+
### SIG
89+
90+
This effort will be hosted in the existing Collector SIG.
91+
92+
Sponsors for this effort are:
93+
94+
* [@dmitryax](https://github.com/dmitryax) (Splunk)
95+
* [@codeboten](https://github.com/codeboten) (OHoneycomb)
96+
97+
### Required staffing
98+
99+
#### Project Leads(s)
100+
101+
* [@pavolloffay](https://github.com/pavolloffay) (Red Hat)
102+
* [@niwoerner](https://github.com/niwoerner) (OllyGarden)
103+
104+
105+
#### GC Liaison
106+
107+
Existing Collector SIG liaison.
108+
109+
#### Engineers
110+
111+
* [@adrielp](https://github.com/adrielp)
112+
* [@shiftyp](https://github.com/shiftyp)
113+
* [@johannaojeling](https://github.com/johannaojeling)
114+
* [@vitorvasc](https://github.com/vitorvasc)
115+
* [@nr-nfajardo](https://github.com/nr-nfajardo)
116+
117+
#### Other Staffing
118+
119+
### Industry outreach (Optional)
120+
121+
The following users have built OpenTelemetry MCP servers:
122+
123+
* [@austinlparker](https://github.com/austinlparker) - author of [otel-mcp](https://github.com/austinlparker/otel-mcp)
124+
* [@mottibec](https://github.com/mottibec) - author of [otelcol-mcp](https://github.com/mottibec/otelcol-mcp)
125+
* [@shiftyp](https://github.com/shiftyp) - author of [otel-mcp-server](https://github.com/shiftyp/otel-mcp-server)
126+
127+
There will be [OpenTelemetry MCP call for contributors post](https://github.com/open-telemetry/opentelemetry.io/pull/8629) to promote the project.
128+
129+
## Expected Timeline
130+
131+
This timeline assumes project approval and resource allocation as outlined in the staffing section. Until staffing is
132+
confirmed and expected time commitments are known, this timeline is in flux.
133+
134+
## Labels
135+
136+
`agentic-workflow`, `mcp` for all PRs and issues related to this project.
137+
138+
## GitHub Project (Post-Approval)
139+
140+
TBD
141+
142+
## SIG Meetings, Roadmap, and Other Info (Post-Approval)
143+
144+
All communication will be done in the existing Collector SIG.

0 commit comments

Comments
 (0)