Skip to content

discovery: enable by default on agents >= 7.78#2915

Open
Yumasi wants to merge 3 commits intomainfrom
guillaume.pagnoux/service-discovery-default-enable
Open

discovery: enable by default on agents >= 7.78#2915
Yumasi wants to merge 3 commits intomainfrom
guillaume.pagnoux/service-discovery-default-enable

Conversation

@Yumasi
Copy link
Copy Markdown
Member

@Yumasi Yumasi commented Apr 20, 2026

What does this PR do?

This PR enables Service Discovery by default on agents >=7.78, which will use the new System-Probe-Lite binary for low-resource overhead. The version is determined based on the image tag, and an unparseable tag is considered to be the latest version.

The change makes the distinction between default enablement, and an explicit one. This allows users of an agent <7.78 to still enable discovery if they truly wants it, via system-probe (much higher overhead).

Motivation

Enable Service Discovery by default.

Additional Notes

This PR uses the fallback mechanism of System-Probe to decide to start System-Probe-Lite. This is done in order to avoid a drift between the operator & the agent in the handling of the fallback decision (i.e we would have to maintain a list of system-probe feature, as proposed #2610, which could be easily missed when adding new features. Missing it could cause wrong behaviors like not enabling system-probe when we actually should for another feature, for example).

Using system-probe to control the fallback decision, like in bare-Linux & Docker installations, avoid this sort of issues, but this means that we will see a spike in memory consumption due to system-probe's size itself (~70MiB). Once the fallback to System-Probe-Lite happens, the memory use is expected to fallback to ~3.4MiB (measured in dogfooding).

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: v7.78.0

Describe your test plan

Fallback behaviour + system-probe container creation tested with Kind and the following cases:

Agent version Config scenario system-probe
container status
Running binary
7.78 Default configuration Present System-Probe-Lite
7.77 Default configuration Absent N/A
7.78 Discovery explicitly disabled Absent N/A
7.77 Discovery explicitly enabled Present System-Probe
7.78 Other system-probe feature enabled Present System-Probe

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

@Yumasi Yumasi added this to the v1.27.0 milestone Apr 20, 2026
@Yumasi Yumasi requested a review from a team April 20, 2026 14:31
@Yumasi Yumasi added the enhancement New feature or request label Apr 20, 2026
@Yumasi Yumasi requested a review from a team as a code owner April 20, 2026 14:31
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ceec7f0ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/controller/datadogagent/defaults/datadogagent_default.go Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.94%. Comparing base (6e2c0c9) to head (ebe9bb9).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2915      +/-   ##
==========================================
+ Coverage   40.08%   40.94%   +0.86%     
==========================================
  Files         320      320              
  Lines       28075    35329    +7254     
==========================================
+ Hits        11254    14466    +3212     
- Misses      16012    20055    +4043     
+ Partials      809      808       -1     
Flag Coverage Δ
unittests 40.94% <100.00%> (+0.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/controller/datadogagent/ddai.go 63.15% <100.00%> (-2.95%) ⬇️
...ller/datadogagent/defaults/datadogagent_default.go 89.51% <ø> (+2.42%) ⬆️
...r/datadogagent/feature/servicediscovery/feature.go 88.63% <100.00%> (+9.46%) ⬆️

... and 298 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e2c0c9...ebe9bb9. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@datadog-official
Copy link
Copy Markdown

datadog-official Bot commented Apr 20, 2026

Code Coverage

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 40.28% (+0.14%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ebe9bb9 | Docs | Datadog PR Page | Give us feedback!

Copy link
Copy Markdown
Member

@tbavelier tbavelier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we don't touch the defaults directory like this, and we move this logic instead to the feature itself in Configure:

func (f *serviceDiscoveryFeature) Configure(_ metav1.Object, ddaSpec *v2alpha1.DatadogAgentSpec, _ *v2alpha1.RemoteConfigConfiguration) (reqComp feature.RequiredComponents) {
. Would that be possible ? (I didn't do an in-depth review / tested if it would work, happy to be proven wrong of course)

@Yumasi Yumasi requested a review from a team as a code owner April 20, 2026 15:33
@Yumasi Yumasi force-pushed the guillaume.pagnoux/service-discovery-default-enable branch from f8d22aa to a6d4a92 Compare April 20, 2026 16:07
@Yumasi Yumasi force-pushed the guillaume.pagnoux/service-discovery-default-enable branch from a6d4a92 to 3ff8d0b Compare April 20, 2026 16:28
@Yumasi
Copy link
Copy Markdown
Member Author

Yumasi commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ff8d0b87f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +82 to +83
if version := common.GetAgentVersionFromImage(*nodeAgent.Image); version != "" {
return version
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the inherited tag for name-only image overrides

When override.nodeAgent.image.name is set to a bare image name such as agent and tag is omitted, the pod override path preserves the inherited/default tag, but this branch gets latest from common.GetAgentVersionFromImage and never falls back to images.AgentLatestVersion. Since shouldEnableServiceDiscoveryByDefault treats unparseable versions as supported, clusters still running the current default Agent tag below 7.78 will unexpectedly get system-probe and DD_DISCOVERY_ENABLED just because they used the documented name-only image override; resolve the version from the actual assembled/overridden image or ignore the synthetic latest tag here.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

@Yumasi Yumasi Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want service discovery defaulting to follow the effective node-agent image version, not only explicitly pinned tags. So if no version is provided, including partial image overrides, we intentionally inherit the default agent version. Unparseable provided tags are also treated as latest by design.
Added a comment + tests to ensure this is the intended behaviour.

@Yumasi Yumasi force-pushed the guillaume.pagnoux/service-discovery-default-enable branch from a45a54f to ebe9bb9 Compare April 22, 2026 08:42
@Yumasi Yumasi requested a review from tbavelier April 24, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants