Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ceec7f0ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2915 +/- ##
==========================================
+ Coverage 40.08% 40.94% +0.86%
==========================================
Files 320 320
Lines 28075 35329 +7254
==========================================
+ Hits 11254 14466 +3212
- Misses 16012 20055 +4043
+ Partials 809 808 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 298 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
🎯 Code Coverage (details) 🔗 Commit SHA: ebe9bb9 | Docs | Datadog PR Page | Give us feedback! |
f8d22aa to
a6d4a92
Compare
a6d4a92 to
3ff8d0b
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ff8d0b87f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if version := common.GetAgentVersionFromImage(*nodeAgent.Image); version != "" { | ||
| return version |
There was a problem hiding this comment.
Preserve the inherited tag for name-only image overrides
When override.nodeAgent.image.name is set to a bare image name such as agent and tag is omitted, the pod override path preserves the inherited/default tag, but this branch gets latest from common.GetAgentVersionFromImage and never falls back to images.AgentLatestVersion. Since shouldEnableServiceDiscoveryByDefault treats unparseable versions as supported, clusters still running the current default Agent tag below 7.78 will unexpectedly get system-probe and DD_DISCOVERY_ENABLED just because they used the documented name-only image override; resolve the version from the actual assembled/overridden image or ignore the synthetic latest tag here.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
We want service discovery defaulting to follow the effective node-agent image version, not only explicitly pinned tags. So if no version is provided, including partial image overrides, we intentionally inherit the default agent version. Unparseable provided tags are also treated as latest by design.
Added a comment + tests to ensure this is the intended behaviour.
a45a54f to
ebe9bb9
Compare
What does this PR do?
This PR enables Service Discovery by default on agents >=7.78, which will use the new System-Probe-Lite binary for low-resource overhead. The version is determined based on the image tag, and an unparseable tag is considered to be the latest version.
The change makes the distinction between default enablement, and an explicit one. This allows users of an agent <7.78 to still enable discovery if they truly wants it, via system-probe (much higher overhead).
Motivation
Enable Service Discovery by default.
Additional Notes
This PR uses the fallback mechanism of System-Probe to decide to start System-Probe-Lite. This is done in order to avoid a drift between the operator & the agent in the handling of the fallback decision (i.e we would have to maintain a list of system-probe feature, as proposed #2610, which could be easily missed when adding new features. Missing it could cause wrong behaviors like not enabling system-probe when we actually should for another feature, for example).
Using system-probe to control the fallback decision, like in bare-Linux & Docker installations, avoid this sort of issues, but this means that we will see a spike in memory consumption due to system-probe's size itself (~70MiB). Once the fallback to System-Probe-Lite happens, the memory use is expected to fallback to ~3.4MiB (measured in dogfooding).
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
Describe your test plan
Fallback behaviour + system-probe container creation tested with Kind and the following cases:
container status
Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel