Skip to content

Conversation

@tbavelier
Copy link
Member

What does this PR do?

This PR makes the DDAI profile merge logic resilient to outdated DDA/DDAI CRDs. When the local SSA merge (managedfields.FieldManager.Apply) fails with field not declared in schema, we extract the offending dotted field path, remove those fields from the merge inputs, and retry the merge. This prevents reconciliation from failing while preserving all other supported fields. A unit test is added to validate behavior when the installed CRD schema is missing a newly introduced field.

Motivation

  • Users can upgrade the operator without upgrading CRDs, and in DDAI+Profiles mode we rely on the installed CRD schema to perform a local SSA merge. If the CRD is older than the operator types, the merge fails hard with field not declared in schema (e.g. .spec.features.cws.enforcement) and blocks reconciliation. Since those fields would be pruned by the apiserver on older CRDs anyway, stripping and retrying is safe backward-compatibility mechanism. This avoids upgrade foot-guns and keeps reconciliation working even when CRD updates lag behind operator updates.
    • This can happen for users using the latest image which is always pulled, or users not deploying the updated Helm chart/other manifests
    • Or internally on our nightly clusters

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

  1. Deploy the operator helm chart with an old CRD (e.g. datadog-operator-2.17.0-dev.3), with DDAI, profiles enabled
  2. Apply a compatible DDA manifest, for instance with cws.enabled=true
  3. It should reconcile without issues
  4. Update only the operator deployment to a newer version (previous to this PR) that includes a new CRD subfield, e.g. cws.enforcement ([CWS] add enforcement parameter #2465) -> e.g. directly edit the image field with kubectl edit
  5. You should see a reconcile error in the DatadogAgent status once the lease is acquired and logs about missing field
    {"level":"ERROR","ts":"2026-01-13T08:23:28.290Z","msg":"Reconciler error","controller":"datadogagent","controllerGroup":"datadoghq.com","controllerKind":"DatadogAgent","DatadogAgent":{"name":"datadog-agent","namespace":"datadog-agent"},"namespace":"datadog-agent","name":"datadog-agent","reconcileID":"f60073e3-dc40-4743-a862-17fd5eea41ad","error":"failed to apply merge: failed to create manager for existing fields: failed to convert new object (datadog-agent/datadog-agent; datadoghq.com/v1alpha1, Kind=DatadogAgentInternal) to smd typed: .spec.features.cws.enforcement: field not declared in schema","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:255"}
    {"level":"INFO","ts":"2026-01-13T08:23:31.315Z","logger":"KubeAPIWarningLogger","msg":"unknown field \"spec.features.cws.enforcement\""}
  6. Update the operator image once again with this PR included, the reconcile error should be gone

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

@tbavelier tbavelier added this to the v1.23.0 milestone Jan 13, 2026
@tbavelier tbavelier requested a review from a team as a code owner January 13, 2026 10:44
@tbavelier tbavelier added the bug Something isn't working label Jan 13, 2026
@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 88.73239% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.34%. Comparing base (f2d7517) to head (7e4a43a).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
internal/controller/datadogagent/merge.go 88.73% 4 Missing and 4 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2484      +/-   ##
==========================================
+ Coverage   37.98%   38.34%   +0.35%     
==========================================
  Files         298      300       +2     
  Lines       25029    26122    +1093     
==========================================
+ Hits         9508    10016     +508     
- Misses      14796    15357     +561     
- Partials      725      749      +24     
Flag Coverage Δ
unittests 38.34% <88.73%> (+0.35%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/controller/datadogagent/merge.go 83.67% <88.73%> (+19.38%) ⬆️

... and 14 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2d7517...7e4a43a. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants