Skip to content

[CONTP-1502] chore: Refresh secrets on reconcile instead of callbacks#2916

Open
Mathew-Estafanous wants to merge 8 commits intomainfrom
mathew.estafanous/secrets-refresh-refactor
Open

[CONTP-1502] chore: Refresh secrets on reconcile instead of callbacks#2916
Mathew-Estafanous wants to merge 8 commits intomainfrom
mathew.estafanous/secrets-refresh-refactor

Conversation

@Mathew-Estafanous
Copy link
Copy Markdown
Contributor

@Mathew-Estafanous Mathew-Estafanous commented Apr 20, 2026

What does this PR do?

Replaces the callback-based secret refresh system with per-reconcile credential resolution. Each reconciler now calls CredentialManager.GetAuth() at the start of every reconcile to get a fresh authentication context with the latest credentials, instead of registering callbacks that recreate API clients when credentials change.

The API clients are stateless HTTP wrappers that don't hold credentials, so they no longer need to be recreated. API URL parsing has been moved into CredentialManager using sync.Once for lazy initialization, and the datadogclient package is now a simple factory for stateless API clients.

Motivation

The previous callback system introduced unnecessary complexity. Every reconciler had to implement UpdateDatadogClient(), register a callback at setup time, and recreate entire API clients on credential change — even though only the auth context needed refreshing.

This is a reworked version of #2312, adjusted for the latest version of the codebase.

Describe how you validated your changes

Unit and integration tests were updated to reflect the new API surface. Test_refresh now validates cache updates instead of callback invocations, and TestReconciler_UpdateDatadogClient tests were removed from all four controller packages since the method no longer exists.

Manual QA steps:

  1. Modified config/manager/manager.yaml to mount a secret backend script via a ConfigMap volume and pass --secretBackendCommand and --secretRefreshInterval=30s flags to the operator. Set DD_API_KEY and DD_APP_KEY env vars to ENC[api-key] and ENC[app-key].
manager.yaml diff

diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml
index f5fea85f..346c3a6f 100644
--- a/config/manager/manager.yaml
+++ b/config/manager/manager.yaml
@@ -33,15 +33,28 @@ spec:
             "metrics": ["*"]
           }]
     spec:
+      volumes:
+        - name: secret-script
+          configMap:
+            name: secret-backend-script
+            defaultMode: 0755
       containers:
       - command:
         - /manager
         args:
         - --enable-leader-election
         - --pprof
+        - --secretBackendCommand=/usr/local/bin/secret/secret_backend.sh
+        - --secretRefreshInterval=20s
+        - --datadogMonitorEnabled=true
         image: controller:latest
         imagePullPolicy: IfNotPresent
         name: manager
+        volumeMounts:
+          - name: secret-script
+            mountPath: /usr/local/bin/secret/secret_backend.sh
+            subPath: secret_backend.sh
+
         env:
         - name: DD_HOSTNAME
           valueFrom:
@@ -61,6 +74,10 @@ spec:
               fieldPath: metadata.namespace
         - name: DD_TOOL_VERSION
           value: "datadog-operator"
+        - name: DD_API_KEY
+          value: "ENC[api-key]"
+        - name: DD_APP_KEY
+          value: "ENC[app-key]"
         resources:
           limits:
             cpu: 100m

  1. Created a ConfigMap containing a shell script that rotates between two valid API/APP key pairs every minute based on the current time.
secrets_backend.sh script

apiVersion: v1
kind: ConfigMap
metadata:
  name: secret-backend-script
  namespace: system
data:
  secret_backend.sh: |
    #!/bin/bash

    # Secret backend script that rotates between two API/APP key pairs.
    # Replace the placeholder values below with real Datadog API and APP keys.
    #
    # The script rotates every minute based on the current time, cycling
    # through the arrays. This lets you verify that the operator picks up
    # new credentials on each refresh interval.

    API_KEYS=(
      "<KEY_1>"
      "<KEY_2>"
    )

    APP_KEYS=(
      "<KEY_1>"
      "<KEY_2>"
    )

    # Time-based rotation: change every minute, cycle through array
    MINUTE=$(date +%M)
    API_INDEX=$((MINUTE % ${#API_KEYS[@]}))
    APP_INDEX=$((MINUTE % ${#APP_KEYS[@]}))

    # Read JSON input from stdin (required by secret backend protocol)
    read -r INPUT

    # Output JSON in the required format
    cat <<EOF
    {"api-key":{"value":"${API_KEYS[$API_INDEX]}"},"app-key":{"value":"${APP_KEYS[$APP_INDEX]}"}}
    EOF

  1. Deployed the operator with make deploy and created a DatadogMonitor CR to exercise CRUD operations.
  2. Verified in operator logs that Credentials have changed, cache updated appears at rotation boundaries.
    . Confirmed both API keys show recent activity in the Datadog API Keys page.
image 5. Updated and deleted the `DatadogMonitor` CR to verify reconciliation works correctly across key rotations.

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented Apr 27, 2026

Code Coverage

Fix all issues with BitsAI

🛑 Gate Violations

🎯 1 Code Coverage issue detected

A Patch coverage percentage gate may be blocking this PR.

Patch coverage: 48.41% (threshold: 80.00%)

ℹ️ Info

🎯 Code Coverage (details)
Patch Coverage: 48.41%
Overall Coverage: 41.04% (+0.01%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: f516736 | Docs | Datadog PR Page | Give us feedback!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 48.38710% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.90%. Comparing base (78490c3) to head (f516736).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/controller/datadogmonitor/controller.go 21.05% 12 Missing and 3 partials ⚠️
internal/controller/setup.go 6.25% 15 Missing ⚠️
pkg/datadogclient/client.go 0.00% 13 Missing ⚠️
internal/controller/datadogdashboard/controller.go 31.25% 10 Missing and 1 partial ⚠️
internal/controller/datadogslo/controller.go 44.44% 8 Missing and 2 partials ⚠️
...al/controller/datadoggenericresource/controller.go 40.00% 8 Missing and 1 partial ⚠️
...nternal/controller/datadoggenericresource/utils.go 0.00% 7 Missing ⚠️
pkg/config/creds.go 91.17% 6 Missing ⚠️
...rnal/controller/datadogagentinternal/controller.go 0.00% 2 Missing ⚠️
internal/controller/datadogdashboard_controller.go 0.00% 2 Missing ⚠️
... and 5 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2916      +/-   ##
==========================================
- Coverage   40.91%   40.90%   -0.01%     
==========================================
  Files         324      324              
  Lines       28743    28648      -95     
==========================================
- Hits        11760    11719      -41     
+ Misses      16129    16080      -49     
+ Partials      854      849       -5     
Flag Coverage Δ
unittests 40.90% <48.38%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
cmd/main.go 6.68% <ø> (+0.03%) ⬆️
...nal/controller/datadoggenericresource/finalizer.go 88.88% <100.00%> (-0.59%) ⬇️
internal/controller/datadogmonitor_controller.go 37.50% <100.00%> (+4.16%) ⬆️
...rnal/controller/datadogagentinternal_controller.go 0.00% <0.00%> (ø)
internal/controller/datadogdashboard/finalizer.go 45.45% <50.00%> (ø)
...al/controller/datadoggenericresource_controller.go 0.00% <0.00%> (ø)
internal/controller/datadogmonitor/finalizer.go 25.00% <50.00%> (ø)
...rnal/controller/datadogagentinternal/controller.go 0.00% <0.00%> (ø)
internal/controller/datadogdashboard_controller.go 0.00% <0.00%> (ø)
internal/controller/datadogslo_controller.go 0.00% <0.00%> (ø)
... and 8 more

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78490c3...f516736. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Mathew-Estafanous Mathew-Estafanous force-pushed the mathew.estafanous/secrets-refresh-refactor branch 2 times, most recently from 6c63ce8 to 4cf448a Compare April 27, 2026 18:06
@Mathew-Estafanous Mathew-Estafanous marked this pull request as ready for review April 27, 2026 18:08
@Mathew-Estafanous Mathew-Estafanous requested a review from a team April 27, 2026 18:08
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4cf448a31f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/controller/datadogmonitor/controller.go
@Mathew-Estafanous Mathew-Estafanous force-pushed the mathew.estafanous/secrets-refresh-refactor branch from a3bcee6 to db84b1e Compare April 27, 2026 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants