Skip to content

Conversation

@watson
Copy link
Collaborator

@watson watson commented Jan 29, 2026

What does this PR do?

Fixes a race condition in AgentProxyCiVisibilityExporter where _isGzipCompatible and evpProxyPrefix could be read before being set, causing intermittent test failures in CI. Also adds proper test cleanup to prevent nock mock state leakage.

Motivation

Four tests in the agent-proxy test suite were failing intermittently in CI:

  • _isGzipCompatible - should set to true if newest version is v4 or newer
  • _isGzipCompatible - should set to false if newest version is v3 or older
  • evpProxyPrefix - should set to v2 if newest version is v3
  • evpProxyPrefix - should set to v4 if newest version is v4

Root Cause: A race condition in the production code, amplified by commit 6eed53f which added caching to fetchAgentInfo. When the cache is hit, the callback runs via process.nextTick (extremely fast), making the timing window between promise resolution and property assignment visible.

The Race Condition:

// OLD CODE (buggy):
const isGzipCompatible = latestEvpProxyVersion >= 4
// ... create writers ...
this._resolveCanUseCiVisProtocol(isEvpCompatible)  // Promise resolves - tests wake up here!
this.exportUncodedTraces()
this.exportUncodedCoverages()
this._isGzipCompatible = isGzipCompatible  // Set AFTER promise resolves - too late!

// NEW CODE (fixed):
this._isGzipCompatible = latestEvpProxyVersion >= 4  // Set immediately, early in callback
// ... create writers ...
this._resolveCanUseCiVisProtocol(isEvpCompatible)  // Promise resolves - property already set!

Fixes Applied:

  1. Production code fix (packages/dd-trace/src/ci-visibility/exporters/agent-proxy/index.js):

    • Set this._isGzipCompatible directly and early in the fetchAgentInfo callback (line 50)
    • Ensures the property is set before this._resolveCanUseCiVisProtocol() resolves the promise
    • Tests and production code can now safely read the property after awaiting _canUseCiVisProtocolPromise
    • Matches the pattern in AgentlessCiVisibilityExporter which sets _isGzipCompatible synchronously in the constructor
  2. Test cleanup fix (packages/dd-trace/test/ci-visibility/exporters/agent-proxy/agent-proxy.spec.js):

    • Add nock.cleanAll() to beforeEach hook to prevent HTTP mock state leakage between tests
    • Without cleanup, nock interceptors from previous tests could interfere with subsequent tests
    • Follows the pattern used in exporter.spec.js and other test files

The flakiness only manifested in CI due to parallel test execution, process reuse, and timing variations that don't occur in local development.

@watson watson requested a review from a team as a code owner January 29, 2026 09:01
Copy link
Collaborator Author

watson commented Jan 29, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 29, 2026

Overall package size

Self size: 4.46 MB
Deduped: 5.3 MB
No deduping: 5.3 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 2.0.3 | 76.87 kB | 808.03 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@watson watson self-assigned this Jan 29, 2026
@codecov
Copy link

codecov bot commented Jan 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.16%. Comparing base (d119cf2) to head (3e692b1).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7375      +/-   ##
==========================================
- Coverage   86.16%   86.16%   -0.01%     
==========================================
  Files         515      515              
  Lines       22246    22245       -1     
==========================================
- Hits        19169    19168       -1     
  Misses       3077     3077              
Flag Coverage Δ
aiguard-macos 99.09% <ø> (ø)
aiguard-ubuntu 99.09% <ø> (ø)
aiguard-windows 99.09% <ø> (ø)
apm-capabilities-tracing-macos 57.84% <100.00%> (-0.01%) ⬇️
apm-capabilities-tracing-ubuntu 57.83% <100.00%> (-0.01%) ⬇️
apm-capabilities-tracing-windows 57.45% <100.00%> (-0.01%) ⬇️
apm-integrations-child-process 99.19% <ø> (ø)
apm-integrations-couchbase-18 100.00% <ø> (ø)
apm-integrations-couchbase-eol 100.00% <ø> (ø)
appsec-express 62.54% <ø> (ø)
appsec-fastify 58.55% <ø> (ø)
appsec-graphql 53.40% <ø> (ø)
appsec-kafka 43.98% <ø> (ø)
appsec-ldapjs 46.04% <ø> (ø)
appsec-lodash 47.29% <ø> (ø)
appsec-macos 93.76% <ø> (ø)
appsec-mongodb-core 51.82% <ø> (ø)
appsec-mongoose 50.73% <ø> (ø)
appsec-mysql 54.16% <ø> (ø)
appsec-node-serialize 43.92% <ø> (ø)
appsec-passport 48.07% <ø> (ø)
appsec-postgres 54.51% <ø> (ø)
appsec-sourcing 33.80% <ø> (ø)
appsec-template 43.92% <ø> (ø)
appsec-ubuntu 93.76% <ø> (ø)
appsec-windows 93.76% <ø> (ø)
llmobs-ai 52.09% <ø> (ø)
llmobs-anthropic 42.73% <ø> (ø)
llmobs-bedrock 40.06% <ø> (ø)
llmobs-google-genai 45.89% <ø> (ø)
llmobs-langchain 50.15% <ø> (ø)
llmobs-openai 55.62% <ø> (ø)
llmobs-vertex-ai 44.48% <ø> (ø)
platform-core 87.23% <ø> (ø)
platform-instrumentations-misc 89.16% <ø> (ø)
platform-shimmer 98.80% <ø> (ø)
platform-unit-guardrails 89.47% <ø> (ø)
profiling-macos 70.74% <ø> (ø)
profiling-ubuntu 70.74% <ø> (ø)
profiling-windows 74.20% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@watson watson marked this pull request as draft January 29, 2026 09:07
@pr-commenter
Copy link

pr-commenter bot commented Jan 29, 2026

Benchmarks

Benchmark execution time: 2026-01-29 11:56:13

Comparing candidate commit 3e692b1 in PR branch watson/fix-race with baseline commit d119cf2 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 231 metrics, 29 unstable metrics.

juan-fernandez
juan-fernandez previously approved these changes Jan 29, 2026
Fixes a race condition in AgentProxyCiVisibilityExporter where
`_isGzipCompatible` and `evpProxyPrefix` could be read before
being set, causing intermittent test failures in CI.

The race condition was amplified by commit 6eed53f which added
caching to fetchAgentInfo. When the cache is hit, the callback
runs via process.nextTick (extremely fast), making the timing
window between promise resolution and property assignment visible.

Production code fix:
- Move `this._isGzipCompatible = isGzipCompatible` to execute
  before `this._resolveCanUseCiVisProtocol()` in the callback
- This ensures the property is set before tests (or production
  code) can read it after awaiting `_canUseCiVisProtocolPromise`
- Matches the pattern in AgentlessCiVisibilityExporter

Test cleanup fix:
- Add `nock.cleanAll()` to beforeEach hook to prevent HTTP mock
  state leakage between tests
- Without cleanup, nock interceptors from previous tests could
  interfere with subsequent tests
- Follows the pattern in exporter.spec.js and other test files

Fixes 4 flaky tests:
- _isGzipCompatible (v4+ and v3 version checks)
- evpProxyPrefix (v2 and v4 prefix assignments)
@watson watson changed the title test(ci-visibility): fix flaky agent-proxy tests fix(ci-visibility): fix race condition in agent-proxy exporter Jan 29, 2026
Copy link
Collaborator

@juan-fernandez juan-fernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! Thanks

@watson watson marked this pull request as ready for review January 29, 2026 12:03
@watson watson enabled auto-merge (squash) January 29, 2026 12:03
@watson watson merged commit e12c195 into master Jan 29, 2026
816 of 817 checks passed
@watson watson deleted the watson/fix-race branch January 29, 2026 12:05
dd-octo-sts bot pushed a commit that referenced this pull request Jan 29, 2026
Fixes a race condition in AgentProxyCiVisibilityExporter where
`_isGzipCompatible` and `evpProxyPrefix` could be read before
being set, causing intermittent test failures in CI.

The race condition was amplified by commit 6eed53f which added
caching to fetchAgentInfo. When the cache is hit, the callback
runs via process.nextTick (extremely fast), making the timing
window between promise resolution and property assignment visible.

Production code fix:
- Move `this._isGzipCompatible = isGzipCompatible` to execute
  before `this._resolveCanUseCiVisProtocol()` in the callback
- This ensures the property is set before tests (or production
  code) can read it after awaiting `_canUseCiVisProtocolPromise`
- Matches the pattern in AgentlessCiVisibilityExporter

Test cleanup fix:
- Add `nock.cleanAll()` to beforeEach hook to prevent HTTP mock
  state leakage between tests
- Without cleanup, nock interceptors from previous tests could
  interfere with subsequent tests
- Follows the pattern in exporter.spec.js and other test files

Fixes 4 flaky tests:
- _isGzipCompatible (v4+ and v3 version checks)
- evpProxyPrefix (v2 and v4 prefix assignments)
@dd-octo-sts dd-octo-sts bot mentioned this pull request Jan 29, 2026
BridgeAR pushed a commit that referenced this pull request Jan 30, 2026
Fixes a race condition in AgentProxyCiVisibilityExporter where
`_isGzipCompatible` and `evpProxyPrefix` could be read before
being set, causing intermittent test failures in CI.

The race condition was amplified by commit 6eed53f which added
caching to fetchAgentInfo. When the cache is hit, the callback
runs via process.nextTick (extremely fast), making the timing
window between promise resolution and property assignment visible.

Production code fix:
- Move `this._isGzipCompatible = isGzipCompatible` to execute
  before `this._resolveCanUseCiVisProtocol()` in the callback
- This ensures the property is set before tests (or production
  code) can read it after awaiting `_canUseCiVisProtocolPromise`
- Matches the pattern in AgentlessCiVisibilityExporter

Test cleanup fix:
- Add `nock.cleanAll()` to beforeEach hook to prevent HTTP mock
  state leakage between tests
- Without cleanup, nock interceptors from previous tests could
  interfere with subsequent tests
- Follows the pattern in exporter.spec.js and other test files

Fixes 4 flaky tests:
- _isGzipCompatible (v4+ and v3 version checks)
- evpProxyPrefix (v2 and v4 prefix assignments)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants