You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[NO-TICKET] Attempt to fix flaky profiler spec under test-memcheck
**What does this PR do?**
This PR attempts to fix a flaky profiler spec seen in
https://github.com/DataDog/dd-trace-rb/actions/runs/15284503936/job/42991363261:
```
bundle exec rake compile spec:profiling:memcheck
...
Failures:
1) Datadog::Profiling::Scheduler#stop when exporter has data to flush flushes the data and stops the loop
Failure/Error: expect(scheduler.stop).to be true
expected true
got false
# ./spec/datadog/profiling/scheduler_spec.rb:299:in 'block (4 levels) in <top (required)>'
# ./spec/spec_helper.rb:270:in 'block (2 levels) in <top (required)>'
# ./spec/spec_helper.rb:152:in 'block (2 levels) in <top (required)>'
# ./vendor/bundle/ruby/3.4.0/gems/webmock-3.25.1/lib/webmock/rspec.rb:39:in 'block (2 levels) in <top (required)>'
# ./vendor/bundle/ruby/3.4.0/gems/rspec-wait-0.0.10/lib/rspec/wait.rb:47:in 'block (2 levels) in <top (required)>'
# ./spec/support/execute_in_fork.rb:32:in 'ForkableExample#run'
```
The flaky spec was introduced in
#4679.
The intention of the `expect(scheduler.stop).to be true` assertion
is that the scheduler stops cleanly without having to rely on the
forced termination timeout. Without the changes in that PR, this
spec will fail because the scheduler will keep on running until
it's forced to terminate.
I not reproduce the issue locally.
So for the spec to be failing it means that the forced termination
still had to kick in... I've been staring at the code for a while
and I don't see any races or bugs that might have caused this spec
to fail sometimes.
What did occur to me is that this spec has only been seen failing
inside `test-memcheck`'s run with valgrind. And what does running
an app inside valgrind cause? It causes the app to go way way
slower than normal (and to be truly single-threaded -- even at the
native code level).
So perhaps the regular 1 second timeout, which is more than enough
time in regular situations to give the background thread time to
work and exit cleanly is not enough in a slow CI that is
further slowed by valgrind?
To hopefully fix this, I'm raising the timeout from its default
of 1 to 10 seconds. This doesn't change the intention of the
original spec -- this spec will still correctly fail if the
code added in the above PR is commented out, and require the
forced termination after 10 seconds.
**Motivation:**
Zero profiling flaky specs!
**Additional Notes:**
I've also tweaked the test to avoid leaking the scheduler
thread when the test fails. In CI we saw two failures but I
strongly suspect the second one...
```
2) Datadog::Profiling::Scheduler#perform when enabled when perform fails calls the on_failure_proc and logs the error
Failure/Error: expect(Datadog.logger).to receive(:warn).with(/Profiling::Scheduler thread error/)
(#<Datadog::Core::Logger:0x000000002ebd95b0 @Level=1, @progname="datadog", @default_formatter=#<Logger::Formatter:0x000000002ec10560 @datetime_format=nil>, @Formatter=nil, @logdev=#<Logger::LogDevice:0x000000002c098ea0 @shift_period_suffix=nil, @shift_size=nil, @shift_age=nil, @filename=nil, @dev=#<IO:<STDOUT>>, @binmode=false, @reraise_write_errors=[], @skip_header=false, @mon_data=#<Monitor:0x000000002ec10510>, @mon_data_owner_object_id=13560>, @level_override={}>).warn(/Profiling::Scheduler thread error/)
expected: 1 time with arguments: (/Profiling::Scheduler thread error/)
received: 2 times with arguments: (/Profiling::Scheduler thread error/)
# ./spec/datadog/profiling/scheduler_spec.rb:100:in 'block (5 levels) in <top (required)>'
# ./spec/spec_helper.rb:270:in 'block (2 levels) in <top (required)>'
# ./spec/spec_helper.rb:152:in 'block (2 levels) in <top (required)>'
# ./vendor/bundle/ruby/3.4.0/gems/webmock-3.25.1/lib/webmock/rspec.rb:39:in 'block (2 levels) in <top (required)>'
# ./vendor/bundle/ruby/3.4.0/gems/rspec-wait-0.0.10/lib/rspec/wait.rb:47:in 'block (2 levels) in <top (required)>'
# ./spec/support/execute_in_fork.rb:32:in 'ForkableExample#run'
```
was just a side-effect of the first spec failing and
leaking the thread.
**How to test the change?**
Validate that CI is still green, and hope this is the correct fix
for the flakiness :(
0 commit comments