Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

aseembits93 · 2025-06-07T01:17:43Z

This custom is designed for future use, it's currently not being imported anywhere in codeflash

PR Type

Enhancement, Tests

Description

Add custom pytest benchmark fixture plugin
Integrate plugin into test runner environments
Add marker-based benchmark test case

Changes walkthrough 📝

Relevant files

Tests

test_benchmark_bubble_sort.py `Add marker-based benchmark test` code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py Added `test_sort_with_marker` with `@pytest.mark.benchmark` Demonstrates nested `benchmark` decorator usage	+7/-0

Enhancement

custom_plugin.py `Implement custom pytest benchmark fixture` codeflash/benchmarking/plugin/custom_plugin.py New pytest plugin providing `CustomBenchmark` fixture Implements no-op `__call__`, `pedantic`, and attribute stubs Configures `benchmark` marker and real-fixture fallback logic	+77/-0

Configuration changes

test_runner.py `Include benchmark plugin in test runner env` codeflash/verification/test_runner.py Updated `PYTEST_PLUGINS` to include custom benchmark plugin Applied across behavioral, line profile, and benchmarking runners	+9/-3

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

github-actions · 2025-06-07T01:18:50Z

PR Reviewer Guide 🔍

(Review updated until commit `3c76a95`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Fixture Override The `benchmark` fixture unconditionally returns the custom implementation for all tests, making the `@pytest.mark.benchmark` check redundant and preventing the real pytest-benchmark fixture from ever being used. custom_benchmark = CustomBenchmark() if request.node.get_closest_marker("benchmark"): # Return our custom benchmark for tests marked with @pytest.mark.benchmark return custom_benchmark return custom_benchmark Weak Assertion The `test_sort_with_marker` test uses `assert 1==1`, which does not verify the sorting behavior. It should assert that the output matches the expected sorted list. @pytest.mark.benchmark def test_sort_with_marker(benchmark): @benchmark def inner_fn(): return sorter(list(reversed(range(5)))) assert 1==1

github-actions · 2025-06-07T01:20:05Z

PR Code Suggestions ✨

Latest suggestions up to 3c76a95

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Support decorator usage Update `call` to support decorator usage by returning a wrapper when only a callable is passed, so that decorating a function postpones its execution until it’s explicitly invoked. codeflash/benchmarking/plugin/custom_plugin.py [16-18] -def __call__(self, func, args, kwargs): # noqa: ANN204, ANN001, ANN002, ANN003 +def __call__(self, args, *kwargs): # noqa: ANN204, ANN001 + # If used as decorator, return a wrapper for later invocation + if len(args) == 1 and callable(args[0]) and not kwargs: + func = args[0] + def wrapper(a, *k): + return func(a, *k) + return wrapper # Just call the function without measuring anything - return func(args, *kwargs) + func, f_args = args + return func(f_args, *kwargs) Suggestion importance[1-10]: 8 __ Why: Adding a wrapper for decorator usage prevents the benchmark fixture from immediately invoking the function at decoration time and allows proper deferred execution.	Medium
Possible issue	Assert actual sorted result Replace the dummy assertion with a real check by invoking `inner_fn` and comparing its output to the expected sorted list. This ensures the benchmark‐decorated function actually runs and is validated. code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [10-15] @pytest.mark.benchmark def test_sort_with_marker(benchmark): @benchmark def inner_fn(): return sorter(list(reversed(range(5)))) - assert 1==1 + result = inner_fn() + assert result == list(range(5)) Suggestion importance[1-10]: 7 __ Why: Replacing the dummy `assert 1==1` with `assert result == list(range(5))` ensures the benchmark‐decorated function actually runs and validates the sort output.	Medium

Previous suggestions

Suggestions up to commit 3c76a95

Category	Suggestion	Impact
Possible issue	Execute and validate benchmark function Ensure the inner benchmark function is actually executed and its output validated. Call `inner_fn()` and assert that it returns the expected sorted list instead of a placeholder assertion. code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [10-15] @pytest.mark.benchmark def test_sort_with_marker(benchmark): @benchmark def inner_fn(): return sorter(list(reversed(range(5)))) - assert 1==1 + result = inner_fn() + assert result == list(range(5)) Suggestion importance[1-10]: 8 __ Why: The test currently asserts a placeholder `1==1` without invoking `inner_fn()`, so calling it and checking the sorted result fixes test correctness.	Medium
Possible issue	Detect plugin with pluginmanager Use pytest's plugin manager to detect if the benchmark plugin is active, avoiding reliance on `request.fixturenames`. Replace the check with `request.config.pluginmanager.hasplugin("pytest_benchmark")` before retrieving the real fixture. codeflash/benchmarking/plugin/custom_plugin.py [67-72] -if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"): +if request.config.pluginmanager.hasplugin("pytest_benchmark"): try: - # Try to get the real benchmark fixture return request.getfixturevalue("benchmark") except (pytest.FixtureLookupError, AttributeError): pass Suggestion importance[1-10]: 6 __ Why: Using `pluginmanager.hasplugin("pytest_benchmark")` is a more reliable way to detect the real benchmark fixture than inspecting `request.fixturenames`.	Low
General	Add iteration loop in pedantic Implement iteration and round loops to more closely mimic pytest-benchmark’s `pedantic`. Loop over `iterations` (and optionally `rounds`) calling `target`, and return the last result. codeflash/benchmarking/plugin/custom_plugin.py [24-39] def pedantic( self, target: Callable, args: tuple = (), kwargs: Optional[dict] = None, iterations: int = 1, rounds: int = 1, warmup_rounds: int = 0, setup: Optional[Callable] = None, ) -> Any: """Mimics the pedantic method of pytest-benchmark.""" if setup: setup() if kwargs is None: kwargs = {} - return target(args, kwargs) + result = None + for _ in range(iterations): + result = target(args, **kwargs) + return result Suggestion importance[1-10]: 6 __ Why: Looping over `iterations` in `pedantic` better mimics pytest-benchmark behavior and corrects the current single-call implementation.	Low
General	Support chained no-op calls Allow chained calls on the custom benchmark by returning `self` instead of `None` from no-op methods. This supports patterns like `benchmark.timer.start().stop()`. codeflash/benchmarking/plugin/custom_plugin.py [20-22] def __getattr__(self, name): - # Return a no-op callable for any attribute - return lambda args, kwargs: None + # Return a no-op callable for any attribute that returns self + return lambda args, **kwargs: self Suggestion importance[1-10]: 4 __ Why: Allowing `__getattr__` to return `self` enables chained no-op calls but only marginally improves usability.	Low

Suggestions up to commit aa33d77

Category Suggestion Impact

General

Implement full pedantic loops

To better emulate pytest-benchmark’s pedantic, run the warmup, iterations, and
rounds loops instead of calling the target only once. Aggregate and return the last
result.

codeflash/benchmarking/plugin/custom_plugin.py [20-35]

 def pedantic(
     self,
     target: Callable,
     args: tuple = (),
-    kwargs: Optional[dict] = None,  # noqa: FA100
-    iterations: int = 1,  # noqa: ARG002
-    rounds: int = 1,  # noqa: ARG002
-    warmup_rounds: int = 0,  # noqa: ARG002
-    setup: Optional[Callable] = None,  # noqa: FA100
-) -> Any:  # noqa: ANN401
+    kwargs: Optional[dict] = None,
+    iterations: int = 1,
+    rounds: int = 1,
+    warmup_rounds: int = 0,
+    setup: Optional[Callable] = None,
+) -> Any:
     """Mimics the pedantic method of pytest-benchmark."""
-    if setup:
-        setup()
     if kwargs is None:
         kwargs = {}
-    return target(*args, **kwargs)
+    # warmup
+    for _ in range(warmup_rounds):
+        if setup:
+            setup()
+        target(*args, **kwargs)
+    # timed loops
+    result = None
+    for _ in range(rounds):
+        if setup:
+            setup()
+        for _ in range(iterations):
+            result = target(*args, **kwargs)
+    return result

Suggestion importance[1-10]: 7

__

Why: Adding the warmup, iterations, and rounds loops mirrors pytest-benchmark’s intended behavior and correctly applies to the existing pedantic definition without breaking other logic.

Medium

Use plugin manager to detect benchmark plugin

Rely on pytest’s plugin manager to detect the real benchmark plugin instead of
inspecting request.fixturenames, which can cause recursion. This ensures you only
override when the pytest-benchmark plugin is not installed.

codeflash/benchmarking/plugin/custom_plugin.py [63-68]

-if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"):
-    try:
-        # Try to get the real benchmark fixture
-        return request.getfixturevalue("benchmark")
-    except (pytest.FixtureLookupError, AttributeError):
-        pass
+if not request.config.pluginmanager.hasplugin("benchmark"):
+    return CustomBenchmark()
+return request.getfixturevalue("benchmark")

Suggestion importance[1-10]: 6

__

Why: Relying on pluginmanager.hasplugin("benchmark") is a cleaner way to detect the real plugin, but the proposed snippet drops the existing exception fallback which could lead to uncaught errors if the fixture isn’t registered.

Low

Suggestions up to commit 56108b5

Category	Suggestion	Impact
Possible issue	Fix DummyBenchmark instantiation The `DummyBenchmark` constructor does not accept any arguments, so instantiate it without passing `request` to avoid a `TypeError`. codeflash/benchmarking/plugin/dummy_plugin.py [85] -return CodeFlashBenchmarkDummyPlugin.DummyBenchmark(request) +return CodeFlashBenchmarkDummyPlugin.DummyBenchmark() Suggestion importance[1-10]: 9 __ Why: Removing the `request` argument fixes a real `TypeError` and restores proper instantiation of `DummyBenchmark`.	High
General	Simulate iteration loops Simulate the configured `iterations` and `rounds` to better mimic pytest-benchmark's behavior by looping the target call accordingly and returning the last result. code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [17-25] def pedantic(self, target: Callable, args: tuple = (), kwargs: dict = None, iterations: int = 1, rounds: int = 1, warmup_rounds: int = 0, setup: Callable = None) -> Any: """Mimics the pedantic method of pytest-benchmark.""" if setup: setup() if kwargs is None: kwargs = {} - return target(args, kwargs) + result = None + for _ in range(rounds): + for _ in range(iterations): + result = target(args, **kwargs) + return result Suggestion importance[1-10]: 5 __ Why: Adding loops for `iterations` and `rounds` correctly improves fidelity to pytest-benchmark without introducing errors.	Low

github-actions · 2025-06-09T23:04:53Z

Persistent review updated to latest commit aa33d77

github-actions · 2025-06-10T23:21:59Z

Persistent review updated to latest commit 3c76a95

codeflash/verification/test_runner.py

aseembits93 · 2025-06-11T04:10:32Z

closing for now as there are other ways of achieving the same goal

github-actions · 2025-06-12T18:30:48Z

Persistent review updated to latest commit 3c76a95

aseembits93 · 2025-06-12T18:36:06Z

@misrasaurabh1 the plugin is not used anywhere in codeflash, it's just defined for future use.

need to fix precommit errors

56108b5

aseembits93 marked this pull request as draft June 7, 2025 01:17

github-actions bot added the Review effort 2/5 label Jun 7, 2025

aseembits93 added the openhands label Jun 9, 2025

aseembits93 added 3 commits June 9, 2025 12:50

rename

5b68810

works

c92da49

bugfix

5220b5e

aseembits93 changed the title ~~Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is diabled~~ Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled Jun 9, 2025

aseembits93 requested a review from KRRT7 June 9, 2025 23:00

aseembits93 marked this pull request as ready for review June 9, 2025 23:03

Merge branch 'main' into benchmark-fixture

aa33d77

ready to review

a4b204d

aseembits93 marked this pull request as draft June 10, 2025 01:55

aseembits93 marked this pull request as ready for review June 10, 2025 23:20

Merge branch 'main' into benchmark-fixture

3c76a95

aseembits93 requested a review from Saga4 June 10, 2025 23:21

github-actions bot added Review effort 3/5 and removed Review effort 2/5 labels Jun 10, 2025

aseembits93 removed the openhands label Jun 10, 2025

aseembits93 requested a review from misrasaurabh1 June 10, 2025 23:53

misrasaurabh1 reviewed Jun 11, 2025

View reviewed changes

codeflash/verification/test_runner.py Outdated Show resolved Hide resolved

aseembits93 closed this Jun 11, 2025

aseembits93 reopened this Jun 12, 2025

Update test_runner.py

c1f36ac

Merge branch 'main' into benchmark-fixture

76603f2

aseembits93 added 2 commits June 12, 2025 11:37

precommit fix

f030f41

precommit fix

3373197

aseembits93 requested a review from misrasaurabh1 June 12, 2025 18:45

aseembits93 added 3 commits June 12, 2025 22:57

Merge branch 'main' into benchmark-fixture

fa75373

Merge branch 'main' into benchmark-fixture

84fa47d

Merge branch 'main' into benchmark-fixture

07356f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

Uh oh!

aseembits93 commented Jun 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 9, 2025

Uh oh!

github-actions bot commented Jun 10, 2025

Uh oh!

Uh oh!

aseembits93 commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

aseembits93 commented Jun 12, 2025

Uh oh!

Uh oh!

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

Are you sure you want to change the base?

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

Uh oh!

Conversation

aseembits93 commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Changes walkthrough 📝

Uh oh!

github-actions bot commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 3c76a95)

Uh oh!

github-actions bot commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Jun 9, 2025

Uh oh!

github-actions bot commented Jun 10, 2025

Uh oh!

Uh oh!

aseembits93 commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

aseembits93 commented Jun 12, 2025

Uh oh!

Uh oh!

aseembits93 commented Jun 7, 2025 •

edited

Loading

github-actions bot commented Jun 7, 2025 •

edited

Loading

(Review updated until commit `3c76a95`)

github-actions bot commented Jun 7, 2025 •

edited

Loading