Skip to content

Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

aseembits93
Copy link
Contributor

@aseembits93 aseembits93 commented Jun 7, 2025

This custom is designed for future use, it's currently not being imported anywhere in codeflash

PR Type

Enhancement, Tests


Description

  • Add custom pytest benchmark fixture plugin

  • Integrate plugin into test runner environments

  • Add marker-based benchmark test case


Changes walkthrough 📝

Relevant files
Tests
test_benchmark_bubble_sort.py
Add marker-based benchmark test                                                   

code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py

  • Added test_sort_with_marker with @pytest.mark.benchmark
  • Demonstrates nested benchmark decorator usage
  • +7/-0     
    Enhancement
    custom_plugin.py
    Implement custom pytest benchmark fixture                               

    codeflash/benchmarking/plugin/custom_plugin.py

  • New pytest plugin providing CustomBenchmark fixture
  • Implements no-op __call__, pedantic, and attribute stubs
  • Configures benchmark marker and real-fixture fallback logic
  • +77/-0   
    Configuration changes
    test_runner.py
    Include benchmark plugin in test runner env                           

    codeflash/verification/test_runner.py

  • Updated PYTEST_PLUGINS to include custom benchmark plugin
  • Applied across behavioral, line profile, and benchmarking runners
  • +9/-3     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @aseembits93 aseembits93 marked this pull request as draft June 7, 2025 01:17
    Copy link

    github-actions bot commented Jun 7, 2025

    PR Reviewer Guide 🔍

    (Review updated until commit 3c76a95)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Fixture Override

    The benchmark fixture unconditionally returns the custom implementation for all tests, making the @pytest.mark.benchmark check redundant and preventing the real pytest-benchmark fixture from ever being used.

    custom_benchmark = CustomBenchmark()
    if request.node.get_closest_marker("benchmark"):
        # Return our custom benchmark for tests marked with @pytest.mark.benchmark
        return custom_benchmark
    return custom_benchmark
    Weak Assertion

    The test_sort_with_marker test uses assert 1==1, which does not verify the sorting behavior. It should assert that the output matches the expected sorted list.

    @pytest.mark.benchmark
    def test_sort_with_marker(benchmark):
        @benchmark
        def inner_fn():
            return sorter(list(reversed(range(5))))
        assert 1==1

    Copy link

    github-actions bot commented Jun 7, 2025

    PR Code Suggestions ✨

    Latest suggestions up to 3c76a95

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Support decorator usage

    Update call to support decorator usage by returning a wrapper when only a
    callable is passed, so that decorating a function postpones its execution until it’s
    explicitly invoked.

    codeflash/benchmarking/plugin/custom_plugin.py [16-18]

    -def __call__(self, func, *args, **kwargs):  # noqa: ANN204, ANN001, ANN002, ANN003
    +def __call__(self, *args, **kwargs):  # noqa: ANN204, ANN001
    +    # If used as decorator, return a wrapper for later invocation
    +    if len(args) == 1 and callable(args[0]) and not kwargs:
    +        func = args[0]
    +        def wrapper(*a, **k):
    +            return func(*a, **k)
    +        return wrapper
         # Just call the function without measuring anything
    -    return func(*args, **kwargs)
    +    func, *f_args = args
    +    return func(*f_args, **kwargs)
    Suggestion importance[1-10]: 8

    __

    Why: Adding a wrapper for decorator usage prevents the benchmark fixture from immediately invoking the function at decoration time and allows proper deferred execution.

    Medium
    Possible issue
    Assert actual sorted result

    Replace the dummy assertion with a real check by invoking inner_fn and comparing its
    output to the expected sorted list. This ensures the benchmark‐decorated function
    actually runs and is validated.

    code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [10-15]

     @pytest.mark.benchmark
     def test_sort_with_marker(benchmark):
         @benchmark
         def inner_fn():
             return sorter(list(reversed(range(5))))
    -    assert 1==1
    +    result = inner_fn()
    +    assert result == list(range(5))
    Suggestion importance[1-10]: 7

    __

    Why: Replacing the dummy assert 1==1 with assert result == list(range(5)) ensures the benchmark‐decorated function actually runs and validates the sort output.

    Medium

    Previous suggestions

    Suggestions up to commit 3c76a95
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Execute and validate benchmark function

    Ensure the inner benchmark function is actually executed and its output validated.
    Call inner_fn() and assert that it returns the expected sorted list instead of a
    placeholder assertion.

    code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [10-15]

     @pytest.mark.benchmark
     def test_sort_with_marker(benchmark):
         @benchmark
         def inner_fn():
             return sorter(list(reversed(range(5))))
    -    assert 1==1
    +    result = inner_fn()
    +    assert result == list(range(5))
    Suggestion importance[1-10]: 8

    __

    Why: The test currently asserts a placeholder 1==1 without invoking inner_fn(), so calling it and checking the sorted result fixes test correctness.

    Medium
    Detect plugin with pluginmanager

    Use pytest's plugin manager to detect if the benchmark plugin is active, avoiding
    reliance on request.fixturenames. Replace the check with
    request.config.pluginmanager.hasplugin("pytest_benchmark") before retrieving the
    real fixture.

    codeflash/benchmarking/plugin/custom_plugin.py [67-72]

    -if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"):
    +if request.config.pluginmanager.hasplugin("pytest_benchmark"):
         try:
    -        # Try to get the real benchmark fixture
             return request.getfixturevalue("benchmark")
         except (pytest.FixtureLookupError, AttributeError):
             pass
    Suggestion importance[1-10]: 6

    __

    Why: Using pluginmanager.hasplugin("pytest_benchmark") is a more reliable way to detect the real benchmark fixture than inspecting request.fixturenames.

    Low
    General
    Add iteration loop in pedantic

    Implement iteration and round loops to more closely mimic pytest-benchmark’s
    pedantic. Loop over iterations (and optionally rounds) calling target, and return
    the last result.

    codeflash/benchmarking/plugin/custom_plugin.py [24-39]

     def pedantic(
         self,
         target: Callable,
         args: tuple = (),
         kwargs: Optional[dict] = None,
         iterations: int = 1,
         rounds: int = 1,
         warmup_rounds: int = 0,
         setup: Optional[Callable] = None,
     ) -> Any:
         """Mimics the pedantic method of pytest-benchmark."""
         if setup:
             setup()
         if kwargs is None:
             kwargs = {}
    -    return target(*args, **kwargs)
    +    result = None
    +    for _ in range(iterations):
    +        result = target(*args, **kwargs)
    +    return result
    Suggestion importance[1-10]: 6

    __

    Why: Looping over iterations in pedantic better mimics pytest-benchmark behavior and corrects the current single-call implementation.

    Low
    Support chained no-op calls

    Allow chained calls on the custom benchmark by returning self instead of None from
    no-op methods. This supports patterns like benchmark.timer.start().stop().

    codeflash/benchmarking/plugin/custom_plugin.py [20-22]

     def __getattr__(self, name):
    -    # Return a no-op callable for any attribute
    -    return lambda *args, **kwargs: None
    +    # Return a no-op callable for any attribute that returns self
    +    return lambda *args, **kwargs: self
    Suggestion importance[1-10]: 4

    __

    Why: Allowing __getattr__ to return self enables chained no-op calls but only marginally improves usability.

    Low
    Suggestions up to commit aa33d77
    CategorySuggestion                                                                                                                                    Impact
    General
    Implement full pedantic loops

    To better emulate pytest-benchmark’s pedantic, run the warmup, iterations, and
    rounds loops instead of calling the target only once. Aggregate and return the last
    result.

    codeflash/benchmarking/plugin/custom_plugin.py [20-35]

     def pedantic(
         self,
         target: Callable,
         args: tuple = (),
    -    kwargs: Optional[dict] = None,  # noqa: FA100
    -    iterations: int = 1,  # noqa: ARG002
    -    rounds: int = 1,  # noqa: ARG002
    -    warmup_rounds: int = 0,  # noqa: ARG002
    -    setup: Optional[Callable] = None,  # noqa: FA100
    -) -> Any:  # noqa: ANN401
    +    kwargs: Optional[dict] = None,
    +    iterations: int = 1,
    +    rounds: int = 1,
    +    warmup_rounds: int = 0,
    +    setup: Optional[Callable] = None,
    +) -> Any:
         """Mimics the pedantic method of pytest-benchmark."""
    -    if setup:
    -        setup()
         if kwargs is None:
             kwargs = {}
    -    return target(*args, **kwargs)
    +    # warmup
    +    for _ in range(warmup_rounds):
    +        if setup:
    +            setup()
    +        target(*args, **kwargs)
    +    # timed loops
    +    result = None
    +    for _ in range(rounds):
    +        if setup:
    +            setup()
    +        for _ in range(iterations):
    +            result = target(*args, **kwargs)
    +    return result
    Suggestion importance[1-10]: 7

    __

    Why: Adding the warmup, iterations, and rounds loops mirrors pytest-benchmark’s intended behavior and correctly applies to the existing pedantic definition without breaking other logic.

    Medium
    Use plugin manager to detect benchmark plugin

    Rely on pytest’s plugin manager to detect the real benchmark plugin instead of
    inspecting request.fixturenames, which can cause recursion. This ensures you only
    override when the pytest-benchmark plugin is not installed.

    codeflash/benchmarking/plugin/custom_plugin.py [63-68]

    -if "benchmark" in request.fixturenames and hasattr(request, "_fixturemanager"):
    -    try:
    -        # Try to get the real benchmark fixture
    -        return request.getfixturevalue("benchmark")
    -    except (pytest.FixtureLookupError, AttributeError):
    -        pass
    +if not request.config.pluginmanager.hasplugin("benchmark"):
    +    return CustomBenchmark()
    +return request.getfixturevalue("benchmark")
    Suggestion importance[1-10]: 6

    __

    Why: Relying on pluginmanager.hasplugin("benchmark") is a cleaner way to detect the real plugin, but the proposed snippet drops the existing exception fallback which could lead to uncaught errors if the fixture isn’t registered.

    Low
    Suggestions up to commit 56108b5
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix DummyBenchmark instantiation

    The DummyBenchmark constructor does not accept any arguments, so instantiate it
    without passing request to avoid a TypeError.

    codeflash/benchmarking/plugin/dummy_plugin.py [85]

    -return CodeFlashBenchmarkDummyPlugin.DummyBenchmark(request)
    +return CodeFlashBenchmarkDummyPlugin.DummyBenchmark()
    Suggestion importance[1-10]: 9

    __

    Why: Removing the request argument fixes a real TypeError and restores proper instantiation of DummyBenchmark.

    High
    General
    Simulate iteration loops

    Simulate the configured iterations and rounds to better mimic pytest-benchmark's
    behavior by looping the target call accordingly and returning the last result.

    code_to_optimize/tests/pytest/benchmarks/test_benchmark_bubble_sort.py [17-25]

     def pedantic(self, target: Callable, args: tuple = (), kwargs: dict = None,
                  iterations: int = 1, rounds: int = 1, warmup_rounds: int = 0,
                  setup: Callable = None) -> Any:
         """Mimics the pedantic method of pytest-benchmark."""
         if setup:
             setup()
         if kwargs is None:
             kwargs = {}
    -    return target(*args, **kwargs)
    +    result = None
    +    for _ in range(rounds):
    +        for _ in range(iterations):
    +            result = target(*args, **kwargs)
    +    return result
    Suggestion importance[1-10]: 5

    __

    Why: Adding loops for iterations and rounds correctly improves fidelity to pytest-benchmark without introducing errors.

    Low

    @aseembits93 aseembits93 changed the title Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is diabled Custom Benchmark fixture for running benchmark tests when the pytest-benchmark plugin is disabled Jun 9, 2025
    @aseembits93 aseembits93 requested a review from KRRT7 June 9, 2025 23:00
    @aseembits93 aseembits93 marked this pull request as ready for review June 9, 2025 23:03
    Copy link

    github-actions bot commented Jun 9, 2025

    Persistent review updated to latest commit aa33d77

    @aseembits93 aseembits93 marked this pull request as draft June 10, 2025 01:55
    @aseembits93 aseembits93 marked this pull request as ready for review June 10, 2025 23:20
    Copy link

    Persistent review updated to latest commit 3c76a95

    @aseembits93
    Copy link
    Contributor Author

    closing for now as there are other ways of achieving the same goal

    @aseembits93 aseembits93 reopened this Jun 12, 2025
    Copy link

    Persistent review updated to latest commit 3c76a95

    @aseembits93
    Copy link
    Contributor Author

    @misrasaurabh1 the plugin is not used anywhere in codeflash, it's just defined for future use.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants