Skip to content

⚡️ Speed up method FunctionRanker.rank_functions by 13% in PR #384 (trace-and-optimize) #458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

codeflash-ai[bot]
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Jun 30, 2025

⚡️ This pull request contains optimizations for PR #384

If you approve this dependent PR, these changes will be merged into the original PR branch trace-and-optimize.

This PR will be automatically closed if the original PR is merged.


📄 13% (0.13x) speedup for FunctionRanker.rank_functions in codeflash/benchmarking/function_ranker.py

⏱️ Runtime : 1.84 milliseconds 1.62 milliseconds (best of 67 runs)

📝 Explanation and details

Here is an optimized rewrite of your FunctionRanker class.
Key speed optimizations applied:

  1. Avoid repeated loading of function stats:
    The original code reloads function stats for each function during ranking (get_function_ttx_score() is called per function and loads/returns). We prefetch stats once in rank_functions() and reuse them for all lookups.

  2. Inline and batch lookups:
    We use a helper to batch compute scores directly via a pre-fetched stats dict. This removes per-call overhead from attribute access and creation of possible keys inside the hot loop.

  3. Minimal string operations:
    We precompute the two possible key formats needed for lookup (file:qualified and file:function) for all items only ONCE, instead of per invocation.

  4. Skip list-comprehension in favor of tuple-unpacking:
    Use generator expressions for lower overhead when building output.

  5. Fast path with dict.get() lookup:
    Avoid redundant if key in dict by just trying dict.get(key).

  6. Do not change signatures or behavior.
    Do not rename any classes or functions.
    All logging, ordering, functionality is preserved.

Summary of performance impact:

  • The stats are loaded only once, not per function.
  • String concatenations for keys are only performed twice per function (and not redundantly in both rank_functions and get_function_ttx_score).
  • All lookup and sorting logic remains as in the original so results will match, but runtime (especially for large lists) will be significantly better.
  • If you want, you could further optimize by memoizing scores with LRU cache, but with this design, dictionary operations are already the bottleneck, and this is the lowest-overhead idiomatic Python approach.
  • No imports, function names, or signatures are changed.

Let me know if you need further GPU-based or numpy/pandas-style speedups!

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 35 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from pathlib import Path

# imports
import pytest
from codeflash.benchmarking.function_ranker import FunctionRanker


class FunctionToOptimize:
    """Minimal stub for FunctionToOptimize."""
    def __init__(self, file_path, function_name, qualified_name=None):
        self.file_path = file_path
        self.function_name = function_name
        self.qualified_name = qualified_name or function_name

    def __repr__(self):
        return f"FunctionToOptimize({self.file_path!r}, {self.function_name!r}, {self.qualified_name!r})"

    def __eq__(self, other):
        return (
            isinstance(other, FunctionToOptimize)
            and self.file_path == other.file_path
            and self.function_name == other.function_name
            and self.qualified_name == other.qualified_name
        )

    def __hash__(self):
        return hash((self.file_path, self.function_name, self.qualified_name))

# FunctionRanker (from above) - already defined

# Patch FunctionRanker to allow injection of stats for testing
class TestableFunctionRanker(FunctionRanker):
    def __init__(self, trace_file_path: Path, injected_stats=None):
        super().__init__(trace_file_path)
        self._injected_stats = injected_stats

    def load_function_stats(self) -> dict[str, dict]:
        if self._function_stats is not None:
            return self._function_stats

        self._function_stats = {}

        # Use injected stats if provided (for testing)
        stats = self._injected_stats if self._injected_stats is not None else {}

        for (filename, line_number, function_name), (
            call_count,
            _num_callers,
            total_time_ns,
            cumulative_time_ns,
            _callers,
        ) in stats.items():
            if call_count <= 0:
                continue

            if "." in function_name and not function_name.startswith("<"):
                parts = function_name.split(".", 1)
                if len(parts) == 2:
                    class_name, method_name = parts
                    qualified_name = function_name
                    base_function_name = method_name
                else:
                    class_name = None
                    qualified_name = function_name
                    base_function_name = function_name
            else:
                class_name = None
                qualified_name = function_name
                base_function_name = function_name

            own_time_ns = total_time_ns
            time_in_callees_ns = cumulative_time_ns - total_time_ns
            ttx_score = own_time_ns + (time_in_callees_ns * call_count)

            function_key = f"{filename}:{qualified_name}"
            self._function_stats[function_key] = {
                "filename": filename,
                "function_name": base_function_name,
                "qualified_name": qualified_name,
                "class_name": class_name,
                "line_number": line_number,
                "call_count": call_count,
                "own_time_ns": own_time_ns,
                "cumulative_time_ns": cumulative_time_ns,
                "time_in_callees_ns": time_in_callees_ns,
                "ttx_score": ttx_score,
            }
        return self._function_stats

# Helper to build stats dict in the format expected by ProfileStats
def build_stats(entries):
    """
    entries: list of tuples:
        (filename, line_number, function_name, call_count, total_time_ns, cumulative_time_ns)
    Returns: dict in ProfileStats.stats format
    """
    stats = {}
    for (filename, line_number, function_name, call_count, total_time_ns, cumulative_time_ns) in entries:
        stats[(filename, line_number, function_name)] = (
            call_count,
            0,  # num_callers, not used
            total_time_ns,
            cumulative_time_ns,
            {},  # callers, not used
        )
    return stats

# ========== BASIC TEST CASES ==========

def test_rank_functions_basic_single_function():
    """Test ranking with a single function."""
    stats = build_stats([
        ("foo.py", 10, "func_a", 5, 1000, 1500),
    ])
    func = FunctionToOptimize("foo.py", "func_a")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func]); ranked = codeflash_output # 82.7μs -> 90.6μs (8.65% slower)

def test_rank_functions_basic_multiple_functions_ordering():
    """Test that functions are ranked in correct ttX order."""
    # func_a: own 1000, callees 500, calls 5 -> ttx = 1000 + (500*5) = 3500
    # func_b: own 2000, callees 100, calls 2 -> ttx = 2000 + (100*2) = 2200
    # func_c: own 500,  callees 200, calls 10 -> ttx = 500 + (200*10) = 2500
    stats = build_stats([
        ("foo.py", 10, "func_a", 5, 1000, 1500),
        ("foo.py", 20, "func_b", 2, 2000, 2100),
        ("foo.py", 30, "func_c", 10, 500, 2500),
    ])
    func_a = FunctionToOptimize("foo.py", "func_a")
    func_b = FunctionToOptimize("foo.py", "func_b")
    func_c = FunctionToOptimize("foo.py", "func_c")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func_a, func_b, func_c]); ranked = codeflash_output # 59.7μs -> 58.2μs (2.56% faster)

def test_rank_functions_basic_same_score():
    """Test that functions with the same ttX score preserve input order (stable sort)."""
    # Both functions have ttx = 1000
    stats = build_stats([
        ("foo.py", 10, "func_a", 1, 1000, 1000),
        ("foo.py", 20, "func_b", 2, 500, 1000),
    ])
    func_a = FunctionToOptimize("foo.py", "func_a")
    func_b = FunctionToOptimize("foo.py", "func_b")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func_a, func_b]); ranked = codeflash_output # 56.3μs -> 55.1μs (2.18% faster)
    codeflash_output = ranker.rank_functions([func_b, func_a]); ranked_rev = codeflash_output # 37.1μs -> 36.4μs (2.01% faster)

def test_rank_functions_basic_zero_ttX():
    """Test that a function with no timing info gets a score of 0 and is sorted last."""
    stats = build_stats([
        ("foo.py", 10, "func_a", 1, 1000, 1000),
    ])
    func_a = FunctionToOptimize("foo.py", "func_a")
    func_b = FunctionToOptimize("foo.py", "func_b")  # Not in stats
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func_a, func_b]); ranked = codeflash_output # 54.7μs -> 53.0μs (3.19% faster)

def test_rank_functions_basic_qualified_name_vs_function_name():
    """Test that qualified_name is preferred for lookup if present."""
    stats = build_stats([
        ("foo.py", 10, "MyClass.my_method", 3, 100, 400),  # ttx = 100 + (300*3) = 1000
    ])
    func = FunctionToOptimize("foo.py", "my_method", qualified_name="MyClass.my_method")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func]); ranked = codeflash_output # 54.4μs -> 52.3μs (4.00% faster)

# ========== EDGE TEST CASES ==========

def test_rank_functions_edge_empty_input():
    """Test with an empty list of functions."""
    stats = build_stats([
        ("foo.py", 10, "func_a", 1, 1000, 1000),
    ])
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([]); ranked = codeflash_output # 49.7μs -> 49.5μs (0.365% faster)

def test_rank_functions_edge_no_stats():
    """Test when stats are empty (no timing data at all)."""
    stats = build_stats([])
    func_a = FunctionToOptimize("foo.py", "func_a")
    func_b = FunctionToOptimize("foo.py", "func_b")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func_a, func_b]); ranked = codeflash_output # 48.2μs -> 51.1μs (5.74% slower)

def test_rank_functions_edge_zero_call_count():
    """Test that functions with call_count <= 0 are ignored in stats."""
    stats = build_stats([
        ("foo.py", 10, "func_a", 0, 1000, 1000),
        ("foo.py", 20, "func_b", -1, 2000, 2000),
        ("foo.py", 30, "func_c", 2, 500, 700),
    ])
    func_a = FunctionToOptimize("foo.py", "func_a")
    func_b = FunctionToOptimize("foo.py", "func_b")
    func_c = FunctionToOptimize("foo.py", "func_c")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func_a, func_b, func_c]); ranked = codeflash_output # 55.4μs -> 52.4μs (5.83% faster)

def test_rank_functions_edge_negative_times():
    """Test negative own_time or cumulative_time (should still compute ttx)."""
    stats = build_stats([
        ("foo.py", 10, "func_a", 2, -100, 100),   # own_time negative
        ("foo.py", 20, "func_b", 1, 100, -200),   # cumulative_time negative
    ])
    func_a = FunctionToOptimize("foo.py", "func_a")
    func_b = FunctionToOptimize("foo.py", "func_b")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func_a, func_b]); ranked = codeflash_output # 54.3μs -> 52.5μs (3.38% faster)

def test_rank_functions_edge_duplicate_functions():
    """Test that duplicate FunctionToOptimize objects are all ranked."""
    stats = build_stats([
        ("foo.py", 10, "func_a", 1, 1000, 1000),
    ])
    func1 = FunctionToOptimize("foo.py", "func_a")
    func2 = FunctionToOptimize("foo.py", "func_a")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func1, func2]); ranked = codeflash_output # 54.2μs -> 52.3μs (3.64% faster)

def test_rank_functions_edge_class_method_vs_function():
    """Test that class method and function with same name are distinguished."""
    stats = build_stats([
        ("foo.py", 10, "MyClass.func", 2, 100, 300),    # ttx = 100 + (200*2) = 500
        ("foo.py", 20, "func", 1, 200, 200),            # ttx = 200
    ])
    func_class = FunctionToOptimize("foo.py", "func", qualified_name="MyClass.func")
    func_plain = FunctionToOptimize("foo.py", "func")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func_class, func_plain]); ranked = codeflash_output # 55.4μs -> 53.4μs (3.83% faster)

def test_rank_functions_edge_nonexistent_function():
    """Test that a function not present in stats gets a score of 0."""
    stats = build_stats([
        ("foo.py", 10, "func_a", 1, 1000, 1000),
    ])
    func = FunctionToOptimize("foo.py", "not_in_stats")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func]); ranked = codeflash_output # 52.6μs -> 51.7μs (1.82% faster)

# ========== LARGE SCALE TEST CASES ==========

def test_rank_functions_large_many_functions():
    """Test ranking with a large number of functions (performance and correctness)."""
    num_funcs = 500  # Stays within the <1000 element guideline
    stats_entries = []
    funcs = []
    for i in range(num_funcs):
        # ttx = own + (callees * call_count) = i + (i*2 * (i+1)) = i + 2*i*(i+1)
        stats_entries.append(("foo.py", i, f"func_{i}", i+1, i, i*2))
        funcs.append(FunctionToOptimize("foo.py", f"func_{i}"))
    stats = build_stats(stats_entries)
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions(funcs); ranked = codeflash_output

def test_rank_functions_large_some_missing_stats():
    """Test ranking with some functions missing from stats."""
    num_funcs = 100
    stats_entries = []
    funcs = []
    for i in range(num_funcs):
        stats_entries.append(("foo.py", i, f"func_{i}", 1, 1000 + i, 1500 + i))
        funcs.append(FunctionToOptimize("foo.py", f"func_{i}"))
    # Add 10 more functions with no stats
    for j in range(10):
        funcs.append(FunctionToOptimize("foo.py", f"missing_{j}"))
    stats = build_stats(stats_entries)
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions(funcs); ranked = codeflash_output

def test_rank_functions_large_all_zero_scores():
    """Test that all functions with zero ttX are ranked in input order."""
    num_funcs = 200
    stats_entries = []
    funcs = []
    for i in range(num_funcs):
        stats_entries.append(("foo.py", i, f"func_{i}", 1, 0, 0))
        funcs.append(FunctionToOptimize("foo.py", f"func_{i}"))
    stats = build_stats(stats_entries)
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions(funcs); ranked = codeflash_output

def test_rank_functions_large_high_call_counts():
    """Test ttX calculation with large call counts."""
    stats = build_stats([
        ("foo.py", 10, "func_a", 1000, 10, 1010),  # ttx = 10 + (1000*1000) = 1,000,010
        ("foo.py", 20, "func_b", 1, 1000000, 1000000),  # ttx = 1,000,000
    ])
    func_a = FunctionToOptimize("foo.py", "func_a")
    func_b = FunctionToOptimize("foo.py", "func_b")
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions([func_a, func_b]); ranked = codeflash_output # 55.3μs -> 54.0μs (2.49% faster)

def test_rank_functions_large_mixed_qualified_names():
    """Test ranking with a mix of qualified and unqualified names."""
    num_funcs = 50
    stats_entries = []
    funcs = []
    for i in range(num_funcs):
        stats_entries.append(("foo.py", i, f"MyClass.func_{i}", 2, 100, 300))
        funcs.append(FunctionToOptimize("foo.py", f"func_{i}", qualified_name=f"MyClass.func_{i}"))
    stats = build_stats(stats_entries)
    ranker = TestableFunctionRanker(Path("trace.db"), stats)
    codeflash_output = ranker.rank_functions(funcs); ranked = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from __future__ import annotations

from pathlib import Path

# imports
import pytest  # used for our unit tests
from codeflash.benchmarking.function_ranker import FunctionRanker


# Minimal FunctionToOptimize stub
class FunctionToOptimize:
    def __init__(self, file_path, function_name, qualified_name):
        self.file_path = file_path
        self.function_name = function_name
        self.qualified_name = qualified_name

    def __repr__(self):
        return f"FunctionToOptimize({self.file_path!r}, {self.function_name!r}, {self.qualified_name!r})"

def set_stats(stats):
    """
    Helper to set the ProfileStats.stats for the next FunctionRanker instance.
    """
    # Patch the ProfileStats class used by FunctionRanker
    import sys
    thismod = sys.modules[__name__]
    class PatchedProfileStats(ProfileStats):
        def __init__(self, trace_file_path):
            self.stats = stats
    thismod.ProfileStats = PatchedProfileStats

# --- BASIC TEST CASES ---

To edit these changes git checkout codeflash/optimize-pr384-2025-06-30T19.14.09 and push.

Codeflash

…(`trace-and-optimize`)

Here is an optimized rewrite of your `FunctionRanker` class.  
**Key speed optimizations applied:**

1. **Avoid repeated loading of function stats:**  
   The original code reloads function stats for each function during ranking (`get_function_ttx_score()` is called per function and loads/returns). We prefetch stats once in `rank_functions()` and reuse them for all lookups.

2. **Inline and batch lookups:**  
   We use a helper to batch compute scores directly via a pre-fetched `stats` dict. This removes per-call overhead from attribute access and creation of possible keys inside the hot loop.

3. **Minimal string operations:**  
   We precompute the two possible key formats needed for lookup (file:qualified and file:function) for all items only ONCE, instead of per invocation.

4. **Skip list-comprehension in favor of tuple-unpacking:**  
   Use generator expressions for lower overhead when building output.

5. **Fast path with `dict.get()` lookup:**  
   Avoid redundant `if key in dict` by just trying `dict.get(key)`.

6. **Do not change signatures or behavior.  
   Do not rename any classes or functions.  
   All logging, ordering, functionality is preserved.**




**Summary of performance impact:**  
- The stats are loaded only once, not per function.
- String concatenations for keys are only performed twice per function (and not redundantly in both `rank_functions` and `get_function_ttx_score`).
- All lookup and sorting logic remains as in the original so results will match, but runtime (especially for large lists) will be significantly better.  
- If you want, you could further optimize by memoizing scores with LRU cache, but with this design, dictionary operations are already the bottleneck, and this is the lowest-overhead idiomatic Python approach.  
- No imports, function names, or signatures are changed.  

Let me know if you need further GPU-based or numpy/pandas-style speedups!
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 30, 2025
@KRRT7 KRRT7 closed this Jun 30, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr384-2025-06-30T19.14.09 branch June 30, 2025 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant