Skip to content

⚡️ Speed up method FunctionRanker._get_function_stats by 51% in PR #384 (trace-and-optimize) #466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

codeflash-ai[bot]
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Jul 1, 2025

⚡️ This pull request contains optimizations for PR #384

If you approve this dependent PR, these changes will be merged into the original PR branch trace-and-optimize.

This PR will be automatically closed if the original PR is merged.


📄 51% (0.51x) speedup for FunctionRanker._get_function_stats in codeflash/benchmarking/function_ranker.py

⏱️ Runtime : 497 microseconds 330 microseconds (best of 51 runs)

📝 Explanation and details

Here is an optimized version of your code, focusing on the _get_function_stats function—the proven performance bottleneck per your line profiing.

Optimizations Applied

  1. Avoid Building Unneeded Lists:

    • Creating possible_keys as a list incurs per-call overhead.
    • Instead, directly check both keys in sequence, avoiding the list entirely.
  2. Short-circuit Early Return:

    • Check for the first key (qualified_name) and return immediately if found (no need to compute or check the second unless necessary).
  3. String Formatting Optimization:

    • Use f-strings directly in the condition rather than storing/interpolating beforehand.
  4. Comment Retention:

    • All existing and relevant comments are preserved, though your original snippet has no in-method comments.


Rationale

  • No lists or unneeded temporary objects are constructed.
  • Uses .get, which is faster than in + lookup.
  • Returns immediately upon match.

This change will reduce total runtime and memory usage significantly in codebases with many calls to _get_function_stats.
Function signatures and return values are unchanged.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1027 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from pathlib import Path

# imports
import pytest
from codeflash.benchmarking.function_ranker import FunctionRanker


# Minimal stubs for dependencies
class FunctionToOptimize:
    def __init__(self, file_path: str, function_name: str, qualified_name: str):
        self.file_path = file_path
        self.function_name = function_name
        self.qualified_name = qualified_name

class ProfileStats:
    """
    Minimal stub for ProfileStats.
    Accepts a dictionary to simulate the .stats attribute.
    """
    def __init__(self, stats_dict_or_path):
        # If a dict is passed, use it; otherwise, assume it's a file path (simulate empty)
        if isinstance(stats_dict_or_path, dict):
            self.stats = stats_dict_or_path
        else:
            self.stats = {}

# Dummy logger for compatibility
class DummyLogger:
    def debug(self, msg): pass
    def warning(self, msg): pass
logger = DummyLogger()
from codeflash.benchmarking.function_ranker import FunctionRanker

# ---------------------- UNIT TESTS ----------------------

# Helper to create a FunctionRanker with custom stats
def make_ranker_with_stats(stats):
    # stats: dict of (filename, lineno, funcname): (call_count, num_callers, total_time_ns, cumulative_time_ns, callers)
    profile_stats = ProfileStats(stats)
    return FunctionRanker(trace_file_path=Path("fake/path/file.py"), profile_stats=profile_stats)

# 1. BASIC TEST CASES





















from __future__ import annotations

from pathlib import Path

# imports
import pytest
from codeflash.benchmarking.function_ranker import FunctionRanker


# Minimal stub for FunctionToOptimize
class FunctionToOptimize:
    def __init__(self, file_path: str, function_name: str, qualified_name: str):
        self.file_path = file_path
        self.function_name = function_name
        self.qualified_name = qualified_name

# ========== UNIT TESTS ==========

# Helper function to create a FunctionRanker with custom stats
def make_ranker_with_stats(stats):
    # Patch ProfileStats to inject stats
    class DummyProfileStats:
        def __init__(self, path):
            self.stats = stats
    # Patch in our dummy ProfileStats
    orig = FunctionRanker.__init__
    def patched_init(self, trace_file_path):
        self.trace_file_path = trace_file_path
        self._profile_stats = DummyProfileStats(trace_file_path.as_posix())
        self._function_stats = {}
        self.load_function_stats()
    FunctionRanker.__init__ = patched_init
    ranker = FunctionRanker(Path("dummy/path"))
    FunctionRanker.__init__ = orig  # restore
    return ranker

# ---------- BASIC TEST CASES ----------

def test_basic_function_match_by_qualified_name():
    # Test: function is present, match by qualified_name
    stats = {
        ("foo.py", 10, "myfunc"): (5, 1, 100, 150, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", "myfunc", "myfunc")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.57μs -> 1.16μs (35.4% faster)

def test_basic_function_match_by_function_name():
    # Test: function is present, only function_name matches (qualified_name does not)
    stats = {
        ("foo.py", 10, "myfunc"): (5, 1, 100, 150, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", "myfunc", "not_myfunc")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.64μs -> 1.41μs (16.3% faster)

def test_basic_function_not_found():
    # Test: function is not present
    stats = {
        ("foo.py", 10, "myfunc"): (5, 1, 100, 150, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", "otherfunc", "otherfunc")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.39μs -> 1.22μs (14.0% faster)


def test_edge_zero_call_count_function_ignored():
    # Test: function with call_count 0 is ignored
    stats = {
        ("foo.py", 10, "myfunc"): (0, 1, 100, 150, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", "myfunc", "myfunc")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.32μs -> 1.24μs (6.52% faster)

def test_edge_negative_call_count_function_ignored():
    # Test: function with negative call_count is ignored
    stats = {
        ("foo.py", 10, "myfunc"): (-1, 1, 100, 150, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", "myfunc", "myfunc")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.22μs -> 1.20μs (1.66% faster)

def test_edge_cumulative_less_than_total_time():
    # Test: cumulative_time_ns < total_time_ns (should allow negative time_in_callees)
    stats = {
        ("foo.py", 10, "myfunc"): (3, 1, 200, 150, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", "myfunc", "myfunc")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.33μs -> 1.03μs (29.2% faster)

def test_edge_class_method_name_parsing():
    # Test: function name with class prefix is parsed correctly
    stats = {
        ("foo.py", 15, "MyClass.my_method"): (4, 1, 80, 120, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", "my_method", "MyClass.my_method")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.37μs -> 1.05μs (30.5% faster)



def test_edge_function_with_empty_function_name():
    # Test: function with empty function name
    stats = {
        ("foo.py", 40, ""): (1, 1, 10, 10, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", "", "")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.23μs -> 941ns (30.9% faster)

def test_edge_function_with_long_names():
    # Test: function with very long name
    long_name = "a" * 200
    stats = {
        ("foo.py", 50, long_name): (1, 1, 20, 30, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", long_name, long_name)
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.39μs -> 1.15μs (20.9% faster)

def test_edge_function_with_special_characters():
    # Test: function name with special characters
    special_name = "func$#@!"
    stats = {
        ("foo.py", 60, special_name): (1, 1, 5, 5, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", special_name, special_name)
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.19μs -> 932ns (27.9% faster)

def test_edge_multiple_functions_same_name_different_files():
    # Test: same function name in different files
    stats = {
        ("foo.py", 10, "myfunc"): (1, 1, 10, 20, {}),
        ("bar.py", 10, "myfunc"): (2, 1, 20, 30, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto_foo = FunctionToOptimize("foo.py", "myfunc", "myfunc")
    fto_bar = FunctionToOptimize("bar.py", "myfunc", "myfunc")
    codeflash_output = ranker._get_function_stats(fto_foo); result_foo = codeflash_output # 1.21μs -> 992ns (22.2% faster)
    codeflash_output = ranker._get_function_stats(fto_bar); result_bar = codeflash_output # 632ns -> 461ns (37.1% faster)

def test_edge_function_with_non_ascii_name():
    # Test: function name with non-ASCII unicode characters
    unicode_name = "функция"
    stats = {
        ("foo.py", 70, unicode_name): (1, 1, 15, 25, {}),
    }
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("foo.py", unicode_name, unicode_name)
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output # 1.48μs -> 1.18μs (25.5% faster)

def test_edge_function_with_none_stats():
    # Test: ProfileStats.stats is None (should not crash)
    class DummyProfileStats:
        def __init__(self, path):
            self.stats = None
    orig = FunctionRanker.__init__
    def patched_init(self, trace_file_path):
        self.trace_file_path = trace_file_path
        self._profile_stats = DummyProfileStats(trace_file_path.as_posix())
        self._function_stats = {}
        try:
            self.load_function_stats()
        except Exception:
            pass
    FunctionRanker.__init__ = patched_init
    ranker = FunctionRanker(Path("dummy/path"))
    FunctionRanker.__init__ = orig

# ---------- LARGE SCALE TEST CASES ----------

def test_large_scale_many_functions():
    # Test: 1000 functions, ensure correct one is found
    stats = {}
    for i in range(1000):
        stats[(f"file_{i}.py", i, f"func_{i}")] = (i+1, 1, 10+i, 20+i, {})
    ranker = make_ranker_with_stats(stats)
    # Pick a few random indexes to check
    for idx in [0, 100, 500, 999]:
        fto = FunctionToOptimize(f"file_{idx}.py", f"func_{idx}", f"func_{idx}")
        codeflash_output = ranker._get_function_stats(fto); result = codeflash_output

def test_large_scale_lookup_not_found():
    # Test: 1000 functions, lookup for a function not present
    stats = {}
    for i in range(1000):
        stats[(f"file_{i}.py", i, f"func_{i}")] = (i+1, 1, 10+i, 20+i, {})
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("notafile.py", "notafunc", "notafunc")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output

def test_large_scale_all_zero_call_count():
    # Test: 1000 functions, all call_count=0, none should be found
    stats = {}
    for i in range(1000):
        stats[(f"file_{i}.py", i, f"func_{i}")] = (0, 1, 10+i, 20+i, {})
    ranker = make_ranker_with_stats(stats)
    fto = FunctionToOptimize("file_500.py", "func_500", "func_500")
    codeflash_output = ranker._get_function_stats(fto); result = codeflash_output

def test_large_scale_class_methods():
    # Test: 1000 class methods, ensure correct parsing and lookup
    stats = {}
    for i in range(1000):
        stats[(f"file_{i}.py", i, f"Class{i}.method_{i}")] = (i+1, 1, 10+i, 20+i, {})
    ranker = make_ranker_with_stats(stats)
    for idx in [0, 123, 456, 999]:
        fto = FunctionToOptimize(f"file_{idx}.py", f"method_{idx}", f"Class{idx}.method_{idx}")
        codeflash_output = ranker._get_function_stats(fto); result = codeflash_output

def test_large_scale_performance():
    # Test: 1000 functions, ensure lookup is fast (functional test, not timing)
    stats = {}
    for i in range(1000):
        stats[(f"file_{i}.py", i, f"func_{i}")] = (i+1, 1, 10+i, 20+i, {})
    ranker = make_ranker_with_stats(stats)
    # Lookup for all 1000
    for i in range(1000):
        fto = FunctionToOptimize(f"file_{i}.py", f"func_{i}", f"func_{i}")
        codeflash_output = ranker._get_function_stats(fto); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr384-2025-07-01T22.08.43 and push.

Codeflash

…384 (`trace-and-optimize`)

Here is an **optimized** version of your code, focusing on the `_get_function_stats` function—the proven performance bottleneck per your line profiing. 

### Optimizations Applied

1. **Avoid Building Unneeded Lists**:  
   - Creating `possible_keys` as a list incurs per-call overhead.  
   - Instead, directly check both keys in sequence, avoiding the list entirely.

2. **Short-circuit Early Return**:  
   - Check for the first key (`qualified_name`) and return immediately if found (no need to compute or check the second unless necessary).

3. **String Formatting Optimization**:  
   - Use f-strings directly in the condition rather than storing/interpolating beforehand.

4. **Comment Retention**:  
   - All existing and relevant comments are preserved, though your original snippet has no in-method comments.

---



---

### Rationale

- **No lists** or unneeded temporary objects are constructed.
- Uses `.get`, which is faster than `in` + lookup.
- Returns immediately upon match.

---

**This change will reduce total runtime and memory usage significantly in codebases with many calls to `_get_function_stats`.**  
Function signatures and return values are unchanged.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 1, 2025
@misrasaurabh1 misrasaurabh1 merged commit 67bd717 into trace-and-optimize Jul 1, 2025
11 of 17 checks passed
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr384-2025-07-01T22.08.43 branch July 1, 2025 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant