-
Notifications
You must be signed in to change notification settings - Fork 17
[Chore] get pr number from gh action event json file, fallback to old behavior #354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Chore] get pr number from gh action event json file, fallback to old behavior #354
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
…hore/get-pr-number-from-gh-action-event-file`) Here’s an optimized rewrite of your code. The main bottleneck in this short program is I/O (reading from disk), and possibly calling `os.getenv` and creating a `Path` object. However, there are some small speedups possible. - Use `open()` directly for a string path—using `Path.open()` adds an unnecessary object creation step. - Avoid returning an empty dictionary with a different key in the cache for different environments. Instead, cache only successful loads. - Use `os.environ.get` for slightly faster environment access. - Specify the encoding in `open` for potential future-proofing and speed. Here’s the improved version. **Changes made:** - Replaced `os.getenv` with slightly faster `os.environ.get`. - Used the built-in `open` instead of `Path(event_path).open()` (avoids `Path` object creation). - Explicit UTF-8 encoding for speed and consistency. - Eliminated unused `Path` import. --- Beyond these changes, this function is already about as fast as possible given its necessary I/O and JSON parsing. Real-world bottlenecks for this function are dominated by disk and JSON decode times. If repeated calls with changed environment are required, removing `lru_cache` can improve correctness at a slight cost to speed. If speed is *critical* and the file is excessively large, consider a faster JSON parser (like `orjson`), but this is typically overkill for GitHub event data. Need more aggressive optimization or C extensions? Let me know!
event_path = os.getenv("GITHUB_EVENT_PATH") | ||
if not event_path: | ||
return {} | ||
with Path(event_path).open() as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 32% (0.32x) speedup for get_cached_gh_event_data
in codeflash/code_utils/env_utils.py
⏱️ Runtime : 2.11 milliseconds
→ 1.59 milliseconds
(best of 107
runs)
📝 Explanation and details
Here’s an optimized rewrite of your code. The main bottleneck in this short program is I/O (reading from disk), and possibly calling `os.getenv` and creating a `Path` object. However, there are some small speedups possible.- Use
open()
directly for a string path—usingPath.open()
adds an unnecessary object creation step. - Avoid returning an empty dictionary with a different key in the cache for different environments. Instead, cache only successful loads.
- Use
os.environ.get
for slightly faster environment access. - Specify the encoding in
open
for potential future-proofing and speed.
Here’s the improved version.
Changes made:
- Replaced
os.getenv
with slightly fasteros.environ.get
. - Used the built-in
open
instead ofPath(event_path).open()
(avoidsPath
object creation). - Explicit UTF-8 encoding for speed and consistency.
- Eliminated unused
Path
import.
Beyond these changes, this function is already about as fast as possible given its necessary I/O and JSON parsing. Real-world bottlenecks for this function are dominated by disk and JSON decode times. If repeated calls with changed environment are required, removing lru_cache
can improve correctness at a slight cost to speed. If speed is critical and the file is excessively large, consider a faster JSON parser (like orjson
), but this is typically overkill for GitHub event data.
Need more aggressive optimization or C extensions? Let me know!
✅ Correctness verification report:
Test | Status |
---|---|
⚙️ Existing Unit Tests | 🔘 None Found |
🌀 Generated Regression Tests | ✅ 48 Passed |
⏪ Replay Tests | 🔘 None Found |
🔎 Concolic Coverage Tests | 🔘 None Found |
📊 Tests Coverage | 100.0% |
🌀 Generated Regression Tests and Runtime
from __future__ import annotations
import json
import os
import tempfile
from functools import lru_cache
from pathlib import Path
# imports
import pytest # used for our unit tests
from codeflash.code_utils.env_utils import get_cached_gh_event_data
def write_json_file(path: Path, data: dict):
"""Helper to write JSON data to a file."""
with path.open('w', encoding='utf-8') as f:
json.dump(data, f)
def test_no_env_var(monkeypatch):
"""Test when GITHUB_EVENT_PATH is not set."""
monkeypatch.delenv("GITHUB_EVENT_PATH", raising=False)
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 3.48μs -> 3.31μs (5.14% faster)
def test_env_var_points_to_nonexistent_file(monkeypatch, tmp_path):
"""Test when GITHUB_EVENT_PATH points to a file that does not exist."""
fake_path = tmp_path / "doesnotexist.json"
monkeypatch.setenv("GITHUB_EVENT_PATH", str(fake_path))
with pytest.raises(FileNotFoundError):
get_cached_gh_event_data()
def test_env_var_points_to_invalid_json(monkeypatch, tmp_path):
"""Test when GITHUB_EVENT_PATH points to a file with invalid JSON."""
invalid_json_file = tmp_path / "bad.json"
invalid_json_file.write_text('{"not": "valid",}', encoding="utf-8") # Trailing comma is invalid
monkeypatch.setenv("GITHUB_EVENT_PATH", str(invalid_json_file))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_env_var_points_to_empty_file(monkeypatch, tmp_path):
"""Test when GITHUB_EVENT_PATH points to an empty file."""
empty_file = tmp_path / "empty.json"
empty_file.write_text("", encoding="utf-8")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(empty_file))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_env_var_points_to_valid_json(monkeypatch, tmp_path):
"""Test when GITHUB_EVENT_PATH points to a valid JSON file."""
data = {"action": "opened", "number": 42}
json_file = tmp_path / "event.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 45.5μs -> 32.1μs (41.8% faster)
def test_env_var_points_to_json_with_non_ascii(monkeypatch, tmp_path):
"""Test when JSON contains non-ASCII characters."""
data = {"message": "café", "emoji": "😀"}
json_file = tmp_path / "unicode.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 46.3μs -> 32.4μs (42.6% faster)
def test_env_var_points_to_json_with_nested_data(monkeypatch, tmp_path):
"""Test when JSON contains nested structures."""
data = {"outer": {"inner": {"value": [1, 2, 3]}}}
json_file = tmp_path / "nested.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 46.6μs -> 32.6μs (43.0% faster)
def test_env_var_points_to_json_with_empty_dict(monkeypatch, tmp_path):
"""Test when JSON file contains an empty dict."""
data = {}
json_file = tmp_path / "emptydict.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 44.3μs -> 30.3μs (46.2% faster)
def test_env_var_points_to_json_with_empty_list(monkeypatch, tmp_path):
"""Test when JSON file contains an empty list (should return a list, not dict)."""
data = []
json_file = tmp_path / "emptylist.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 43.7μs -> 30.3μs (44.2% faster)
def test_env_var_points_to_json_with_non_dict(monkeypatch, tmp_path):
"""Test when JSON file contains a non-dict, non-list value (e.g., int, str, bool)."""
for val in [123, "hello", True, None]:
json_file = tmp_path / f"val_{str(val)}.json"
write_json_file(json_file, val)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 360ns -> 340ns (5.88% faster)
def test_lru_cache_behavior(monkeypatch, tmp_path):
"""Test that lru_cache prevents re-reading the file after first call."""
data1 = {"foo": 1}
data2 = {"bar": 2}
json_file = tmp_path / "event.json"
write_json_file(json_file, data1)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
# First call caches data1
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 521ns -> 471ns (10.6% faster)
# Overwrite file with data2
write_json_file(json_file, data2)
# Second call should still return data1 due to cache
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 521ns -> 471ns (10.6% faster)
def test_cache_cleared_reads_new_data(monkeypatch, tmp_path):
"""Test that clearing the cache causes the function to re-read the file."""
data1 = {"foo": 1}
data2 = {"bar": 2}
json_file = tmp_path / "event.json"
write_json_file(json_file, data1)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 44.4μs -> 30.2μs (47.3% faster)
write_json_file(json_file, data2)
get_cached_gh_event_data.cache_clear()
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 44.4μs -> 30.2μs (47.3% faster)
def test_large_json(monkeypatch, tmp_path):
"""Test with a large JSON object (scalability/performance)."""
data = {f"key_{i}": i for i in range(1000)}
json_file = tmp_path / "large.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 195μs -> 178μs (9.89% faster)
def test_large_nested_json(monkeypatch, tmp_path):
"""Test with a large, deeply nested JSON structure."""
data = current = {}
for i in range(100):
current[f"level_{i}"] = {}
current = current[f"level_{i}"]
json_file = tmp_path / "deep.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 65.5μs -> 47.6μs (37.4% faster)
# Walk down the nested structure to check depth
current = result
for i in range(100):
current = current[f"level_{i}"]
def test_large_list_json(monkeypatch, tmp_path):
"""Test with a large list as the root JSON object."""
data = [i for i in range(1000)]
json_file = tmp_path / "biglist.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 96.5μs -> 82.9μs (16.5% faster)
def test_env_var_points_to_file_with_whitespace(monkeypatch, tmp_path):
"""Test when JSON file contains only whitespace."""
whitespace_file = tmp_path / "whitespace.json"
whitespace_file.write_text(" \n\t ", encoding="utf-8")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(whitespace_file))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_env_var_points_to_file_with_comments(monkeypatch, tmp_path):
"""Test when JSON file contains comments (which are invalid in JSON)."""
comment_file = tmp_path / "comment.json"
comment_file.write_text('{"foo": 1} // comment', encoding="utf-8")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(comment_file))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations
import json
import os
import shutil
import tempfile
from functools import lru_cache
from pathlib import Path
# imports
import pytest # used for our unit tests
from codeflash.code_utils.env_utils import get_cached_gh_event_data
# --- Basic Test Cases ---
def test_no_env_var_returns_empty_dict(monkeypatch):
# GITHUB_EVENT_PATH is not set
monkeypatch.delenv("GITHUB_EVENT_PATH", raising=False)
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 3.43μs -> 3.41μs (0.558% faster)
def test_env_var_empty_returns_empty_dict(monkeypatch):
# GITHUB_EVENT_PATH is set to empty string
monkeypatch.setenv("GITHUB_EVENT_PATH", "")
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 1.66μs -> 1.59μs (4.39% faster)
def test_valid_json_file(monkeypatch, tmp_path):
# Create a valid JSON file
data = {"action": "opened", "number": 42}
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 45.6μs -> 32.4μs (40.8% faster)
def test_valid_json_file_non_ascii(monkeypatch, tmp_path):
# JSON with non-ASCII characters
data = {"message": "こんにちは", "user": "测试"}
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps(data), encoding="utf-8")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 46.2μs -> 32.7μs (41.3% faster)
def test_valid_json_file_empty_dict(monkeypatch, tmp_path):
# JSON file with empty dict
data = {}
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 43.8μs -> 30.8μs (42.5% faster)
# --- Edge Test Cases ---
def test_env_var_points_to_nonexistent_file(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file that does not exist
file_path = tmp_path / "does_not_exist.json"
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
with pytest.raises(FileNotFoundError):
get_cached_gh_event_data()
def test_env_var_points_to_directory(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a directory, not a file
monkeypatch.setenv("GITHUB_EVENT_PATH", str(tmp_path))
with pytest.raises(IsADirectoryError):
get_cached_gh_event_data()
def test_env_var_points_to_invalid_json(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with invalid JSON
file_path = tmp_path / "bad.json"
file_path.write_text("{not: valid json}")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_env_var_points_to_json_array(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with a JSON array, not a dict
file_path = tmp_path / "array.json"
file_path.write_text(json.dumps([1, 2, 3]))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 45.4μs -> 31.6μs (43.6% faster)
def test_env_var_points_to_json_null(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with JSON null
file_path = tmp_path / "null.json"
file_path.write_text("null")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 43.6μs -> 30.4μs (43.4% faster)
def test_env_var_points_to_json_number(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with a JSON number
file_path = tmp_path / "num.json"
file_path.write_text("123")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 44.0μs -> 31.1μs (41.7% faster)
def test_env_var_points_to_json_string(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with a JSON string
file_path = tmp_path / "str.json"
file_path.write_text('"hello"')
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 43.9μs -> 30.8μs (42.9% faster)
def test_env_var_points_to_empty_file(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to an empty file
file_path = tmp_path / "empty.json"
file_path.write_text("")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_file_permission_denied(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with no read permissions
file_path = tmp_path / "event.json"
file_path.write_text('{"foo": "bar"}')
file_path.chmod(0o000) # Remove all permissions
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
try:
with pytest.raises(PermissionError):
get_cached_gh_event_data()
finally:
# Restore permissions so tmp_path can be cleaned up
file_path.chmod(0o644)
def test_cache_behavior(monkeypatch, tmp_path):
# Ensure lru_cache is working: changing file content after first call has no effect
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps({"a": 1}))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 511ns -> 491ns (4.07% faster)
# Change file content
file_path.write_text(json.dumps({"a": 2}))
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 511ns -> 491ns (4.07% faster)
def test_cache_cleared(monkeypatch, tmp_path):
# After cache_clear, new file content is read
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps({"a": 1}))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 43.4μs -> 30.0μs (44.5% faster)
# Change file content
file_path.write_text(json.dumps({"a": 2}))
get_cached_gh_event_data.cache_clear()
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 43.4μs -> 30.0μs (44.5% faster)
# --- Large Scale Test Cases ---
def test_large_json_file(monkeypatch, tmp_path):
# Test with a large JSON object (under 1000 keys)
data = {f"key_{i}": i for i in range(1000)}
file_path = tmp_path / "large.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 193μs -> 174μs (10.4% faster)
def test_large_json_array(monkeypatch, tmp_path):
# Test with a large JSON array (under 1000 elements)
data = [i for i in range(1000)]
file_path = tmp_path / "large_array.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 96.0μs -> 82.7μs (16.1% faster)
def test_deeply_nested_json(monkeypatch, tmp_path):
# Test with deeply nested JSON (depth ~100)
data = curr = {}
for i in range(100):
curr["nested"] = {}
curr = curr["nested"]
file_path = tmp_path / "deep.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 56.8μs -> 43.4μs (31.1% faster)
# Walk down the nesting to verify structure
curr = result
for _ in range(100):
curr = curr["nested"]
def test_multiple_calls_same_result(monkeypatch, tmp_path):
# Multiple calls return the same object (due to lru_cache)
data = {"foo": "bar"}
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 261ns -> 250ns (4.40% faster)
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 261ns -> 250ns (4.40% faster)
def test_multiple_env_paths(monkeypatch, tmp_path):
# Changing GITHUB_EVENT_PATH does not change result due to lru_cache
file1 = tmp_path / "event1.json"
file2 = tmp_path / "event2.json"
file1.write_text(json.dumps({"a": 1}))
file2.write_text(json.dumps({"a": 2}))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file1))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 261ns -> 261ns (0.000% faster)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file2))
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 261ns -> 261ns (0.000% faster)
# After cache_clear, new env var is respected
get_cached_gh_event_data.cache_clear()
codeflash_output = get_cached_gh_event_data(); result3 = codeflash_output # 261ns -> 261ns (0.000% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To test or edit this optimization locally git merge codeflash/optimize-pr354-2025-06-20T15.43.29
event_path = os.getenv("GITHUB_EVENT_PATH") | |
if not event_path: | |
return {} | |
with Path(event_path).open() as f: | |
event_path = os.environ.get("GITHUB_EVENT_PATH") | |
if not event_path: | |
return {} | |
with open(event_path, encoding="utf-8") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! this makes it easier to use codeflash :)
…et-pr-number-from-gh-action-event-file
User description
this will load the pr number from event.json file, typically at
/home/runner/work/_temp/_github_workflow/event.json
, instead of using$CODEFLASH_PR_NUMBER
how I tested this:
PR Type
Enhancement, Documentation
Description
Add GH event JSON fallback for PR number
Update PR number retrieval logic in env_utils.py
Improve error message when PR number missing
Remove manual PR number setting in workflows/docs
Changes walkthrough 📝
env_utils.py
Add GH event JSON fallback in env utils
codeflash/code_utils/env_utils.py
codeflash-optimize.yaml
Remove manual PR number env var
.github/workflows/codeflash-optimize.yaml
codeflash-optimize.yaml
Remove manual PR number env var in CLI workflow
codeflash/cli_cmds/workflows/codeflash-optimize.yaml
codeflash-github-actions.md
Remove manual PR number from docs
docs/docs/getting-started/codeflash-github-actions.md