Skip to content

Conversation

@odjuricicTT
Copy link
Collaborator

Summary

Implements issue #759 to read the task field from the model's ModelInfo object instead of hardcoding task strings in tt-xla benchmarks.

Changes

  • benchmark/tt-xla/llm_benchmark.py: Read model_info.task instead of hardcoding "text-generation"
  • benchmark/tt-xla/vision_benchmark.py: Read model_info.task instead of hardcoding "Vision, Random Input Data"
    • Modified setup_model() to return full ModelInfo object instead of just name

Benefits

  • Standardizes task categorization using the forge Task enum
  • Enables consistent querying in Superset by task type
  • Maintains single source of truth in model definitions

Testing

Performance benchmark workflow will be triggered to verify the task field is correctly populated in the benchmark result JSON.

Fixes #759


🤖 Generated with Claude Code

- Update llm_benchmark.py to read task from model_info.task
- Update vision_benchmark.py to read task from model_info.task
- Modify setup_model() to return full ModelInfo object instead of just name

This standardizes task categorization using the forge Task enum instead of
hardcoded strings, enabling better querying in Superset.

Co-authored-by: Ognjen Djuricic <[email protected]>
@odjuricicTT
Copy link
Collaborator Author

@claude Can you review this pr?

@claude
Copy link

claude bot commented Dec 25, 2025

Claude finished @odjuricicTT's task in 1m 39s —— View job


Code Review

Todo List

  • Read modified files to understand the changes
  • Review the git diff to see exact modifications
  • Analyze code correctness and quality
  • Provide comprehensive review feedback

Review Summary

I've reviewed the changes in this PR and they look good overall. The implementation correctly reads the task field from ModelInfo instead of hardcoding strings. Here are my findings:

✅ What's Working Well

  1. Consistent pattern across both files: Both llm_benchmark.py and vision_benchmark.py now use str(model_info.task) consistently
  2. Code cleanup in vision_benchmark.py: The setup_model() refactoring is an improvement:
    • Eliminates redundant calls to get_model_info()
    • Returns the full ModelInfo object instead of just the name
    • Removes duplication between the variant/no-variant branches
  3. Proper string conversion: Using str(model_info.task) handles the enum-to-string conversion correctly

🔍 Observations & Potential Issues

1. Minor inconsistency in llm_benchmark.py (llm_benchmark.py:293-296)

The benchmark_llm_torch_xla function still validates that task == "text-generation" using the hardcoded string:

if task != "text-generation":
    raise ValueError(
        f"Only 'text-generation' task is supported for llm benchmark. Got: {task}. "
        "Please use -t text-generation"
    )

This validation happens before the task is actually read from model_info.task (line 417). While this doesn't break anything since the task parameter still gets passed in from the test functions, it means:

  • The parameter validation uses a hardcoded string
  • The actual result uses model_info.task

Suggestion: Consider whether this validation is still needed, or if it should validate against the forge Task enum values instead of hardcoded strings.

2. Type hint removed in vision_benchmark.py (vision_benchmark.py:35)

The return type hint was changed from:

def setup_model(...) -> tuple[torch.nn.Module, str]:

to:

def setup_model(...):

This is correct since the function now returns (model, ModelInfo) instead of (model, str), but the type hint could be added back:

def setup_model(...) -> tuple[torch.nn.Module, ModelInfo]:

This would improve type safety and documentation.

3. Verify ModelInfo.task enum values

The PR assumes model_info.task returns the appropriate enum values. Make sure:

  • All LLM models have task = Task.TEXT_GENERATION (or similar)
  • All vision models have appropriate task enum values
  • The str() conversion produces the expected format for Superset queries

📝 Recommendations

  1. Consider adding type hints: Add the return type hint back to setup_model() in vision_benchmark.py
  2. Review task validation: The hardcoded task validation in llm_benchmark.py could be updated to be more consistent with using the enum
  3. Documentation: Consider adding a comment explaining that task comes from ModelInfo to help future developers

✨ Overall Assessment

The changes are solid and achieve the stated goal of reading task from ModelInfo. The refactoring in vision_benchmark.py is particularly clean. With the minor observations noted above, this PR successfully:

  • ✅ Standardizes task categorization using the forge Task enum
  • ✅ Maintains single source of truth in model definitions
  • ✅ Enables consistent querying in Superset by task type

The code changes are correct and the implementation is clean. Nice work! 🎯


Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors tt-xla benchmark scripts to read task information from the ModelInfo object instead of hardcoding task strings, aiming to standardize task categorization across benchmarks.

Key Changes

  • Modified setup_model() in vision_benchmark.py to return the full ModelInfo object instead of just the model name string
  • Updated both llm_benchmark.py and vision_benchmark.py to extract task values from model_info.task using string conversion
  • Changed how model name is accessed from model_info (now using .name attribute)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
benchmark/tt-xla/vision_benchmark.py Modified setup_model() to return ModelInfo object; updated task and name extraction to use ModelInfo attributes
benchmark/tt-xla/llm_benchmark.py Added model_info retrieval and updated task extraction to use model_info.task attribute

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 47 to 54
if model_variant:
print(f"Loading model {model_loader.get_model_info(variant=model_variant).name}...")
model_info = model_loader.get_model_info(variant=model_variant)
print(f"Loading model {model_info.name}...")
model = model_loader.load_model()
model_info = model_loader.get_model_info(model_variant).name
else:
print(f"Loading model {model_loader.get_model_info().name}...")
model_info = model_loader.get_model_info()
print(f"Loading model {model_info.name}...")
model = model_loader.load_model()
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is duplicated code between the if and else branches. The only difference is whether variant=model_variant is passed to get_model_info(). This duplication can be eliminated by conditionally building the arguments or restructuring the code to reduce redundancy and improve maintainability.

Copilot uses AI. Check for mistakes.
full_model_name = f"{model_info}"
model_type = "Vision, Random Input Data"
full_model_name = model_info.name
model_type = str(model_info.task)
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting the task enum to string using str() may not produce the desired output format. Depending on how the Task enum is defined, str() might return something like "Task.TEXT_GENERATION" instead of just "text-generation". Consider using .value or .name attribute of the enum if available, or ensure the enum's __str__ method is properly defined to return the expected format. This affects consistency with how task values are stored in benchmark results.

Suggested change
model_type = str(model_info.task)
model_type = getattr(model_info.task, "value", str(model_info.task))

Copilot uses AI. Check for mistakes.
model_type = "text-generation"
model_info = model_loader.get_model_info(variant=model_variant)
full_model_name = model_info.name
model_type = str(model_info.task)
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting the task enum to string using str() may not produce the desired output format. Depending on how the Task enum is defined, str() might return something like "Task.TEXT_GENERATION" instead of just "text-generation". Consider using .value or .name attribute of the enum if available, or ensure the enum's __str__ method is properly defined to return the expected format. This affects consistency with how task values are stored in benchmark results.

Suggested change
model_type = str(model_info.task)
model_type = getattr(model_info.task, "value", str(model_info.task))

Copilot uses AI. Check for mistakes.


def setup_model(model_loader, model_variant=None, data_format="bfloat16") -> tuple[torch.nn.Module, str]:
def setup_model(model_loader, model_variant=None, data_format="bfloat16"):
Copy link

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature no longer includes a type hint for the return value. The docstring states the function returns "Tuple of (model, model_info)" but the signature should include the return type annotation for better type checking and code documentation. Consider adding -> tuple[torch.nn.Module, ModelInfo] or the appropriate type for model_info.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ModelTask from forge models as superset field

2 participants