Read task from ModelInfo in tt-xla benchmarks #768

odjuricicTT · 2025-12-25T10:47:05Z

Summary

Implements issue #759 to read the task field from the model's ModelInfo object instead of hardcoding task strings in tt-xla benchmarks.

Changes

benchmark/tt-xla/llm_benchmark.py: Read model_info.task instead of hardcoding "text-generation"
benchmark/tt-xla/vision_benchmark.py: Read model_info.task instead of hardcoding "Vision, Random Input Data"
- Modified setup_model() to return full ModelInfo object instead of just name

Benefits

Standardizes task categorization using the forge Task enum
Enables consistent querying in Superset by task type
Maintains single source of truth in model definitions

Testing

Performance benchmark workflow will be triggered to verify the task field is correctly populated in the benchmark result JSON.

Fixes #759

🤖 Generated with Claude Code

- Update llm_benchmark.py to read task from model_info.task - Update vision_benchmark.py to read task from model_info.task - Modify setup_model() to return full ModelInfo object instead of just name This standardizes task categorization using the forge Task enum instead of hardcoded strings, enabling better querying in Superset. Co-authored-by: Ognjen Djuricic <[email protected]>

odjuricicTT · 2025-12-25T10:47:25Z

@claude Can you review this pr?

claude · 2025-12-25T10:47:38Z

Claude finished @odjuricicTT's task in 1m 39s —— View job

Code Review

Todo List

Read modified files to understand the changes
Review the git diff to see exact modifications
Analyze code correctness and quality
Provide comprehensive review feedback

Review Summary

I've reviewed the changes in this PR and they look good overall. The implementation correctly reads the task field from ModelInfo instead of hardcoding strings. Here are my findings:

✅ What's Working Well

Consistent pattern across both files: Both llm_benchmark.py and vision_benchmark.py now use str(model_info.task) consistently
Code cleanup in vision_benchmark.py: The setup_model() refactoring is an improvement:
- Eliminates redundant calls to get_model_info()
- Returns the full ModelInfo object instead of just the name
- Removes duplication between the variant/no-variant branches
Proper string conversion: Using str(model_info.task) handles the enum-to-string conversion correctly

🔍 Observations & Potential Issues

1. Minor inconsistency in llm_benchmark.py (llm_benchmark.py:293-296)

The benchmark_llm_torch_xla function still validates that task == "text-generation" using the hardcoded string:

if task != "text-generation":
    raise ValueError(
        f"Only 'text-generation' task is supported for llm benchmark. Got: {task}. "
        "Please use -t text-generation"
    )

This validation happens before the task is actually read from model_info.task (line 417). While this doesn't break anything since the task parameter still gets passed in from the test functions, it means:

The parameter validation uses a hardcoded string
The actual result uses model_info.task

Suggestion: Consider whether this validation is still needed, or if it should validate against the forge Task enum values instead of hardcoded strings.

2. Type hint removed in vision_benchmark.py (vision_benchmark.py:35)

The return type hint was changed from:

def setup_model(...) -> tuple[torch.nn.Module, str]:

to:

def setup_model(...):

This is correct since the function now returns (model, ModelInfo) instead of (model, str), but the type hint could be added back:

def setup_model(...) -> tuple[torch.nn.Module, ModelInfo]:

This would improve type safety and documentation.

3. Verify ModelInfo.task enum values

The PR assumes model_info.task returns the appropriate enum values. Make sure:

All LLM models have task = Task.TEXT_GENERATION (or similar)
All vision models have appropriate task enum values
The str() conversion produces the expected format for Superset queries

📝 Recommendations

Consider adding type hints: Add the return type hint back to setup_model() in vision_benchmark.py
Review task validation: The hardcoded task validation in llm_benchmark.py could be updated to be more consistent with using the enum
Documentation: Consider adding a comment explaining that task comes from ModelInfo to help future developers

✨ Overall Assessment

The changes are solid and achieve the stated goal of reading task from ModelInfo. The refactoring in vision_benchmark.py is particularly clean. With the minor observations noted above, this PR successfully:

✅ Standardizes task categorization using the forge Task enum
✅ Maintains single source of truth in model definitions
✅ Enables consistent querying in Superset by task type

The code changes are correct and the implementation is clean. Nice work! 🎯

Copilot

Pull request overview

This PR refactors tt-xla benchmark scripts to read task information from the ModelInfo object instead of hardcoding task strings, aiming to standardize task categorization across benchmarks.

Key Changes

Modified setup_model() in vision_benchmark.py to return the full ModelInfo object instead of just the model name string
Updated both llm_benchmark.py and vision_benchmark.py to extract task values from model_info.task using string conversion
Changed how model name is accessed from model_info (now using .name attribute)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
benchmark/tt-xla/vision_benchmark.py	Modified `setup_model()` to return ModelInfo object; updated task and name extraction to use ModelInfo attributes
benchmark/tt-xla/llm_benchmark.py	Added model_info retrieval and updated task extraction to use `model_info.task` attribute

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-25T10:50:54Z

benchmark/tt-xla/vision_benchmark.py

    if model_variant:
-        print(f"Loading model {model_loader.get_model_info(variant=model_variant).name}...")
+        model_info = model_loader.get_model_info(variant=model_variant)
+        print(f"Loading model {model_info.name}...")
        model = model_loader.load_model()
-        model_info = model_loader.get_model_info(model_variant).name
    else:
-        print(f"Loading model {model_loader.get_model_info().name}...")
+        model_info = model_loader.get_model_info()
+        print(f"Loading model {model_info.name}...")
        model = model_loader.load_model()


There is duplicated code between the if and else branches. The only difference is whether variant=model_variant is passed to get_model_info(). This duplication can be eliminated by conditionally building the arguments or restructuring the code to reduce redundancy and improve maintainability.

Copilot · 2025-12-25T10:50:55Z

benchmark/tt-xla/vision_benchmark.py

-    full_model_name = f"{model_info}"
-    model_type = "Vision, Random Input Data"
+    full_model_name = model_info.name
+    model_type = str(model_info.task)


Converting the task enum to string using str() may not produce the desired output format. Depending on how the Task enum is defined, str() might return something like "Task.TEXT_GENERATION" instead of just "text-generation". Consider using .value or .name attribute of the enum if available, or ensure the enum's __str__ method is properly defined to return the expected format. This affects consistency with how task values are stored in benchmark results.

Suggested change

model_type = str(model_info.task)

model_type = getattr(model_info.task, "value", str(model_info.task))

Copilot · 2025-12-25T10:50:55Z

benchmark/tt-xla/llm_benchmark.py

-    model_type = "text-generation"
+    model_info = model_loader.get_model_info(variant=model_variant)
+    full_model_name = model_info.name
+    model_type = str(model_info.task)


Converting the task enum to string using str() may not produce the desired output format. Depending on how the Task enum is defined, str() might return something like "Task.TEXT_GENERATION" instead of just "text-generation". Consider using .value or .name attribute of the enum if available, or ensure the enum's __str__ method is properly defined to return the expected format. This affects consistency with how task values are stored in benchmark results.

Suggested change

model_type = str(model_info.task)

model_type = getattr(model_info.task, "value", str(model_info.task))

Copilot · 2025-12-25T10:50:55Z

benchmark/tt-xla/vision_benchmark.py



-def setup_model(model_loader, model_variant=None, data_format="bfloat16") -> tuple[torch.nn.Module, str]:
+def setup_model(model_loader, model_variant=None, data_format="bfloat16"):


The function signature no longer includes a type hint for the return value. The docstring states the function returns "Tuple of (model, model_info)" but the signature should include the return type annotation for better type checking and code documentation. Consider adding -> tuple[torch.nn.Module, ModelInfo] or the appropriate type for model_info.

odjuricicTT requested review from tt-mpantic and vkovacevicTT as code owners December 25, 2025 10:47

odjuricicTT requested a review from Copilot December 25, 2025 10:47

Copilot started reviewing on behalf of odjuricicTT December 25, 2025 10:47 View session

Copilot AI reviewed Dec 25, 2025

View reviewed changes

odjuricicTT mentioned this pull request Dec 25, 2025

Refactor encoder benchmark #748

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read task from ModelInfo in tt-xla benchmarks #768

Read task from ModelInfo in tt-xla benchmarks #768

Uh oh!

odjuricicTT commented Dec 25, 2025

Uh oh!

odjuricicTT commented Dec 25, 2025

Uh oh!

claude bot commented Dec 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	model_type = str(model_info.task)
	model_type = getattr(model_info.task, "value", str(model_info.task))



		def setup_model(model_loader, model_variant=None, data_format="bfloat16") -> tuple[torch.nn.Module, str]:
		def setup_model(model_loader, model_variant=None, data_format="bfloat16"):

Read task from ModelInfo in tt-xla benchmarks #768

Are you sure you want to change the base?

Read task from ModelInfo in tt-xla benchmarks #768

Uh oh!

Conversation

odjuricicTT commented Dec 25, 2025

Summary

Changes

Benefits

Testing

Uh oh!

odjuricicTT commented Dec 25, 2025

Uh oh!

claude bot commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Todo List

Review Summary

✅ What's Working Well

🔍 Observations & Potential Issues

1. Minor inconsistency in llm_benchmark.py (llm_benchmark.py:293-296)

2. Type hint removed in vision_benchmark.py (vision_benchmark.py:35)

3. Verify ModelInfo.task enum values

📝 Recommendations

✨ Overall Assessment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Dec 25, 2025 •

edited

Loading