Skip to content

[MCP Integration] Phase 2: Data models and constants#2235

Merged
aniruddh-alt merged 12 commits intoani/mcp-integration-01-scaffoldfrom
ani/mcp-integration-02-models
Mar 24, 2026
Merged

[MCP Integration] Phase 2: Data models and constants#2235
aniruddh-alt merged 12 commits intoani/mcp-integration-01-scaffoldfrom
ani/mcp-integration-02-models

Conversation

@aniruddh-alt
Copy link
Copy Markdown
Contributor

Description

Part of the MCP Integration PR chain (Phase 2 of 10) - Stage: foundation

What changed: Added Pydantic data models (JobRecord, PreFlightSummary, CloudValidation, etc.) and constants (cloud provider configs, resource specs, validation rules) used by all MCP services.

Why: These shared types and constants form the foundation that all subsequent service modules depend on.

Related issues

Before submitting

  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Mar 3, 2026

Important

Upgrade your plan to unlock code review, CI analysis, custom rules, and more.

@aniruddh-alt aniruddh-alt force-pushed the ani/mcp-integration-02-models branch from f38a2e4 to 8e7a6b1 Compare March 4, 2026 00:03
@aniruddh-alt aniruddh-alt force-pushed the ani/mcp-integration-01-scaffold branch from 70e68d6 to b35dc2d Compare March 4, 2026 00:31
@aniruddh-alt aniruddh-alt force-pushed the ani/mcp-integration-02-models branch from cbe787c to 4d3461e Compare March 4, 2026 00:32
@aniruddh-alt aniruddh-alt force-pushed the ani/mcp-integration-01-scaffold branch from b35dc2d to 79de029 Compare March 4, 2026 00:41
Aniruddhan Ramesh and others added 3 commits March 3, 2026 16:42
Add Pydantic models (JobRecord, PreFlightSummary, CloudValidation, etc.)
and constants (cloud provider configs, resource specs, validation rules)
that form the shared foundation for all MCP services.

Part of the MCP integration PR chain (Phase 2 of 10).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing "eval" to DATA_SPLITS constant
- Convert mutable list constants to tuples
- Import TypedDict from typing (Python 3.12+)
@aniruddh-alt aniruddh-alt force-pushed the ani/mcp-integration-02-models branch from 4d3461e to 89c8856 Compare March 4, 2026 00:43
aniruddh-alt and others added 6 commits March 3, 2026 16:51
@aniruddh-alt aniruddh-alt marked this pull request as ready for review March 8, 2026 16:51
"pretraining",
"synthesis",
"quantization",
"",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this empty string here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove them, they were meant to be a fallback but it is not needed.


# The oumi release version that the bundled configs/ directory corresponds to.
# Update this constant whenever the bundled configs are refreshed from a tag.
BUNDLED_OUMI_VERSION = "0.7"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take a dependency on where this is defined in the library or have an automatic sync check to run whenever one of these changes to ensure the other is also changed?

torch_dtype: PyTorch data type (e.g., bfloat16, float32).
"""

learning_rate: NotRequired[float]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our standard is typically to use Optional, is there a difference with this type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotRequired means that the field doesn't need to be present as opposed to Optional which needs the field to be present but it can be None. It adds convenience but can be updated to Optional to maintain uniformity.

e.g. {"torch": "2.3.0"}.
"""

accelerator_type: str
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StrEnum instead?

"""Constants and type definitions for Oumi MCP server."""

from pathlib import Path
from typing import Literal

This comment was marked as outdated.

Co-authored-by: Aniruddhan Ramesh <aniruddhanramesh@Aniruddhans-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@aniruddh-alt aniruddh-alt merged commit c0b57cd into ani/mcp-integration-01-scaffold Mar 24, 2026
1 check passed
),
timeout=30.0,
)
except TimeoutError:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code catches the built-in TimeoutError, but asyncio.wait_for raises asyncio.TimeoutError on Python < 3.11, which is a different, uncaught exception type.
Severity: HIGH

Suggested Fix

In src/oumi/mcp/job_service.py, change the exception handling from except TimeoutError: to except asyncio.TimeoutError: to correctly catch the timeout exception raised by asyncio.wait_for across all supported Python versions.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/oumi/mcp/job_service.py#L519

Potential issue: The `cancel_job_impl` function uses `try...except TimeoutError` to
handle timeouts from `asyncio.wait_for`. In Python versions before 3.11,
`asyncio.wait_for` raises `asyncio.TimeoutError`, which is a distinct exception and does
not inherit from the built-in `TimeoutError`. Consequently, if the 30-second timeout is
reached during a job cancellation on an environment running Python < 3.11, the exception
will not be caught. This will propagate as an unhandled exception, causing the async
handler to crash and potentially leading to a server failure. The codebase shows
awareness of this distinction elsewhere, using `asyncio.TimeoutError` correctly in other
modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants