Skip to content

Latest commit

 

History

History
382 lines (247 loc) · 12.1 KB

File metadata and controls

382 lines (247 loc) · 12.1 KB

Model Protocol Strategy

Purpose

This document records the protocol strategy for model integration in the design-to-code project.

Related implementation record:

It exists to answer four practical questions:

  • how providers should be configured
  • how the service layer should isolate protocol differences
  • why the project should not hard-code everything around OpenAI-compatible assumptions
  • how to keep the open-source version simple while still leaving room for future commercial hosting or multi-provider support

This document is intended to guide future refactoring of app/services/model.ts.

Current State

The current generation pipeline already has a useful task-oriented shape.

Current strengths:

  • the business tasks are already explicit: ui-schema, design-image, image-parse, html-code
  • provider-level model lists already exist
  • the main service API is task-oriented rather than vendor-oriented
  • OpenAI-compatible providers can already be wired with provider-specific base URLs and paths

Current limitations:

  • protocol concerns and provider concerns are mixed together in ModelProviderConfig
  • openAICompatible is too coarse to express real protocol differences
  • corsEndpoint currently acts as a generic fallback rather than a clean adapter type
  • the service assumes that many providers can be treated as OpenAI-like even when compatibility is only partial
  • stream support is effectively tied to OpenAI-compatible SSE behavior

The result is acceptable for a first phase, but it will become harder to maintain once the project needs to support more than one family of protocol behavior.

Decision Summary

The project should not continue on a long-term path where every provider is modeled as OpenAI-compatible.

The project should also not try to implement many protocols immediately.

Recommended strategy:

  1. keep OpenAI-compatible support as the primary implementation path for now
  2. refactor the service so OpenAI-compatible support becomes one adapter rather than the architecture itself
  3. separate business tasks from protocol details
  4. make room for future adapters such as Anthropic-style, Gemini-style, or fully custom REST integrations

In short:

  • short term: one stable adapter
  • medium term: adapter-based architecture
  • long term: multi-protocol support without leaking protocol-specific fields into the business layer

Why Not Use OpenAI-Compatible Everywhere

The phrase OpenAI-compatible is useful, but it is not a sufficient architecture boundary.

Different vendors that claim OpenAI compatibility often diverge in practice on one or more of these points:

  • unsupported fields such as response_format
  • differences in multi-modal message content format
  • differences in streaming event structure
  • differences in image generation endpoints and request bodies
  • differences in token limit parameters, sampling parameters, and naming conventions
  • differences in authentication, rate limiting, and error response shape

This means OpenAI-compatible should be treated as a protocol family, not as a guarantee of full interchangeability.

Why Not Add Many Protocols Immediately

The project is still in an open-source, product-shaping stage.

The current priority is:

  • preserve a working pipeline
  • keep the integration understandable for contributors
  • avoid exploding service complexity too early

Implementing multiple protocols before the architecture is clarified would create unnecessary maintenance cost.

The correct move is to establish a clean adapter boundary first, then add protocols only when a real provider requires them.

Architectural Principle

The architecture should be organized around three layers.

1. Task Layer

This is the business-facing layer.

It should answer questions like:

  • generate UI Schema
  • generate design preview
  • parse image into schema
  • generate HTML plus Tailwind

This layer should not know whether the underlying provider uses OpenAI chat, Anthropic messages, Gemini content generation, or a custom REST gateway.

2. Protocol Adapter Layer

This layer is responsible for translating task requests into provider-specific HTTP or SDK calls.

Examples of adapter families:

  • OpenAI-compatible chat adapter
  • OpenAI-compatible image adapter
  • custom REST adapter
  • future Anthropic adapter
  • future Gemini adapter

Each adapter should own:

  • request body mapping
  • response parsing
  • stream handling
  • protocol-specific error interpretation

3. Provider Configuration Layer

This layer describes each concrete provider instance.

It should answer questions like:

  • which protocol family does this provider use
  • which tasks does it support
  • which endpoints does it expose
  • which auth method does it require
  • which models should be used for each task

Recommended Direction For Config Shape

The current ModelProviderConfig should evolve from a mixed structure into a clearer composition.

Recommended conceptual shape:

type TaskType = 'ui-schema' | 'design-image' | 'image-parse' | 'html-code'

type ProtocolKind =
  | 'openai-compatible'
  | 'custom-rest'
  | 'anthropic'
  | 'gemini'

type AuthScheme = 'bearer' | 'api-key' | 'custom'

interface ProviderProtocolConfig {
  kind: ProtocolKind
  baseUrl?: string
  authScheme?: AuthScheme
  endpoints?: {
    chat?: string
    image?: string
    vision?: string
    customTask?: string
  }
}

interface ProviderCapabilities {
  streamText?: boolean
  imageInput?: boolean
  imageGeneration?: boolean
  structuredJson?: boolean
}

interface ProviderTaskModels {
  uiSchema: string[]
  designImage: string[]
  imageParse: string[]
  htmlCode: string[]
}

interface ProviderConfig {
  providerId: string
  apiKey: string
  protocol: ProviderProtocolConfig
  capabilities: ProviderCapabilities
  taskModels: ProviderTaskModels
}

This is not a requirement to refactor immediately line for line, but it is the target shape to keep in mind.

Recommended Adapter Boundary

The service layer should eventually dispatch by protocol adapter rather than by boolean flags.

Conceptual direction:

interface ProtocolAdapter {
  kind: ProtocolKind
  invoke(input: AdapterInvokeInput): Promise<ModelRawResponse>
  invokeStream?(input: AdapterStreamInput): Promise<ModelRawResponse>
}

The important point is not the exact TypeScript signature.

The important point is:

  • the service chooses a task
  • the task resolves to a provider and model
  • the provider points to an adapter family
  • the adapter handles protocol-specific details

This keeps protocol logic out of the business flow.

Recommended Task Flow

The task flow should remain task-oriented.

Example sequence for UI Schema generation:

  1. task layer receives ui-schema
  2. service resolves candidate providers and models
  3. service selects the adapter from provider protocol config
  4. adapter builds the request for that protocol family
  5. response is normalized back into project-level output

This means retry logic and provider fallback remain valid even as protocols diversify.

Streaming Strategy

Streaming is currently the area most tightly coupled to OpenAI-compatible behavior.

This is acceptable for now, but the code should explicitly treat streaming support as capability-dependent.

Required rule:

  • streaming must not be assumed for every provider

Recommended direction:

  • OpenAI-compatible adapter may implement SSE-based streaming first
  • future adapters can either implement their own streaming behavior or declare no stream support
  • the task layer should ask whether the provider supports stream text before attempting stream mode

This will prevent accidental coupling between task behavior and a single protocol family's stream format.

Role Of Custom REST

The current corsEndpoint and corsImageEndpoint concept should be reinterpreted as a protocol adapter family rather than a fallback hack.

Recommended interpretation:

  • custom-rest is a first-class adapter type
  • it is valid for hosted proxies, vendor wrappers, internal bridges, or future commercial routing services

This is important because the project may later want a hosted mode without forcing all providers through an OpenAI-shaped abstraction.

Open-Source Route Guidance

For the open-source version, the integration strategy should remain simple.

Recommended rule set:

  • keep OpenAI-compatible integration as the default path
  • document that compatibility is partial and provider-specific
  • expose provider config in a way contributors can understand
  • do not attempt to support every vendor protocol immediately

This keeps onboarding practical while avoiding architectural dead ends.

Commercialization Readiness

This project may later support a hosted or managed model mode.

The protocol strategy should leave room for that without complicating the current open-source flow.

Commercialization-friendly implication:

  • hosted routing should be represented as another adapter family, not as a special case embedded into task logic
  • rate limiting, queueing, caching, and usage tracking belong outside the task layer
  • provider-specific secrets and commercial orchestration should not shape the public task API

This keeps the open-source local-key mode and future hosted mode aligned around the same internal task model.

Practical Refactoring Guidance

The service should be refactored in small steps.

Recommended order:

Phase 1. Clarify Config Semantics

Goals:

  • reduce ambiguity inside ModelProviderConfig
  • make protocol family explicit
  • make capability support explicit where needed

Possible outcome:

  • keep the existing type name temporarily
  • add a protocol field instead of relying only on openAICompatible

Phase 2. Extract OpenAI-Compatible Adapter

Goals:

  • move invokeOpenAICompatible
  • move invokeOpenAICompatibleStream
  • keep current behavior unchanged

Acceptance criteria:

  • no business logic changes
  • service behavior remains equivalent
  • build and existing generation flow still work

Phase 3. Rename Custom CORS Pathing As Adapter Behavior

Goals:

  • replace the idea of generic fallback with explicit custom-rest behavior
  • make intent clearer for future contributors

Acceptance criteria:

  • custom REST is visible as a protocol choice rather than an escape hatch

Phase 4. Prepare For Second Protocol Family

Goals:

  • define the contract needed for a second adapter
  • avoid implementing it until a real provider requires it

Acceptance criteria:

  • the architecture can accept another adapter without rewriting task logic

What Should Stay Stable

These project-level concepts should remain stable even if protocol support evolves:

  • task names
  • output result types
  • retry and fallback flow
  • provider attempt records
  • business prompts
  • UI status stages in Studio

The UI should not need to care whether a response came from OpenAI-compatible chat, a custom REST bridge, or a future non-OpenAI adapter.

What Should Change Later

These elements should evolve when the service layer is refactored:

  • openAICompatible boolean should no longer be the main protocol discriminator
  • provider config should separate protocol, capability, and model mapping
  • stream support should become capability-aware
  • custom hosted integrations should become explicit adapter types

Recommendation Summary

The project should adopt this position:

  • do not lock the architecture to OpenAI-compatible assumptions
  • do not rush into many protocol implementations
  • keep OpenAI-compatible as the primary adapter for now
  • build around a task layer plus protocol adapter layer plus provider config layer

This is the lowest-risk path for an open-source product that still wants to remain extensible.

Next Implementation Step

When the service layer is next refactored, the first concrete step should be:

  1. extract the current OpenAI-compatible request and stream logic behind an adapter boundary
  2. make provider protocol type explicit in configuration
  3. preserve current task results and UI behavior

Only after that should the project consider adding another protocol family.