This document records the protocol strategy for model integration in the design-to-code project.
Related implementation record:
- docs/large-output-generation-plan.md: strategy for handling large model outputs with chunked generation, continuation recovery, and incremental assembly.
It exists to answer four practical questions:
- how providers should be configured
- how the service layer should isolate protocol differences
- why the project should not hard-code everything around OpenAI-compatible assumptions
- how to keep the open-source version simple while still leaving room for future commercial hosting or multi-provider support
This document is intended to guide future refactoring of app/services/model.ts.
The current generation pipeline already has a useful task-oriented shape.
Current strengths:
- the business tasks are already explicit:
ui-schema,design-image,image-parse,html-code - provider-level model lists already exist
- the main service API is task-oriented rather than vendor-oriented
- OpenAI-compatible providers can already be wired with provider-specific base URLs and paths
Current limitations:
- protocol concerns and provider concerns are mixed together in
ModelProviderConfig openAICompatibleis too coarse to express real protocol differencescorsEndpointcurrently acts as a generic fallback rather than a clean adapter type- the service assumes that many providers can be treated as OpenAI-like even when compatibility is only partial
- stream support is effectively tied to OpenAI-compatible SSE behavior
The result is acceptable for a first phase, but it will become harder to maintain once the project needs to support more than one family of protocol behavior.
The project should not continue on a long-term path where every provider is modeled as OpenAI-compatible.
The project should also not try to implement many protocols immediately.
Recommended strategy:
- keep OpenAI-compatible support as the primary implementation path for now
- refactor the service so OpenAI-compatible support becomes one adapter rather than the architecture itself
- separate business tasks from protocol details
- make room for future adapters such as Anthropic-style, Gemini-style, or fully custom REST integrations
In short:
- short term: one stable adapter
- medium term: adapter-based architecture
- long term: multi-protocol support without leaking protocol-specific fields into the business layer
The phrase OpenAI-compatible is useful, but it is not a sufficient architecture boundary.
Different vendors that claim OpenAI compatibility often diverge in practice on one or more of these points:
- unsupported fields such as
response_format - differences in multi-modal message content format
- differences in streaming event structure
- differences in image generation endpoints and request bodies
- differences in token limit parameters, sampling parameters, and naming conventions
- differences in authentication, rate limiting, and error response shape
This means OpenAI-compatible should be treated as a protocol family, not as a guarantee of full interchangeability.
The project is still in an open-source, product-shaping stage.
The current priority is:
- preserve a working pipeline
- keep the integration understandable for contributors
- avoid exploding service complexity too early
Implementing multiple protocols before the architecture is clarified would create unnecessary maintenance cost.
The correct move is to establish a clean adapter boundary first, then add protocols only when a real provider requires them.
The architecture should be organized around three layers.
This is the business-facing layer.
It should answer questions like:
- generate UI Schema
- generate design preview
- parse image into schema
- generate HTML plus Tailwind
This layer should not know whether the underlying provider uses OpenAI chat, Anthropic messages, Gemini content generation, or a custom REST gateway.
This layer is responsible for translating task requests into provider-specific HTTP or SDK calls.
Examples of adapter families:
- OpenAI-compatible chat adapter
- OpenAI-compatible image adapter
- custom REST adapter
- future Anthropic adapter
- future Gemini adapter
Each adapter should own:
- request body mapping
- response parsing
- stream handling
- protocol-specific error interpretation
This layer describes each concrete provider instance.
It should answer questions like:
- which protocol family does this provider use
- which tasks does it support
- which endpoints does it expose
- which auth method does it require
- which models should be used for each task
The current ModelProviderConfig should evolve from a mixed structure into a clearer composition.
Recommended conceptual shape:
type TaskType = 'ui-schema' | 'design-image' | 'image-parse' | 'html-code'
type ProtocolKind =
| 'openai-compatible'
| 'custom-rest'
| 'anthropic'
| 'gemini'
type AuthScheme = 'bearer' | 'api-key' | 'custom'
interface ProviderProtocolConfig {
kind: ProtocolKind
baseUrl?: string
authScheme?: AuthScheme
endpoints?: {
chat?: string
image?: string
vision?: string
customTask?: string
}
}
interface ProviderCapabilities {
streamText?: boolean
imageInput?: boolean
imageGeneration?: boolean
structuredJson?: boolean
}
interface ProviderTaskModels {
uiSchema: string[]
designImage: string[]
imageParse: string[]
htmlCode: string[]
}
interface ProviderConfig {
providerId: string
apiKey: string
protocol: ProviderProtocolConfig
capabilities: ProviderCapabilities
taskModels: ProviderTaskModels
}This is not a requirement to refactor immediately line for line, but it is the target shape to keep in mind.
The service layer should eventually dispatch by protocol adapter rather than by boolean flags.
Conceptual direction:
interface ProtocolAdapter {
kind: ProtocolKind
invoke(input: AdapterInvokeInput): Promise<ModelRawResponse>
invokeStream?(input: AdapterStreamInput): Promise<ModelRawResponse>
}The important point is not the exact TypeScript signature.
The important point is:
- the service chooses a task
- the task resolves to a provider and model
- the provider points to an adapter family
- the adapter handles protocol-specific details
This keeps protocol logic out of the business flow.
The task flow should remain task-oriented.
Example sequence for UI Schema generation:
- task layer receives
ui-schema - service resolves candidate providers and models
- service selects the adapter from provider protocol config
- adapter builds the request for that protocol family
- response is normalized back into project-level output
This means retry logic and provider fallback remain valid even as protocols diversify.
Streaming is currently the area most tightly coupled to OpenAI-compatible behavior.
This is acceptable for now, but the code should explicitly treat streaming support as capability-dependent.
Required rule:
- streaming must not be assumed for every provider
Recommended direction:
- OpenAI-compatible adapter may implement SSE-based streaming first
- future adapters can either implement their own streaming behavior or declare no stream support
- the task layer should ask whether the provider supports stream text before attempting stream mode
This will prevent accidental coupling between task behavior and a single protocol family's stream format.
The current corsEndpoint and corsImageEndpoint concept should be reinterpreted as a protocol adapter family rather than a fallback hack.
Recommended interpretation:
custom-restis a first-class adapter type- it is valid for hosted proxies, vendor wrappers, internal bridges, or future commercial routing services
This is important because the project may later want a hosted mode without forcing all providers through an OpenAI-shaped abstraction.
For the open-source version, the integration strategy should remain simple.
Recommended rule set:
- keep OpenAI-compatible integration as the default path
- document that compatibility is partial and provider-specific
- expose provider config in a way contributors can understand
- do not attempt to support every vendor protocol immediately
This keeps onboarding practical while avoiding architectural dead ends.
This project may later support a hosted or managed model mode.
The protocol strategy should leave room for that without complicating the current open-source flow.
Commercialization-friendly implication:
- hosted routing should be represented as another adapter family, not as a special case embedded into task logic
- rate limiting, queueing, caching, and usage tracking belong outside the task layer
- provider-specific secrets and commercial orchestration should not shape the public task API
This keeps the open-source local-key mode and future hosted mode aligned around the same internal task model.
The service should be refactored in small steps.
Recommended order:
Goals:
- reduce ambiguity inside
ModelProviderConfig - make protocol family explicit
- make capability support explicit where needed
Possible outcome:
- keep the existing type name temporarily
- add a
protocolfield instead of relying only onopenAICompatible
Goals:
- move
invokeOpenAICompatible - move
invokeOpenAICompatibleStream - keep current behavior unchanged
Acceptance criteria:
- no business logic changes
- service behavior remains equivalent
- build and existing generation flow still work
Goals:
- replace the idea of generic fallback with explicit
custom-restbehavior - make intent clearer for future contributors
Acceptance criteria:
- custom REST is visible as a protocol choice rather than an escape hatch
Goals:
- define the contract needed for a second adapter
- avoid implementing it until a real provider requires it
Acceptance criteria:
- the architecture can accept another adapter without rewriting task logic
These project-level concepts should remain stable even if protocol support evolves:
- task names
- output result types
- retry and fallback flow
- provider attempt records
- business prompts
- UI status stages in Studio
The UI should not need to care whether a response came from OpenAI-compatible chat, a custom REST bridge, or a future non-OpenAI adapter.
These elements should evolve when the service layer is refactored:
openAICompatibleboolean should no longer be the main protocol discriminator- provider config should separate protocol, capability, and model mapping
- stream support should become capability-aware
- custom hosted integrations should become explicit adapter types
The project should adopt this position:
- do not lock the architecture to OpenAI-compatible assumptions
- do not rush into many protocol implementations
- keep OpenAI-compatible as the primary adapter for now
- build around a task layer plus protocol adapter layer plus provider config layer
This is the lowest-risk path for an open-source product that still wants to remain extensible.
When the service layer is next refactored, the first concrete step should be:
- extract the current OpenAI-compatible request and stream logic behind an adapter boundary
- make provider protocol type explicit in configuration
- preserve current task results and UI behavior
Only after that should the project consider adding another protocol family.