LLM Inference for Go

A single interface in Go to get inference from multiple LLM / AI providers using their official SDKs.

Features at a glance
Installation
Quickstart
Examples
Supported providers
HTTP debugging
Notes
Development
License

Features at a glance

Single normalized interface (ProviderSetAPI) for multiple providers. Current support:
- Anthropic Messages API. Official SDK used
- OpenAI Chat Completions API Official SDK used
- OpenAI Responses API Official SDK used
Normalized data model in spec/:
- messages (user / assistant / system / developer),
- text, images, and files, (no audio/video content types yet),
- tools (function, custom, built-in tools like web search),
- reasoning / thinking content,
- streaming events (text + thinking),
- usage accounting.
Streaming support:
- Text streaming for all providers that support it.
- Reasoning / thinking streaming where the provider exposes it (Anthropic, OpenAI Responses).
Client and Server Tools:
- Client tools are supported via Function Calling.
- Anthropic server-side web search.
- OpenAI Responses web search tool.
- OpenAI Chat Completions web search via web_search_options.
HTTP-level debugging:
- Pluggable CompletionDebugger interface.
- A built-in ready to use implementation at: debugclient.HTTPCompletionDebugger:
  - wraps SDK HTTP clients,
  - captures request/response metadata,
  - redacts secrets and sensitive content,
  - attaches a scrubbed debug blob to FetchCompletionResponse.DebugDetails.

Installation

# Go 1.25+
go get github.com/flexigpt/inference-go

Quickstart

Basic pattern:

Create a ProviderSetAPI.
Add one or more providers. Set their API keys.
Send a FetchCompletionRequest.

Examples

Basic OpenAI Responses
Basic OpenAI Chat Completions
Basic Anthropic Messages
Extended OpenAI Responses example
- Demonstrates tools, web search, file and image attachments.

Supported providers

Anthropic Messages API

Feature support

Area	Supported?	Notes
Text input/output	yes	User and assistant messages mapped to text blocks.
Streaming text	yes
Reasoning / thinking	yes	Redacted thinking is also supported; not streamed to caller.
Streaming thinking	yes
Images (input)	yes	Inline base64 (`imageData`) or remote URLs (`imageURL`) mapped to Anthropic image blocks.
Files / documents (input)	yes	PDFs only, via base64 or URL. Plain-text base64 and other MIME types are currently ignored.
Audio/Video input/output	no
Tools (function/custom)	yes	JSON Schema based.
Web search	yes	Server web search tool use + web search result blocks.
Citations	partial	URL citations only. Other stateful citations are not mapped.
Metadata / service tiers	opaque	Not exposed in normalized types; available in debug payload.
Stateful flows	no	Library focuses on stateless calls only.
Usage data	yes	Input/Output/Cached. Anthropic doesn't expose Reasoning tokens usage.

Behavior for conversational + interleaved reasoning message input
- Input: No reasoning content in the incoming messages.
  - Action: Build the message list unchanged. If the last user message is a tool_result, force thinking disabled; otherwise, honor the requested thinking setting.
- Input: All reasoning messages are signed.
  - Action: Build the message list unchanged. If the last user message is a tool_result and the previous assistant message begins with thinking content, force thinking enabled; otherwise, honor the requested thinking setting.
- Input: Mix of reasoning messages where some include a valid signature thinking and others do not.
  - Action: Retain only the reasoning messages with a valid signature; drop the rest. Apply the above behaviors after this cleanup.

OpenAI Responses API

Feature support

Area	Supported?	Notes
Text input/output	yes	Input/output messages fully supported.
Streaming text	yes
Reasoning / thinking	yes	Reasoning items mapped to `ReasoningContent`, including encrypted content.
Streaming thinking	yes
Images (input)	yes	`imageData` (base64) or `imageURL`, with `detail` low/high/auto, mapped to Responses `input_image` items.
Files / documents (input)	yes	`fileData` (base64) or `fileURL` mapped to Responses `input_file` items; works for PDFs and other file MIME types.
Audio/Video input/output	no
Tools (function/custom)	yes	JSON Schema based.
Web search	yes	Server web search tool choice + web search tool call blocks mapped to `webSearch` tool calls in normalized outputs.
Citations	yes	URL citations mapped to `spec.CitationKindURL`.
Metadata / service tiers	opaque	Not exposed in normalized types; available in debug payload.
Stateful flows	no	Store is explicitly disabled (`Store: false`).
Usage data	yes	Input/Output/Cached/Reasoning.

Behavior for conversational + interleaved reasoning message input
- Input: No reasoning messages.
  - Action: Build the message list unchanged. Honor the requested thinking setting.
- Input: All reasoning messages are encrypted_content.
  - Action: Build the message list unchanged. Honor the requested thinking setting.
- Input: Mixed reasoning messages: some are signature-based and some are encrypted_content.
  - Action: Keep only the encrypted_content reasoning; drop the signature-based reasoning.

OpenAI Chat Completions API

Feature support

Area	Supported?	Notes
Text input/output	yes	Single assistant message per completion (first choice).
Streaming text	yes
Reasoning / thinking	yes	Reasoning effort config only; no separate reasoning messages in API.
Streaming thinking	no	Not exposed by Chat Completions.
Images (input)	yes	`imageData` (base64) and `imageURL` are both supported; base64 is sent as a data URL with `detail` low/high/auto.
Files / documents (input)	yes	`fileData` (base64) only, sent as a data URL; `fileURL` and stateful file IDs are not used by this adapter.
Audio/Video input/output	no
Tools (function/custom)	yes	JSON Schema based.
Web search	yes	API doesn't expose a tool; mapped via top-level `web_search_options` derived from a `webSearch` ToolChoice.
Citations	yes	URL citations mapped from annotations.
Metadata / service tiers	opaque	Not exposed in normalized types; available in debug payload.
Stateful flows	no	Library focuses on stateless calls only.
Usage data	yes	Input/Output/Cached/Reasoning.

Behavior for conversational + interleaved reasoning message input
- Reasoning effort config is kept as is.
- All reasoning input/output messages are dropped as the api doesn't support it.

HTTP debugging

The library exposes a pluggable CompletionDebugger interface:

type CompletionDebugger interface {
    HTTPClient(base *http.Client) *http.Client
    StartSpan(ctx context.Context, info *spec.CompletionSpanStart) (context.Context, spec.CompletionSpan)
}

package debugclient includes an implementation that can be readily used as HTTPCompletionDebugger:
- wraps the provider SDK’s *http.Client,
- captures and scrubs:
  - URL, method, headers (with secret redaction),
  - query params,
  - request/response bodies (optional, scrubbed of LLM text and large base64),
  - curl command for reproduction,
- attaches a structured HTTPDebugState to FetchCompletionResponse.DebugDetails.
- You can then inspect resp.DebugDetails for a given call, or just rely on slog output.
Use it via WithDebugClientBuilder:

ps, _ := inference.NewProviderSetAPI(
    inference.WithDebugClientBuilder(func(p spec.ProviderParam) spec.CompletionDebugger {
        return debugclient.NewHTTPCompletionDebugger(&debugclient.DebugConfig{
            LogToSlog: false,
        })
    }),
)

Notes

Stateless focus. The design focuses on stateless request/response interactions:
- no conversation IDs,
- no file IDs,
Opaque / provider‑specific fields.
- Many provider‑specific fields (error details, service tiers, cache metadata, full raw responses) are only available through the debug payload, not in the normalized spec types.
- Few of the common needed params may be added over time and as needed.
Token counting - Normalized Usage reports what the provider exposes:
- Anthropic: input vs. cached tokens, output tokens.
- OpenAI: prompt vs. cached tokens, completion tokens, reasoning tokens where available.
Heuristic prompt filtering.
- ModelParam.MaxPromptLength triggers sdkutil.FilterMessagesByTokenCount, which uses a simple heuristic token counter. It is approximate, not an exact tokenizer.

Development

Formatting follows gofumpt and golines via golangci-lint, which is also used for linting. All rules are in .golangci.yml.
Useful scripts are defined in taskfile.yml; requires Task.
Bug reports and PRs are welcome:
- Keep the public API (package inference and spec) small and intentional.
- Avoid leaking provider‑specific types through the public surface; put them under internal/.
- Please run tests and linters before sending a PR.

License

All source code in this repository, unless otherwise noted, is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github		.github
.vscode		.vscode
debugclient		debugclient
internal		internal
scripts		scripts
spec		spec
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.testcoverage.yml		.testcoverage.yml
.tool-versions		.tool-versions
LICENSE		LICENSE
README.md		README.md
data_contract_meta.go		data_contract_meta.go
data_contract_meta_test.go		data_contract_meta_test.go
go.mod		go.mod
go.sum		go.sum
notes.md		notes.md
provider_set.go		provider_set.go
taskfile.yml		taskfile.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Inference for Go

Features at a glance

Installation

Quickstart

Examples

Supported providers

Anthropic Messages API

OpenAI Responses API

OpenAI Chat Completions API

HTTP debugging

Notes

Development

License

About

Uh oh!

Releases

Contributors 2

Uh oh!

Languages

License

flexigpt/inference-go

Folders and files

Latest commit

History

Repository files navigation

LLM Inference for Go

Features at a glance

Installation

Quickstart

Examples

Supported providers

Anthropic Messages API

OpenAI Responses API

OpenAI Chat Completions API

HTTP debugging

Notes

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Uh oh!

Languages