svilupp
diff --git a/‎CHANGELOG.md‎
Lines changed: 7 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 60 additions & 0 deletions b/‎README.md‎
Lines changed: 60 additions & 0 deletions
diff --git a/‎docs/src/.vitepress/config.mts‎
Lines changed: 3 additions & 1 deletion b/‎docs/src/.vitepress/config.mts‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎docs/src/extra_tools/observability_logfire.md‎
Lines changed: 192 additions & 0 deletions b/‎docs/src/extra_tools/observability_logfire.md‎
Lines changed: 192 additions & 0 deletions
@@ -8,13 +8,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+### Fixed
+
 ### Updated
 
 ## [0.88.0]
 
 ### Added
 - Added support for OpenAI's Responses API (`/responses` endpoint) via `OpenAIResponseSchema`. Supports reasoning traces, multi-turn conversations with `previous_response_id`, and structured extraction with `aiextract`. Use `aigenerate(OpenAIResponseSchema(), prompt; model="o4-mini")` for reasoning models (access via `result.extras[:reasoning_content]`). See `examples/working_with_responses_api.jl`. Note: Many features are not supported yet, eg, built-in tools, etc.
 - Added support for streaming responses with `OpenAIResponseSchema` via a dedicated `StreamCallback` flavor. See `examples/working_with_responses_api.jl`.
+- Added comprehensive observability metadata to `AIMessage.extras` for Logfire.jl integration (provider metadata, unified usage keys, cache/reasoning tokens). See `examples/observability_with_logfire.jl`.
+
+### Fixed
+- Fixed `return_all` parameter not being handled correctly in tracer wrappers for `aiextract`, `aitools`, `aiscan`, and `aiimage`. Previously, when using `TracerSchema` or `SaverSchema`, these functions would pass through the raw vector result instead of returning a single message when `return_all=false` (the default).
+- Fixed `aigenerate` and `aiextract` for `OpenAIResponseSchema` ignoring the `return_all` parameter, which broke compatibility with the tracer infrastructure and other patterns that rely on `return_all=true`.
 
 ## [0.87.0]
 
 
@@ -103,6 +103,7 @@ For more practical examples, see the `examples/` folder and the [Advanced Exampl
     - [Using MistralAI API and other OpenAI-compatible APIs](#using-mistralai-api-and-other-openai-compatible-apis)
     - [Using OpenAI Responses API](#using-openai-responses-api)
     - [Using Anthropic Models](#using-anthropic-models)
+    - [Advanced Observability with Logfire.jl](#advanced-observability-with-logfirejl)
     - [More Examples](#more-examples)
   - [Package Interface](#package-interface)
   - [Frequently Asked Questions](#frequently-asked-questions)
@@ -657,6 +658,65 @@ msg = aigenerate(
 ```
 
 
+### Advanced Observability with Logfire.jl
+
+[Logfire.jl](https://github.com/svilupp/Logfire.jl) provides OpenTelemetry-based observability for your LLM applications. It automatically traces all your AI calls with detailed information about tokens, costs, messages, and latency.
+
+**Quick Setup:**
+
+```julia
+using Pkg
+Pkg.add(["Logfire", "DotEnv"])  # Install Logfire.jl to enable the extension
+
+using DotEnv
+DotEnv.load!()  # Load LOGFIRE_TOKEN and API keys from .env file
+
+using Logfire, PromptingTools
+
+# 1. Configure Logfire (uses LOGFIRE_TOKEN env var, or pass token directly)
+Logfire.configure(service_name = "my-app")
+
+# 2. Instrument all registered models - wraps them with tracing schema
+Logfire.instrument_promptingtools!()
+
+# 3. Use PromptingTools as normal - traces are automatic!
+aigenerate("What is 2 + 2?"; model = "gpt4om")
+```
+
+**What Gets Captured:**
+- Token usage (input/output/total) and cost estimates
+- Full conversation history (system, user, assistant messages)
+- Model parameters (temperature, max_tokens, etc.)
+- Latency measurements and cache/streaming flags
+- Tool/function calls and structured extraction results
+
+**Instrument Individual Models:**
+
+You don't have to instrument all models. Wrap only specific models for selective tracing:
+
+```julia
+Logfire.instrument_promptingtools_model!("my-local-llm")
+```
+
+**Alternative Backends:**
+
+You don't have to use Logfire cloud - send traces to any OpenTelemetry-compatible backend:
+
+```julia
+# Local development with Jaeger
+ENV["OTEL_EXPORTER_OTLP_ENDPOINT"] = "http://localhost:4318"
+Logfire.configure(service_name = "my-app", send_to_logfire = :always)
+
+# Or use Langfuse
+ENV["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://cloud.langfuse.com/api/public/otel"
+ENV["OTEL_EXPORTER_OTLP_HEADERS"] = "Authorization=Basic <base64-credentials>"
+Logfire.configure(service_name = "my-app", send_to_logfire = :always)
+```
+
+That said, I strongly recommend [Pydantic Logfire](https://pydantic.dev/logfire) - their free tier provides hundreds of thousands of traced conversations per month, which is more than enough for most use cases.
+
+See the [Logfire.jl documentation](https://svilupp.github.io/Logfire.jl/dev) and [`examples/observability_with_logfire.jl`](examples/observability_with_logfire.jl) for more details.
+
 ### More Examples
 
 TBU...
 
@@ -54,6 +54,7 @@ export default defineConfig({
             { text: 'RAGTools', link: '/extra_tools/rag_tools_intro' },
             { text: 'RAGTools Migration', link: '/ragtools_migration' },
             { text: 'APITools', link: '/extra_tools/api_tools_intro' },
+            { text: 'Observability (Logfire)', link: '/extra_tools/observability_logfire' },
           ]
           },
         ],
@@ -92,7 +93,8 @@ export default defineConfig({
           { text: 'Extra Tools', collapsed: true, items: [
             { text: 'Text Utilities', link: '/extra_tools/text_utilities_intro' },
             { text: 'AgentTools', link: '/extra_tools/agent_tools_intro' },
-            { text: 'APITools', link: '/extra_tools/api_tools_intro' }]
+            { text: 'APITools', link: '/extra_tools/api_tools_intro' },
+            { text: 'Observability (Logfire)', link: '/extra_tools/observability_logfire' }]
           },
         ],
       },
 
@@ -0,0 +1,192 @@
+# Observability with Logfire.jl
+
+[Logfire.jl](https://github.com/svilupp/Logfire.jl) provides OpenTelemetry-based observability for your LLM applications built with PromptingTools.jl. It automatically traces all your AI calls with detailed information about tokens, costs, messages, and latency.
+
+## Installation
+
+Logfire.jl is a separate package that provides a PromptingTools extension. Install it along with DotEnv for loading secrets:
+
+```julia
+using Pkg
+Pkg.add(["Logfire", "DotEnv"])
+```
+
+The extension is loaded automatically when both packages are present - no additional configuration needed.
+
+## Quick Start
+
+```julia
+using DotEnv
+DotEnv.load!()  # Load LOGFIRE_TOKEN and API keys from .env file
+
+using Logfire, PromptingTools
+
+# 1. Configure Logfire (uses LOGFIRE_TOKEN env var, or pass token directly)
+Logfire.configure(service_name = "my-app")
+
+# 2. Instrument all registered models - wraps them with tracing schema
+Logfire.instrument_promptingtools!()
+
+# 3. Use PromptingTools as normal - traces are automatic!
+aigenerate("What is 2 + 2?"; model = "gpt4om")
+```
+
+## How It Works
+
+The integration works by wrapping registered models in a Logfire tracing schema. When you call `instrument_promptingtools!()`, Logfire modifies the model registry to route all calls through its tracing layer. This means:
+
+- All `ai*` functions work exactly as before
+- No code changes needed in your existing workflows
+- Traces are captured automatically with rich metadata
+
+## What Gets Captured
+
+Each AI call creates a span with:
+
+- **Request parameters**: model, temperature, top_p, max_tokens, stop, penalties
+- **Usage metrics**: input/output/total tokens, latency, cost estimates
+- **Provider metadata**: model returned, status, finish_reason, response_id
+- **Conversation**: full message history (roles + content)
+- **Cache & streaming**: flags and chunk counts
+- **Tool/function calls**: count and payload
+- **Errors**: exceptions with span status set to error
+
+## Extras Field Reference
+
+PromptingTools populates `AIMessage.extras` with detailed metadata that Logfire.jl maps to OpenTelemetry GenAI semantic convention attributes. The fields use unified naming across providers for consistency.
+
+### Provider Metadata
+
+| Extras Key | Type | Description | OpenAI | Anthropic |
+|------------|------|-------------|--------|-----------|
+| `:model` | String | Actual model used (may differ from requested) | ✓ | ✓ |
+| `:response_id` | String | Provider's unique response identifier | ✓ | ✓ |
+| `:system_fingerprint` | String | OpenAI system fingerprint for determinism | ✓ | - |
+| `:service_tier` | String | Service tier used (e.g., "default", "standard") | ✓ | ✓ |
+
+### Unified Usage Keys
+
+These keys provide cross-provider compatibility. Use these for provider-agnostic code:
+
+| Extras Key | Type | Description | OpenAI Source | Anthropic Source |
+|------------|------|-------------|---------------|------------------|
+| `:cache_read_tokens` | Int | Tokens read from cache (cache hits) | `prompt_tokens_details.cached_tokens` | `cache_read_input_tokens` |
+| `:cache_write_tokens` | Int | Tokens written to cache | - | `cache_creation_input_tokens` |
+| `:reasoning_tokens` | Int | Chain-of-thought/reasoning tokens | `completion_tokens_details.reasoning_tokens` | - |
+| `:audio_input_tokens` | Int | Audio tokens in input | `prompt_tokens_details.audio_tokens` | - |
+| `:audio_output_tokens` | Int | Audio tokens in output | `completion_tokens_details.audio_tokens` | - |
+| `:accepted_prediction_tokens` | Int | Predicted tokens that were accepted | `completion_tokens_details.accepted_prediction_tokens` | - |
+| `:rejected_prediction_tokens` | Int | Predicted tokens that were rejected | `completion_tokens_details.rejected_prediction_tokens` | - |
+
+### Anthropic-Specific Keys
+
+| Extras Key | Type | Description |
+|------------|------|-------------|
+| `:cache_write_1h_tokens` | Int | Ephemeral 1-hour cache tokens |
+| `:cache_write_5m_tokens` | Int | Ephemeral 5-minute cache tokens |
+| `:web_search_requests` | Int | Server-side web search requests |
+| `:cache_creation_input_tokens` | Int | Original Anthropic key (backwards compat) |
+| `:cache_read_input_tokens` | Int | Original Anthropic key (backwards compat) |
+
+### Raw Provider Dicts
+
+For debugging or advanced use cases, the original nested structures are preserved:
+
+| Extras Key | Provider | Contents |
+|------------|----------|----------|
+| `:prompt_tokens_details` | OpenAI | `{:cached_tokens, :audio_tokens}` |
+| `:completion_tokens_details` | OpenAI | `{:reasoning_tokens, :audio_tokens, :accepted_prediction_tokens, :rejected_prediction_tokens}` |
+| `:cache_creation` | Anthropic | `{:ephemeral_1h_input_tokens, :ephemeral_5m_input_tokens}` |
+| `:server_tool_use` | Anthropic | `{:web_search_requests}` |
+
+### Example: Accessing Extras
+
+```julia
+using PromptingTools
+
+msg = aigenerate("What is 2+2?"; model="gpt4om")
+
+# Provider metadata
+println("Model used: ", msg.extras[:model])
+println("Response ID: ", msg.extras[:response_id])
+
+# Unified usage (works across providers)
+cache_hits = get(msg.extras, :cache_read_tokens, 0)
+reasoning = get(msg.extras, :reasoning_tokens, 0)
+
+# Raw OpenAI details (if needed)
+if haskey(msg.extras, :prompt_tokens_details)
+    details = msg.extras[:prompt_tokens_details]
+    println("Cached: ", get(details, :cached_tokens, 0))
+end
+```
+
+## Instrument Individual Models
+
+You don't have to instrument all models. For selective tracing, wrap only specific models:
+
+```julia
+Logfire.instrument_promptingtools_model!("my-local-llm")
+```
+
+This reuses the model's registered PromptingTools schema, so provider-specific behavior is preserved.
+
+## Alternative Backends
+
+You don't have to use Logfire cloud. Send traces to any OpenTelemetry-compatible backend using standard environment variables:
+
+| Variable | Purpose |
+|----------|---------|
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | Backend URL (e.g., `http://localhost:4318`) |
+| `OTEL_EXPORTER_OTLP_HEADERS` | Custom headers (e.g., `Authorization=Bearer token`) |
+
+### Local Development with Jaeger
+
+```bash
+# Start Jaeger
+docker run --rm -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest
+```
+
+```julia
+using Logfire
+
+ENV["OTEL_EXPORTER_OTLP_ENDPOINT"] = "http://localhost:4318"
+
+Logfire.configure(
+    service_name = "my-app",
+    send_to_logfire = :always  # Export even without Logfire token
+)
+
+Logfire.instrument_promptingtools!()
+# Now use PromptingTools normally - traces go to Jaeger
+```
+
+View traces at: http://localhost:16686
+
+### Using with Langfuse
+
+```julia
+ENV["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://cloud.langfuse.com/api/public/otel"
+ENV["OTEL_EXPORTER_OTLP_HEADERS"] = "Authorization=Basic <base64-credentials>"
+
+Logfire.configure(service_name = "my-llm-app", send_to_logfire = :always)
+```
+
+## Recommended: Pydantic Logfire
+
+While you can use any OTLP-compatible backend, we strongly recommend [Pydantic Logfire](https://pydantic.dev/logfire). Their free tier provides hundreds of thousands of traced conversations per month, which is more than enough for most use cases. The UI is purpose-built for LLM observability with excellent visualization of conversations, token usage, and costs.
+
+## Authentication
+
+- Provide your Logfire token via `Logfire.configure(token = "...")` or set `ENV["LOGFIRE_TOKEN"]`
+- Use `DotEnv.load!()` to load tokens from a project-local `.env` file (recommended for per-project configuration)
+
+## Example
+
+See the full example at [`examples/observability_with_logfire.jl`](https://github.com/svilupp/PromptingTools.jl/blob/main/examples/observability_with_logfire.jl).
+
+## Further Reading
+
+- [Logfire.jl Documentation](https://svilupp.github.io/Logfire.jl/dev)
+- [Logfire.jl GitHub](https://github.com/svilupp/Logfire.jl)
+- [Pydantic Logfire](https://pydantic.dev/logfire)