# latticelm ## Overview A lightweight LLM proxy gateway written in Go that provides a unified API interface for multiple LLM providers. Similar to LiteLLM, but built natively in Go using each provider's official SDK. ## Purpose Simplify LLM integration by exposing a single, consistent API that routes requests to different providers: - **OpenAI** (GPT models) - **Azure OpenAI** (Azure-deployed models) - **Anthropic** (Claude) - **Google Generative AI** (Gemini) Instead of managing multiple SDK integrations in your application, call one endpoint and let the gateway handle provider-specific implementations. ## Architecture ``` Client Request ↓ latticelm (unified API) ↓ ├─→ OpenAI SDK ├─→ Azure OpenAI (OpenAI SDK + Azure auth) ├─→ Anthropic SDK └─→ Google Gen AI SDK ``` ## Key Features - **Single API interface** for multiple LLM providers - **Native Go SDKs** for optimal performance and type safety - **Provider abstraction** - switch providers without changing client code - **Lightweight** - minimal overhead, fast routing - **Easy configuration** - manage API keys and provider settings centrally ## Use Cases - Applications that need multi-provider LLM support - Cost optimization (route to cheapest provider for specific tasks) - Failover and redundancy (fallback to alternative providers) - A/B testing across different models - Centralized LLM access for microservices ## 🎉 Status: **WORKING!** ✅ **All four providers integrated with official Go SDKs:** - OpenAI → `github.com/openai/openai-go/v3` - Azure OpenAI → `github.com/openai/openai-go/v3` (with Azure auth) - Anthropic → `github.com/anthropics/anthropic-sdk-go` - Google → `google.golang.org/genai` ✅ **Compiles successfully** (36MB binary) ✅ **Provider auto-selection** (gpt→Azure/OpenAI, claude→Anthropic, gemini→Google) ✅ **Configuration system** (YAML with env var support) ✅ **Streaming support** (Server-Sent Events for all providers) ✅ **OAuth2/OIDC authentication** (Google, Auth0, any OIDC provider) ✅ **Terminal chat client** (Python with Rich UI, PEP 723) ✅ **Conversation tracking** (previous_response_id for efficient context) ## Quick Start ```bash # 1. Set API keys export OPENAI_API_KEY="your-key" export ANTHROPIC_API_KEY="your-key" export GOOGLE_API_KEY="your-key" # 2. Build cd latticelm go build -o gateway ./cmd/gateway # 3. Run ./gateway # 4. Test (non-streaming) curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o-mini", "input": [ { "role": "user", "content": [{"type": "input_text", "text": "Hello!"}] } ] }' # 5. Test streaming curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -N \ -d '{ "model": "claude-3-5-sonnet-20241022", "stream": true, "input": [ { "role": "user", "content": [{"type": "input_text", "text": "Write a haiku about Go"}] } ] }' ``` ## API Standard This gateway implements the **[Open Responses](https://www.openresponses.org)** specification — an open-source, multi-provider API standard for LLM interfaces based on OpenAI's Responses API. **Why Open Responses:** - **Multi-provider by default** - one schema that maps cleanly across providers - **Agentic workflow support** - consistent streaming events, tool invocation patterns, and "items" as atomic units - **Extensible** - stable core with room for provider-specific features By following the Open Responses spec, this gateway ensures: - Interoperability across different LLM providers - Standard request/response formats (messages, tool calls, streaming) - Compatibility with existing Open Responses tooling and ecosystem For full specification details, see: **https://www.openresponses.org** ## Tech Stack - **Language:** Go - **API Specification:** [Open Responses](https://www.openresponses.org) - **SDKs:** - `google.golang.org/genai` (Google Generative AI) - Anthropic Go SDK - OpenAI Go SDK - **Transport:** RESTful HTTP (potentially gRPC in the future) ## Status 🚧 **In Development** - Project specification and initial setup phase. ## Getting Started 1. **Copy the example config** and fill in provider API keys: ```bash cp config.example.yaml config.yaml ``` You can also override API keys via environment variables (`GOOGLE_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`). 2. **Run the gateway** using the default configuration path: ```bash go run ./cmd/gateway --config config.yaml ``` The server listens on the address configured under `server.address` (defaults to `:8080`). 3. **Call the Open Responses endpoint**: ```bash curl -X POST http://localhost:8080/v1/responses \ -H 'Content-Type: application/json' \ -d '{ "model": "gpt-4o-mini", "input": [ {"role": "user", "content": [{"type": "input_text", "text": "Hello!"}]} ] }' ``` Include `"provider": "anthropic"` (or `google`, `openai`) to pin a provider; otherwise the gateway infers it from the model name. ## Project Structure - `cmd/gateway`: Entry point that loads configuration, wires providers, and starts the HTTP server. - `internal/config`: YAML configuration loader with environment overrides for API keys. - `internal/api`: Open Responses request/response types and validation helpers. - `internal/server`: HTTP handlers that expose `/v1/responses`. - `internal/providers`: Provider abstractions plus provider-specific scaffolding in `google`, `anthropic`, and `openai` subpackages. ## Chat Client Interactive terminal chat interface with beautiful Rich UI: ```bash # Basic usage uv run chat.py # With authentication uv run chat.py --token "$(gcloud auth print-identity-token)" # Switch models on the fly You> /model claude You> /models # List all available models ``` The chat client automatically uses `previous_response_id` to reduce token usage by only sending new messages instead of the full conversation history. See **[CHAT_CLIENT.md](./CHAT_CLIENT.md)** for full documentation. ## Conversation Management The gateway implements conversation tracking using `previous_response_id` from the Open Responses spec: - 📉 **Reduced token usage** - Only send new messages - ⚡ **Smaller requests** - Less bandwidth - 🧠 **Server-side context** - Gateway maintains history - ⏰ **Auto-expire** - Conversations expire after 1 hour See **[CONVERSATIONS.md](./CONVERSATIONS.md)** for details. ## Azure OpenAI The gateway supports Azure OpenAI with the same interface as standard OpenAI: ```yaml providers: azureopenai: type: "azureopenai" api_key: "${AZURE_OPENAI_API_KEY}" endpoint: "https://your-resource.openai.azure.com" models: - name: "gpt-4o" provider: "azureopenai" provider_model_id: "my-gpt4o-deployment" # optional: defaults to name ``` ```bash export AZURE_OPENAI_API_KEY="..." export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com" ./gateway ``` The `provider_model_id` field lets you map a friendly model name to the actual provider identifier (e.g., an Azure deployment name). If omitted, the model `name` is used directly. See **[AZURE_OPENAI.md](./AZURE_OPENAI.md)** for complete setup guide. ## Authentication The gateway supports OAuth2/OIDC authentication. See **[AUTH.md](./AUTH.md)** for setup instructions. **Quick example with Google OAuth:** ```yaml auth: enabled: true issuer: "https://accounts.google.com" audience: "YOUR-CLIENT-ID.apps.googleusercontent.com" ``` ```bash # Get token TOKEN=$(gcloud auth print-identity-token) # Make authenticated request curl -X POST http://localhost:8080/v1/responses \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"model": "gemini-2.0-flash-exp", ...}' ``` ## Next Steps - ✅ ~~Implement streaming responses~~ - ✅ ~~Add OAuth2/OIDC authentication~~ - ✅ ~~Implement conversation tracking with previous_response_id~~ - ⬜ Add structured logging, tracing, and request-level metrics - ⬜ Support tool/function calling - ⬜ Persistent conversation storage (Redis/database) - ⬜ Expand configuration to support routing policies (cost, latency, failover)