267 lines
8.1 KiB
Markdown
267 lines
8.1 KiB
Markdown
# latticelm
|
|
|
|
## Overview
|
|
|
|
A lightweight LLM proxy gateway written in Go that provides a unified API interface for multiple LLM providers. Similar to LiteLLM, but built natively in Go using each provider's official SDK.
|
|
|
|
## Purpose
|
|
|
|
Simplify LLM integration by exposing a single, consistent API that routes requests to different providers:
|
|
- **OpenAI** (GPT models)
|
|
- **Azure OpenAI** (Azure-deployed models)
|
|
- **Anthropic** (Claude)
|
|
- **Google Generative AI** (Gemini)
|
|
|
|
Instead of managing multiple SDK integrations in your application, call one endpoint and let the gateway handle provider-specific implementations.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Client Request
|
|
↓
|
|
latticelm (unified API)
|
|
↓
|
|
├─→ OpenAI SDK
|
|
├─→ Azure OpenAI (OpenAI SDK + Azure auth)
|
|
├─→ Anthropic SDK
|
|
└─→ Google Gen AI SDK
|
|
```
|
|
|
|
## Key Features
|
|
|
|
- **Single API interface** for multiple LLM providers
|
|
- **Native Go SDKs** for optimal performance and type safety
|
|
- **Provider abstraction** - switch providers without changing client code
|
|
- **Lightweight** - minimal overhead, fast routing
|
|
- **Easy configuration** - manage API keys and provider settings centrally
|
|
|
|
## Use Cases
|
|
|
|
- Applications that need multi-provider LLM support
|
|
- Cost optimization (route to cheapest provider for specific tasks)
|
|
- Failover and redundancy (fallback to alternative providers)
|
|
- A/B testing across different models
|
|
- Centralized LLM access for microservices
|
|
|
|
## 🎉 Status: **WORKING!**
|
|
|
|
✅ **All four providers integrated with official Go SDKs:**
|
|
- OpenAI → `github.com/openai/openai-go/v3`
|
|
- Azure OpenAI → `github.com/openai/openai-go/v3` (with Azure auth)
|
|
- Anthropic → `github.com/anthropics/anthropic-sdk-go`
|
|
- Google → `google.golang.org/genai`
|
|
|
|
✅ **Compiles successfully** (36MB binary)
|
|
✅ **Provider auto-selection** (gpt→Azure/OpenAI, claude→Anthropic, gemini→Google)
|
|
✅ **Configuration system** (YAML with env var support)
|
|
✅ **Streaming support** (Server-Sent Events for all providers)
|
|
✅ **OAuth2/OIDC authentication** (Google, Auth0, any OIDC provider)
|
|
✅ **Terminal chat client** (Python with Rich UI, PEP 723)
|
|
✅ **Conversation tracking** (previous_response_id for efficient context)
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# 1. Set API keys
|
|
export OPENAI_API_KEY="your-key"
|
|
export ANTHROPIC_API_KEY="your-key"
|
|
export GOOGLE_API_KEY="your-key"
|
|
|
|
# 2. Build
|
|
cd latticelm
|
|
go build -o gateway ./cmd/gateway
|
|
|
|
# 3. Run
|
|
./gateway
|
|
|
|
# 4. Test (non-streaming)
|
|
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-4o-mini",
|
|
"input": [
|
|
{
|
|
"role": "user",
|
|
"content": [{"type": "input_text", "text": "Hello!"}]
|
|
}
|
|
]
|
|
}'
|
|
|
|
# 5. Test streaming
|
|
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-N \
|
|
-d '{
|
|
"model": "claude-3-5-sonnet-20241022",
|
|
"stream": true,
|
|
"input": [
|
|
{
|
|
"role": "user",
|
|
"content": [{"type": "input_text", "text": "Write a haiku about Go"}]
|
|
}
|
|
]
|
|
}'
|
|
```
|
|
|
|
## API Standard
|
|
|
|
This gateway implements the **[Open Responses](https://www.openresponses.org)** specification — an open-source, multi-provider API standard for LLM interfaces based on OpenAI's Responses API.
|
|
|
|
**Why Open Responses:**
|
|
- **Multi-provider by default** - one schema that maps cleanly across providers
|
|
- **Agentic workflow support** - consistent streaming events, tool invocation patterns, and "items" as atomic units
|
|
- **Extensible** - stable core with room for provider-specific features
|
|
|
|
By following the Open Responses spec, this gateway ensures:
|
|
- Interoperability across different LLM providers
|
|
- Standard request/response formats (messages, tool calls, streaming)
|
|
- Compatibility with existing Open Responses tooling and ecosystem
|
|
|
|
For full specification details, see: **https://www.openresponses.org**
|
|
|
|
## Tech Stack
|
|
|
|
- **Language:** Go
|
|
- **API Specification:** [Open Responses](https://www.openresponses.org)
|
|
- **SDKs:**
|
|
- `google.golang.org/genai` (Google Generative AI)
|
|
- Anthropic Go SDK
|
|
- OpenAI Go SDK
|
|
- **Transport:** RESTful HTTP (potentially gRPC in the future)
|
|
|
|
## Status
|
|
|
|
🚧 **In Development** - Project specification and initial setup phase.
|
|
|
|
## Getting Started
|
|
|
|
1. **Copy the example config** and fill in provider API keys:
|
|
|
|
```bash
|
|
cp config.example.yaml config.yaml
|
|
```
|
|
|
|
You can also override API keys via environment variables (`GOOGLE_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`).
|
|
|
|
2. **Run the gateway** using the default configuration path:
|
|
|
|
```bash
|
|
go run ./cmd/gateway --config config.yaml
|
|
```
|
|
|
|
The server listens on the address configured under `server.address` (defaults to `:8080`).
|
|
|
|
3. **Call the Open Responses endpoint**:
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/responses \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{
|
|
"model": "gpt-4o-mini",
|
|
"input": [
|
|
{"role": "user", "content": [{"type": "input_text", "text": "Hello!"}]}
|
|
]
|
|
}'
|
|
```
|
|
|
|
Include `"provider": "anthropic"` (or `google`, `openai`) to pin a provider; otherwise the gateway infers it from the model name.
|
|
|
|
## Project Structure
|
|
|
|
- `cmd/gateway`: Entry point that loads configuration, wires providers, and starts the HTTP server.
|
|
- `internal/config`: YAML configuration loader with environment overrides for API keys.
|
|
- `internal/api`: Open Responses request/response types and validation helpers.
|
|
- `internal/server`: HTTP handlers that expose `/v1/responses`.
|
|
- `internal/providers`: Provider abstractions plus provider-specific scaffolding in `google`, `anthropic`, and `openai` subpackages.
|
|
|
|
## Chat Client
|
|
|
|
Interactive terminal chat interface with beautiful Rich UI:
|
|
|
|
```bash
|
|
# Basic usage
|
|
uv run chat.py
|
|
|
|
# With authentication
|
|
uv run chat.py --token "$(gcloud auth print-identity-token)"
|
|
|
|
# Switch models on the fly
|
|
You> /model claude
|
|
You> /models # List all available models
|
|
```
|
|
|
|
The chat client automatically uses `previous_response_id` to reduce token usage by only sending new messages instead of the full conversation history.
|
|
|
|
See **[CHAT_CLIENT.md](./CHAT_CLIENT.md)** for full documentation.
|
|
|
|
## Conversation Management
|
|
|
|
The gateway implements conversation tracking using `previous_response_id` from the Open Responses spec:
|
|
|
|
- 📉 **Reduced token usage** - Only send new messages
|
|
- ⚡ **Smaller requests** - Less bandwidth
|
|
- 🧠 **Server-side context** - Gateway maintains history
|
|
- ⏰ **Auto-expire** - Conversations expire after 1 hour
|
|
|
|
See **[CONVERSATIONS.md](./CONVERSATIONS.md)** for details.
|
|
|
|
## Azure OpenAI
|
|
|
|
The gateway supports Azure OpenAI with the same interface as standard OpenAI:
|
|
|
|
```yaml
|
|
providers:
|
|
azureopenai:
|
|
type: "azureopenai"
|
|
api_key: "${AZURE_OPENAI_API_KEY}"
|
|
endpoint: "https://your-resource.openai.azure.com"
|
|
|
|
models:
|
|
- name: "gpt-4o"
|
|
provider: "azureopenai"
|
|
provider_model_id: "my-gpt4o-deployment" # optional: defaults to name
|
|
```
|
|
|
|
```bash
|
|
export AZURE_OPENAI_API_KEY="..."
|
|
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
|
|
|
|
./gateway
|
|
```
|
|
|
|
The `provider_model_id` field lets you map a friendly model name to the actual provider identifier (e.g., an Azure deployment name). If omitted, the model `name` is used directly. See **[AZURE_OPENAI.md](./AZURE_OPENAI.md)** for complete setup guide.
|
|
|
|
## Authentication
|
|
|
|
The gateway supports OAuth2/OIDC authentication. See **[AUTH.md](./AUTH.md)** for setup instructions.
|
|
|
|
**Quick example with Google OAuth:**
|
|
|
|
```yaml
|
|
auth:
|
|
enabled: true
|
|
issuer: "https://accounts.google.com"
|
|
audience: "YOUR-CLIENT-ID.apps.googleusercontent.com"
|
|
```
|
|
|
|
```bash
|
|
# Get token
|
|
TOKEN=$(gcloud auth print-identity-token)
|
|
|
|
# Make authenticated request
|
|
curl -X POST http://localhost:8080/v1/responses \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "gemini-2.0-flash-exp", ...}'
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
- ✅ ~~Implement streaming responses~~
|
|
- ✅ ~~Add OAuth2/OIDC authentication~~
|
|
- ✅ ~~Implement conversation tracking with previous_response_id~~
|
|
- ⬜ Add structured logging, tracing, and request-level metrics
|
|
- ⬜ Support tool/function calling
|
|
- ⬜ Persistent conversation storage (Redis/database)
|
|
- ⬜ Expand configuration to support routing policies (cost, latency, failover)
|