2026-03-02 16:06:38 +00:00
2026-03-02 15:55:03 +00:00
2026-03-02 16:06:38 +00:00
2026-03-02 14:32:10 +00:00
2026-03-02 14:32:10 +00:00
2026-03-02 13:58:25 +00:00
2026-03-02 15:55:03 +00:00
2026-03-02 16:06:38 +00:00
2026-03-02 16:06:38 +00:00
2026-03-02 15:36:56 +00:00

latticelm

Overview

A lightweight LLM proxy gateway written in Go that provides a unified API interface for multiple LLM providers. Similar to LiteLLM, but built natively in Go using each provider's official SDK.

Purpose

Simplify LLM integration by exposing a single, consistent API that routes requests to different providers:

  • OpenAI (GPT models)
  • Azure OpenAI (Azure-deployed models)
  • Anthropic (Claude)
  • Google Generative AI (Gemini)

Instead of managing multiple SDK integrations in your application, call one endpoint and let the gateway handle provider-specific implementations.

Architecture

Client Request
    ↓
latticelm (unified API)
    ↓
├─→ OpenAI SDK
├─→ Azure OpenAI (OpenAI SDK + Azure auth)
├─→ Anthropic SDK
└─→ Google Gen AI SDK

Key Features

  • Single API interface for multiple LLM providers
  • Native Go SDKs for optimal performance and type safety
  • Provider abstraction - switch providers without changing client code
  • Lightweight - minimal overhead, fast routing
  • Easy configuration - manage API keys and provider settings centrally

Use Cases

  • Applications that need multi-provider LLM support
  • Cost optimization (route to cheapest provider for specific tasks)
  • Failover and redundancy (fallback to alternative providers)
  • A/B testing across different models
  • Centralized LLM access for microservices

🎉 Status: WORKING!

All four providers integrated with official Go SDKs:

  • OpenAI → github.com/openai/openai-go/v3
  • Azure OpenAI → github.com/openai/openai-go/v3 (with Azure auth)
  • Anthropic → github.com/anthropics/anthropic-sdk-go
  • Google → google.golang.org/genai

Compiles successfully (36MB binary) Provider auto-selection (gpt→Azure/OpenAI, claude→Anthropic, gemini→Google) Configuration system (YAML with env var support) Streaming support (Server-Sent Events for all providers) OAuth2/OIDC authentication (Google, Auth0, any OIDC provider) Terminal chat client (Python with Rich UI, PEP 723) Conversation tracking (previous_response_id for efficient context)

Quick Start

# 1. Set API keys
export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"
export GOOGLE_API_KEY="your-key"

# 2. Build
cd latticelm
go build -o gateway ./cmd/gateway

# 3. Run
./gateway

# 4. Test (non-streaming)
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "input": [
      {
        "role": "user",
        "content": [{"type": "input_text", "text": "Hello!"}]
      }
    ]
  }'

# 5. Test streaming
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "stream": true,
    "input": [
      {
        "role": "user",
        "content": [{"type": "input_text", "text": "Write a haiku about Go"}]
      }
    ]
  }'

API Standard

This gateway implements the Open Responses specification — an open-source, multi-provider API standard for LLM interfaces based on OpenAI's Responses API.

Why Open Responses:

  • Multi-provider by default - one schema that maps cleanly across providers
  • Agentic workflow support - consistent streaming events, tool invocation patterns, and "items" as atomic units
  • Extensible - stable core with room for provider-specific features

By following the Open Responses spec, this gateway ensures:

  • Interoperability across different LLM providers
  • Standard request/response formats (messages, tool calls, streaming)
  • Compatibility with existing Open Responses tooling and ecosystem

For full specification details, see: https://www.openresponses.org

Tech Stack

  • Language: Go
  • API Specification: Open Responses
  • SDKs:
    • google.golang.org/genai (Google Generative AI)
    • Anthropic Go SDK
    • OpenAI Go SDK
  • Transport: RESTful HTTP (potentially gRPC in the future)

Status

🚧 In Development - Project specification and initial setup phase.

Getting Started

  1. Copy the example config and fill in provider API keys:

    cp config.example.yaml config.yaml
    

    You can also override API keys via environment variables (GOOGLE_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY).

  2. Run the gateway using the default configuration path:

    go run ./cmd/gateway --config config.yaml
    

    The server listens on the address configured under server.address (defaults to :8080).

  3. Call the Open Responses endpoint:

    curl -X POST http://localhost:8080/v1/responses \
      -H 'Content-Type: application/json' \
      -d '{
            "model": "gpt-4o-mini",
            "input": [
              {"role": "user", "content": [{"type": "input_text", "text": "Hello!"}]}
            ]
          }'
    

    Include "provider": "anthropic" (or google, openai) to pin a provider; otherwise the gateway infers it from the model name.

Project Structure

  • cmd/gateway: Entry point that loads configuration, wires providers, and starts the HTTP server.
  • internal/config: YAML configuration loader with environment overrides for API keys.
  • internal/api: Open Responses request/response types and validation helpers.
  • internal/server: HTTP handlers that expose /v1/responses.
  • internal/providers: Provider abstractions plus provider-specific scaffolding in google, anthropic, and openai subpackages.

Chat Client

Interactive terminal chat interface with beautiful Rich UI:

# Basic usage
uv run chat.py

# With authentication
uv run chat.py --token "$(gcloud auth print-identity-token)"

# Switch models on the fly
You> /model claude
You> /models  # List all available models

The chat client automatically uses previous_response_id to reduce token usage by only sending new messages instead of the full conversation history.

See CHAT_CLIENT.md for full documentation.

Conversation Management

The gateway implements conversation tracking using previous_response_id from the Open Responses spec:

  • 📉 Reduced token usage - Only send new messages
  • Smaller requests - Less bandwidth
  • 🧠 Server-side context - Gateway maintains history
  • Auto-expire - Conversations expire after 1 hour

See CONVERSATIONS.md for details.

Azure OpenAI

The gateway supports Azure OpenAI with the same interface as standard OpenAI:

providers:
  azureopenai:
    type: "azureopenai"
    api_key: "${AZURE_OPENAI_API_KEY}"
    endpoint: "https://your-resource.openai.azure.com"

models:
  - name: "gpt-4o"
    provider: "azureopenai"
    provider_model_id: "my-gpt4o-deployment"  # optional: defaults to name
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"

./gateway

The provider_model_id field lets you map a friendly model name to the actual provider identifier (e.g., an Azure deployment name). If omitted, the model name is used directly. See AZURE_OPENAI.md for complete setup guide.

Authentication

The gateway supports OAuth2/OIDC authentication. See AUTH.md for setup instructions.

Quick example with Google OAuth:

auth:
  enabled: true
  issuer: "https://accounts.google.com"
  audience: "YOUR-CLIENT-ID.apps.googleusercontent.com"
# Get token
TOKEN=$(gcloud auth print-identity-token)

# Make authenticated request
curl -X POST http://localhost:8080/v1/responses \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-2.0-flash-exp", ...}'

Next Steps

  • Implement streaming responses
  • Add OAuth2/OIDC authentication
  • Implement conversation tracking with previous_response_id
  • Add structured logging, tracing, and request-level metrics
  • Support tool/function calling
  • Persistent conversation storage (Redis/database)
  • Expand configuration to support routing policies (cost, latency, failover)
Description
No description provided
Readme 573 KiB
Languages
Go 83.6%
TypeScript 6.1%
Vue 4.2%
Python 4%
Makefile 1%
Other 1%