Merge pull request 'Add Chat client to UI' (#5) from push-rtlulrsvzsvl into main
Some checks failed
CI / Test (push) Failing after 1m32s
CI / Lint (push) Failing after 13s
CI / Build (push) Has been skipped
CI / Security Scan (push) Failing after 4m44s
CI / Build and Push Docker Image (push) Has been skipped

Reviewed-on: #5
This commit was merged in pull request #5.
This commit is contained in:
2026-03-07 03:30:02 +00:00
24 changed files with 3062 additions and 428 deletions

View File

@@ -1,9 +1,23 @@
# Multi-stage build for Go LLM Gateway
# Stage 1: Build the Go binary
# Stage 1: Build the frontend
FROM node:18-alpine AS frontend-builder
WORKDIR /frontend
# Copy package files for better caching
COPY frontend/admin/package*.json ./
RUN npm ci --only=production
# Copy frontend source and build
COPY frontend/admin/ ./
RUN npm run build
# Stage 2: Build the Go binary
FROM golang:alpine AS builder
# Install build dependencies
RUN apk add --no-cache git ca-certificates tzdata
RUN apk add --no-cache git ca-certificates tzdata gcc musl-dev
WORKDIR /build
@@ -14,10 +28,12 @@ RUN go mod download
# Copy source code
COPY . .
# Copy pre-built frontend assets from stage 1
COPY --from=frontend-builder /frontend/dist ./internal/admin/dist
# Build the binary with optimizations
# CGO is required for SQLite support
RUN apk add --no-cache gcc musl-dev && \
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build \
RUN CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build \
-ldflags='-w -s -extldflags "-static"' \
-a -installsuffix cgo \
-o gateway \

764
README.md
View File

@@ -1,16 +1,47 @@
# latticelm
> A production-ready LLM proxy gateway written in Go with enterprise features
## Table of Contents
- [Overview](#overview)
- [Supported Providers](#supported-providers)
- [Key Features](#key-features)
- [Status](#status)
- [Use Cases](#use-cases)
- [Architecture](#architecture)
- [Quick Start](#quick-start)
- [API Standard](#api-standard)
- [API Reference](#api-reference)
- [Tech Stack](#tech-stack)
- [Project Structure](#project-structure)
- [Configuration](#configuration)
- [Chat Client](#chat-client)
- [Conversation Management](#conversation-management)
- [Observability](#observability)
- [Circuit Breakers](#circuit-breakers)
- [Azure OpenAI](#azure-openai)
- [Azure Anthropic](#azure-anthropic-microsoft-foundry)
- [Admin Web UI](#admin-web-ui)
- [Deployment](#deployment)
- [Authentication](#authentication)
- [Production Features](#production-features)
- [Roadmap](#roadmap)
- [Documentation](#documentation)
- [Contributing](#contributing)
- [License](#license)
## Overview
A lightweight LLM proxy gateway written in Go that provides a unified API interface for multiple LLM providers. Similar to LiteLLM, but built natively in Go using each provider's official SDK.
A production-ready LLM proxy gateway written in Go that provides a unified API interface for multiple LLM providers. Similar to LiteLLM, but built natively in Go using each provider's official SDK with enterprise features including rate limiting, circuit breakers, observability, and authentication.
## Purpose
## Supported Providers
Simplify LLM integration by exposing a single, consistent API that routes requests to different providers:
- **OpenAI** (GPT models)
- **Azure OpenAI** (Azure-deployed models)
- **Anthropic** (Claude)
- **Google Generative AI** (Gemini)
- **Azure OpenAI** (Azure-deployed OpenAI models)
- **Anthropic** (Claude models)
- **Azure Anthropic** (Microsoft Foundry-hosted Claude models)
- **Google Generative AI** (Gemini models)
- **Vertex AI** (Google Cloud-hosted Gemini models)
Instead of managing multiple SDK integrations in your application, call one endpoint and let the gateway handle provider-specific implementations.
@@ -31,11 +62,24 @@ latticelm (unified API)
## Key Features
### Core Functionality
- **Single API interface** for multiple LLM providers
- **Native Go SDKs** for optimal performance and type safety
- **Provider abstraction** - switch providers without changing client code
- **Lightweight** - minimal overhead, fast routing
- **Easy configuration** - manage API keys and provider settings centrally
- **Streaming support** - Server-Sent Events for all providers
- **Conversation tracking** - Efficient context management with `previous_response_id`
### Production Features
- **Circuit breakers** - Automatic failure detection and recovery per provider
- **Rate limiting** - Per-IP token bucket algorithm with configurable limits
- **OAuth2/OIDC authentication** - Support for Google, Auth0, and any OIDC provider
- **Observability** - Prometheus metrics and OpenTelemetry tracing
- **Health checks** - Kubernetes-compatible liveness and readiness endpoints
- **Admin Web UI** - Built-in dashboard for monitoring and configuration
### Configuration
- **Easy setup** - YAML configuration with environment variable overrides
- **Flexible storage** - In-memory, SQLite, MySQL, PostgreSQL, or Redis for conversations
## Use Cases
@@ -45,43 +89,70 @@ latticelm (unified API)
- A/B testing across different models
- Centralized LLM access for microservices
## 🎉 Status: **WORKING!**
## Status
**All providers integrated with official Go SDKs:**
**Production Ready** - All core features implemented and tested.
### Provider Integration
✅ All providers use official Go SDKs:
- OpenAI → `github.com/openai/openai-go/v3`
- Azure OpenAI → `github.com/openai/openai-go/v3` (with Azure auth)
- Anthropic → `github.com/anthropics/anthropic-sdk-go`
- Google → `google.golang.org/genai`
- Azure Anthropic → `github.com/anthropics/anthropic-sdk-go` (with Azure auth)
- Google Gen AI → `google.golang.org/genai`
- Vertex AI → `google.golang.org/genai` (with GCP auth)
**Compiles successfully** (36MB binary)
**Provider auto-selection** (gpt→Azure/OpenAI, claude→Anthropic, gemini→Google)
**Configuration system** (YAML with env var support)
**Streaming support** (Server-Sent Events for all providers)
**OAuth2/OIDC authentication** (Google, Auth0, any OIDC provider)
**Terminal chat client** (Python with Rich UI, PEP 723)
**Conversation tracking** (previous_response_id for efficient context)
**Rate limiting** (Per-IP token bucket with configurable limits)
**Health & readiness endpoints** (Kubernetes-compatible health checks)
**Admin Web UI** (Dashboard with system info, health checks, provider status)
### Features
✅ Provider auto-selection (gpt→OpenAI, claude→Anthropic, gemini→Google)
Streaming responses (Server-Sent Events)
Conversation tracking with `previous_response_id`
✅ OAuth2/OIDC authentication
Rate limiting with token bucket algorithm
Circuit breakers for fault tolerance
Observability (Prometheus metrics + OpenTelemetry tracing)
✅ Health & readiness endpoints
✅ Admin Web UI dashboard
✅ Terminal chat client (Python with Rich UI)
## Quick Start
### Prerequisites
- Go 1.21+ (for building from source)
- Docker (optional, for containerized deployment)
- Node.js 18+ (optional, for Admin UI development)
### Running Locally
```bash
# 1. Set API keys
# 1. Clone the repository
git clone https://github.com/yourusername/latticelm.git
cd latticelm
# 2. Set API keys
export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"
export GOOGLE_API_KEY="your-key"
# 2. Build (includes Admin UI)
cd latticelm
# 3. Copy and configure settings (optional)
cp config.example.yaml config.yaml
# Edit config.yaml to customize settings
# 4. Build (includes Admin UI)
make build-all
# 3. Run
# 5. Run
./bin/llm-gateway
# 4. Test (non-streaming)
curl -X POST http://localhost:8080/v1/chat/completions \
# Gateway starts on http://localhost:8080
# Admin UI available at http://localhost:8080/admin/
```
### Testing the API
**Non-streaming request:**
```bash
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
@@ -92,9 +163,11 @@ curl -X POST http://localhost:8080/v1/chat/completions \
}
]
}'
```
# 5. Test streaming
curl -X POST http://localhost:8080/v1/chat/completions \
**Streaming request:**
```bash
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-N \
-d '{
@@ -109,6 +182,20 @@ curl -X POST http://localhost:8080/v1/chat/completions \
}'
```
### Development Mode
Run backend and frontend separately for live reloading:
```bash
# Terminal 1: Backend with auto-reload
make dev-backend
# Terminal 2: Frontend dev server
make dev-frontend
```
Frontend runs on `http://localhost:5173` with hot module replacement.
## API Standard
This gateway implements the **[Open Responses](https://www.openresponses.org)** specification — an open-source, multi-provider API standard for LLM interfaces based on OpenAI's Responses API.
@@ -125,64 +212,245 @@ By following the Open Responses spec, this gateway ensures:
For full specification details, see: **https://www.openresponses.org**
## API Reference
### Core Endpoints
#### POST /v1/responses
Create a chat completion response (streaming or non-streaming).
**Request body:**
```json
{
"model": "gpt-4o-mini",
"stream": false,
"input": [
{
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
}
],
"previous_response_id": "optional-conversation-id",
"provider": "optional-explicit-provider"
}
```
**Response (non-streaming):**
```json
{
"id": "resp_abc123",
"object": "response",
"model": "gpt-4o-mini",
"provider": "openai",
"output": [
{
"role": "assistant",
"content": [{"type": "text", "text": "Hello! How can I help you?"}]
}
],
"usage": {
"input_tokens": 10,
"output_tokens": 8
}
}
```
**Response (streaming):**
Server-Sent Events with `data: {...}` lines containing deltas.
#### GET /v1/models
List available models.
**Response:**
```json
{
"object": "list",
"data": [
{"id": "gpt-4o-mini", "provider": "openai"},
{"id": "claude-3-5-sonnet", "provider": "anthropic"},
{"id": "gemini-1.5-flash", "provider": "google"}
]
}
```
### Health Endpoints
#### GET /health
Liveness probe (always returns 200 if server is running).
**Response:**
```json
{
"status": "healthy",
"timestamp": 1709438400
}
```
#### GET /ready
Readiness probe (checks conversation store and providers).
**Response:**
```json
{
"status": "ready",
"timestamp": 1709438400,
"checks": {
"conversation_store": "healthy",
"providers": "healthy"
}
}
```
Returns 503 if any check fails.
### Admin Endpoints
#### GET /admin/
Web dashboard (when admin UI is enabled).
#### GET /admin/api/info
System information.
#### GET /admin/api/health
Detailed health status.
#### GET /admin/api/config
Current configuration (secrets masked).
### Observability Endpoints
#### GET /metrics
Prometheus metrics (when observability is enabled).
## Tech Stack
- **Language:** Go
- **API Specification:** [Open Responses](https://www.openresponses.org)
- **SDKs:**
- `google.golang.org/genai` (Google Generative AI)
- Anthropic Go SDK
- OpenAI Go SDK
- **Transport:** RESTful HTTP (potentially gRPC in the future)
## Status
🚧 **In Development** - Project specification and initial setup phase.
## Getting Started
1. **Copy the example config** and fill in provider API keys:
```bash
cp config.example.yaml config.yaml
```
You can also override API keys via environment variables (`GOOGLE_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`).
2. **Run the gateway** using the default configuration path:
```bash
go run ./cmd/gateway --config config.yaml
```
The server listens on the address configured under `server.address` (defaults to `:8080`).
3. **Call the Open Responses endpoint**:
```bash
curl -X POST http://localhost:8080/v1/responses \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-4o-mini",
"input": [
{"role": "user", "content": [{"type": "input_text", "text": "Hello!"}]}
]
}'
```
Include `"provider": "anthropic"` (or `google`, `openai`) to pin a provider; otherwise the gateway infers it from the model name.
- **Official SDKs:**
- `google.golang.org/genai` (Google Generative AI & Vertex AI)
- `github.com/anthropics/anthropic-sdk-go` (Anthropic & Azure Anthropic)
- `github.com/openai/openai-go/v3` (OpenAI & Azure OpenAI)
- **Observability:**
- Prometheus for metrics
- OpenTelemetry for distributed tracing
- **Resilience:**
- Circuit breakers via `github.com/sony/gobreaker`
- Token bucket rate limiting
- **Transport:** RESTful HTTP with Server-Sent Events for streaming
## Project Structure
- `cmd/gateway`: Entry point that loads configuration, wires providers, and starts the HTTP server.
- `internal/config`: YAML configuration loader with environment overrides for API keys.
- `internal/api`: Open Responses request/response types and validation helpers.
- `internal/server`: HTTP handlers that expose `/v1/responses`.
- `internal/providers`: Provider abstractions plus provider-specific scaffolding in `google`, `anthropic`, and `openai` subpackages.
```
latticelm/
├── cmd/gateway/ # Main application entry point
├── internal/
│ ├── admin/ # Admin UI backend and embedded frontend
│ ├── api/ # Open Responses types and validation
│ ├── auth/ # OAuth2/OIDC authentication
│ ├── config/ # YAML configuration loader
│ ├── conversation/ # Conversation tracking and storage
│ ├── logger/ # Structured logging setup
│ ├── metrics/ # Prometheus metrics
│ ├── providers/ # Provider implementations
│ │ ├── anthropic/
│ │ ├── azureanthropic/
│ │ ├── azureopenai/
│ │ ├── google/
│ │ ├── openai/
│ │ └── vertexai/
│ ├── ratelimit/ # Rate limiting implementation
│ ├── server/ # HTTP server and handlers
│ └── tracing/ # OpenTelemetry tracing
├── frontend/admin/ # Vue.js Admin UI
├── k8s/ # Kubernetes manifests
├── tests/ # Integration tests
├── config.example.yaml # Example configuration
├── Makefile # Build and development tasks
└── README.md
```
## Configuration
The gateway uses a YAML configuration file with support for environment variable overrides.
### Basic Configuration
```yaml
server:
address: ":8080"
max_request_body_size: 10485760 # 10MB
logging:
format: "json" # or "text" for development
level: "info" # debug, info, warn, error
# Configure providers (API keys can use ${ENV_VAR} syntax)
providers:
openai:
type: "openai"
api_key: "${OPENAI_API_KEY}"
anthropic:
type: "anthropic"
api_key: "${ANTHROPIC_API_KEY}"
google:
type: "google"
api_key: "${GOOGLE_API_KEY}"
# Map model names to providers
models:
- name: "gpt-4o-mini"
provider: "openai"
- name: "claude-3-5-sonnet"
provider: "anthropic"
- name: "gemini-1.5-flash"
provider: "google"
```
### Advanced Configuration
```yaml
# Rate limiting
rate_limit:
enabled: true
requests_per_second: 10
burst: 20
# Authentication
auth:
enabled: true
issuer: "https://accounts.google.com"
audience: "your-client-id.apps.googleusercontent.com"
# Observability
observability:
enabled: true
metrics:
enabled: true
path: "/metrics"
tracing:
enabled: true
service_name: "llm-gateway"
exporter:
type: "otlp"
endpoint: "localhost:4317"
# Conversation storage
conversations:
store: "sql" # memory, sql, or redis
ttl: "1h"
driver: "sqlite3"
dsn: "conversations.db"
# Admin UI
admin:
enabled: true
```
See `config.example.yaml` for complete configuration options with detailed comments.
## Chat Client
Interactive terminal chat interface with beautiful Rich UI:
Interactive terminal chat interface with beautiful Rich UI powered by Python and the Rich library:
```bash
# Basic usage
@@ -196,20 +464,118 @@ You> /model claude
You> /models # List all available models
```
The chat client automatically uses `previous_response_id` to reduce token usage by only sending new messages instead of the full conversation history.
Features:
- **Syntax highlighting** for code blocks
- **Markdown rendering** for formatted responses
- **Model switching** on the fly with `/model` command
- **Conversation history** with automatic `previous_response_id` tracking
- **Streaming responses** with real-time display
See **[CHAT_CLIENT.md](./CHAT_CLIENT.md)** for full documentation.
The chat client uses [PEP 723](https://peps.python.org/pep-0723/) inline script metadata, so `uv run` automatically installs dependencies.
## Conversation Management
The gateway implements conversation tracking using `previous_response_id` from the Open Responses spec:
The gateway implements efficient conversation tracking using `previous_response_id` from the Open Responses spec:
- 📉 **Reduced token usage** - Only send new messages
- ⚡ **Smaller requests** - Less bandwidth
- 🧠 **Server-side context** - Gateway maintains history
- ⏰ **Auto-expire** - Conversations expire after 1 hour
- 📉 **Reduced token usage** - Only send new messages, not full history
-**Smaller requests** - Less bandwidth and faster responses
- 🧠 **Server-side context** - Gateway maintains conversation state
-**Auto-expire** - Conversations expire after configurable TTL (default: 1 hour)
See **[CONVERSATIONS.md](./CONVERSATIONS.md)** for details.
### Storage Options
Choose from multiple storage backends:
```yaml
conversations:
store: "memory" # "memory", "sql", or "redis"
ttl: "1h" # Conversation expiration
# SQLite (default for sql)
driver: "sqlite3"
dsn: "conversations.db"
# MySQL
# driver: "mysql"
# dsn: "user:password@tcp(localhost:3306)/dbname?parseTime=true"
# PostgreSQL
# driver: "pgx"
# dsn: "postgres://user:password@localhost:5432/dbname?sslmode=disable"
# Redis
# store: "redis"
# dsn: "redis://:password@localhost:6379/0"
```
## Observability
The gateway provides comprehensive observability through Prometheus metrics and OpenTelemetry tracing.
### Metrics
Enable Prometheus metrics to monitor gateway performance:
```yaml
observability:
enabled: true
metrics:
enabled: true
path: "/metrics" # Default endpoint
```
Available metrics include:
- Request counts and latencies per provider and model
- Error rates and types
- Circuit breaker state changes
- Rate limit hits
- Conversation store operations
Access metrics at `http://localhost:8080/metrics` (Prometheus scrape format).
### Tracing
Enable OpenTelemetry tracing for distributed request tracking:
```yaml
observability:
enabled: true
tracing:
enabled: true
service_name: "llm-gateway"
sampler:
type: "probability" # "always", "never", or "probability"
rate: 0.1 # Sample 10% of requests
exporter:
type: "otlp" # Send to OpenTelemetry Collector
endpoint: "localhost:4317" # gRPC endpoint
insecure: true # Use TLS in production
```
Traces include:
- End-to-end request flow
- Provider API calls
- Conversation store lookups
- Circuit breaker operations
- Authentication checks
Use with Jaeger, Zipkin, or any OpenTelemetry-compatible backend.
## Circuit Breakers
The gateway automatically wraps each provider with a circuit breaker for fault tolerance. When a provider experiences failures, the circuit breaker:
1. **Closed state** - Normal operation, requests pass through
2. **Open state** - Fast-fail after threshold reached, returns errors immediately
3. **Half-open state** - Allows test requests to check if provider recovered
Default configuration (per provider):
- **Max requests in half-open**: 3
- **Interval**: 60 seconds (resets failure count)
- **Timeout**: 30 seconds (open → half-open transition)
- **Failure ratio**: 0.5 (50% failures trips circuit)
Circuit breaker state changes are logged and exposed via metrics.
## Azure OpenAI
@@ -235,7 +601,33 @@ export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
./gateway
```
The `provider_model_id` field lets you map a friendly model name to the actual provider identifier (e.g., an Azure deployment name). If omitted, the model `name` is used directly. See **[AZURE_OPENAI.md](./AZURE_OPENAI.md)** for complete setup guide.
The `provider_model_id` field lets you map a friendly model name to the actual provider identifier (e.g., an Azure deployment name). If omitted, the model `name` is used directly.
## Azure Anthropic (Microsoft Foundry)
The gateway supports Azure-hosted Anthropic models through Microsoft's AI Foundry:
```yaml
providers:
azureanthropic:
type: "azureanthropic"
api_key: "${AZURE_ANTHROPIC_API_KEY}"
endpoint: "https://your-resource.services.ai.azure.com/anthropic"
models:
- name: "claude-sonnet-4-5"
provider: "azureanthropic"
provider_model_id: "claude-sonnet-4-5-20250514" # optional
```
```bash
export AZURE_ANTHROPIC_API_KEY="..."
export AZURE_ANTHROPIC_ENDPOINT="https://your-resource.services.ai.azure.com/anthropic"
./gateway
```
Azure Anthropic provides Claude models with Azure's compliance, security, and regional deployment options.
## Admin Web UI
@@ -277,11 +669,94 @@ make dev-frontend
Frontend dev server runs on `http://localhost:5173` and proxies API requests to backend.
## Deployment
### Docker
**See the [Docker Deployment Guide](./docs/DOCKER_DEPLOYMENT.md)** for complete instructions on using pre-built images.
Build and run with Docker:
```bash
# Build Docker image (includes Admin UI automatically)
docker build -t llm-gateway:latest .
# Run container
docker run -d \
--name llm-gateway \
-p 8080:8080 \
-e GOOGLE_API_KEY="your-key" \
-e ANTHROPIC_API_KEY="your-key" \
-e OPENAI_API_KEY="your-key" \
llm-gateway:latest
# Check status
docker logs llm-gateway
```
The Docker build uses a multi-stage process that automatically builds the frontend, so you don't need Node.js installed locally.
**Using Docker Compose:**
```yaml
version: '3.8'
services:
llm-gateway:
build: .
ports:
- "8080:8080"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- GOOGLE_API_KEY=${GOOGLE_API_KEY}
restart: unless-stopped
```
```bash
docker-compose up -d
```
The Docker image:
- Uses 3-stage build (frontend → backend → runtime) for minimal size (~50MB)
- Automatically builds and embeds the Admin UI
- Runs as non-root user (UID 1000) for security
- Includes health checks for orchestration
- No need for Node.js or Go installed locally
### Kubernetes
Production-ready Kubernetes manifests are available in the `k8s/` directory:
```bash
# Deploy to Kubernetes
kubectl apply -k k8s/
# Or deploy individual manifests
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
```
Features included:
- **High availability** - 3+ replicas with pod anti-affinity
- **Auto-scaling** - HorizontalPodAutoscaler (3-20 replicas)
- **Security** - Non-root, read-only filesystem, network policies
- **Monitoring** - ServiceMonitor and PrometheusRule for Prometheus Operator
- **Storage** - Redis StatefulSet for conversation persistence
- **Ingress** - TLS with cert-manager integration
See **[k8s/README.md](./k8s/README.md)** for complete deployment guide including:
- Cloud-specific configurations (AWS EKS, GCP GKE, Azure AKS)
- Secrets management (External Secrets Operator, Sealed Secrets)
- Monitoring and alerting setup
- Troubleshooting guide
## Authentication
The gateway supports OAuth2/OIDC authentication. See **[AUTH.md](./AUTH.md)** for setup instructions.
The gateway supports OAuth2/OIDC authentication for securing API access.
**Quick example with Google OAuth:**
### Configuration
```yaml
auth:
@@ -349,12 +824,109 @@ The readiness endpoint verifies:
- At least one provider is configured
- Returns 503 if any check fails
## Next Steps
## Roadmap
-~~Implement streaming responses~~
-~~Add OAuth2/OIDC authentication~~
-~~Implement conversation tracking with previous_response_id~~
- ⬜ Add structured logging, tracing, and request-level metrics
- ⬜ Support tool/function calling
- ⬜ Persistent conversation storage (Redis/database)
- ⬜ Expand configuration to support routing policies (cost, latency, failover)
### Completed ✅
-Streaming responses (Server-Sent Events)
-OAuth2/OIDC authentication
- ✅ Conversation tracking with `previous_response_id`
- ✅ Persistent conversation storage (SQL and Redis)
- ✅ Circuit breakers for fault tolerance
- ✅ Rate limiting
- ✅ Observability (Prometheus metrics and OpenTelemetry tracing)
- ✅ Admin Web UI
- ✅ Health and readiness endpoints
### In Progress 🚧
- ⬜ Tool/function calling support across providers
- ⬜ Request-level cost tracking and budgets
- ⬜ Advanced routing policies (cost optimization, latency-based, failover)
- ⬜ Multi-tenancy with per-tenant rate limits and quotas
- ⬜ Request caching for identical prompts
- ⬜ Webhook notifications for events (failures, circuit breaker changes)
## Documentation
Comprehensive guides and documentation are available in the `/docs` directory:
- **[Docker Deployment Guide](./docs/DOCKER_DEPLOYMENT.md)** - Deploy with pre-built images or build from source
- **[Kubernetes Deployment Guide](./k8s/README.md)** - Production deployment with Kubernetes
- **[Admin UI Documentation](./docs/ADMIN_UI.md)** - Using the web dashboard
- **[Configuration Reference](./config.example.yaml)** - All configuration options explained
See the **[docs directory README](./docs/README.md)** for a complete documentation index.
## Contributing
Contributions are welcome! Here's how you can help:
### Reporting Issues
- **Bug reports**: Include steps to reproduce, expected vs actual behavior, and environment details
- **Feature requests**: Describe the use case and why it would be valuable
- **Security issues**: Email security concerns privately (don't open public issues)
### Development Workflow
1. **Fork and clone** the repository
2. **Create a branch** for your feature: `git checkout -b feature/your-feature-name`
3. **Make your changes** with clear, atomic commits
4. **Add tests** for new functionality
5. **Run tests**: `make test`
6. **Run linter**: `make lint`
7. **Update documentation** if needed
8. **Submit a pull request** with a clear description
### Code Standards
- Follow Go best practices and idioms
- Write tests for new features and bug fixes
- Keep functions small and focused
- Use meaningful variable names
- Add comments for complex logic
- Run `go fmt` before committing
### Testing
```bash
# Run all tests
make test
# Run specific package tests
go test ./internal/providers/...
# Run with coverage
make test-coverage
# Run integration tests (requires API keys)
make test-integration
```
### Adding a New Provider
1. Create provider implementation in `internal/providers/yourprovider/`
2. Implement the `Provider` interface
3. Add provider registration in `internal/providers/providers.go`
4. Add configuration support in `internal/config/`
5. Add tests and update documentation
## License
MIT License - see the repository for details.
## Acknowledgments
- Built with official SDKs from OpenAI, Anthropic, and Google
- Inspired by [LiteLLM](https://github.com/BerriAI/litellm)
- Implements the [Open Responses](https://www.openresponses.org) specification
- Uses [gobreaker](https://github.com/sony/gobreaker) for circuit breaker functionality
## Support
- **Documentation**: Check this README and the files in `/docs`
- **Issues**: Open a GitHub issue for bugs or feature requests
- **Discussions**: Use GitHub Discussions for questions and community support
---
**Made with ❤️ in Go**

View File

@@ -155,6 +155,11 @@ func main() {
// Register admin endpoints if enabled
if cfg.Admin.Enabled {
// Check if frontend dist exists
if _, err := os.Stat("internal/admin/dist"); os.IsNotExist(err) {
log.Fatalf("admin UI enabled but frontend dist not found")
}
buildInfo := admin.BuildInfo{
Version: "dev",
BuildTime: time.Now().Format(time.RFC3339),
@@ -348,23 +353,39 @@ func initConversationStore(cfg config.ConversationConfig, logger *slog.Logger) (
return conversation.NewMemoryStore(ttl), "memory", nil
}
}
type responseWriter struct {
http.ResponseWriter
statusCode int
bytesWritten int
wroteHeader bool
}
func (rw *responseWriter) WriteHeader(code int) {
if rw.wroteHeader {
return
}
rw.wroteHeader = true
rw.statusCode = code
rw.ResponseWriter.WriteHeader(code)
}
func (rw *responseWriter) Write(b []byte) (int, error) {
if !rw.wroteHeader {
rw.wroteHeader = true
rw.statusCode = http.StatusOK
}
n, err := rw.ResponseWriter.Write(b)
rw.bytesWritten += n
return n, err
}
func (rw *responseWriter) Flush() {
if flusher, ok := rw.ResponseWriter.(http.Flusher); ok {
flusher.Flush()
}
}
func loggingMiddleware(next http.Handler, logger *slog.Logger) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()

57
cmd/gateway/main_test.go Normal file
View File

@@ -0,0 +1,57 @@
package main
import (
"net/http"
"net/http/httptest"
"testing"
"github.com/stretchr/testify/assert"
)
var _ http.Flusher = (*responseWriter)(nil)
type countingFlusherRecorder struct {
*httptest.ResponseRecorder
flushCount int
}
func newCountingFlusherRecorder() *countingFlusherRecorder {
return &countingFlusherRecorder{ResponseRecorder: httptest.NewRecorder()}
}
func (r *countingFlusherRecorder) Flush() {
r.flushCount++
}
func TestResponseWriterWriteHeaderOnlyOnce(t *testing.T) {
rec := httptest.NewRecorder()
rw := &responseWriter{ResponseWriter: rec, statusCode: http.StatusOK}
rw.WriteHeader(http.StatusCreated)
rw.WriteHeader(http.StatusInternalServerError)
assert.Equal(t, http.StatusCreated, rec.Code)
assert.Equal(t, http.StatusCreated, rw.statusCode)
}
func TestResponseWriterWriteSetsImplicitStatus(t *testing.T) {
rec := httptest.NewRecorder()
rw := &responseWriter{ResponseWriter: rec, statusCode: http.StatusOK}
n, err := rw.Write([]byte("ok"))
assert.NoError(t, err)
assert.Equal(t, 2, n)
assert.Equal(t, http.StatusOK, rec.Code)
assert.Equal(t, http.StatusOK, rw.statusCode)
assert.Equal(t, 2, rw.bytesWritten)
}
func TestResponseWriterFlushDelegates(t *testing.T) {
rec := newCountingFlusherRecorder()
rw := &responseWriter{ResponseWriter: rec, statusCode: http.StatusOK}
rw.Flush()
assert.Equal(t, 1, rec.flushCount)
}

471
docs/DOCKER_DEPLOYMENT.md Normal file
View File

@@ -0,0 +1,471 @@
# Docker Deployment Guide
> Deploy the LLM Gateway using pre-built Docker images or build your own.
## Table of Contents
- [Quick Start](#quick-start)
- [Using Pre-Built Images](#using-pre-built-images)
- [Configuration](#configuration)
- [Docker Compose](#docker-compose)
- [Building from Source](#building-from-source)
- [Production Considerations](#production-considerations)
- [Troubleshooting](#troubleshooting)
## Quick Start
Pull and run the latest image:
```bash
docker run -d \
--name llm-gateway \
-p 8080:8080 \
-e OPENAI_API_KEY="sk-your-key" \
-e ANTHROPIC_API_KEY="sk-ant-your-key" \
-e GOOGLE_API_KEY="your-key" \
ghcr.io/yourusername/llm-gateway:latest
# Verify it's running
curl http://localhost:8080/health
```
## Using Pre-Built Images
Images are automatically built and published via GitHub Actions on every release.
### Available Tags
- `latest` - Latest stable release
- `v1.2.3` - Specific version tags
- `main` - Latest commit on main branch (unstable)
- `sha-abc1234` - Specific commit SHA
### Pull from Registry
```bash
# Pull latest stable
docker pull ghcr.io/yourusername/llm-gateway:latest
# Pull specific version
docker pull ghcr.io/yourusername/llm-gateway:v1.2.3
# List local images
docker images | grep llm-gateway
```
### Basic Usage
```bash
docker run -d \
--name llm-gateway \
-p 8080:8080 \
--env-file .env \
ghcr.io/yourusername/llm-gateway:latest
```
## Configuration
### Environment Variables
Create a `.env` file with your API keys:
```bash
# Required: At least one provider
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
GOOGLE_API_KEY=your-google-key
# Optional: Server settings
SERVER_ADDRESS=:8080
LOGGING_LEVEL=info
LOGGING_FORMAT=json
# Optional: Features
ADMIN_ENABLED=true
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS_PER_SECOND=10
RATE_LIMIT_BURST=20
# Optional: Auth
AUTH_ENABLED=false
AUTH_ISSUER=https://accounts.google.com
AUTH_AUDIENCE=your-client-id.apps.googleusercontent.com
# Optional: Observability
OBSERVABILITY_ENABLED=false
OBSERVABILITY_METRICS_ENABLED=false
OBSERVABILITY_TRACING_ENABLED=false
```
Run with environment file:
```bash
docker run -d \
--name llm-gateway \
-p 8080:8080 \
--env-file .env \
ghcr.io/yourusername/llm-gateway:latest
```
### Using Config File
For more complex configurations, use a YAML config file:
```bash
# Create config from example
cp config.example.yaml config.yaml
# Edit config.yaml with your settings
# Mount config file into container
docker run -d \
--name llm-gateway \
-p 8080:8080 \
-v $(pwd)/config.yaml:/app/config.yaml:ro \
ghcr.io/yourusername/llm-gateway:latest \
--config /app/config.yaml
```
### Persistent Storage
For persistent conversation storage with SQLite:
```bash
docker run -d \
--name llm-gateway \
-p 8080:8080 \
-v llm-gateway-data:/app/data \
-e OPENAI_API_KEY="your-key" \
-e CONVERSATIONS_STORE=sql \
-e CONVERSATIONS_DRIVER=sqlite3 \
-e CONVERSATIONS_DSN=/app/data/conversations.db \
ghcr.io/yourusername/llm-gateway:latest
```
## Docker Compose
The project includes a production-ready `docker-compose.yaml` file.
### Basic Setup
```bash
# Create .env file with API keys
cat > .env <<EOF
GOOGLE_API_KEY=your-google-key
ANTHROPIC_API_KEY=sk-ant-your-key
OPENAI_API_KEY=sk-your-key
EOF
# Start gateway + Redis
docker-compose up -d
# Check status
docker-compose ps
# View logs
docker-compose logs -f gateway
```
### With Monitoring
Enable Prometheus and Grafana:
```bash
docker-compose --profile monitoring up -d
```
Access services:
- Gateway: http://localhost:8080
- Admin UI: http://localhost:8080/admin/
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
### Managing Services
```bash
# Stop all services
docker-compose down
# Stop and remove volumes (deletes data!)
docker-compose down -v
# Restart specific service
docker-compose restart gateway
# View logs
docker-compose logs -f gateway
# Update to latest image
docker-compose pull
docker-compose up -d
```
## Building from Source
If you need to build your own image:
```bash
# Clone repository
git clone https://github.com/yourusername/latticelm.git
cd latticelm
# Build image (includes frontend automatically)
docker build -t llm-gateway:local .
# Run your build
docker run -d \
--name llm-gateway \
-p 8080:8080 \
--env-file .env \
llm-gateway:local
```
### Multi-Platform Builds
Build for multiple architectures:
```bash
# Setup buildx
docker buildx create --use
# Build and push multi-platform
docker buildx build \
--platform linux/amd64,linux/arm64 \
-t ghcr.io/yourusername/llm-gateway:latest \
--push .
```
## Production Considerations
### Security
**Use secrets management:**
```bash
# Docker secrets (Swarm)
echo "sk-your-key" | docker secret create openai_key -
docker service create \
--name llm-gateway \
--secret openai_key \
-e OPENAI_API_KEY_FILE=/run/secrets/openai_key \
ghcr.io/yourusername/llm-gateway:latest
```
**Run as non-root:**
The image already runs as UID 1000 (non-root) by default.
**Read-only filesystem:**
```bash
docker run -d \
--name llm-gateway \
--read-only \
--tmpfs /tmp \
-v llm-gateway-data:/app/data \
-p 8080:8080 \
--env-file .env \
ghcr.io/yourusername/llm-gateway:latest
```
### Resource Limits
Set memory and CPU limits:
```bash
docker run -d \
--name llm-gateway \
-p 8080:8080 \
--memory="512m" \
--cpus="1.0" \
--env-file .env \
ghcr.io/yourusername/llm-gateway:latest
```
### Health Checks
The image includes built-in health checks:
```bash
# Check health status
docker inspect --format='{{.State.Health.Status}}' llm-gateway
# Manual health check
curl http://localhost:8080/health
curl http://localhost:8080/ready
```
### Logging
Configure structured JSON logging:
```bash
docker run -d \
--name llm-gateway \
-p 8080:8080 \
-e LOGGING_FORMAT=json \
-e LOGGING_LEVEL=info \
--log-driver=json-file \
--log-opt max-size=10m \
--log-opt max-file=3 \
ghcr.io/yourusername/llm-gateway:latest
```
### Networking
**Custom network:**
```bash
# Create network
docker network create llm-network
# Run gateway on network
docker run -d \
--name llm-gateway \
--network llm-network \
-p 8080:8080 \
--env-file .env \
ghcr.io/yourusername/llm-gateway:latest
# Run Redis on same network
docker run -d \
--name redis \
--network llm-network \
redis:7-alpine
```
## Troubleshooting
### Container Won't Start
Check logs:
```bash
docker logs llm-gateway
docker logs --tail 50 llm-gateway
```
Common issues:
- Missing required API keys
- Port 8080 already in use (use `-p 9000:8080`)
- Invalid configuration file syntax
### High Memory Usage
Monitor resources:
```bash
docker stats llm-gateway
```
Set limits:
```bash
docker update --memory="512m" llm-gateway
```
### Connection Issues
**Test from inside container:**
```bash
docker exec -it llm-gateway wget -O- http://localhost:8080/health
```
**Check port bindings:**
```bash
docker port llm-gateway
```
**Test provider connectivity:**
```bash
docker exec llm-gateway wget -O- https://api.openai.com
```
### Database Locked (SQLite)
If using SQLite with multiple containers:
```bash
# SQLite doesn't support concurrent writes
# Use Redis or PostgreSQL instead:
docker run -d \
--name redis \
redis:7-alpine
docker run -d \
--name llm-gateway \
-p 8080:8080 \
-e CONVERSATIONS_STORE=redis \
-e CONVERSATIONS_DSN=redis://redis:6379/0 \
--link redis \
ghcr.io/yourusername/llm-gateway:latest
```
### Image Pull Failures
**Authentication:**
```bash
# Login to GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin
# Pull image
docker pull ghcr.io/yourusername/llm-gateway:latest
```
**Rate limiting:**
Images are public but may be rate-limited. Use Docker Hub mirror or cache.
### Debugging
**Interactive shell:**
```bash
docker exec -it llm-gateway sh
```
**Inspect configuration:**
```bash
# Check environment variables
docker exec llm-gateway env
# Check config file
docker exec llm-gateway cat /app/config.yaml
```
**Network debugging:**
```bash
docker exec llm-gateway wget --spider http://localhost:8080/health
docker exec llm-gateway ping google.com
```
## Useful Commands
```bash
# Container lifecycle
docker stop llm-gateway
docker start llm-gateway
docker restart llm-gateway
docker rm -f llm-gateway
# Logs
docker logs -f llm-gateway
docker logs --tail 100 llm-gateway
docker logs --since 30m llm-gateway
# Cleanup
docker system prune -a
docker volume prune
docker image prune -a
# Updates
docker pull ghcr.io/yourusername/llm-gateway:latest
docker stop llm-gateway
docker rm llm-gateway
docker run -d --name llm-gateway ... ghcr.io/yourusername/llm-gateway:latest
```
## Next Steps
- **Production deployment**: See [Kubernetes guide](../k8s/README.md) for orchestration
- **Monitoring**: Enable Prometheus metrics and set up Grafana dashboards
- **Security**: Configure OAuth2/OIDC authentication
- **Scaling**: Use Kubernetes HPA or Docker Swarm for auto-scaling
## Additional Resources
- [Main README](../README.md) - Full documentation
- [Kubernetes Deployment](../k8s/README.md) - Production orchestration
- [Configuration Reference](../config.example.yaml) - All config options
- [GitHub Container Registry](https://github.com/yourusername/latticelm/pkgs/container/llm-gateway) - Published images

74
docs/README.md Normal file
View File

@@ -0,0 +1,74 @@
# Documentation
Welcome to the latticelm documentation. This directory contains detailed guides and documentation for various aspects of the LLM Gateway.
## User Guides
### [Docker Deployment Guide](./DOCKER_DEPLOYMENT.md)
Complete guide to deploying the LLM Gateway using Docker with pre-built images or building from source.
**Topics covered:**
- Using pre-built container images from CI/CD
- Configuration with environment variables and config files
- Docker Compose setup with Redis and monitoring
- Production considerations (security, resources, networking)
- Multi-platform builds
- Troubleshooting and debugging
### [Admin Web UI](./ADMIN_UI.md)
Documentation for the built-in admin dashboard.
**Topics covered:**
- Accessing the Admin UI
- Features and capabilities
- System information dashboard
- Provider status monitoring
- Configuration management
## Developer Documentation
### [Admin UI Specification](./admin-ui-spec.md)
Technical specification and design document for the Admin UI component.
**Topics covered:**
- Component architecture
- API endpoints
- UI mockups and wireframes
- Implementation details
### [Implementation Summary](./IMPLEMENTATION_SUMMARY.md)
Overview of the implementation details and architecture decisions.
**Topics covered:**
- System architecture
- Provider implementations
- Key features and their implementations
- Technology stack
## Additional Resources
## Deployment Guides
### [Kubernetes Deployment Guide](../k8s/README.md)
Production-grade Kubernetes deployment with high availability, monitoring, and security.
**Topics covered:**
- Deploying with Kustomize and kubectl
- Secrets management (External Secrets Operator, Sealed Secrets)
- Monitoring with Prometheus and OpenTelemetry
- Horizontal Pod Autoscaling and PodDisruptionBudgets
- Security best practices (RBAC, NetworkPolicies, Pod Security)
- Cloud-specific guides (AWS EKS, GCP GKE, Azure AKS)
- Storage options (Redis, PostgreSQL, managed services)
- Rolling updates and rollback strategies
For more documentation, see:
- **[Main README](../README.md)** - Overview, quick start, and feature documentation
- **[Configuration Example](../config.example.yaml)** - Detailed configuration options with comments
## Need Help?
- **Issues**: Check the [GitHub Issues](https://github.com/yourusername/latticelm/issues)
- **Discussions**: Use [GitHub Discussions](https://github.com/yourusername/latticelm/discussions) for questions
- **Contributing**: See [Contributing Guidelines](../README.md#contributing) in the main README

View File

@@ -9,6 +9,7 @@
"version": "0.1.0",
"dependencies": {
"axios": "^1.6.0",
"openai": "^6.27.0",
"vue": "^3.4.0",
"vue-router": "^4.2.0"
},
@@ -1438,6 +1439,27 @@
"node": "^10 || ^12 || ^13.7 || ^14 || >=15.0.1"
}
},
"node_modules/openai": {
"version": "6.27.0",
"resolved": "https://registry.npmjs.org/openai/-/openai-6.27.0.tgz",
"integrity": "sha512-osTKySlrdYrLYTt0zjhY8yp0JUBmWDCN+Q+QxsV4xMQnnoVFpylgKGgxwN8sSdTNw0G4y+WUXs4eCMWpyDNWZQ==",
"license": "Apache-2.0",
"bin": {
"openai": "bin/cli"
},
"peerDependencies": {
"ws": "^8.18.0",
"zod": "^3.25 || ^4.0"
},
"peerDependenciesMeta": {
"ws": {
"optional": true
},
"zod": {
"optional": true
}
}
},
"node_modules/path-browserify": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/path-browserify/-/path-browserify-1.0.1.tgz",

View File

@@ -9,9 +9,10 @@
"preview": "vite preview"
},
"dependencies": {
"axios": "^1.6.0",
"openai": "^6.27.0",
"vue": "^3.4.0",
"vue-router": "^4.2.0",
"axios": "^1.6.0"
"vue-router": "^4.2.0"
},
"devDependencies": {
"@vitejs/plugin-vue": "^5.0.0",

View File

@@ -1,5 +1,6 @@
import { createRouter, createWebHistory } from 'vue-router'
import Dashboard from './views/Dashboard.vue'
import Chat from './views/Chat.vue'
const router = createRouter({
history: createWebHistory('/admin/'),
@@ -8,6 +9,11 @@ const router = createRouter({
path: '/',
name: 'dashboard',
component: Dashboard
},
{
path: '/chat',
name: 'chat',
component: Chat
}
]
})

View File

@@ -0,0 +1,550 @@
<template>
<div class="chat-page">
<header class="header">
<div class="header-content">
<router-link to="/" class="back-link"> Dashboard</router-link>
<h1>Playground</h1>
</div>
</header>
<div class="chat-container">
<!-- Sidebar -->
<aside class="sidebar">
<div class="sidebar-section">
<label class="field-label">Model</label>
<select v-model="selectedModel" class="select-input" :disabled="modelsLoading">
<option v-if="modelsLoading" value="">Loading...</option>
<option v-for="m in models" :key="m.id" :value="m.id">
{{ m.id }}
</option>
</select>
</div>
<div class="sidebar-section">
<label class="field-label">System Instructions</label>
<textarea
v-model="instructions"
class="textarea-input"
rows="4"
placeholder="You are a helpful assistant..."
></textarea>
</div>
<div class="sidebar-section">
<label class="field-label">Temperature</label>
<div class="slider-row">
<input type="range" v-model.number="temperature" min="0" max="2" step="0.1" class="slider" />
<span class="slider-value">{{ temperature }}</span>
</div>
</div>
<div class="sidebar-section">
<label class="field-label">Stream</label>
<label class="toggle">
<input type="checkbox" v-model="stream" />
<span class="toggle-slider"></span>
</label>
</div>
<button class="btn-clear" @click="clearChat">Clear Chat</button>
</aside>
<!-- Chat Area -->
<main class="chat-main">
<div class="messages" ref="messagesContainer">
<div v-if="messages.length === 0" class="empty-chat">
<p>Send a message to start chatting.</p>
</div>
<div
v-for="(msg, i) in messages"
:key="i"
:class="['message', `message-${msg.role}`]"
>
<div class="message-role">{{ msg.role }}</div>
<div class="message-content" v-html="renderContent(msg.content)"></div>
</div>
<div v-if="isLoading" class="message message-assistant">
<div class="message-role">assistant</div>
<div class="message-content">
<span class="typing-indicator">
<span></span><span></span><span></span>
</span>
{{ streamingText }}
</div>
</div>
</div>
<div class="input-area">
<textarea
v-model="userInput"
class="chat-input"
placeholder="Type a message..."
rows="1"
@keydown.enter.exact.prevent="sendMessage"
@input="autoResize"
ref="chatInputEl"
></textarea>
<button class="btn-send" @click="sendMessage" :disabled="isLoading || !userInput.trim()">
Send
</button>
</div>
</main>
</div>
</div>
</template>
<script setup lang="ts">
import { ref, onMounted, nextTick } from 'vue'
import OpenAI from 'openai'
interface ChatMessage {
role: 'user' | 'assistant'
content: string
}
interface ModelOption {
id: string
provider: string
}
const models = ref<ModelOption[]>([])
const modelsLoading = ref(true)
const selectedModel = ref('')
const instructions = ref('')
const temperature = ref(1.0)
const stream = ref(true)
const userInput = ref('')
const messages = ref<ChatMessage[]>([])
const isLoading = ref(false)
const streamingText = ref('')
const lastResponseId = ref<string | null>(null)
const messagesContainer = ref<HTMLElement | null>(null)
const chatInputEl = ref<HTMLTextAreaElement | null>(null)
const client = new OpenAI({
baseURL: `${window.location.origin}/v1`,
apiKey: 'unused',
dangerouslyAllowBrowser: true,
})
async function loadModels() {
try {
const resp = await fetch('/v1/models')
const data = await resp.json()
models.value = data.data || []
if (models.value.length > 0) {
selectedModel.value = models.value[0].id
}
} catch (e) {
console.error('Failed to load models:', e)
} finally {
modelsLoading.value = false
}
}
function scrollToBottom() {
nextTick(() => {
if (messagesContainer.value) {
messagesContainer.value.scrollTop = messagesContainer.value.scrollHeight
}
})
}
function autoResize(e: Event) {
const el = e.target as HTMLTextAreaElement
el.style.height = 'auto'
el.style.height = Math.min(el.scrollHeight, 150) + 'px'
}
function renderContent(content: string): string {
return content
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
.replace(/\n/g, '<br>')
}
function clearChat() {
messages.value = []
lastResponseId.value = null
streamingText.value = ''
}
async function sendMessage() {
const text = userInput.value.trim()
if (!text || isLoading.value) return
messages.value.push({ role: 'user', content: text })
userInput.value = ''
if (chatInputEl.value) {
chatInputEl.value.style.height = 'auto'
}
scrollToBottom()
isLoading.value = true
streamingText.value = ''
try {
const params: Record<string, any> = {
model: selectedModel.value,
input: text,
temperature: temperature.value,
stream: stream.value,
}
if (instructions.value.trim()) {
params.instructions = instructions.value.trim()
}
if (lastResponseId.value) {
params.previous_response_id = lastResponseId.value
}
if (stream.value) {
const response = await client.responses.create(params as any)
// The SDK returns an async iterable for streaming
let fullText = ''
for await (const event of response as any) {
if (event.type === 'response.output_text.delta') {
fullText += event.delta
streamingText.value = fullText
scrollToBottom()
} else if (event.type === 'response.completed') {
lastResponseId.value = event.response.id
}
}
messages.value.push({ role: 'assistant', content: fullText })
} else {
const response = await client.responses.create(params as any) as any
lastResponseId.value = response.id
const text = response.output
?.filter((item: any) => item.type === 'message')
?.flatMap((item: any) => item.content)
?.filter((part: any) => part.type === 'output_text')
?.map((part: any) => part.text)
?.join('') || ''
messages.value.push({ role: 'assistant', content: text })
}
} catch (e: any) {
messages.value.push({
role: 'assistant',
content: `Error: ${e.message || 'Failed to get response'}`,
})
} finally {
isLoading.value = false
streamingText.value = ''
scrollToBottom()
}
}
onMounted(() => {
loadModels()
})
</script>
<style scoped>
.chat-page {
min-height: 100vh;
display: flex;
flex-direction: column;
background-color: #f5f5f5;
}
.header {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
padding: 1rem 2rem;
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}
.header-content {
display: flex;
align-items: center;
gap: 1.5rem;
}
.back-link {
color: rgba(255, 255, 255, 0.85);
text-decoration: none;
font-size: 0.95rem;
}
.back-link:hover {
color: white;
}
.header h1 {
font-size: 1.5rem;
font-weight: 600;
}
.chat-container {
flex: 1;
display: flex;
overflow: hidden;
height: calc(100vh - 65px);
}
/* Sidebar */
.sidebar {
width: 280px;
background: white;
border-right: 1px solid #e2e8f0;
padding: 1.5rem;
display: flex;
flex-direction: column;
gap: 1.25rem;
overflow-y: auto;
}
.sidebar-section {
display: flex;
flex-direction: column;
gap: 0.5rem;
}
.field-label {
font-size: 0.8rem;
font-weight: 600;
color: #4a5568;
text-transform: uppercase;
letter-spacing: 0.05em;
}
.select-input {
padding: 0.5rem;
border: 1px solid #e2e8f0;
border-radius: 6px;
font-size: 0.875rem;
background: white;
color: #2d3748;
}
.textarea-input {
padding: 0.5rem;
border: 1px solid #e2e8f0;
border-radius: 6px;
font-size: 0.875rem;
resize: vertical;
font-family: inherit;
color: #2d3748;
}
.slider-row {
display: flex;
align-items: center;
gap: 0.75rem;
}
.slider {
flex: 1;
accent-color: #667eea;
}
.slider-value {
font-size: 0.875rem;
font-weight: 500;
color: #2d3748;
min-width: 2rem;
text-align: right;
}
.toggle {
position: relative;
width: 44px;
height: 24px;
cursor: pointer;
}
.toggle input {
opacity: 0;
width: 0;
height: 0;
}
.toggle-slider {
position: absolute;
inset: 0;
background-color: #cbd5e0;
border-radius: 24px;
transition: 0.2s;
}
.toggle-slider::before {
content: '';
position: absolute;
height: 18px;
width: 18px;
left: 3px;
bottom: 3px;
background-color: white;
border-radius: 50%;
transition: 0.2s;
}
.toggle input:checked + .toggle-slider {
background-color: #667eea;
}
.toggle input:checked + .toggle-slider::before {
transform: translateX(20px);
}
.btn-clear {
margin-top: auto;
padding: 0.5rem;
background: #fed7d7;
color: #742a2a;
border: none;
border-radius: 6px;
font-size: 0.875rem;
font-weight: 500;
cursor: pointer;
}
.btn-clear:hover {
background: #feb2b2;
}
/* Chat Main */
.chat-main {
flex: 1;
display: flex;
flex-direction: column;
min-width: 0;
}
.messages {
flex: 1;
overflow-y: auto;
padding: 1.5rem;
display: flex;
flex-direction: column;
gap: 1rem;
}
.empty-chat {
flex: 1;
display: flex;
align-items: center;
justify-content: center;
color: #a0aec0;
font-size: 1.1rem;
}
.message {
max-width: 80%;
padding: 0.75rem 1rem;
border-radius: 12px;
line-height: 1.5;
}
.message-user {
align-self: flex-end;
background: #667eea;
color: white;
}
.message-user .message-role {
color: rgba(255, 255, 255, 0.7);
}
.message-assistant {
align-self: flex-start;
background: white;
border: 1px solid #e2e8f0;
color: #2d3748;
}
.message-role {
font-size: 0.7rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
margin-bottom: 0.25rem;
color: #a0aec0;
}
.message-content {
font-size: 0.95rem;
word-break: break-word;
}
/* Typing indicator */
.typing-indicator {
display: inline-flex;
gap: 3px;
margin-right: 6px;
}
.typing-indicator span {
width: 6px;
height: 6px;
border-radius: 50%;
background: #a0aec0;
animation: bounce 1.2s infinite;
}
.typing-indicator span:nth-child(2) { animation-delay: 0.2s; }
.typing-indicator span:nth-child(3) { animation-delay: 0.4s; }
@keyframes bounce {
0%, 60%, 100% { transform: translateY(0); }
30% { transform: translateY(-4px); }
}
/* Input Area */
.input-area {
padding: 1rem 1.5rem;
background: white;
border-top: 1px solid #e2e8f0;
display: flex;
gap: 0.75rem;
align-items: flex-end;
}
.chat-input {
flex: 1;
padding: 0.75rem 1rem;
border: 1px solid #e2e8f0;
border-radius: 12px;
font-size: 0.95rem;
font-family: inherit;
resize: none;
color: #2d3748;
line-height: 1.4;
max-height: 150px;
overflow-y: auto;
}
.chat-input:focus {
outline: none;
border-color: #667eea;
box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.15);
}
.btn-send {
padding: 0.75rem 1.5rem;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
border: none;
border-radius: 12px;
font-size: 0.95rem;
font-weight: 500;
cursor: pointer;
white-space: nowrap;
}
.btn-send:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.btn-send:hover:not(:disabled) {
opacity: 0.9;
}
</style>

View File

@@ -1,7 +1,10 @@
<template>
<div class="dashboard">
<header class="header">
<h1>LLM Gateway Admin</h1>
<div class="header-row">
<h1>LLM Gateway Admin</h1>
<router-link to="/chat" class="nav-link">Playground </router-link>
</div>
</header>
<div class="container">
@@ -168,11 +171,34 @@ onUnmounted(() => {
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
}
.header-row {
display: flex;
justify-content: space-between;
align-items: center;
}
.header h1 {
font-size: 2rem;
font-weight: 600;
}
.nav-link {
color: rgba(255, 255, 255, 0.85);
text-decoration: none;
font-size: 1rem;
font-weight: 500;
padding: 0.5rem 1rem;
border: 1px solid rgba(255, 255, 255, 0.3);
border-radius: 8px;
transition: all 0.2s;
}
.nav-link:hover {
color: white;
border-color: rgba(255, 255, 255, 0.6);
background: rgba(255, 255, 255, 0.1);
}
.container {
max-width: 1400px;
margin: 0 auto;

View File

@@ -6,10 +6,15 @@ export default defineConfig({
base: '/admin/',
server: {
port: 5173,
allowedHosts: ['.coder.ia-innovacion.work', 'localhost'],
proxy: {
'/admin/api': {
target: 'http://localhost:8080',
changeOrigin: true,
},
'/v1': {
target: 'http://localhost:8080',
changeOrigin: true,
}
}
},

View File

@@ -172,9 +172,32 @@ func Load(path string) (*Config, error) {
func (cfg *Config) validate() error {
for _, m := range cfg.Models {
if _, ok := cfg.Providers[m.Provider]; !ok {
providerEntry, ok := cfg.Providers[m.Provider]
if !ok {
return fmt.Errorf("model %q references unknown provider %q", m.Name, m.Provider)
}
switch providerEntry.Type {
case "openai", "anthropic", "google", "azureopenai", "azureanthropic":
if providerEntry.APIKey == "" {
return fmt.Errorf("model %q references provider %q (%s) without api_key", m.Name, m.Provider, providerEntry.Type)
}
}
switch providerEntry.Type {
case "azureopenai", "azureanthropic":
if providerEntry.Endpoint == "" {
return fmt.Errorf("model %q references provider %q (%s) without endpoint", m.Name, m.Provider, providerEntry.Type)
}
case "vertexai":
if providerEntry.Project == "" || providerEntry.Location == "" {
return fmt.Errorf("model %q references provider %q (vertexai) without project/location", m.Name, m.Provider)
}
case "openai", "anthropic", "google":
// No additional required fields.
default:
return fmt.Errorf("model %q references provider %q with unknown type %q", m.Name, m.Provider, providerEntry.Type)
}
}
return nil
}

View File

@@ -103,7 +103,7 @@ server:
address: ":8080"
providers:
azure:
type: azure_openai
type: azureopenai
api_key: azure-key
endpoint: https://my-resource.openai.azure.com
api_version: "2024-02-15-preview"
@@ -113,7 +113,7 @@ models:
provider_model_id: gpt-4-deployment
`,
validate: func(t *testing.T, cfg *Config) {
assert.Equal(t, "azure_openai", cfg.Providers["azure"].Type)
assert.Equal(t, "azureopenai", cfg.Providers["azure"].Type)
assert.Equal(t, "azure-key", cfg.Providers["azure"].APIKey)
assert.Equal(t, "https://my-resource.openai.azure.com", cfg.Providers["azure"].Endpoint)
assert.Equal(t, "2024-02-15-preview", cfg.Providers["azure"].APIVersion)
@@ -126,7 +126,7 @@ server:
address: ":8080"
providers:
vertex:
type: vertex_ai
type: vertexai
project: my-gcp-project
location: us-central1
models:
@@ -135,7 +135,7 @@ models:
provider_model_id: gemini-1.5-pro
`,
validate: func(t *testing.T, cfg *Config) {
assert.Equal(t, "vertex_ai", cfg.Providers["vertex"].Type)
assert.Equal(t, "vertexai", cfg.Providers["vertex"].Type)
assert.Equal(t, "my-gcp-project", cfg.Providers["vertex"].Project)
assert.Equal(t, "us-central1", cfg.Providers["vertex"].Location)
},
@@ -208,6 +208,20 @@ models:
configYAML: `invalid: yaml: content: [unclosed`,
expectError: true,
},
{
name: "model references provider without required API key",
configYAML: `
server:
address: ":8080"
providers:
openai:
type: openai
models:
- name: gpt-4
provider: openai
`,
expectError: true,
},
{
name: "multiple models same provider",
configYAML: `
@@ -283,7 +297,7 @@ func TestConfigValidate(t *testing.T) {
name: "valid config",
config: Config{
Providers: map[string]ProviderEntry{
"openai": {Type: "openai"},
"openai": {Type: "openai", APIKey: "test-key"},
},
Models: []ModelEntry{
{Name: "gpt-4", Provider: "openai"},
@@ -295,7 +309,7 @@ func TestConfigValidate(t *testing.T) {
name: "model references unknown provider",
config: Config{
Providers: map[string]ProviderEntry{
"openai": {Type: "openai"},
"openai": {Type: "openai", APIKey: "test-key"},
},
Models: []ModelEntry{
{Name: "gpt-4", Provider: "unknown"},
@@ -303,6 +317,18 @@ func TestConfigValidate(t *testing.T) {
},
expectError: true,
},
{
name: "model references provider without api key",
config: Config{
Providers: map[string]ProviderEntry{
"openai": {Type: "openai"},
},
Models: []ModelEntry{
{Name: "gpt-4", Provider: "openai"},
},
},
expectError: true,
},
{
name: "no models",
config: Config{
@@ -317,8 +343,8 @@ func TestConfigValidate(t *testing.T) {
name: "multiple models multiple providers",
config: Config{
Providers: map[string]ProviderEntry{
"openai": {Type: "openai"},
"anthropic": {Type: "anthropic"},
"openai": {Type: "openai", APIKey: "test-key"},
"anthropic": {Type: "anthropic", APIKey: "ant-key"},
},
Models: []ModelEntry{
{Name: "gpt-4", Provider: "openai"},

View File

@@ -48,15 +48,30 @@ type metricsResponseWriter struct {
http.ResponseWriter
statusCode int
bytesWritten int
wroteHeader bool
}
func (w *metricsResponseWriter) WriteHeader(statusCode int) {
if w.wroteHeader {
return
}
w.wroteHeader = true
w.statusCode = statusCode
w.ResponseWriter.WriteHeader(statusCode)
}
func (w *metricsResponseWriter) Write(b []byte) (int, error) {
if !w.wroteHeader {
w.wroteHeader = true
w.statusCode = http.StatusOK
}
n, err := w.ResponseWriter.Write(b)
w.bytesWritten += n
return n, err
}
func (w *metricsResponseWriter) Flush() {
if flusher, ok := w.ResponseWriter.(http.Flusher); ok {
flusher.Flush()
}
}

View File

@@ -0,0 +1,65 @@
package observability
import (
"net/http"
"net/http/httptest"
"testing"
"github.com/stretchr/testify/assert"
)
var _ http.Flusher = (*metricsResponseWriter)(nil)
var _ http.Flusher = (*statusResponseWriter)(nil)
type testFlusherRecorder struct {
*httptest.ResponseRecorder
flushCount int
}
func newTestFlusherRecorder() *testFlusherRecorder {
return &testFlusherRecorder{ResponseRecorder: httptest.NewRecorder()}
}
func (r *testFlusherRecorder) Flush() {
r.flushCount++
}
func TestMetricsResponseWriterWriteHeaderOnlyOnce(t *testing.T) {
rec := httptest.NewRecorder()
rw := &metricsResponseWriter{ResponseWriter: rec, statusCode: http.StatusOK}
rw.WriteHeader(http.StatusAccepted)
rw.WriteHeader(http.StatusInternalServerError)
assert.Equal(t, http.StatusAccepted, rec.Code)
assert.Equal(t, http.StatusAccepted, rw.statusCode)
}
func TestMetricsResponseWriterFlushDelegates(t *testing.T) {
rec := newTestFlusherRecorder()
rw := &metricsResponseWriter{ResponseWriter: rec, statusCode: http.StatusOK}
rw.Flush()
assert.Equal(t, 1, rec.flushCount)
}
func TestStatusResponseWriterWriteHeaderOnlyOnce(t *testing.T) {
rec := httptest.NewRecorder()
rw := &statusResponseWriter{ResponseWriter: rec, statusCode: http.StatusOK}
rw.WriteHeader(http.StatusNoContent)
rw.WriteHeader(http.StatusInternalServerError)
assert.Equal(t, http.StatusNoContent, rec.Code)
assert.Equal(t, http.StatusNoContent, rw.statusCode)
}
func TestStatusResponseWriterFlushDelegates(t *testing.T) {
rec := newTestFlusherRecorder()
rw := &statusResponseWriter{ResponseWriter: rec, statusCode: http.StatusOK}
rw.Flush()
assert.Equal(t, 1, rec.flushCount)
}

View File

@@ -72,14 +72,29 @@ func TracingMiddleware(next http.Handler, tp *sdktrace.TracerProvider) http.Hand
// statusResponseWriter wraps http.ResponseWriter to capture the status code.
type statusResponseWriter struct {
http.ResponseWriter
statusCode int
statusCode int
wroteHeader bool
}
func (w *statusResponseWriter) WriteHeader(statusCode int) {
if w.wroteHeader {
return
}
w.wroteHeader = true
w.statusCode = statusCode
w.ResponseWriter.WriteHeader(statusCode)
}
func (w *statusResponseWriter) Write(b []byte) (int, error) {
if !w.wroteHeader {
w.wroteHeader = true
w.statusCode = http.StatusOK
}
return w.ResponseWriter.Write(b)
}
func (w *statusResponseWriter) Flush() {
if flusher, ok := w.ResponseWriter.(http.Flusher); ok {
flusher.Flush()
}
}

View File

@@ -136,6 +136,9 @@ func (r *Registry) Get(name string) (Provider, bool) {
func (r *Registry) Models() []struct{ Provider, Model string } {
var out []struct{ Provider, Model string }
for _, m := range r.modelList {
if _, ok := r.providers[m.Provider]; !ok {
continue
}
out = append(out, struct{ Provider, Model string }{Provider: m.Provider, Model: m.Name})
}
return out
@@ -156,7 +159,9 @@ func (r *Registry) Default(model string) (Provider, error) {
if p, ok := r.providers[providerName]; ok {
return p, nil
}
return nil, fmt.Errorf("model %q is mapped to provider %q, but that provider is not available", model, providerName)
}
return nil, fmt.Errorf("model %q not configured", model)
}
for _, p := range r.providers {

View File

@@ -475,7 +475,7 @@ func TestRegistry_Default(t *testing.T) {
},
},
{
name: "returns first provider for unknown model",
name: "returns error for unknown model",
setupReg: func() *Registry {
reg, _ := NewRegistry(
map[string]config.ProviderEntry{
@@ -490,11 +490,34 @@ func TestRegistry_Default(t *testing.T) {
)
return reg
},
modelName: "unknown-model",
validate: func(t *testing.T, p Provider) {
assert.NotNil(t, p)
// Should return first available provider
modelName: "unknown-model",
expectError: true,
errorMsg: "not configured",
},
{
name: "returns error for model whose provider is unavailable",
setupReg: func() *Registry {
reg, _ := NewRegistry(
map[string]config.ProviderEntry{
"openai": {
Type: "openai",
APIKey: "", // unavailable provider
},
"google": {
Type: "google",
APIKey: "test-key",
},
},
[]config.ModelEntry{
{Name: "gpt-4", Provider: "openai"},
{Name: "gemini-pro", Provider: "google"},
},
)
return reg
},
modelName: "gpt-4",
expectError: true,
errorMsg: "not available",
},
{
name: "returns first provider for empty model name",
@@ -542,6 +565,31 @@ func TestRegistry_Default(t *testing.T) {
}
}
func TestRegistry_Models_FiltersUnavailableProviders(t *testing.T) {
reg, err := NewRegistry(
map[string]config.ProviderEntry{
"openai": {
Type: "openai",
APIKey: "", // unavailable provider
},
"google": {
Type: "google",
APIKey: "test-key",
},
},
[]config.ModelEntry{
{Name: "gpt-4", Provider: "openai"},
{Name: "gemini-pro", Provider: "google"},
},
)
require.NoError(t, err)
models := reg.Models()
require.Len(t, models, 1)
assert.Equal(t, "gemini-pro", models[0].Model)
assert.Equal(t, "google", models[0].Provider)
}
func TestBuildProvider(t *testing.T) {
tests := []struct {
name string

View File

@@ -239,17 +239,17 @@ func (s *GatewayServer) handleSyncResponse(w http.ResponseWriter, r *http.Reques
}
func (s *GatewayServer) handleStreamingResponse(w http.ResponseWriter, r *http.Request, provider providers.Provider, providerMsgs []api.Message, resolvedReq *api.ResponseRequest, origReq *api.ResponseRequest, storeMsgs []api.Message) {
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
w.WriteHeader(http.StatusOK)
flusher, ok := w.(http.Flusher)
if !ok {
http.Error(w, "streaming not supported", http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
w.WriteHeader(http.StatusOK)
responseID := generateID("resp_")
itemID := generateID("msg_")
seq := 0

View File

@@ -0,0 +1,53 @@
package server
import (
"io"
"log/slog"
"net/http"
"net/http/httptest"
"testing"
"github.com/stretchr/testify/assert"
)
type nonFlusherRecorder struct {
recorder *httptest.ResponseRecorder
writeHeaderCalls int
}
func newNonFlusherRecorder() *nonFlusherRecorder {
return &nonFlusherRecorder{recorder: httptest.NewRecorder()}
}
func (w *nonFlusherRecorder) Header() http.Header {
return w.recorder.Header()
}
func (w *nonFlusherRecorder) Write(b []byte) (int, error) {
return w.recorder.Write(b)
}
func (w *nonFlusherRecorder) WriteHeader(statusCode int) {
w.writeHeaderCalls++
w.recorder.WriteHeader(statusCode)
}
func (w *nonFlusherRecorder) StatusCode() int {
return w.recorder.Code
}
func (w *nonFlusherRecorder) BodyString() string {
return w.recorder.Body.String()
}
func TestHandleStreamingResponseWithoutFlusherWritesSingleErrorHeader(t *testing.T) {
s := New(nil, nil, slog.New(slog.NewTextHandler(io.Discard, nil)))
req := httptest.NewRequest(http.MethodPost, "/v1/responses", nil)
w := newNonFlusherRecorder()
s.handleStreamingResponse(w, req, nil, nil, nil, nil, nil)
assert.Equal(t, 1, w.writeHeaderCalls)
assert.Equal(t, http.StatusInternalServerError, w.StatusCode())
assert.Contains(t, w.BodyString(), "streaming not supported")
}

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@@ -135,6 +135,41 @@ class ChatClient:
return self._stream_response(model)
else:
return self._sync_response(model)
@staticmethod
def _get_attr(obj: Any, key: str, default: Any = None) -> Any:
"""Access object attributes safely for both SDK objects and dicts."""
if obj is None:
return default
if isinstance(obj, dict):
return obj.get(key, default)
return getattr(obj, key, default)
def _extract_stream_error(self, event: Any) -> str:
"""Extract error message from a response.failed event."""
response = self._get_attr(event, "response")
error = self._get_attr(response, "error")
message = self._get_attr(error, "message")
if message:
return str(message)
return "streaming request failed"
def _extract_completed_text(self, event: Any) -> str:
"""Extract assistant output text from a response.completed event."""
response = self._get_attr(event, "response")
output_items = self._get_attr(response, "output", []) or []
text_parts = []
for item in output_items:
if self._get_attr(item, "type") != "message":
continue
for part in self._get_attr(item, "content", []) or []:
if self._get_attr(part, "type") == "output_text":
text = self._get_attr(part, "text", "")
if text:
text_parts.append(str(text))
return "".join(text_parts)
def _sync_response(self, model: str) -> str:
"""Non-streaming response with tool support."""
@@ -225,6 +260,7 @@ class ChatClient:
while iteration < max_iterations:
iteration += 1
assistant_text = ""
stream_error = None
tool_calls = {} # Dict to track tool calls by item_id
tool_calls_list = [] # Final list of completed tool calls
assistant_content = []
@@ -244,6 +280,15 @@ class ChatClient:
if event.type == "response.output_text.delta":
assistant_text += event.delta
live.update(Markdown(assistant_text))
elif event.type == "response.completed":
# Some providers may emit final text only in response.completed.
if not assistant_text:
completed_text = self._extract_completed_text(event)
if completed_text:
assistant_text = completed_text
live.update(Markdown(assistant_text))
elif event.type == "response.failed":
stream_error = self._extract_stream_error(event)
elif event.type == "response.output_item.added":
if hasattr(event, 'item') and event.item.type == "function_call":
# Start tracking a new tool call
@@ -270,6 +315,10 @@ class ChatClient:
except json.JSONDecodeError:
self.console.print(f"[red]Error parsing tool arguments JSON[/red]")
if stream_error:
self.console.print(f"[bold red]Error:[/bold red] {stream_error}")
return ""
# Build assistant content
if assistant_text:
assistant_content.append({"type": "output_text", "text": assistant_text})
@@ -485,7 +534,7 @@ def main():
console.print(Markdown(response))
except APIStatusError as e:
console.print(f"[bold red]Error {e.status_code}:[/bold red] {e.message}")
console.print(f"[bold red]Error {e.status_code}:[/bold red] {str(e)}")
except Exception as e:
console.print(f"[bold red]Error:[/bold red] {e}")