A8065384/latticelm

Fork 0

Files

History

Anibal Angulo df6b677a15 Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

configmap.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

deployment.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

hpa.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

ingress.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

kustomization.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

namespace.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

networkpolicy.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

pdb.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

prometheusrule.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

README.md

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

redis.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

secret.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

service.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

serviceaccount.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

servicemonitor.yaml

Add Dockerfile and Manifests

2026-03-05 06:14:03 +00:00

README.md

Kubernetes Deployment Guide

This directory contains Kubernetes manifests for deploying the LLM Gateway to production.

Prerequisites

Kubernetes cluster (v1.24+)
kubectl configured
Container registry access
(Optional) Prometheus Operator for monitoring
(Optional) cert-manager for TLS certificates
(Optional) nginx-ingress-controller or cloud load balancer

Quick Start

1. Build and Push Docker Image

# Build the image
docker build -t your-registry/llm-gateway:v1.0.0 .

# Push to registry
docker push your-registry/llm-gateway:v1.0.0

2. Configure Secrets

Option A: Using kubectl

kubectl create namespace llm-gateway

kubectl create secret generic llm-gateway-secrets \
  --from-literal=GOOGLE_API_KEY="your-key" \
  --from-literal=ANTHROPIC_API_KEY="your-key" \
  --from-literal=OPENAI_API_KEY="your-key" \
  --from-literal=OIDC_AUDIENCE="your-client-id" \
  -n llm-gateway

Option B: Using External Secrets Operator (Recommended)

Uncomment the ExternalSecret in secret.yaml
Configure your SecretStore (AWS Secrets Manager, Vault, etc.)

3. Update Configuration

Edit configmap.yaml:

Update Redis connection string if using external Redis
Configure observability endpoints (Tempo, Prometheus)
Adjust rate limits as needed
Set OIDC issuer and audience

Edit ingress.yaml:

Replace llm-gateway.example.com with your domain
Configure TLS certificate annotations

Edit kustomization.yaml:

Update image registry and tag

4. Deploy

Using Kustomize (Recommended):

kubectl apply -k k8s/

Using kubectl directly:

kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/serviceaccount.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/redis.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/hpa.yaml
kubectl apply -f k8s/pdb.yaml
kubectl apply -f k8s/networkpolicy.yaml

With Prometheus Operator:

kubectl apply -f k8s/servicemonitor.yaml
kubectl apply -f k8s/prometheusrule.yaml

5. Verify Deployment

# Check pods
kubectl get pods -n llm-gateway

# Check services
kubectl get svc -n llm-gateway

# Check ingress
kubectl get ingress -n llm-gateway

# View logs
kubectl logs -n llm-gateway -l app=llm-gateway --tail=100 -f

# Check health
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80
curl http://localhost:8080/health

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    Internet/Clients                      │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│                  Ingress Controller                      │
│            (nginx/ALB/GCE with TLS)                     │
└───────────────────────┬─────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────┐
│                  LLM Gateway Service                     │
│                    (LoadBalancer)                        │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│   Gateway    │ │   Gateway    │ │   Gateway    │
│   Pod 1      │ │   Pod 2      │ │   Pod 3      │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
       │                │                │
       └────────────────┼────────────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│    Redis     │ │  Prometheus  │ │    Tempo     │
│ (Persistent) │ │  (Metrics)   │ │  (Traces)    │
└──────────────┘ └──────────────┘ └──────────────┘

Resource Specifications

Default Resources

Requests: 100m CPU, 128Mi memory
Limits: 1000m CPU, 512Mi memory
Replicas: 3 (min), 20 (max with HPA)

Scaling

HPA scales based on CPU (70%) and memory (80%)
PodDisruptionBudget ensures minimum 2 replicas during disruptions

Configuration Options

Environment Variables (from Secret)

GOOGLE_API_KEY: Google AI API key
ANTHROPIC_API_KEY: Anthropic API key
OPENAI_API_KEY: OpenAI API key
OIDC_AUDIENCE: OIDC client ID for authentication

ConfigMap Settings

See configmap.yaml for full configuration options:

Server address
Logging format and level
Rate limiting
Observability (metrics/tracing)
Provider endpoints
Conversation storage
Authentication

Security

Security Features

Non-root container execution (UID 1000)
Read-only root filesystem
No privilege escalation
All capabilities dropped
Network policies for ingress/egress control
SeccompProfile: RuntimeDefault

TLS/HTTPS

Ingress configured with TLS
Uses cert-manager for automatic certificate provisioning
Force SSL redirect enabled

Secrets Management

Never commit secrets to git!

Production options:

External Secrets Operator (Recommended)
- AWS Secrets Manager
- HashiCorp Vault
- Google Secret Manager
Sealed Secrets
- Encrypted secrets in git
Manual kubectl secrets
- Created outside of git

Monitoring

Metrics

Exposed on /metrics endpoint
Scraped by Prometheus via ServiceMonitor
Key metrics:
- HTTP request rate, latency, errors
- Provider request rate, latency, token usage
- Conversation store operations
- Rate limiting hits

Alerts

See prometheusrule.yaml for configured alerts:

High error rate
High latency
Provider failures
Pod down
High memory usage
Rate limit threshold exceeded
Conversation store errors

Logs

Structured JSON logs with:

Request IDs
Trace context (trace_id, span_id)
Log levels (debug/info/warn/error)

View logs:

kubectl logs -n llm-gateway -l app=llm-gateway --tail=100 -f

Maintenance

Rolling Updates

# Update image
kubectl set image deployment/llm-gateway gateway=your-registry/llm-gateway:v1.0.1 -n llm-gateway

# Check rollout status
kubectl rollout status deployment/llm-gateway -n llm-gateway

# Rollback if needed
kubectl rollout undo deployment/llm-gateway -n llm-gateway

Scaling

# Manual scale
kubectl scale deployment/llm-gateway --replicas=5 -n llm-gateway

# HPA will auto-scale within min/max bounds (3-20)

Configuration Updates

# Edit ConfigMap
kubectl edit configmap llm-gateway-config -n llm-gateway

# Restart pods to pick up changes
kubectl rollout restart deployment/llm-gateway -n llm-gateway

Debugging

# Exec into pod
kubectl exec -it -n llm-gateway deployment/llm-gateway -- /bin/sh

# Port forward for local access
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80

# Check events
kubectl get events -n llm-gateway --sort-by='.lastTimestamp'

Production Considerations

High Availability

Minimum 3 replicas across availability zones
Pod anti-affinity rules spread pods across nodes
PodDisruptionBudget ensures service availability during disruptions

Performance

Adjust resource limits based on load testing
Configure HPA thresholds based on traffic patterns
Use node affinity for GPU nodes if needed

Cost Optimization

Use spot/preemptible instances for non-critical workloads
Set appropriate resource requests/limits
Monitor token usage and implement quotas

Disaster Recovery

Redis persistence (if using StatefulSet)
Regular backups of conversation data
Multi-region deployment for geo-redundancy
Document runbooks for incident response

Cloud-Specific Notes

AWS EKS

Use AWS Load Balancer Controller for ALB
Configure IRSA for service account
Use ElastiCache for Redis
Store secrets in AWS Secrets Manager

GCP GKE

Use GKE Ingress for GCLB
Configure Workload Identity
Use Memorystore for Redis
Store secrets in Google Secret Manager

Azure AKS

Use Azure Application Gateway Ingress Controller
Configure Azure AD Workload Identity
Use Azure Cache for Redis
Store secrets in Azure Key Vault

Troubleshooting

Common Issues

Pods not starting:

kubectl describe pod -n llm-gateway -l app=llm-gateway
kubectl logs -n llm-gateway -l app=llm-gateway --previous

Health check failures:

kubectl port-forward -n llm-gateway deployment/llm-gateway 8080:8080
curl http://localhost:8080/health
curl http://localhost:8080/ready

Provider connection issues:

Verify API keys in secrets
Check network policies allow egress
Verify provider endpoints are accessible

Redis connection issues:

kubectl exec -it -n llm-gateway redis-0 -- redis-cli ping

README.md

Kubernetes Deployment Guide

Prerequisites

Quick Start

1. Build and Push Docker Image

2. Configure Secrets

3. Update Configuration

4. Deploy

5. Verify Deployment

Architecture Overview

Resource Specifications

Default Resources

Scaling

Configuration Options

Environment Variables (from Secret)

ConfigMap Settings

Security

Security Features

TLS/HTTPS

Secrets Management

Monitoring

Metrics

Alerts

Logs

Maintenance

Rolling Updates

Scaling

Configuration Updates

Debugging

Production Considerations

High Availability

Performance

Cost Optimization

Disaster Recovery

Cloud-Specific Notes

AWS EKS

GCP GKE

Azure AKS

Troubleshooting

Common Issues

Additional Resources