A8065384/latticelm

Fork 0

Files

Anibal Angulo 610b6c3367 Add deployment guides

2026-03-06 21:55:42 +00:00

19 KiB

Raw Permalink Blame History

Kubernetes Deployment Guide

Production-ready Kubernetes manifests for deploying the LLM Gateway with high availability, monitoring, and security.

Quick Start
Prerequisites
Deployment
Configuration
Secrets Management
Monitoring
Storage Options
Scaling
Updates and Rollbacks
Security
Cloud Provider Guides
Troubleshooting

Quick Start

Deploy with default settings using pre-built images:

# Update kustomization.yaml with your image
cd k8s/
vim kustomization.yaml  # Set image to ghcr.io/yourusername/llm-gateway:v1.0.0

# Create secrets
kubectl create namespace llm-gateway
kubectl create secret generic llm-gateway-secrets \
  --from-literal=OPENAI_API_KEY="sk-your-key" \
  --from-literal=ANTHROPIC_API_KEY="sk-ant-your-key" \
  --from-literal=GOOGLE_API_KEY="your-key" \
  -n llm-gateway

# Deploy
kubectl apply -k .

# Verify
kubectl get pods -n llm-gateway
kubectl logs -n llm-gateway -l app=llm-gateway

Prerequisites

Kubernetes: v1.24+ cluster
kubectl: Configured and authenticated
Container images: Access to ghcr.io/yourusername/llm-gateway

Optional but recommended:

Prometheus Operator: For metrics and alerting
cert-manager: For automatic TLS certificates
Ingress Controller: nginx, ALB, or GCE
External Secrets Operator: For secrets management

Deployment

Using Kustomize (Recommended)

# Review and customize
cd k8s/
vim kustomization.yaml  # Update image, namespace, etc.
vim configmap.yaml      # Configure gateway settings
vim ingress.yaml        # Set your domain

# Deploy all resources
kubectl apply -k .

# Deploy with Kustomize overlays
kubectl apply -k overlays/production/

Using kubectl

kubectl apply -f namespace.yaml
kubectl apply -f serviceaccount.yaml
kubectl apply -f secret.yaml
kubectl apply -f configmap.yaml
kubectl apply -f redis.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml
kubectl apply -f hpa.yaml
kubectl apply -f pdb.yaml
kubectl apply -f networkpolicy.yaml

With Monitoring

If Prometheus Operator is installed:

kubectl apply -f servicemonitor.yaml
kubectl apply -f prometheusrule.yaml

Configuration

Image Configuration

Update kustomization.yaml:

images:
  - name: llm-gateway
    newName: ghcr.io/yourusername/llm-gateway
    newTag: v1.2.3  # Or 'latest', 'main', 'sha-abc123'

Gateway Configuration

Edit configmap.yaml for gateway settings:

apiVersion: v1
kind: ConfigMap
metadata:
  name: llm-gateway-config
data:
  config.yaml: |
    server:
      address: ":8080"

    logging:
      level: info
      format: json

    rate_limit:
      enabled: true
      requests_per_second: 10
      burst: 20

    observability:
      enabled: true
      metrics:
        enabled: true
      tracing:
        enabled: true
        exporter:
          type: otlp
          endpoint: tempo:4317

    conversations:
      store: redis
      dsn: redis://redis:6379/0
      ttl: 1h

Resource Limits

Default resources (adjust based on load testing):

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 1000m
    memory: 512Mi

Ingress Configuration

Edit ingress.yaml for your domain:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: llm-gateway
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - llm-gateway.yourdomain.com
      secretName: llm-gateway-tls
  rules:
    - host: llm-gateway.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: llm-gateway
                port:
                  number: 80

Secrets Management

Option 1: kubectl (Development)

kubectl create secret generic llm-gateway-secrets \
  --from-literal=OPENAI_API_KEY="sk-..." \
  --from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
  --from-literal=GOOGLE_API_KEY="..." \
  --from-literal=OIDC_AUDIENCE="your-client-id" \
  -n llm-gateway

Option 2: External Secrets Operator (Production)

Install ESO, then create ExternalSecret:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: llm-gateway-secrets
  namespace: llm-gateway
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secretsmanager  # or vault, gcpsm, etc.
    kind: ClusterSecretStore
  target:
    name: llm-gateway-secrets
  data:
    - secretKey: OPENAI_API_KEY
      remoteRef:
        key: llm-gateway/openai-key
    - secretKey: ANTHROPIC_API_KEY
      remoteRef:
        key: llm-gateway/anthropic-key
    - secretKey: GOOGLE_API_KEY
      remoteRef:
        key: llm-gateway/google-key

Option 3: Sealed Secrets

# Encrypt secrets
echo -n "sk-your-key" | kubectl create secret generic llm-gateway-secrets \
  --dry-run=client --from-file=OPENAI_API_KEY=/dev/stdin -o yaml | \
  kubeseal -o yaml > sealed-secret.yaml

# Commit sealed-secret.yaml to git
kubectl apply -f sealed-secret.yaml

Monitoring

Metrics

ServiceMonitor for Prometheus Operator:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: llm-gateway
spec:
  selector:
    matchLabels:
      app: llm-gateway
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Available metrics:

gateway_requests_total - Total requests by provider/model
gateway_request_duration_seconds - Request latency histogram
gateway_provider_errors_total - Errors by provider
gateway_circuit_breaker_state - Circuit breaker state changes
gateway_rate_limit_hits_total - Rate limit violations

Alerts

PrometheusRule with common alerts:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: llm-gateway-alerts
spec:
  groups:
    - name: llm-gateway
      interval: 30s
      rules:
        - alert: HighErrorRate
          expr: rate(gateway_requests_total{status=~"5.."}[5m]) > 0.05
          for: 5m
          annotations:
            summary: High error rate detected

        - alert: PodDown
          expr: kube_deployment_status_replicas_available{deployment="llm-gateway"} < 2
          for: 5m
          annotations:
            summary: Less than 2 gateway pods running

Logging

View logs:

# Tail logs
kubectl logs -n llm-gateway -l app=llm-gateway -f

# Filter by level
kubectl logs -n llm-gateway -l app=llm-gateway | jq 'select(.level=="error")'

# Search logs
kubectl logs -n llm-gateway -l app=llm-gateway | grep "circuit.*open"

Tracing

Configure OpenTelemetry collector:

observability:
  tracing:
    enabled: true
    exporter:
      type: otlp
      endpoint: tempo:4317  # or jaeger-collector:4317

Storage Options

In-Memory (Default)

No persistence, lost on pod restart:

conversations:
  store: memory

Redis (Recommended)

Deploy Redis StatefulSet:

kubectl apply -f redis.yaml

Configure gateway:

conversations:
  store: redis
  dsn: redis://redis:6379/0
  ttl: 1h

External Redis

For production, use managed Redis:

conversations:
  store: redis
  dsn: redis://:password@redis.example.com:6379/0
  ttl: 1h

Cloud providers:

AWS: ElastiCache for Redis
GCP: Memorystore for Redis
Azure: Azure Cache for Redis

PostgreSQL

conversations:
  store: sql
  driver: pgx
  dsn: postgres://user:pass@postgres:5432/llm_gateway?sslmode=require
  ttl: 1h

Scaling

Horizontal Pod Autoscaler

Default HPA configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: llm-gateway
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: llm-gateway
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Monitor HPA:

kubectl get hpa -n llm-gateway
kubectl describe hpa llm-gateway -n llm-gateway

Manual Scaling

# Scale to specific replica count
kubectl scale deployment/llm-gateway --replicas=10 -n llm-gateway

# Check status
kubectl get deployment llm-gateway -n llm-gateway

Pod Disruption Budget

Ensures availability during disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: llm-gateway
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: llm-gateway

Updates and Rollbacks

Rolling Updates

# Update image
kubectl set image deployment/llm-gateway \
  gateway=ghcr.io/yourusername/llm-gateway:v1.2.3 \
  -n llm-gateway

# Watch rollout
kubectl rollout status deployment/llm-gateway -n llm-gateway

# Pause rollout if issues
kubectl rollout pause deployment/llm-gateway -n llm-gateway

# Resume rollout
kubectl rollout resume deployment/llm-gateway -n llm-gateway

Rollback

# Rollback to previous version
kubectl rollout undo deployment/llm-gateway -n llm-gateway

# Rollback to specific revision
kubectl rollout history deployment/llm-gateway -n llm-gateway
kubectl rollout undo deployment/llm-gateway --to-revision=3 -n llm-gateway

Blue-Green Deployment

# Deploy new version with different label
kubectl apply -f deployment-v2.yaml

# Test new version
kubectl port-forward -n llm-gateway deployment/llm-gateway-v2 8080:8080

# Switch service to new version
kubectl patch service llm-gateway -n llm-gateway \
  -p '{"spec":{"selector":{"version":"v2"}}}'

# Delete old version after verification
kubectl delete deployment llm-gateway-v1 -n llm-gateway

Security

Pod Security

Deployment includes security best practices:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault

containers:
  - name: gateway
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
          - ALL

Network Policies

Restrict traffic to/from gateway pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: llm-gateway
spec:
  podSelector:
    matchLabels:
      app: llm-gateway
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:  # Allow DNS
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
    - to:  # Allow Redis
        - podSelector:
            matchLabels:
              app: redis
      ports:
        - protocol: TCP
          port: 6379
    - to:  # Allow external LLM providers (HTTPS)
        - namespaceSelector: {}
      ports:
        - protocol: TCP
          port: 443

RBAC

ServiceAccount with minimal permissions:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: llm-gateway
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: llm-gateway
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: llm-gateway
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: llm-gateway
subjects:
  - kind: ServiceAccount
    name: llm-gateway

Cloud Provider Guides

AWS EKS

# Install AWS Load Balancer Controller
kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=my-cluster

# Update ingress for ALB
# Add annotations to ingress.yaml:
metadata:
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip

IRSA for secrets:

# Create IAM role and associate with ServiceAccount
eksctl create iamserviceaccount \
  --name llm-gateway \
  --namespace llm-gateway \
  --cluster my-cluster \
  --attach-policy-arn arn:aws:iam::aws:policy/SecretsManagerReadWrite \
  --approve

ElastiCache Redis:

conversations:
  store: redis
  dsn: redis://my-cluster.cache.amazonaws.com:6379/0

GCP GKE

# Enable Workload Identity
gcloud container clusters update my-cluster \
  --workload-pool=PROJECT_ID.svc.id.goog

# Create service account with Secret Manager access
gcloud iam service-accounts create llm-gateway

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member "serviceAccount:llm-gateway@PROJECT_ID.iam.gserviceaccount.com" \
  --role "roles/secretmanager.secretAccessor"

# Bind K8s SA to GCP SA
kubectl annotate serviceaccount llm-gateway \
  -n llm-gateway \
  iam.gke.io/gcp-service-account=llm-gateway@PROJECT_ID.iam.gserviceaccount.com

Memorystore Redis:

conversations:
  store: redis
  dsn: redis://10.0.0.3:6379/0  # Private IP from Memorystore

Azure AKS

# Install Application Gateway Ingress Controller
az aks enable-addons \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --addons ingress-appgw \
  --appgw-name myApplicationGateway

# Configure Azure AD Workload Identity
az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --enable-oidc-issuer \
  --enable-workload-identity

Azure Key Vault with ESO:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: azure-keyvault
spec:
  provider:
    azurekv:
      authType: WorkloadIdentity
      vaultUrl: https://my-vault.vault.azure.net

Troubleshooting

Pods Not Starting

# Check pod status
kubectl get pods -n llm-gateway

# Describe pod for events
kubectl describe pod llm-gateway-xxx -n llm-gateway

# Check logs
kubectl logs -n llm-gateway llm-gateway-xxx

# Check previous container logs (if crashed)
kubectl logs -n llm-gateway llm-gateway-xxx --previous

Common issues:

Image pull errors: Check registry credentials
CrashLoopBackOff: Check logs for startup errors
Pending: Check resource quotas and node capacity

Health Check Failures

# Port-forward to test locally
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80

# Test endpoints
curl http://localhost:8080/health
curl http://localhost:8080/ready

# Check from inside pod
kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- http://localhost:8080/health

Provider Connection Issues

# Test egress from pod
kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- https://api.openai.com

# Check secrets
kubectl get secret llm-gateway-secrets -n llm-gateway -o jsonpath='{.data.OPENAI_API_KEY}' | base64 -d

# Verify network policies
kubectl get networkpolicy -n llm-gateway
kubectl describe networkpolicy llm-gateway -n llm-gateway

Redis Connection Issues

# Test Redis connectivity
kubectl exec -n llm-gateway deployment/llm-gateway -- nc -zv redis 6379

# Connect to Redis
kubectl exec -it -n llm-gateway redis-0 -- redis-cli

# Check Redis logs
kubectl logs -n llm-gateway redis-0

Performance Issues

# Check resource usage
kubectl top pods -n llm-gateway
kubectl top nodes

# Check HPA status
kubectl describe hpa llm-gateway -n llm-gateway

# Check for throttling
kubectl describe pod llm-gateway-xxx -n llm-gateway | grep -i throttl

Debug Container

For distroless/minimal images:

# Use ephemeral debug container
kubectl debug -it -n llm-gateway llm-gateway-xxx --image=busybox --target=gateway

# Or use debug pod
kubectl run debug --rm -it --image=nicolaka/netshoot -n llm-gateway -- /bin/bash

Useful Commands

# View all resources
kubectl get all -n llm-gateway

# Check deployment status
kubectl rollout status deployment/llm-gateway -n llm-gateway

# Tail logs from all pods
kubectl logs -n llm-gateway -l app=llm-gateway -f --max-log-requests=10

# Get events
kubectl get events -n llm-gateway --sort-by='.lastTimestamp'

# Check resource quotas
kubectl describe resourcequota -n llm-gateway

# Export current config
kubectl get deployment llm-gateway -n llm-gateway -o yaml > deployment-backup.yaml

# Force pod restart
kubectl rollout restart deployment/llm-gateway -n llm-gateway

# Delete and recreate deployment
kubectl delete deployment llm-gateway -n llm-gateway
kubectl apply -f deployment.yaml

Architecture Overview

┌─────────────────────────────────────────────────┐
│           Internet / Load Balancer              │
└────────────────────┬────────────────────────────┘
                     │
                     ▼
          ┌──────────────────────┐
          │  Ingress Controller  │
          │    (TLS/SSL)         │
          └──────────┬───────────┘
                     │
                     ▼
          ┌──────────────────────┐
          │  Gateway Service     │
          │   (ClusterIP:80)     │
          └──────────┬───────────┘
                     │
        ┌────────────┼────────────┐
        ▼            ▼            ▼
    ┌─────┐      ┌─────┐      ┌─────┐
    │ Pod │      │ Pod │      │ Pod │
    │  1  │      │  2  │      │  3  │
    └──┬──┘      └──┬──┘      └──┬──┘
       │            │            │
       └────────────┼────────────┘
                    │
       ┌────────────┼────────────┐
       ▼            ▼            ▼
   ┌──────┐    ┌──────┐    ┌──────┐
   │Redis │    │Prom  │    │Tempo │
   └──────┘    └──────┘    └──────┘

19 KiB Raw Permalink Blame History