Files
2026-03-06 21:55:42 +00:00
..
2026-03-05 06:14:03 +00:00
2026-03-05 06:14:03 +00:00
2026-03-05 06:14:03 +00:00
2026-03-05 06:14:03 +00:00
2026-03-05 06:14:03 +00:00
2026-03-05 06:14:03 +00:00
2026-03-06 21:55:42 +00:00
2026-03-05 06:14:03 +00:00
2026-03-05 06:14:03 +00:00
2026-03-05 06:14:03 +00:00

Kubernetes Deployment Guide

Production-ready Kubernetes manifests for deploying the LLM Gateway with high availability, monitoring, and security.

Table of Contents

Quick Start

Deploy with default settings using pre-built images:

# Update kustomization.yaml with your image
cd k8s/
vim kustomization.yaml  # Set image to ghcr.io/yourusername/llm-gateway:v1.0.0

# Create secrets
kubectl create namespace llm-gateway
kubectl create secret generic llm-gateway-secrets \
  --from-literal=OPENAI_API_KEY="sk-your-key" \
  --from-literal=ANTHROPIC_API_KEY="sk-ant-your-key" \
  --from-literal=GOOGLE_API_KEY="your-key" \
  -n llm-gateway

# Deploy
kubectl apply -k .

# Verify
kubectl get pods -n llm-gateway
kubectl logs -n llm-gateway -l app=llm-gateway

Prerequisites

  • Kubernetes: v1.24+ cluster
  • kubectl: Configured and authenticated
  • Container images: Access to ghcr.io/yourusername/llm-gateway

Optional but recommended:

  • Prometheus Operator: For metrics and alerting
  • cert-manager: For automatic TLS certificates
  • Ingress Controller: nginx, ALB, or GCE
  • External Secrets Operator: For secrets management

Deployment

# Review and customize
cd k8s/
vim kustomization.yaml  # Update image, namespace, etc.
vim configmap.yaml      # Configure gateway settings
vim ingress.yaml        # Set your domain

# Deploy all resources
kubectl apply -k .

# Deploy with Kustomize overlays
kubectl apply -k overlays/production/

Using kubectl

kubectl apply -f namespace.yaml
kubectl apply -f serviceaccount.yaml
kubectl apply -f secret.yaml
kubectl apply -f configmap.yaml
kubectl apply -f redis.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml
kubectl apply -f hpa.yaml
kubectl apply -f pdb.yaml
kubectl apply -f networkpolicy.yaml

With Monitoring

If Prometheus Operator is installed:

kubectl apply -f servicemonitor.yaml
kubectl apply -f prometheusrule.yaml

Configuration

Image Configuration

Update kustomization.yaml:

images:
  - name: llm-gateway
    newName: ghcr.io/yourusername/llm-gateway
    newTag: v1.2.3  # Or 'latest', 'main', 'sha-abc123'

Gateway Configuration

Edit configmap.yaml for gateway settings:

apiVersion: v1
kind: ConfigMap
metadata:
  name: llm-gateway-config
data:
  config.yaml: |
    server:
      address: ":8080"

    logging:
      level: info
      format: json

    rate_limit:
      enabled: true
      requests_per_second: 10
      burst: 20

    observability:
      enabled: true
      metrics:
        enabled: true
      tracing:
        enabled: true
        exporter:
          type: otlp
          endpoint: tempo:4317

    conversations:
      store: redis
      dsn: redis://redis:6379/0
      ttl: 1h

Resource Limits

Default resources (adjust based on load testing):

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 1000m
    memory: 512Mi

Ingress Configuration

Edit ingress.yaml for your domain:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: llm-gateway
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - llm-gateway.yourdomain.com
      secretName: llm-gateway-tls
  rules:
    - host: llm-gateway.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: llm-gateway
                port:
                  number: 80

Secrets Management

Option 1: kubectl (Development)

kubectl create secret generic llm-gateway-secrets \
  --from-literal=OPENAI_API_KEY="sk-..." \
  --from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
  --from-literal=GOOGLE_API_KEY="..." \
  --from-literal=OIDC_AUDIENCE="your-client-id" \
  -n llm-gateway

Option 2: External Secrets Operator (Production)

Install ESO, then create ExternalSecret:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: llm-gateway-secrets
  namespace: llm-gateway
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secretsmanager  # or vault, gcpsm, etc.
    kind: ClusterSecretStore
  target:
    name: llm-gateway-secrets
  data:
    - secretKey: OPENAI_API_KEY
      remoteRef:
        key: llm-gateway/openai-key
    - secretKey: ANTHROPIC_API_KEY
      remoteRef:
        key: llm-gateway/anthropic-key
    - secretKey: GOOGLE_API_KEY
      remoteRef:
        key: llm-gateway/google-key

Option 3: Sealed Secrets

# Encrypt secrets
echo -n "sk-your-key" | kubectl create secret generic llm-gateway-secrets \
  --dry-run=client --from-file=OPENAI_API_KEY=/dev/stdin -o yaml | \
  kubeseal -o yaml > sealed-secret.yaml

# Commit sealed-secret.yaml to git
kubectl apply -f sealed-secret.yaml

Monitoring

Metrics

ServiceMonitor for Prometheus Operator:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: llm-gateway
spec:
  selector:
    matchLabels:
      app: llm-gateway
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Available metrics:

  • gateway_requests_total - Total requests by provider/model
  • gateway_request_duration_seconds - Request latency histogram
  • gateway_provider_errors_total - Errors by provider
  • gateway_circuit_breaker_state - Circuit breaker state changes
  • gateway_rate_limit_hits_total - Rate limit violations

Alerts

PrometheusRule with common alerts:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: llm-gateway-alerts
spec:
  groups:
    - name: llm-gateway
      interval: 30s
      rules:
        - alert: HighErrorRate
          expr: rate(gateway_requests_total{status=~"5.."}[5m]) > 0.05
          for: 5m
          annotations:
            summary: High error rate detected

        - alert: PodDown
          expr: kube_deployment_status_replicas_available{deployment="llm-gateway"} < 2
          for: 5m
          annotations:
            summary: Less than 2 gateway pods running

Logging

View logs:

# Tail logs
kubectl logs -n llm-gateway -l app=llm-gateway -f

# Filter by level
kubectl logs -n llm-gateway -l app=llm-gateway | jq 'select(.level=="error")'

# Search logs
kubectl logs -n llm-gateway -l app=llm-gateway | grep "circuit.*open"

Tracing

Configure OpenTelemetry collector:

observability:
  tracing:
    enabled: true
    exporter:
      type: otlp
      endpoint: tempo:4317  # or jaeger-collector:4317

Storage Options

In-Memory (Default)

No persistence, lost on pod restart:

conversations:
  store: memory

Deploy Redis StatefulSet:

kubectl apply -f redis.yaml

Configure gateway:

conversations:
  store: redis
  dsn: redis://redis:6379/0
  ttl: 1h

External Redis

For production, use managed Redis:

conversations:
  store: redis
  dsn: redis://:password@redis.example.com:6379/0
  ttl: 1h

Cloud providers:

  • AWS: ElastiCache for Redis
  • GCP: Memorystore for Redis
  • Azure: Azure Cache for Redis

PostgreSQL

conversations:
  store: sql
  driver: pgx
  dsn: postgres://user:pass@postgres:5432/llm_gateway?sslmode=require
  ttl: 1h

Scaling

Horizontal Pod Autoscaler

Default HPA configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: llm-gateway
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: llm-gateway
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Monitor HPA:

kubectl get hpa -n llm-gateway
kubectl describe hpa llm-gateway -n llm-gateway

Manual Scaling

# Scale to specific replica count
kubectl scale deployment/llm-gateway --replicas=10 -n llm-gateway

# Check status
kubectl get deployment llm-gateway -n llm-gateway

Pod Disruption Budget

Ensures availability during disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: llm-gateway
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: llm-gateway

Updates and Rollbacks

Rolling Updates

# Update image
kubectl set image deployment/llm-gateway \
  gateway=ghcr.io/yourusername/llm-gateway:v1.2.3 \
  -n llm-gateway

# Watch rollout
kubectl rollout status deployment/llm-gateway -n llm-gateway

# Pause rollout if issues
kubectl rollout pause deployment/llm-gateway -n llm-gateway

# Resume rollout
kubectl rollout resume deployment/llm-gateway -n llm-gateway

Rollback

# Rollback to previous version
kubectl rollout undo deployment/llm-gateway -n llm-gateway

# Rollback to specific revision
kubectl rollout history deployment/llm-gateway -n llm-gateway
kubectl rollout undo deployment/llm-gateway --to-revision=3 -n llm-gateway

Blue-Green Deployment

# Deploy new version with different label
kubectl apply -f deployment-v2.yaml

# Test new version
kubectl port-forward -n llm-gateway deployment/llm-gateway-v2 8080:8080

# Switch service to new version
kubectl patch service llm-gateway -n llm-gateway \
  -p '{"spec":{"selector":{"version":"v2"}}}'

# Delete old version after verification
kubectl delete deployment llm-gateway-v1 -n llm-gateway

Security

Pod Security

Deployment includes security best practices:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault

containers:
  - name: gateway
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
          - ALL

Network Policies

Restrict traffic to/from gateway pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: llm-gateway
spec:
  podSelector:
    matchLabels:
      app: llm-gateway
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:  # Allow DNS
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
    - to:  # Allow Redis
        - podSelector:
            matchLabels:
              app: redis
      ports:
        - protocol: TCP
          port: 6379
    - to:  # Allow external LLM providers (HTTPS)
        - namespaceSelector: {}
      ports:
        - protocol: TCP
          port: 443

RBAC

ServiceAccount with minimal permissions:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: llm-gateway
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: llm-gateway
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: llm-gateway
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: llm-gateway
subjects:
  - kind: ServiceAccount
    name: llm-gateway

Cloud Provider Guides

AWS EKS

# Install AWS Load Balancer Controller
kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=my-cluster

# Update ingress for ALB
# Add annotations to ingress.yaml:
metadata:
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip

IRSA for secrets:

# Create IAM role and associate with ServiceAccount
eksctl create iamserviceaccount \
  --name llm-gateway \
  --namespace llm-gateway \
  --cluster my-cluster \
  --attach-policy-arn arn:aws:iam::aws:policy/SecretsManagerReadWrite \
  --approve

ElastiCache Redis:

conversations:
  store: redis
  dsn: redis://my-cluster.cache.amazonaws.com:6379/0

GCP GKE

# Enable Workload Identity
gcloud container clusters update my-cluster \
  --workload-pool=PROJECT_ID.svc.id.goog

# Create service account with Secret Manager access
gcloud iam service-accounts create llm-gateway

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member "serviceAccount:llm-gateway@PROJECT_ID.iam.gserviceaccount.com" \
  --role "roles/secretmanager.secretAccessor"

# Bind K8s SA to GCP SA
kubectl annotate serviceaccount llm-gateway \
  -n llm-gateway \
  iam.gke.io/gcp-service-account=llm-gateway@PROJECT_ID.iam.gserviceaccount.com

Memorystore Redis:

conversations:
  store: redis
  dsn: redis://10.0.0.3:6379/0  # Private IP from Memorystore

Azure AKS

# Install Application Gateway Ingress Controller
az aks enable-addons \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --addons ingress-appgw \
  --appgw-name myApplicationGateway

# Configure Azure AD Workload Identity
az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --enable-oidc-issuer \
  --enable-workload-identity

Azure Key Vault with ESO:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: azure-keyvault
spec:
  provider:
    azurekv:
      authType: WorkloadIdentity
      vaultUrl: https://my-vault.vault.azure.net

Troubleshooting

Pods Not Starting

# Check pod status
kubectl get pods -n llm-gateway

# Describe pod for events
kubectl describe pod llm-gateway-xxx -n llm-gateway

# Check logs
kubectl logs -n llm-gateway llm-gateway-xxx

# Check previous container logs (if crashed)
kubectl logs -n llm-gateway llm-gateway-xxx --previous

Common issues:

  • Image pull errors: Check registry credentials
  • CrashLoopBackOff: Check logs for startup errors
  • Pending: Check resource quotas and node capacity

Health Check Failures

# Port-forward to test locally
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80

# Test endpoints
curl http://localhost:8080/health
curl http://localhost:8080/ready

# Check from inside pod
kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- http://localhost:8080/health

Provider Connection Issues

# Test egress from pod
kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- https://api.openai.com

# Check secrets
kubectl get secret llm-gateway-secrets -n llm-gateway -o jsonpath='{.data.OPENAI_API_KEY}' | base64 -d

# Verify network policies
kubectl get networkpolicy -n llm-gateway
kubectl describe networkpolicy llm-gateway -n llm-gateway

Redis Connection Issues

# Test Redis connectivity
kubectl exec -n llm-gateway deployment/llm-gateway -- nc -zv redis 6379

# Connect to Redis
kubectl exec -it -n llm-gateway redis-0 -- redis-cli

# Check Redis logs
kubectl logs -n llm-gateway redis-0

Performance Issues

# Check resource usage
kubectl top pods -n llm-gateway
kubectl top nodes

# Check HPA status
kubectl describe hpa llm-gateway -n llm-gateway

# Check for throttling
kubectl describe pod llm-gateway-xxx -n llm-gateway | grep -i throttl

Debug Container

For distroless/minimal images:

# Use ephemeral debug container
kubectl debug -it -n llm-gateway llm-gateway-xxx --image=busybox --target=gateway

# Or use debug pod
kubectl run debug --rm -it --image=nicolaka/netshoot -n llm-gateway -- /bin/bash

Useful Commands

# View all resources
kubectl get all -n llm-gateway

# Check deployment status
kubectl rollout status deployment/llm-gateway -n llm-gateway

# Tail logs from all pods
kubectl logs -n llm-gateway -l app=llm-gateway -f --max-log-requests=10

# Get events
kubectl get events -n llm-gateway --sort-by='.lastTimestamp'

# Check resource quotas
kubectl describe resourcequota -n llm-gateway

# Export current config
kubectl get deployment llm-gateway -n llm-gateway -o yaml > deployment-backup.yaml

# Force pod restart
kubectl rollout restart deployment/llm-gateway -n llm-gateway

# Delete and recreate deployment
kubectl delete deployment llm-gateway -n llm-gateway
kubectl apply -f deployment.yaml

Architecture Overview

┌─────────────────────────────────────────────────┐
│           Internet / Load Balancer              │
└────────────────────┬────────────────────────────┘
                     │
                     ▼
          ┌──────────────────────┐
          │  Ingress Controller  │
          │    (TLS/SSL)         │
          └──────────┬───────────┘
                     │
                     ▼
          ┌──────────────────────┐
          │  Gateway Service     │
          │   (ClusterIP:80)     │
          └──────────┬───────────┘
                     │
        ┌────────────┼────────────┐
        ▼            ▼            ▼
    ┌─────┐      ┌─────┐      ┌─────┐
    │ Pod │      │ Pod │      │ Pod │
    │  1  │      │  2  │      │  3  │
    └──┬──┘      └──┬──┘      └──┬──┘
       │            │            │
       └────────────┼────────────┘
                    │
       ┌────────────┼────────────┐
       ▼            ▼            ▼
   ┌──────┐    ┌──────┐    ┌──────┐
   │Redis │    │Prom  │    │Tempo │
   └──────┘    └──────┘    └──────┘

Additional Resources