19 KiB
Kubernetes Deployment Guide
Production-ready Kubernetes manifests for deploying the LLM Gateway with high availability, monitoring, and security.
Table of Contents
- Quick Start
- Prerequisites
- Deployment
- Configuration
- Secrets Management
- Monitoring
- Storage Options
- Scaling
- Updates and Rollbacks
- Security
- Cloud Provider Guides
- Troubleshooting
Quick Start
Deploy with default settings using pre-built images:
# Update kustomization.yaml with your image
cd k8s/
vim kustomization.yaml # Set image to ghcr.io/yourusername/llm-gateway:v1.0.0
# Create secrets
kubectl create namespace llm-gateway
kubectl create secret generic llm-gateway-secrets \
--from-literal=OPENAI_API_KEY="sk-your-key" \
--from-literal=ANTHROPIC_API_KEY="sk-ant-your-key" \
--from-literal=GOOGLE_API_KEY="your-key" \
-n llm-gateway
# Deploy
kubectl apply -k .
# Verify
kubectl get pods -n llm-gateway
kubectl logs -n llm-gateway -l app=llm-gateway
Prerequisites
- Kubernetes: v1.24+ cluster
- kubectl: Configured and authenticated
- Container images: Access to
ghcr.io/yourusername/llm-gateway
Optional but recommended:
- Prometheus Operator: For metrics and alerting
- cert-manager: For automatic TLS certificates
- Ingress Controller: nginx, ALB, or GCE
- External Secrets Operator: For secrets management
Deployment
Using Kustomize (Recommended)
# Review and customize
cd k8s/
vim kustomization.yaml # Update image, namespace, etc.
vim configmap.yaml # Configure gateway settings
vim ingress.yaml # Set your domain
# Deploy all resources
kubectl apply -k .
# Deploy with Kustomize overlays
kubectl apply -k overlays/production/
Using kubectl
kubectl apply -f namespace.yaml
kubectl apply -f serviceaccount.yaml
kubectl apply -f secret.yaml
kubectl apply -f configmap.yaml
kubectl apply -f redis.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml
kubectl apply -f hpa.yaml
kubectl apply -f pdb.yaml
kubectl apply -f networkpolicy.yaml
With Monitoring
If Prometheus Operator is installed:
kubectl apply -f servicemonitor.yaml
kubectl apply -f prometheusrule.yaml
Configuration
Image Configuration
Update kustomization.yaml:
images:
- name: llm-gateway
newName: ghcr.io/yourusername/llm-gateway
newTag: v1.2.3 # Or 'latest', 'main', 'sha-abc123'
Gateway Configuration
Edit configmap.yaml for gateway settings:
apiVersion: v1
kind: ConfigMap
metadata:
name: llm-gateway-config
data:
config.yaml: |
server:
address: ":8080"
logging:
level: info
format: json
rate_limit:
enabled: true
requests_per_second: 10
burst: 20
observability:
enabled: true
metrics:
enabled: true
tracing:
enabled: true
exporter:
type: otlp
endpoint: tempo:4317
conversations:
store: redis
dsn: redis://redis:6379/0
ttl: 1h
Resource Limits
Default resources (adjust based on load testing):
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m
memory: 512Mi
Ingress Configuration
Edit ingress.yaml for your domain:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: llm-gateway
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- llm-gateway.yourdomain.com
secretName: llm-gateway-tls
rules:
- host: llm-gateway.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: llm-gateway
port:
number: 80
Secrets Management
Option 1: kubectl (Development)
kubectl create secret generic llm-gateway-secrets \
--from-literal=OPENAI_API_KEY="sk-..." \
--from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
--from-literal=GOOGLE_API_KEY="..." \
--from-literal=OIDC_AUDIENCE="your-client-id" \
-n llm-gateway
Option 2: External Secrets Operator (Production)
Install ESO, then create ExternalSecret:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: llm-gateway-secrets
namespace: llm-gateway
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secretsmanager # or vault, gcpsm, etc.
kind: ClusterSecretStore
target:
name: llm-gateway-secrets
data:
- secretKey: OPENAI_API_KEY
remoteRef:
key: llm-gateway/openai-key
- secretKey: ANTHROPIC_API_KEY
remoteRef:
key: llm-gateway/anthropic-key
- secretKey: GOOGLE_API_KEY
remoteRef:
key: llm-gateway/google-key
Option 3: Sealed Secrets
# Encrypt secrets
echo -n "sk-your-key" | kubectl create secret generic llm-gateway-secrets \
--dry-run=client --from-file=OPENAI_API_KEY=/dev/stdin -o yaml | \
kubeseal -o yaml > sealed-secret.yaml
# Commit sealed-secret.yaml to git
kubectl apply -f sealed-secret.yaml
Monitoring
Metrics
ServiceMonitor for Prometheus Operator:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: llm-gateway
spec:
selector:
matchLabels:
app: llm-gateway
endpoints:
- port: http
path: /metrics
interval: 30s
Available metrics:
gateway_requests_total- Total requests by provider/modelgateway_request_duration_seconds- Request latency histogramgateway_provider_errors_total- Errors by providergateway_circuit_breaker_state- Circuit breaker state changesgateway_rate_limit_hits_total- Rate limit violations
Alerts
PrometheusRule with common alerts:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: llm-gateway-alerts
spec:
groups:
- name: llm-gateway
interval: 30s
rules:
- alert: HighErrorRate
expr: rate(gateway_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: High error rate detected
- alert: PodDown
expr: kube_deployment_status_replicas_available{deployment="llm-gateway"} < 2
for: 5m
annotations:
summary: Less than 2 gateway pods running
Logging
View logs:
# Tail logs
kubectl logs -n llm-gateway -l app=llm-gateway -f
# Filter by level
kubectl logs -n llm-gateway -l app=llm-gateway | jq 'select(.level=="error")'
# Search logs
kubectl logs -n llm-gateway -l app=llm-gateway | grep "circuit.*open"
Tracing
Configure OpenTelemetry collector:
observability:
tracing:
enabled: true
exporter:
type: otlp
endpoint: tempo:4317 # or jaeger-collector:4317
Storage Options
In-Memory (Default)
No persistence, lost on pod restart:
conversations:
store: memory
Redis (Recommended)
Deploy Redis StatefulSet:
kubectl apply -f redis.yaml
Configure gateway:
conversations:
store: redis
dsn: redis://redis:6379/0
ttl: 1h
External Redis
For production, use managed Redis:
conversations:
store: redis
dsn: redis://:password@redis.example.com:6379/0
ttl: 1h
Cloud providers:
- AWS: ElastiCache for Redis
- GCP: Memorystore for Redis
- Azure: Azure Cache for Redis
PostgreSQL
conversations:
store: sql
driver: pgx
dsn: postgres://user:pass@postgres:5432/llm_gateway?sslmode=require
ttl: 1h
Scaling
Horizontal Pod Autoscaler
Default HPA configuration:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: llm-gateway
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: llm-gateway
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Monitor HPA:
kubectl get hpa -n llm-gateway
kubectl describe hpa llm-gateway -n llm-gateway
Manual Scaling
# Scale to specific replica count
kubectl scale deployment/llm-gateway --replicas=10 -n llm-gateway
# Check status
kubectl get deployment llm-gateway -n llm-gateway
Pod Disruption Budget
Ensures availability during disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: llm-gateway
spec:
minAvailable: 2
selector:
matchLabels:
app: llm-gateway
Updates and Rollbacks
Rolling Updates
# Update image
kubectl set image deployment/llm-gateway \
gateway=ghcr.io/yourusername/llm-gateway:v1.2.3 \
-n llm-gateway
# Watch rollout
kubectl rollout status deployment/llm-gateway -n llm-gateway
# Pause rollout if issues
kubectl rollout pause deployment/llm-gateway -n llm-gateway
# Resume rollout
kubectl rollout resume deployment/llm-gateway -n llm-gateway
Rollback
# Rollback to previous version
kubectl rollout undo deployment/llm-gateway -n llm-gateway
# Rollback to specific revision
kubectl rollout history deployment/llm-gateway -n llm-gateway
kubectl rollout undo deployment/llm-gateway --to-revision=3 -n llm-gateway
Blue-Green Deployment
# Deploy new version with different label
kubectl apply -f deployment-v2.yaml
# Test new version
kubectl port-forward -n llm-gateway deployment/llm-gateway-v2 8080:8080
# Switch service to new version
kubectl patch service llm-gateway -n llm-gateway \
-p '{"spec":{"selector":{"version":"v2"}}}'
# Delete old version after verification
kubectl delete deployment llm-gateway-v1 -n llm-gateway
Security
Pod Security
Deployment includes security best practices:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: gateway
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
Network Policies
Restrict traffic to/from gateway pods:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: llm-gateway
spec:
podSelector:
matchLabels:
app: llm-gateway
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
egress:
- to: # Allow DNS
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- to: # Allow Redis
- podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379
- to: # Allow external LLM providers (HTTPS)
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
RBAC
ServiceAccount with minimal permissions:
apiVersion: v1
kind: ServiceAccount
metadata:
name: llm-gateway
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: llm-gateway
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: llm-gateway
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: llm-gateway
subjects:
- kind: ServiceAccount
name: llm-gateway
Cloud Provider Guides
AWS EKS
# Install AWS Load Balancer Controller
kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=my-cluster
# Update ingress for ALB
# Add annotations to ingress.yaml:
metadata:
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
IRSA for secrets:
# Create IAM role and associate with ServiceAccount
eksctl create iamserviceaccount \
--name llm-gateway \
--namespace llm-gateway \
--cluster my-cluster \
--attach-policy-arn arn:aws:iam::aws:policy/SecretsManagerReadWrite \
--approve
ElastiCache Redis:
conversations:
store: redis
dsn: redis://my-cluster.cache.amazonaws.com:6379/0
GCP GKE
# Enable Workload Identity
gcloud container clusters update my-cluster \
--workload-pool=PROJECT_ID.svc.id.goog
# Create service account with Secret Manager access
gcloud iam service-accounts create llm-gateway
gcloud projects add-iam-policy-binding PROJECT_ID \
--member "serviceAccount:llm-gateway@PROJECT_ID.iam.gserviceaccount.com" \
--role "roles/secretmanager.secretAccessor"
# Bind K8s SA to GCP SA
kubectl annotate serviceaccount llm-gateway \
-n llm-gateway \
iam.gke.io/gcp-service-account=llm-gateway@PROJECT_ID.iam.gserviceaccount.com
Memorystore Redis:
conversations:
store: redis
dsn: redis://10.0.0.3:6379/0 # Private IP from Memorystore
Azure AKS
# Install Application Gateway Ingress Controller
az aks enable-addons \
--resource-group myResourceGroup \
--name myAKSCluster \
--addons ingress-appgw \
--appgw-name myApplicationGateway
# Configure Azure AD Workload Identity
az aks update \
--resource-group myResourceGroup \
--name myAKSCluster \
--enable-oidc-issuer \
--enable-workload-identity
Azure Key Vault with ESO:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: azure-keyvault
spec:
provider:
azurekv:
authType: WorkloadIdentity
vaultUrl: https://my-vault.vault.azure.net
Troubleshooting
Pods Not Starting
# Check pod status
kubectl get pods -n llm-gateway
# Describe pod for events
kubectl describe pod llm-gateway-xxx -n llm-gateway
# Check logs
kubectl logs -n llm-gateway llm-gateway-xxx
# Check previous container logs (if crashed)
kubectl logs -n llm-gateway llm-gateway-xxx --previous
Common issues:
- Image pull errors: Check registry credentials
- CrashLoopBackOff: Check logs for startup errors
- Pending: Check resource quotas and node capacity
Health Check Failures
# Port-forward to test locally
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80
# Test endpoints
curl http://localhost:8080/health
curl http://localhost:8080/ready
# Check from inside pod
kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- http://localhost:8080/health
Provider Connection Issues
# Test egress from pod
kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- https://api.openai.com
# Check secrets
kubectl get secret llm-gateway-secrets -n llm-gateway -o jsonpath='{.data.OPENAI_API_KEY}' | base64 -d
# Verify network policies
kubectl get networkpolicy -n llm-gateway
kubectl describe networkpolicy llm-gateway -n llm-gateway
Redis Connection Issues
# Test Redis connectivity
kubectl exec -n llm-gateway deployment/llm-gateway -- nc -zv redis 6379
# Connect to Redis
kubectl exec -it -n llm-gateway redis-0 -- redis-cli
# Check Redis logs
kubectl logs -n llm-gateway redis-0
Performance Issues
# Check resource usage
kubectl top pods -n llm-gateway
kubectl top nodes
# Check HPA status
kubectl describe hpa llm-gateway -n llm-gateway
# Check for throttling
kubectl describe pod llm-gateway-xxx -n llm-gateway | grep -i throttl
Debug Container
For distroless/minimal images:
# Use ephemeral debug container
kubectl debug -it -n llm-gateway llm-gateway-xxx --image=busybox --target=gateway
# Or use debug pod
kubectl run debug --rm -it --image=nicolaka/netshoot -n llm-gateway -- /bin/bash
Useful Commands
# View all resources
kubectl get all -n llm-gateway
# Check deployment status
kubectl rollout status deployment/llm-gateway -n llm-gateway
# Tail logs from all pods
kubectl logs -n llm-gateway -l app=llm-gateway -f --max-log-requests=10
# Get events
kubectl get events -n llm-gateway --sort-by='.lastTimestamp'
# Check resource quotas
kubectl describe resourcequota -n llm-gateway
# Export current config
kubectl get deployment llm-gateway -n llm-gateway -o yaml > deployment-backup.yaml
# Force pod restart
kubectl rollout restart deployment/llm-gateway -n llm-gateway
# Delete and recreate deployment
kubectl delete deployment llm-gateway -n llm-gateway
kubectl apply -f deployment.yaml
Architecture Overview
┌─────────────────────────────────────────────────┐
│ Internet / Load Balancer │
└────────────────────┬────────────────────────────┘
│
▼
┌──────────────────────┐
│ Ingress Controller │
│ (TLS/SSL) │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Gateway Service │
│ (ClusterIP:80) │
└──────────┬───────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐
│ Pod │ │ Pod │ │ Pod │
│ 1 │ │ 2 │ │ 3 │
└──┬──┘ └──┬──┘ └──┬──┘
│ │ │
└────────────┼────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────┐
│Redis │ │Prom │ │Tempo │
└──────┘ └──────┘ └──────┘