Kubernetes Deployment Guide
This directory contains Kubernetes manifests for deploying the LLM Gateway to production.
Prerequisites
- Kubernetes cluster (v1.24+)
kubectlconfigured- Container registry access
- (Optional) Prometheus Operator for monitoring
- (Optional) cert-manager for TLS certificates
- (Optional) nginx-ingress-controller or cloud load balancer
Quick Start
1. Build and Push Docker Image
# Build the image
docker build -t your-registry/llm-gateway:v1.0.0 .
# Push to registry
docker push your-registry/llm-gateway:v1.0.0
2. Configure Secrets
Option A: Using kubectl
kubectl create namespace llm-gateway
kubectl create secret generic llm-gateway-secrets \
--from-literal=GOOGLE_API_KEY="your-key" \
--from-literal=ANTHROPIC_API_KEY="your-key" \
--from-literal=OPENAI_API_KEY="your-key" \
--from-literal=OIDC_AUDIENCE="your-client-id" \
-n llm-gateway
Option B: Using External Secrets Operator (Recommended)
- Uncomment the ExternalSecret in
secret.yaml - Configure your SecretStore (AWS Secrets Manager, Vault, etc.)
3. Update Configuration
Edit configmap.yaml:
- Update Redis connection string if using external Redis
- Configure observability endpoints (Tempo, Prometheus)
- Adjust rate limits as needed
- Set OIDC issuer and audience
Edit ingress.yaml:
- Replace
llm-gateway.example.comwith your domain - Configure TLS certificate annotations
Edit kustomization.yaml:
- Update image registry and tag
4. Deploy
Using Kustomize (Recommended):
kubectl apply -k k8s/
Using kubectl directly:
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/serviceaccount.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/redis.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/hpa.yaml
kubectl apply -f k8s/pdb.yaml
kubectl apply -f k8s/networkpolicy.yaml
With Prometheus Operator:
kubectl apply -f k8s/servicemonitor.yaml
kubectl apply -f k8s/prometheusrule.yaml
5. Verify Deployment
# Check pods
kubectl get pods -n llm-gateway
# Check services
kubectl get svc -n llm-gateway
# Check ingress
kubectl get ingress -n llm-gateway
# View logs
kubectl logs -n llm-gateway -l app=llm-gateway --tail=100 -f
# Check health
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80
curl http://localhost:8080/health
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ Internet/Clients │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Ingress Controller │
│ (nginx/ALB/GCE with TLS) │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ LLM Gateway Service │
│ (LoadBalancer) │
└───────────────────────┬─────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Gateway │ │ Gateway │ │ Gateway │
│ Pod 1 │ │ Pod 2 │ │ Pod 3 │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────────┼────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Redis │ │ Prometheus │ │ Tempo │
│ (Persistent) │ │ (Metrics) │ │ (Traces) │
└──────────────┘ └──────────────┘ └──────────────┘
Resource Specifications
Default Resources
- Requests: 100m CPU, 128Mi memory
- Limits: 1000m CPU, 512Mi memory
- Replicas: 3 (min), 20 (max with HPA)
Scaling
- HPA scales based on CPU (70%) and memory (80%)
- PodDisruptionBudget ensures minimum 2 replicas during disruptions
Configuration Options
Environment Variables (from Secret)
GOOGLE_API_KEY: Google AI API keyANTHROPIC_API_KEY: Anthropic API keyOPENAI_API_KEY: OpenAI API keyOIDC_AUDIENCE: OIDC client ID for authentication
ConfigMap Settings
See configmap.yaml for full configuration options:
- Server address
- Logging format and level
- Rate limiting
- Observability (metrics/tracing)
- Provider endpoints
- Conversation storage
- Authentication
Security
Security Features
- Non-root container execution (UID 1000)
- Read-only root filesystem
- No privilege escalation
- All capabilities dropped
- Network policies for ingress/egress control
- SeccompProfile: RuntimeDefault
TLS/HTTPS
- Ingress configured with TLS
- Uses cert-manager for automatic certificate provisioning
- Force SSL redirect enabled
Secrets Management
Never commit secrets to git!
Production options:
-
External Secrets Operator (Recommended)
- AWS Secrets Manager
- HashiCorp Vault
- Google Secret Manager
-
Sealed Secrets
- Encrypted secrets in git
-
Manual kubectl secrets
- Created outside of git
Monitoring
Metrics
- Exposed on
/metricsendpoint - Scraped by Prometheus via ServiceMonitor
- Key metrics:
- HTTP request rate, latency, errors
- Provider request rate, latency, token usage
- Conversation store operations
- Rate limiting hits
Alerts
See prometheusrule.yaml for configured alerts:
- High error rate
- High latency
- Provider failures
- Pod down
- High memory usage
- Rate limit threshold exceeded
- Conversation store errors
Logs
Structured JSON logs with:
- Request IDs
- Trace context (trace_id, span_id)
- Log levels (debug/info/warn/error)
View logs:
kubectl logs -n llm-gateway -l app=llm-gateway --tail=100 -f
Maintenance
Rolling Updates
# Update image
kubectl set image deployment/llm-gateway gateway=your-registry/llm-gateway:v1.0.1 -n llm-gateway
# Check rollout status
kubectl rollout status deployment/llm-gateway -n llm-gateway
# Rollback if needed
kubectl rollout undo deployment/llm-gateway -n llm-gateway
Scaling
# Manual scale
kubectl scale deployment/llm-gateway --replicas=5 -n llm-gateway
# HPA will auto-scale within min/max bounds (3-20)
Configuration Updates
# Edit ConfigMap
kubectl edit configmap llm-gateway-config -n llm-gateway
# Restart pods to pick up changes
kubectl rollout restart deployment/llm-gateway -n llm-gateway
Debugging
# Exec into pod
kubectl exec -it -n llm-gateway deployment/llm-gateway -- /bin/sh
# Port forward for local access
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80
# Check events
kubectl get events -n llm-gateway --sort-by='.lastTimestamp'
Production Considerations
High Availability
- Minimum 3 replicas across availability zones
- Pod anti-affinity rules spread pods across nodes
- PodDisruptionBudget ensures service availability during disruptions
Performance
- Adjust resource limits based on load testing
- Configure HPA thresholds based on traffic patterns
- Use node affinity for GPU nodes if needed
Cost Optimization
- Use spot/preemptible instances for non-critical workloads
- Set appropriate resource requests/limits
- Monitor token usage and implement quotas
Disaster Recovery
- Redis persistence (if using StatefulSet)
- Regular backups of conversation data
- Multi-region deployment for geo-redundancy
- Document runbooks for incident response
Cloud-Specific Notes
AWS EKS
- Use AWS Load Balancer Controller for ALB
- Configure IRSA for service account
- Use ElastiCache for Redis
- Store secrets in AWS Secrets Manager
GCP GKE
- Use GKE Ingress for GCLB
- Configure Workload Identity
- Use Memorystore for Redis
- Store secrets in Google Secret Manager
Azure AKS
- Use Azure Application Gateway Ingress Controller
- Configure Azure AD Workload Identity
- Use Azure Cache for Redis
- Store secrets in Azure Key Vault
Troubleshooting
Common Issues
Pods not starting:
kubectl describe pod -n llm-gateway -l app=llm-gateway
kubectl logs -n llm-gateway -l app=llm-gateway --previous
Health check failures:
kubectl port-forward -n llm-gateway deployment/llm-gateway 8080:8080
curl http://localhost:8080/health
curl http://localhost:8080/ready
Provider connection issues:
- Verify API keys in secrets
- Check network policies allow egress
- Verify provider endpoints are accessible
Redis connection issues:
kubectl exec -it -n llm-gateway redis-0 -- redis-cli ping