Files
latticelm/k8s/README.md
2026-03-06 21:55:42 +00:00

867 lines
19 KiB
Markdown

# Kubernetes Deployment Guide
> Production-ready Kubernetes manifests for deploying the LLM Gateway with high availability, monitoring, and security.
## Table of Contents
- [Quick Start](#quick-start)
- [Prerequisites](#prerequisites)
- [Deployment](#deployment)
- [Configuration](#configuration)
- [Secrets Management](#secrets-management)
- [Monitoring](#monitoring)
- [Storage Options](#storage-options)
- [Scaling](#scaling)
- [Updates and Rollbacks](#updates-and-rollbacks)
- [Security](#security)
- [Cloud Provider Guides](#cloud-provider-guides)
- [Troubleshooting](#troubleshooting)
## Quick Start
Deploy with default settings using pre-built images:
```bash
# Update kustomization.yaml with your image
cd k8s/
vim kustomization.yaml # Set image to ghcr.io/yourusername/llm-gateway:v1.0.0
# Create secrets
kubectl create namespace llm-gateway
kubectl create secret generic llm-gateway-secrets \
--from-literal=OPENAI_API_KEY="sk-your-key" \
--from-literal=ANTHROPIC_API_KEY="sk-ant-your-key" \
--from-literal=GOOGLE_API_KEY="your-key" \
-n llm-gateway
# Deploy
kubectl apply -k .
# Verify
kubectl get pods -n llm-gateway
kubectl logs -n llm-gateway -l app=llm-gateway
```
## Prerequisites
- **Kubernetes**: v1.24+ cluster
- **kubectl**: Configured and authenticated
- **Container images**: Access to `ghcr.io/yourusername/llm-gateway`
**Optional but recommended:**
- **Prometheus Operator**: For metrics and alerting
- **cert-manager**: For automatic TLS certificates
- **Ingress Controller**: nginx, ALB, or GCE
- **External Secrets Operator**: For secrets management
## Deployment
### Using Kustomize (Recommended)
```bash
# Review and customize
cd k8s/
vim kustomization.yaml # Update image, namespace, etc.
vim configmap.yaml # Configure gateway settings
vim ingress.yaml # Set your domain
# Deploy all resources
kubectl apply -k .
# Deploy with Kustomize overlays
kubectl apply -k overlays/production/
```
### Using kubectl
```bash
kubectl apply -f namespace.yaml
kubectl apply -f serviceaccount.yaml
kubectl apply -f secret.yaml
kubectl apply -f configmap.yaml
kubectl apply -f redis.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml
kubectl apply -f hpa.yaml
kubectl apply -f pdb.yaml
kubectl apply -f networkpolicy.yaml
```
### With Monitoring
If Prometheus Operator is installed:
```bash
kubectl apply -f servicemonitor.yaml
kubectl apply -f prometheusrule.yaml
```
## Configuration
### Image Configuration
Update `kustomization.yaml`:
```yaml
images:
- name: llm-gateway
newName: ghcr.io/yourusername/llm-gateway
newTag: v1.2.3 # Or 'latest', 'main', 'sha-abc123'
```
### Gateway Configuration
Edit `configmap.yaml` for gateway settings:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: llm-gateway-config
data:
config.yaml: |
server:
address: ":8080"
logging:
level: info
format: json
rate_limit:
enabled: true
requests_per_second: 10
burst: 20
observability:
enabled: true
metrics:
enabled: true
tracing:
enabled: true
exporter:
type: otlp
endpoint: tempo:4317
conversations:
store: redis
dsn: redis://redis:6379/0
ttl: 1h
```
### Resource Limits
Default resources (adjust based on load testing):
```yaml
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m
memory: 512Mi
```
### Ingress Configuration
Edit `ingress.yaml` for your domain:
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: llm-gateway
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- llm-gateway.yourdomain.com
secretName: llm-gateway-tls
rules:
- host: llm-gateway.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: llm-gateway
port:
number: 80
```
## Secrets Management
### Option 1: kubectl (Development)
```bash
kubectl create secret generic llm-gateway-secrets \
--from-literal=OPENAI_API_KEY="sk-..." \
--from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
--from-literal=GOOGLE_API_KEY="..." \
--from-literal=OIDC_AUDIENCE="your-client-id" \
-n llm-gateway
```
### Option 2: External Secrets Operator (Production)
Install ESO, then create ExternalSecret:
```yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: llm-gateway-secrets
namespace: llm-gateway
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secretsmanager # or vault, gcpsm, etc.
kind: ClusterSecretStore
target:
name: llm-gateway-secrets
data:
- secretKey: OPENAI_API_KEY
remoteRef:
key: llm-gateway/openai-key
- secretKey: ANTHROPIC_API_KEY
remoteRef:
key: llm-gateway/anthropic-key
- secretKey: GOOGLE_API_KEY
remoteRef:
key: llm-gateway/google-key
```
### Option 3: Sealed Secrets
```bash
# Encrypt secrets
echo -n "sk-your-key" | kubectl create secret generic llm-gateway-secrets \
--dry-run=client --from-file=OPENAI_API_KEY=/dev/stdin -o yaml | \
kubeseal -o yaml > sealed-secret.yaml
# Commit sealed-secret.yaml to git
kubectl apply -f sealed-secret.yaml
```
## Monitoring
### Metrics
ServiceMonitor for Prometheus Operator:
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: llm-gateway
spec:
selector:
matchLabels:
app: llm-gateway
endpoints:
- port: http
path: /metrics
interval: 30s
```
**Available metrics:**
- `gateway_requests_total` - Total requests by provider/model
- `gateway_request_duration_seconds` - Request latency histogram
- `gateway_provider_errors_total` - Errors by provider
- `gateway_circuit_breaker_state` - Circuit breaker state changes
- `gateway_rate_limit_hits_total` - Rate limit violations
### Alerts
PrometheusRule with common alerts:
```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: llm-gateway-alerts
spec:
groups:
- name: llm-gateway
interval: 30s
rules:
- alert: HighErrorRate
expr: rate(gateway_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: High error rate detected
- alert: PodDown
expr: kube_deployment_status_replicas_available{deployment="llm-gateway"} < 2
for: 5m
annotations:
summary: Less than 2 gateway pods running
```
### Logging
View logs:
```bash
# Tail logs
kubectl logs -n llm-gateway -l app=llm-gateway -f
# Filter by level
kubectl logs -n llm-gateway -l app=llm-gateway | jq 'select(.level=="error")'
# Search logs
kubectl logs -n llm-gateway -l app=llm-gateway | grep "circuit.*open"
```
### Tracing
Configure OpenTelemetry collector:
```yaml
observability:
tracing:
enabled: true
exporter:
type: otlp
endpoint: tempo:4317 # or jaeger-collector:4317
```
## Storage Options
### In-Memory (Default)
No persistence, lost on pod restart:
```yaml
conversations:
store: memory
```
### Redis (Recommended)
Deploy Redis StatefulSet:
```bash
kubectl apply -f redis.yaml
```
Configure gateway:
```yaml
conversations:
store: redis
dsn: redis://redis:6379/0
ttl: 1h
```
### External Redis
For production, use managed Redis:
```yaml
conversations:
store: redis
dsn: redis://:password@redis.example.com:6379/0
ttl: 1h
```
**Cloud providers:**
- **AWS**: ElastiCache for Redis
- **GCP**: Memorystore for Redis
- **Azure**: Azure Cache for Redis
### PostgreSQL
```yaml
conversations:
store: sql
driver: pgx
dsn: postgres://user:pass@postgres:5432/llm_gateway?sslmode=require
ttl: 1h
```
## Scaling
### Horizontal Pod Autoscaler
Default HPA configuration:
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: llm-gateway
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: llm-gateway
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
Monitor HPA:
```bash
kubectl get hpa -n llm-gateway
kubectl describe hpa llm-gateway -n llm-gateway
```
### Manual Scaling
```bash
# Scale to specific replica count
kubectl scale deployment/llm-gateway --replicas=10 -n llm-gateway
# Check status
kubectl get deployment llm-gateway -n llm-gateway
```
### Pod Disruption Budget
Ensures availability during disruptions:
```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: llm-gateway
spec:
minAvailable: 2
selector:
matchLabels:
app: llm-gateway
```
## Updates and Rollbacks
### Rolling Updates
```bash
# Update image
kubectl set image deployment/llm-gateway \
gateway=ghcr.io/yourusername/llm-gateway:v1.2.3 \
-n llm-gateway
# Watch rollout
kubectl rollout status deployment/llm-gateway -n llm-gateway
# Pause rollout if issues
kubectl rollout pause deployment/llm-gateway -n llm-gateway
# Resume rollout
kubectl rollout resume deployment/llm-gateway -n llm-gateway
```
### Rollback
```bash
# Rollback to previous version
kubectl rollout undo deployment/llm-gateway -n llm-gateway
# Rollback to specific revision
kubectl rollout history deployment/llm-gateway -n llm-gateway
kubectl rollout undo deployment/llm-gateway --to-revision=3 -n llm-gateway
```
### Blue-Green Deployment
```bash
# Deploy new version with different label
kubectl apply -f deployment-v2.yaml
# Test new version
kubectl port-forward -n llm-gateway deployment/llm-gateway-v2 8080:8080
# Switch service to new version
kubectl patch service llm-gateway -n llm-gateway \
-p '{"spec":{"selector":{"version":"v2"}}}'
# Delete old version after verification
kubectl delete deployment llm-gateway-v1 -n llm-gateway
```
## Security
### Pod Security
Deployment includes security best practices:
```yaml
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: gateway
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
```
### Network Policies
Restrict traffic to/from gateway pods:
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: llm-gateway
spec:
podSelector:
matchLabels:
app: llm-gateway
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
egress:
- to: # Allow DNS
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- to: # Allow Redis
- podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379
- to: # Allow external LLM providers (HTTPS)
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
```
### RBAC
ServiceAccount with minimal permissions:
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: llm-gateway
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: llm-gateway
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: llm-gateway
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: llm-gateway
subjects:
- kind: ServiceAccount
name: llm-gateway
```
## Cloud Provider Guides
### AWS EKS
```bash
# Install AWS Load Balancer Controller
kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=my-cluster
# Update ingress for ALB
# Add annotations to ingress.yaml:
metadata:
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
```
**IRSA for secrets:**
```bash
# Create IAM role and associate with ServiceAccount
eksctl create iamserviceaccount \
--name llm-gateway \
--namespace llm-gateway \
--cluster my-cluster \
--attach-policy-arn arn:aws:iam::aws:policy/SecretsManagerReadWrite \
--approve
```
**ElastiCache Redis:**
```yaml
conversations:
store: redis
dsn: redis://my-cluster.cache.amazonaws.com:6379/0
```
### GCP GKE
```bash
# Enable Workload Identity
gcloud container clusters update my-cluster \
--workload-pool=PROJECT_ID.svc.id.goog
# Create service account with Secret Manager access
gcloud iam service-accounts create llm-gateway
gcloud projects add-iam-policy-binding PROJECT_ID \
--member "serviceAccount:llm-gateway@PROJECT_ID.iam.gserviceaccount.com" \
--role "roles/secretmanager.secretAccessor"
# Bind K8s SA to GCP SA
kubectl annotate serviceaccount llm-gateway \
-n llm-gateway \
iam.gke.io/gcp-service-account=llm-gateway@PROJECT_ID.iam.gserviceaccount.com
```
**Memorystore Redis:**
```yaml
conversations:
store: redis
dsn: redis://10.0.0.3:6379/0 # Private IP from Memorystore
```
### Azure AKS
```bash
# Install Application Gateway Ingress Controller
az aks enable-addons \
--resource-group myResourceGroup \
--name myAKSCluster \
--addons ingress-appgw \
--appgw-name myApplicationGateway
# Configure Azure AD Workload Identity
az aks update \
--resource-group myResourceGroup \
--name myAKSCluster \
--enable-oidc-issuer \
--enable-workload-identity
```
**Azure Key Vault with ESO:**
```yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: azure-keyvault
spec:
provider:
azurekv:
authType: WorkloadIdentity
vaultUrl: https://my-vault.vault.azure.net
```
## Troubleshooting
### Pods Not Starting
```bash
# Check pod status
kubectl get pods -n llm-gateway
# Describe pod for events
kubectl describe pod llm-gateway-xxx -n llm-gateway
# Check logs
kubectl logs -n llm-gateway llm-gateway-xxx
# Check previous container logs (if crashed)
kubectl logs -n llm-gateway llm-gateway-xxx --previous
```
**Common issues:**
- Image pull errors: Check registry credentials
- CrashLoopBackOff: Check logs for startup errors
- Pending: Check resource quotas and node capacity
### Health Check Failures
```bash
# Port-forward to test locally
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80
# Test endpoints
curl http://localhost:8080/health
curl http://localhost:8080/ready
# Check from inside pod
kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- http://localhost:8080/health
```
### Provider Connection Issues
```bash
# Test egress from pod
kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- https://api.openai.com
# Check secrets
kubectl get secret llm-gateway-secrets -n llm-gateway -o jsonpath='{.data.OPENAI_API_KEY}' | base64 -d
# Verify network policies
kubectl get networkpolicy -n llm-gateway
kubectl describe networkpolicy llm-gateway -n llm-gateway
```
### Redis Connection Issues
```bash
# Test Redis connectivity
kubectl exec -n llm-gateway deployment/llm-gateway -- nc -zv redis 6379
# Connect to Redis
kubectl exec -it -n llm-gateway redis-0 -- redis-cli
# Check Redis logs
kubectl logs -n llm-gateway redis-0
```
### Performance Issues
```bash
# Check resource usage
kubectl top pods -n llm-gateway
kubectl top nodes
# Check HPA status
kubectl describe hpa llm-gateway -n llm-gateway
# Check for throttling
kubectl describe pod llm-gateway-xxx -n llm-gateway | grep -i throttl
```
### Debug Container
For distroless/minimal images:
```bash
# Use ephemeral debug container
kubectl debug -it -n llm-gateway llm-gateway-xxx --image=busybox --target=gateway
# Or use debug pod
kubectl run debug --rm -it --image=nicolaka/netshoot -n llm-gateway -- /bin/bash
```
## Useful Commands
```bash
# View all resources
kubectl get all -n llm-gateway
# Check deployment status
kubectl rollout status deployment/llm-gateway -n llm-gateway
# Tail logs from all pods
kubectl logs -n llm-gateway -l app=llm-gateway -f --max-log-requests=10
# Get events
kubectl get events -n llm-gateway --sort-by='.lastTimestamp'
# Check resource quotas
kubectl describe resourcequota -n llm-gateway
# Export current config
kubectl get deployment llm-gateway -n llm-gateway -o yaml > deployment-backup.yaml
# Force pod restart
kubectl rollout restart deployment/llm-gateway -n llm-gateway
# Delete and recreate deployment
kubectl delete deployment llm-gateway -n llm-gateway
kubectl apply -f deployment.yaml
```
## Architecture Overview
```
┌─────────────────────────────────────────────────┐
│ Internet / Load Balancer │
└────────────────────┬────────────────────────────┘
┌──────────────────────┐
│ Ingress Controller │
│ (TLS/SSL) │
└──────────┬───────────┘
┌──────────────────────┐
│ Gateway Service │
│ (ClusterIP:80) │
└──────────┬───────────┘
┌────────────┼────────────┐
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐
│ Pod │ │ Pod │ │ Pod │
│ 1 │ │ 2 │ │ 3 │
└──┬──┘ └──┬──┘ └──┬──┘
│ │ │
└────────────┼────────────┘
┌────────────┼────────────┐
▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────┐
│Redis │ │Prom │ │Tempo │
└──────┘ └──────┘ └──────┘
```
## Additional Resources
- [Main Documentation](../README.md)
- [Docker Deployment](../docs/DOCKER_DEPLOYMENT.md)
- [Kubernetes Best Practices](https://kubernetes.io/docs/concepts/configuration/overview/)
- [Prometheus Operator](https://prometheus-operator.dev/)
- [External Secrets Operator](https://external-secrets.io/)
- [cert-manager](https://cert-manager.io/)