353 lines
10 KiB
Markdown
353 lines
10 KiB
Markdown
# Kubernetes Deployment Guide
|
|
|
|
This directory contains Kubernetes manifests for deploying the LLM Gateway to production.
|
|
|
|
## Prerequisites
|
|
|
|
- Kubernetes cluster (v1.24+)
|
|
- `kubectl` configured
|
|
- Container registry access
|
|
- (Optional) Prometheus Operator for monitoring
|
|
- (Optional) cert-manager for TLS certificates
|
|
- (Optional) nginx-ingress-controller or cloud load balancer
|
|
|
|
## Quick Start
|
|
|
|
### 1. Build and Push Docker Image
|
|
|
|
```bash
|
|
# Build the image
|
|
docker build -t your-registry/llm-gateway:v1.0.0 .
|
|
|
|
# Push to registry
|
|
docker push your-registry/llm-gateway:v1.0.0
|
|
```
|
|
|
|
### 2. Configure Secrets
|
|
|
|
**Option A: Using kubectl**
|
|
```bash
|
|
kubectl create namespace llm-gateway
|
|
|
|
kubectl create secret generic llm-gateway-secrets \
|
|
--from-literal=GOOGLE_API_KEY="your-key" \
|
|
--from-literal=ANTHROPIC_API_KEY="your-key" \
|
|
--from-literal=OPENAI_API_KEY="your-key" \
|
|
--from-literal=OIDC_AUDIENCE="your-client-id" \
|
|
-n llm-gateway
|
|
```
|
|
|
|
**Option B: Using External Secrets Operator (Recommended)**
|
|
- Uncomment the ExternalSecret in `secret.yaml`
|
|
- Configure your SecretStore (AWS Secrets Manager, Vault, etc.)
|
|
|
|
### 3. Update Configuration
|
|
|
|
Edit `configmap.yaml`:
|
|
- Update Redis connection string if using external Redis
|
|
- Configure observability endpoints (Tempo, Prometheus)
|
|
- Adjust rate limits as needed
|
|
- Set OIDC issuer and audience
|
|
|
|
Edit `ingress.yaml`:
|
|
- Replace `llm-gateway.example.com` with your domain
|
|
- Configure TLS certificate annotations
|
|
|
|
Edit `kustomization.yaml`:
|
|
- Update image registry and tag
|
|
|
|
### 4. Deploy
|
|
|
|
**Using Kustomize (Recommended):**
|
|
```bash
|
|
kubectl apply -k k8s/
|
|
```
|
|
|
|
**Using kubectl directly:**
|
|
```bash
|
|
kubectl apply -f k8s/namespace.yaml
|
|
kubectl apply -f k8s/serviceaccount.yaml
|
|
kubectl apply -f k8s/secret.yaml
|
|
kubectl apply -f k8s/configmap.yaml
|
|
kubectl apply -f k8s/redis.yaml
|
|
kubectl apply -f k8s/deployment.yaml
|
|
kubectl apply -f k8s/service.yaml
|
|
kubectl apply -f k8s/ingress.yaml
|
|
kubectl apply -f k8s/hpa.yaml
|
|
kubectl apply -f k8s/pdb.yaml
|
|
kubectl apply -f k8s/networkpolicy.yaml
|
|
```
|
|
|
|
**With Prometheus Operator:**
|
|
```bash
|
|
kubectl apply -f k8s/servicemonitor.yaml
|
|
kubectl apply -f k8s/prometheusrule.yaml
|
|
```
|
|
|
|
### 5. Verify Deployment
|
|
|
|
```bash
|
|
# Check pods
|
|
kubectl get pods -n llm-gateway
|
|
|
|
# Check services
|
|
kubectl get svc -n llm-gateway
|
|
|
|
# Check ingress
|
|
kubectl get ingress -n llm-gateway
|
|
|
|
# View logs
|
|
kubectl logs -n llm-gateway -l app=llm-gateway --tail=100 -f
|
|
|
|
# Check health
|
|
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80
|
|
curl http://localhost:8080/health
|
|
```
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Internet/Clients │
|
|
└───────────────────────┬─────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Ingress Controller │
|
|
│ (nginx/ALB/GCE with TLS) │
|
|
└───────────────────────┬─────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ LLM Gateway Service │
|
|
│ (LoadBalancer) │
|
|
└───────────────────────┬─────────────────────────────────┘
|
|
│
|
|
┌───────────────┼───────────────┐
|
|
▼ ▼ ▼
|
|
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
│ Gateway │ │ Gateway │ │ Gateway │
|
|
│ Pod 1 │ │ Pod 2 │ │ Pod 3 │
|
|
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
|
|
│ │ │
|
|
└────────────────┼────────────────┘
|
|
│
|
|
┌───────────────┼───────────────┐
|
|
▼ ▼ ▼
|
|
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
│ Redis │ │ Prometheus │ │ Tempo │
|
|
│ (Persistent) │ │ (Metrics) │ │ (Traces) │
|
|
└──────────────┘ └──────────────┘ └──────────────┘
|
|
```
|
|
|
|
## Resource Specifications
|
|
|
|
### Default Resources
|
|
- **Requests**: 100m CPU, 128Mi memory
|
|
- **Limits**: 1000m CPU, 512Mi memory
|
|
- **Replicas**: 3 (min), 20 (max with HPA)
|
|
|
|
### Scaling
|
|
- HPA scales based on CPU (70%) and memory (80%)
|
|
- PodDisruptionBudget ensures minimum 2 replicas during disruptions
|
|
|
|
## Configuration Options
|
|
|
|
### Environment Variables (from Secret)
|
|
- `GOOGLE_API_KEY`: Google AI API key
|
|
- `ANTHROPIC_API_KEY`: Anthropic API key
|
|
- `OPENAI_API_KEY`: OpenAI API key
|
|
- `OIDC_AUDIENCE`: OIDC client ID for authentication
|
|
|
|
### ConfigMap Settings
|
|
See `configmap.yaml` for full configuration options:
|
|
- Server address
|
|
- Logging format and level
|
|
- Rate limiting
|
|
- Observability (metrics/tracing)
|
|
- Provider endpoints
|
|
- Conversation storage
|
|
- Authentication
|
|
|
|
## Security
|
|
|
|
### Security Features
|
|
- Non-root container execution (UID 1000)
|
|
- Read-only root filesystem
|
|
- No privilege escalation
|
|
- All capabilities dropped
|
|
- Network policies for ingress/egress control
|
|
- SeccompProfile: RuntimeDefault
|
|
|
|
### TLS/HTTPS
|
|
- Ingress configured with TLS
|
|
- Uses cert-manager for automatic certificate provisioning
|
|
- Force SSL redirect enabled
|
|
|
|
### Secrets Management
|
|
**Never commit secrets to git!**
|
|
|
|
Production options:
|
|
1. **External Secrets Operator** (Recommended)
|
|
- AWS Secrets Manager
|
|
- HashiCorp Vault
|
|
- Google Secret Manager
|
|
|
|
2. **Sealed Secrets**
|
|
- Encrypted secrets in git
|
|
|
|
3. **Manual kubectl secrets**
|
|
- Created outside of git
|
|
|
|
## Monitoring
|
|
|
|
### Metrics
|
|
- Exposed on `/metrics` endpoint
|
|
- Scraped by Prometheus via ServiceMonitor
|
|
- Key metrics:
|
|
- HTTP request rate, latency, errors
|
|
- Provider request rate, latency, token usage
|
|
- Conversation store operations
|
|
- Rate limiting hits
|
|
|
|
### Alerts
|
|
See `prometheusrule.yaml` for configured alerts:
|
|
- High error rate
|
|
- High latency
|
|
- Provider failures
|
|
- Pod down
|
|
- High memory usage
|
|
- Rate limit threshold exceeded
|
|
- Conversation store errors
|
|
|
|
### Logs
|
|
Structured JSON logs with:
|
|
- Request IDs
|
|
- Trace context (trace_id, span_id)
|
|
- Log levels (debug/info/warn/error)
|
|
|
|
View logs:
|
|
```bash
|
|
kubectl logs -n llm-gateway -l app=llm-gateway --tail=100 -f
|
|
```
|
|
|
|
## Maintenance
|
|
|
|
### Rolling Updates
|
|
```bash
|
|
# Update image
|
|
kubectl set image deployment/llm-gateway gateway=your-registry/llm-gateway:v1.0.1 -n llm-gateway
|
|
|
|
# Check rollout status
|
|
kubectl rollout status deployment/llm-gateway -n llm-gateway
|
|
|
|
# Rollback if needed
|
|
kubectl rollout undo deployment/llm-gateway -n llm-gateway
|
|
```
|
|
|
|
### Scaling
|
|
```bash
|
|
# Manual scale
|
|
kubectl scale deployment/llm-gateway --replicas=5 -n llm-gateway
|
|
|
|
# HPA will auto-scale within min/max bounds (3-20)
|
|
```
|
|
|
|
### Configuration Updates
|
|
```bash
|
|
# Edit ConfigMap
|
|
kubectl edit configmap llm-gateway-config -n llm-gateway
|
|
|
|
# Restart pods to pick up changes
|
|
kubectl rollout restart deployment/llm-gateway -n llm-gateway
|
|
```
|
|
|
|
### Debugging
|
|
```bash
|
|
# Exec into pod
|
|
kubectl exec -it -n llm-gateway deployment/llm-gateway -- /bin/sh
|
|
|
|
# Port forward for local access
|
|
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80
|
|
|
|
# Check events
|
|
kubectl get events -n llm-gateway --sort-by='.lastTimestamp'
|
|
```
|
|
|
|
## Production Considerations
|
|
|
|
### High Availability
|
|
- Minimum 3 replicas across availability zones
|
|
- Pod anti-affinity rules spread pods across nodes
|
|
- PodDisruptionBudget ensures service availability during disruptions
|
|
|
|
### Performance
|
|
- Adjust resource limits based on load testing
|
|
- Configure HPA thresholds based on traffic patterns
|
|
- Use node affinity for GPU nodes if needed
|
|
|
|
### Cost Optimization
|
|
- Use spot/preemptible instances for non-critical workloads
|
|
- Set appropriate resource requests/limits
|
|
- Monitor token usage and implement quotas
|
|
|
|
### Disaster Recovery
|
|
- Redis persistence (if using StatefulSet)
|
|
- Regular backups of conversation data
|
|
- Multi-region deployment for geo-redundancy
|
|
- Document runbooks for incident response
|
|
|
|
## Cloud-Specific Notes
|
|
|
|
### AWS EKS
|
|
- Use AWS Load Balancer Controller for ALB
|
|
- Configure IRSA for service account
|
|
- Use ElastiCache for Redis
|
|
- Store secrets in AWS Secrets Manager
|
|
|
|
### GCP GKE
|
|
- Use GKE Ingress for GCLB
|
|
- Configure Workload Identity
|
|
- Use Memorystore for Redis
|
|
- Store secrets in Google Secret Manager
|
|
|
|
### Azure AKS
|
|
- Use Azure Application Gateway Ingress Controller
|
|
- Configure Azure AD Workload Identity
|
|
- Use Azure Cache for Redis
|
|
- Store secrets in Azure Key Vault
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Pods not starting:**
|
|
```bash
|
|
kubectl describe pod -n llm-gateway -l app=llm-gateway
|
|
kubectl logs -n llm-gateway -l app=llm-gateway --previous
|
|
```
|
|
|
|
**Health check failures:**
|
|
```bash
|
|
kubectl port-forward -n llm-gateway deployment/llm-gateway 8080:8080
|
|
curl http://localhost:8080/health
|
|
curl http://localhost:8080/ready
|
|
```
|
|
|
|
**Provider connection issues:**
|
|
- Verify API keys in secrets
|
|
- Check network policies allow egress
|
|
- Verify provider endpoints are accessible
|
|
|
|
**Redis connection issues:**
|
|
```bash
|
|
kubectl exec -it -n llm-gateway redis-0 -- redis-cli ping
|
|
```
|
|
|
|
## Additional Resources
|
|
|
|
- [Kubernetes Documentation](https://kubernetes.io/docs/)
|
|
- [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)
|
|
- [cert-manager](https://cert-manager.io/)
|
|
- [External Secrets Operator](https://external-secrets.io/)
|