Add Dockerfile and Manifests
This commit is contained in:
352
k8s/README.md
Normal file
352
k8s/README.md
Normal file
@@ -0,0 +1,352 @@
|
||||
# Kubernetes Deployment Guide
|
||||
|
||||
This directory contains Kubernetes manifests for deploying the LLM Gateway to production.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Kubernetes cluster (v1.24+)
|
||||
- `kubectl` configured
|
||||
- Container registry access
|
||||
- (Optional) Prometheus Operator for monitoring
|
||||
- (Optional) cert-manager for TLS certificates
|
||||
- (Optional) nginx-ingress-controller or cloud load balancer
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Build and Push Docker Image
|
||||
|
||||
```bash
|
||||
# Build the image
|
||||
docker build -t your-registry/llm-gateway:v1.0.0 .
|
||||
|
||||
# Push to registry
|
||||
docker push your-registry/llm-gateway:v1.0.0
|
||||
```
|
||||
|
||||
### 2. Configure Secrets
|
||||
|
||||
**Option A: Using kubectl**
|
||||
```bash
|
||||
kubectl create namespace llm-gateway
|
||||
|
||||
kubectl create secret generic llm-gateway-secrets \
|
||||
--from-literal=GOOGLE_API_KEY="your-key" \
|
||||
--from-literal=ANTHROPIC_API_KEY="your-key" \
|
||||
--from-literal=OPENAI_API_KEY="your-key" \
|
||||
--from-literal=OIDC_AUDIENCE="your-client-id" \
|
||||
-n llm-gateway
|
||||
```
|
||||
|
||||
**Option B: Using External Secrets Operator (Recommended)**
|
||||
- Uncomment the ExternalSecret in `secret.yaml`
|
||||
- Configure your SecretStore (AWS Secrets Manager, Vault, etc.)
|
||||
|
||||
### 3. Update Configuration
|
||||
|
||||
Edit `configmap.yaml`:
|
||||
- Update Redis connection string if using external Redis
|
||||
- Configure observability endpoints (Tempo, Prometheus)
|
||||
- Adjust rate limits as needed
|
||||
- Set OIDC issuer and audience
|
||||
|
||||
Edit `ingress.yaml`:
|
||||
- Replace `llm-gateway.example.com` with your domain
|
||||
- Configure TLS certificate annotations
|
||||
|
||||
Edit `kustomization.yaml`:
|
||||
- Update image registry and tag
|
||||
|
||||
### 4. Deploy
|
||||
|
||||
**Using Kustomize (Recommended):**
|
||||
```bash
|
||||
kubectl apply -k k8s/
|
||||
```
|
||||
|
||||
**Using kubectl directly:**
|
||||
```bash
|
||||
kubectl apply -f k8s/namespace.yaml
|
||||
kubectl apply -f k8s/serviceaccount.yaml
|
||||
kubectl apply -f k8s/secret.yaml
|
||||
kubectl apply -f k8s/configmap.yaml
|
||||
kubectl apply -f k8s/redis.yaml
|
||||
kubectl apply -f k8s/deployment.yaml
|
||||
kubectl apply -f k8s/service.yaml
|
||||
kubectl apply -f k8s/ingress.yaml
|
||||
kubectl apply -f k8s/hpa.yaml
|
||||
kubectl apply -f k8s/pdb.yaml
|
||||
kubectl apply -f k8s/networkpolicy.yaml
|
||||
```
|
||||
|
||||
**With Prometheus Operator:**
|
||||
```bash
|
||||
kubectl apply -f k8s/servicemonitor.yaml
|
||||
kubectl apply -f k8s/prometheusrule.yaml
|
||||
```
|
||||
|
||||
### 5. Verify Deployment
|
||||
|
||||
```bash
|
||||
# Check pods
|
||||
kubectl get pods -n llm-gateway
|
||||
|
||||
# Check services
|
||||
kubectl get svc -n llm-gateway
|
||||
|
||||
# Check ingress
|
||||
kubectl get ingress -n llm-gateway
|
||||
|
||||
# View logs
|
||||
kubectl logs -n llm-gateway -l app=llm-gateway --tail=100 -f
|
||||
|
||||
# Check health
|
||||
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80
|
||||
curl http://localhost:8080/health
|
||||
```
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Internet/Clients │
|
||||
└───────────────────────┬─────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Ingress Controller │
|
||||
│ (nginx/ALB/GCE with TLS) │
|
||||
└───────────────────────┬─────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ LLM Gateway Service │
|
||||
│ (LoadBalancer) │
|
||||
└───────────────────────┬─────────────────────────────────┘
|
||||
│
|
||||
┌───────────────┼───────────────┐
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ Gateway │ │ Gateway │ │ Gateway │
|
||||
│ Pod 1 │ │ Pod 2 │ │ Pod 3 │
|
||||
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
|
||||
│ │ │
|
||||
└────────────────┼────────────────┘
|
||||
│
|
||||
┌───────────────┼───────────────┐
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ Redis │ │ Prometheus │ │ Tempo │
|
||||
│ (Persistent) │ │ (Metrics) │ │ (Traces) │
|
||||
└──────────────┘ └──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
## Resource Specifications
|
||||
|
||||
### Default Resources
|
||||
- **Requests**: 100m CPU, 128Mi memory
|
||||
- **Limits**: 1000m CPU, 512Mi memory
|
||||
- **Replicas**: 3 (min), 20 (max with HPA)
|
||||
|
||||
### Scaling
|
||||
- HPA scales based on CPU (70%) and memory (80%)
|
||||
- PodDisruptionBudget ensures minimum 2 replicas during disruptions
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Environment Variables (from Secret)
|
||||
- `GOOGLE_API_KEY`: Google AI API key
|
||||
- `ANTHROPIC_API_KEY`: Anthropic API key
|
||||
- `OPENAI_API_KEY`: OpenAI API key
|
||||
- `OIDC_AUDIENCE`: OIDC client ID for authentication
|
||||
|
||||
### ConfigMap Settings
|
||||
See `configmap.yaml` for full configuration options:
|
||||
- Server address
|
||||
- Logging format and level
|
||||
- Rate limiting
|
||||
- Observability (metrics/tracing)
|
||||
- Provider endpoints
|
||||
- Conversation storage
|
||||
- Authentication
|
||||
|
||||
## Security
|
||||
|
||||
### Security Features
|
||||
- Non-root container execution (UID 1000)
|
||||
- Read-only root filesystem
|
||||
- No privilege escalation
|
||||
- All capabilities dropped
|
||||
- Network policies for ingress/egress control
|
||||
- SeccompProfile: RuntimeDefault
|
||||
|
||||
### TLS/HTTPS
|
||||
- Ingress configured with TLS
|
||||
- Uses cert-manager for automatic certificate provisioning
|
||||
- Force SSL redirect enabled
|
||||
|
||||
### Secrets Management
|
||||
**Never commit secrets to git!**
|
||||
|
||||
Production options:
|
||||
1. **External Secrets Operator** (Recommended)
|
||||
- AWS Secrets Manager
|
||||
- HashiCorp Vault
|
||||
- Google Secret Manager
|
||||
|
||||
2. **Sealed Secrets**
|
||||
- Encrypted secrets in git
|
||||
|
||||
3. **Manual kubectl secrets**
|
||||
- Created outside of git
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Metrics
|
||||
- Exposed on `/metrics` endpoint
|
||||
- Scraped by Prometheus via ServiceMonitor
|
||||
- Key metrics:
|
||||
- HTTP request rate, latency, errors
|
||||
- Provider request rate, latency, token usage
|
||||
- Conversation store operations
|
||||
- Rate limiting hits
|
||||
|
||||
### Alerts
|
||||
See `prometheusrule.yaml` for configured alerts:
|
||||
- High error rate
|
||||
- High latency
|
||||
- Provider failures
|
||||
- Pod down
|
||||
- High memory usage
|
||||
- Rate limit threshold exceeded
|
||||
- Conversation store errors
|
||||
|
||||
### Logs
|
||||
Structured JSON logs with:
|
||||
- Request IDs
|
||||
- Trace context (trace_id, span_id)
|
||||
- Log levels (debug/info/warn/error)
|
||||
|
||||
View logs:
|
||||
```bash
|
||||
kubectl logs -n llm-gateway -l app=llm-gateway --tail=100 -f
|
||||
```
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Rolling Updates
|
||||
```bash
|
||||
# Update image
|
||||
kubectl set image deployment/llm-gateway gateway=your-registry/llm-gateway:v1.0.1 -n llm-gateway
|
||||
|
||||
# Check rollout status
|
||||
kubectl rollout status deployment/llm-gateway -n llm-gateway
|
||||
|
||||
# Rollback if needed
|
||||
kubectl rollout undo deployment/llm-gateway -n llm-gateway
|
||||
```
|
||||
|
||||
### Scaling
|
||||
```bash
|
||||
# Manual scale
|
||||
kubectl scale deployment/llm-gateway --replicas=5 -n llm-gateway
|
||||
|
||||
# HPA will auto-scale within min/max bounds (3-20)
|
||||
```
|
||||
|
||||
### Configuration Updates
|
||||
```bash
|
||||
# Edit ConfigMap
|
||||
kubectl edit configmap llm-gateway-config -n llm-gateway
|
||||
|
||||
# Restart pods to pick up changes
|
||||
kubectl rollout restart deployment/llm-gateway -n llm-gateway
|
||||
```
|
||||
|
||||
### Debugging
|
||||
```bash
|
||||
# Exec into pod
|
||||
kubectl exec -it -n llm-gateway deployment/llm-gateway -- /bin/sh
|
||||
|
||||
# Port forward for local access
|
||||
kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80
|
||||
|
||||
# Check events
|
||||
kubectl get events -n llm-gateway --sort-by='.lastTimestamp'
|
||||
```
|
||||
|
||||
## Production Considerations
|
||||
|
||||
### High Availability
|
||||
- Minimum 3 replicas across availability zones
|
||||
- Pod anti-affinity rules spread pods across nodes
|
||||
- PodDisruptionBudget ensures service availability during disruptions
|
||||
|
||||
### Performance
|
||||
- Adjust resource limits based on load testing
|
||||
- Configure HPA thresholds based on traffic patterns
|
||||
- Use node affinity for GPU nodes if needed
|
||||
|
||||
### Cost Optimization
|
||||
- Use spot/preemptible instances for non-critical workloads
|
||||
- Set appropriate resource requests/limits
|
||||
- Monitor token usage and implement quotas
|
||||
|
||||
### Disaster Recovery
|
||||
- Redis persistence (if using StatefulSet)
|
||||
- Regular backups of conversation data
|
||||
- Multi-region deployment for geo-redundancy
|
||||
- Document runbooks for incident response
|
||||
|
||||
## Cloud-Specific Notes
|
||||
|
||||
### AWS EKS
|
||||
- Use AWS Load Balancer Controller for ALB
|
||||
- Configure IRSA for service account
|
||||
- Use ElastiCache for Redis
|
||||
- Store secrets in AWS Secrets Manager
|
||||
|
||||
### GCP GKE
|
||||
- Use GKE Ingress for GCLB
|
||||
- Configure Workload Identity
|
||||
- Use Memorystore for Redis
|
||||
- Store secrets in Google Secret Manager
|
||||
|
||||
### Azure AKS
|
||||
- Use Azure Application Gateway Ingress Controller
|
||||
- Configure Azure AD Workload Identity
|
||||
- Use Azure Cache for Redis
|
||||
- Store secrets in Azure Key Vault
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Pods not starting:**
|
||||
```bash
|
||||
kubectl describe pod -n llm-gateway -l app=llm-gateway
|
||||
kubectl logs -n llm-gateway -l app=llm-gateway --previous
|
||||
```
|
||||
|
||||
**Health check failures:**
|
||||
```bash
|
||||
kubectl port-forward -n llm-gateway deployment/llm-gateway 8080:8080
|
||||
curl http://localhost:8080/health
|
||||
curl http://localhost:8080/ready
|
||||
```
|
||||
|
||||
**Provider connection issues:**
|
||||
- Verify API keys in secrets
|
||||
- Check network policies allow egress
|
||||
- Verify provider endpoints are accessible
|
||||
|
||||
**Redis connection issues:**
|
||||
```bash
|
||||
kubectl exec -it -n llm-gateway redis-0 -- redis-cli ping
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [Kubernetes Documentation](https://kubernetes.io/docs/)
|
||||
- [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)
|
||||
- [cert-manager](https://cert-manager.io/)
|
||||
- [External Secrets Operator](https://external-secrets.io/)
|
||||
76
k8s/configmap.yaml
Normal file
76
k8s/configmap.yaml
Normal file
@@ -0,0 +1,76 @@
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: llm-gateway-config
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
data:
|
||||
config.yaml: |
|
||||
server:
|
||||
address: ":8080"
|
||||
|
||||
logging:
|
||||
format: "json"
|
||||
level: "info"
|
||||
|
||||
rate_limit:
|
||||
enabled: true
|
||||
requests_per_second: 10
|
||||
burst: 20
|
||||
|
||||
observability:
|
||||
enabled: true
|
||||
|
||||
metrics:
|
||||
enabled: true
|
||||
path: "/metrics"
|
||||
|
||||
tracing:
|
||||
enabled: true
|
||||
service_name: "llm-gateway"
|
||||
sampler:
|
||||
type: "probability"
|
||||
rate: 0.1
|
||||
exporter:
|
||||
type: "otlp"
|
||||
endpoint: "tempo.observability.svc.cluster.local:4317"
|
||||
insecure: true
|
||||
|
||||
providers:
|
||||
google:
|
||||
type: "google"
|
||||
api_key: "${GOOGLE_API_KEY}"
|
||||
endpoint: "https://generativelanguage.googleapis.com"
|
||||
anthropic:
|
||||
type: "anthropic"
|
||||
api_key: "${ANTHROPIC_API_KEY}"
|
||||
endpoint: "https://api.anthropic.com"
|
||||
openai:
|
||||
type: "openai"
|
||||
api_key: "${OPENAI_API_KEY}"
|
||||
endpoint: "https://api.openai.com"
|
||||
|
||||
conversations:
|
||||
store: "redis"
|
||||
ttl: "1h"
|
||||
dsn: "redis://redis.llm-gateway.svc.cluster.local:6379/0"
|
||||
|
||||
auth:
|
||||
enabled: true
|
||||
issuer: "https://accounts.google.com"
|
||||
audience: "${OIDC_AUDIENCE}"
|
||||
|
||||
models:
|
||||
- name: "gemini-1.5-flash"
|
||||
provider: "google"
|
||||
- name: "gemini-1.5-pro"
|
||||
provider: "google"
|
||||
- name: "claude-3-5-sonnet-20241022"
|
||||
provider: "anthropic"
|
||||
- name: "claude-3-5-haiku-20241022"
|
||||
provider: "anthropic"
|
||||
- name: "gpt-4o"
|
||||
provider: "openai"
|
||||
- name: "gpt-4o-mini"
|
||||
provider: "openai"
|
||||
168
k8s/deployment.yaml
Normal file
168
k8s/deployment.yaml
Normal file
@@ -0,0 +1,168 @@
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
version: v1
|
||||
spec:
|
||||
replicas: 3
|
||||
strategy:
|
||||
type: RollingUpdate
|
||||
rollingUpdate:
|
||||
maxSurge: 1
|
||||
maxUnavailable: 0
|
||||
selector:
|
||||
matchLabels:
|
||||
app: llm-gateway
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: llm-gateway
|
||||
version: v1
|
||||
annotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8080"
|
||||
prometheus.io/path: "/metrics"
|
||||
spec:
|
||||
serviceAccountName: llm-gateway
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
|
||||
containers:
|
||||
- name: gateway
|
||||
image: llm-gateway:latest # Replace with your registry/image:tag
|
||||
imagePullPolicy: IfNotPresent
|
||||
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 8080
|
||||
protocol: TCP
|
||||
|
||||
env:
|
||||
# Provider API Keys from Secret
|
||||
- name: GOOGLE_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: llm-gateway-secrets
|
||||
key: GOOGLE_API_KEY
|
||||
- name: ANTHROPIC_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: llm-gateway-secrets
|
||||
key: ANTHROPIC_API_KEY
|
||||
- name: OPENAI_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: llm-gateway-secrets
|
||||
key: OPENAI_API_KEY
|
||||
- name: OIDC_AUDIENCE
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: llm-gateway-secrets
|
||||
key: OIDC_AUDIENCE
|
||||
|
||||
# Optional: Pod metadata
|
||||
- name: POD_NAME
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.name
|
||||
- name: POD_NAMESPACE
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.namespace
|
||||
- name: POD_IP
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: status.podIP
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 1000m
|
||||
memory: 512Mi
|
||||
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
scheme: HTTP
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 5
|
||||
successThreshold: 1
|
||||
failureThreshold: 3
|
||||
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /ready
|
||||
port: http
|
||||
scheme: HTTP
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
successThreshold: 1
|
||||
failureThreshold: 3
|
||||
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
scheme: HTTP
|
||||
initialDelaySeconds: 0
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
successThreshold: 1
|
||||
failureThreshold: 30
|
||||
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /app/config
|
||||
readOnly: true
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
|
||||
volumes:
|
||||
- name: config
|
||||
configMap:
|
||||
name: llm-gateway-config
|
||||
- name: tmp
|
||||
emptyDir: {}
|
||||
|
||||
# Affinity rules for better distribution
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 100
|
||||
podAffinityTerm:
|
||||
labelSelector:
|
||||
matchExpressions:
|
||||
- key: app
|
||||
operator: In
|
||||
values:
|
||||
- llm-gateway
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
# Tolerations (if needed for specific node pools)
|
||||
# tolerations:
|
||||
# - key: "workload-type"
|
||||
# operator: "Equal"
|
||||
# value: "llm"
|
||||
# effect: "NoSchedule"
|
||||
63
k8s/hpa.yaml
Normal file
63
k8s/hpa.yaml
Normal file
@@ -0,0 +1,63 @@
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: llm-gateway
|
||||
|
||||
minReplicas: 3
|
||||
maxReplicas: 20
|
||||
|
||||
behavior:
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
policies:
|
||||
- type: Percent
|
||||
value: 50
|
||||
periodSeconds: 60
|
||||
- type: Pods
|
||||
value: 2
|
||||
periodSeconds: 60
|
||||
selectPolicy: Min
|
||||
scaleUp:
|
||||
stabilizationWindowSeconds: 0
|
||||
policies:
|
||||
- type: Percent
|
||||
value: 100
|
||||
periodSeconds: 30
|
||||
- type: Pods
|
||||
value: 4
|
||||
periodSeconds: 30
|
||||
selectPolicy: Max
|
||||
|
||||
metrics:
|
||||
# CPU-based scaling
|
||||
- type: Resource
|
||||
resource:
|
||||
name: cpu
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 70
|
||||
|
||||
# Memory-based scaling
|
||||
- type: Resource
|
||||
resource:
|
||||
name: memory
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 80
|
||||
|
||||
# Custom metrics (requires metrics-server and custom metrics API)
|
||||
# - type: Pods
|
||||
# pods:
|
||||
# metric:
|
||||
# name: http_requests_per_second
|
||||
# target:
|
||||
# type: AverageValue
|
||||
# averageValue: "1000"
|
||||
66
k8s/ingress.yaml
Normal file
66
k8s/ingress.yaml
Normal file
@@ -0,0 +1,66 @@
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
annotations:
|
||||
# General annotations
|
||||
kubernetes.io/ingress.class: "nginx"
|
||||
|
||||
# TLS configuration
|
||||
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
||||
|
||||
# Security headers
|
||||
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
|
||||
nginx.ingress.kubernetes.io/ssl-protocols: "TLSv1.2 TLSv1.3"
|
||||
|
||||
# Rate limiting (supplement application-level rate limiting)
|
||||
nginx.ingress.kubernetes.io/limit-rps: "100"
|
||||
nginx.ingress.kubernetes.io/limit-connections: "50"
|
||||
|
||||
# Request size limit (10MB)
|
||||
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
|
||||
|
||||
# Timeouts
|
||||
nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
|
||||
nginx.ingress.kubernetes.io/proxy-send-timeout: "120"
|
||||
nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
|
||||
|
||||
# CORS (if needed)
|
||||
# nginx.ingress.kubernetes.io/enable-cors: "true"
|
||||
# nginx.ingress.kubernetes.io/cors-allow-origin: "https://yourdomain.com"
|
||||
# nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, OPTIONS"
|
||||
# nginx.ingress.kubernetes.io/cors-allow-credentials: "true"
|
||||
|
||||
# For AWS ALB Ingress Controller (alternative to nginx)
|
||||
# kubernetes.io/ingress.class: "alb"
|
||||
# alb.ingress.kubernetes.io/scheme: "internet-facing"
|
||||
# alb.ingress.kubernetes.io/target-type: "ip"
|
||||
# alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
|
||||
# alb.ingress.kubernetes.io/ssl-redirect: '443'
|
||||
# alb.ingress.kubernetes.io/certificate-arn: "arn:aws:acm:region:account:certificate/xxx"
|
||||
|
||||
# For GKE Ingress (alternative to nginx)
|
||||
# kubernetes.io/ingress.class: "gce"
|
||||
# kubernetes.io/ingress.global-static-ip-name: "llm-gateway-ip"
|
||||
# ingress.gcp.kubernetes.io/pre-shared-cert: "llm-gateway-cert"
|
||||
|
||||
spec:
|
||||
tls:
|
||||
- hosts:
|
||||
- llm-gateway.example.com # Replace with your domain
|
||||
secretName: llm-gateway-tls
|
||||
|
||||
rules:
|
||||
- host: llm-gateway.example.com # Replace with your domain
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: llm-gateway
|
||||
port:
|
||||
number: 80
|
||||
46
k8s/kustomization.yaml
Normal file
46
k8s/kustomization.yaml
Normal file
@@ -0,0 +1,46 @@
|
||||
# Kustomize configuration for easy deployment
|
||||
# Usage: kubectl apply -k k8s/
|
||||
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
|
||||
namespace: llm-gateway
|
||||
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- serviceaccount.yaml
|
||||
- configmap.yaml
|
||||
- secret.yaml
|
||||
- deployment.yaml
|
||||
- service.yaml
|
||||
- ingress.yaml
|
||||
- hpa.yaml
|
||||
- pdb.yaml
|
||||
- networkpolicy.yaml
|
||||
- redis.yaml
|
||||
- servicemonitor.yaml
|
||||
- prometheusrule.yaml
|
||||
|
||||
# Common labels applied to all resources
|
||||
commonLabels:
|
||||
app.kubernetes.io/name: llm-gateway
|
||||
app.kubernetes.io/component: api-gateway
|
||||
app.kubernetes.io/part-of: llm-platform
|
||||
|
||||
# Images to be used (customize for your registry)
|
||||
images:
|
||||
- name: llm-gateway
|
||||
newName: your-registry/llm-gateway
|
||||
newTag: latest
|
||||
|
||||
# ConfigMap generator (alternative to configmap.yaml)
|
||||
# configMapGenerator:
|
||||
# - name: llm-gateway-config
|
||||
# files:
|
||||
# - config.yaml
|
||||
|
||||
# Secret generator (for local development only)
|
||||
# secretGenerator:
|
||||
# - name: llm-gateway-secrets
|
||||
# envs:
|
||||
# - secrets.env
|
||||
7
k8s/namespace.yaml
Normal file
7
k8s/namespace.yaml
Normal file
@@ -0,0 +1,7 @@
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
environment: production
|
||||
83
k8s/networkpolicy.yaml
Normal file
83
k8s/networkpolicy.yaml
Normal file
@@ -0,0 +1,83 @@
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
spec:
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app: llm-gateway
|
||||
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
|
||||
ingress:
|
||||
# Allow traffic from ingress controller
|
||||
- from:
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
name: ingress-nginx
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 8080
|
||||
|
||||
# Allow traffic from within the namespace (for debugging/testing)
|
||||
- from:
|
||||
- podSelector: {}
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 8080
|
||||
|
||||
# Allow Prometheus scraping
|
||||
- from:
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
name: observability
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app: prometheus
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 8080
|
||||
|
||||
egress:
|
||||
# Allow DNS
|
||||
- to:
|
||||
- namespaceSelector: {}
|
||||
podSelector:
|
||||
matchLabels:
|
||||
k8s-app: kube-dns
|
||||
ports:
|
||||
- protocol: UDP
|
||||
port: 53
|
||||
|
||||
# Allow Redis access
|
||||
- to:
|
||||
- podSelector:
|
||||
matchLabels:
|
||||
app: redis
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 6379
|
||||
|
||||
# Allow external provider API access (OpenAI, Anthropic, Google)
|
||||
- to:
|
||||
- namespaceSelector: {}
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 443
|
||||
|
||||
# Allow OTLP tracing export
|
||||
- to:
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
name: observability
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app: tempo
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 4317
|
||||
13
k8s/pdb.yaml
Normal file
13
k8s/pdb.yaml
Normal file
@@ -0,0 +1,13 @@
|
||||
apiVersion: policy/v1
|
||||
kind: PodDisruptionBudget
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
spec:
|
||||
minAvailable: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app: llm-gateway
|
||||
unhealthyPodEvictionPolicy: AlwaysAllow
|
||||
122
k8s/prometheusrule.yaml
Normal file
122
k8s/prometheusrule.yaml
Normal file
@@ -0,0 +1,122 @@
|
||||
# PrometheusRule for alerting
|
||||
# Requires Prometheus Operator to be installed
|
||||
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PrometheusRule
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
prometheus: kube-prometheus
|
||||
spec:
|
||||
groups:
|
||||
- name: llm-gateway.rules
|
||||
interval: 30s
|
||||
rules:
|
||||
|
||||
# High error rate
|
||||
- alert: LLMGatewayHighErrorRate
|
||||
expr: |
|
||||
(
|
||||
sum(rate(http_requests_total{namespace="llm-gateway",status_code=~"5.."}[5m]))
|
||||
/
|
||||
sum(rate(http_requests_total{namespace="llm-gateway"}[5m]))
|
||||
) > 0.05
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
component: llm-gateway
|
||||
annotations:
|
||||
summary: "High error rate in LLM Gateway"
|
||||
description: "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)"
|
||||
|
||||
# High latency
|
||||
- alert: LLMGatewayHighLatency
|
||||
expr: |
|
||||
histogram_quantile(0.95,
|
||||
sum(rate(http_request_duration_seconds_bucket{namespace="llm-gateway"}[5m])) by (le)
|
||||
) > 10
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
component: llm-gateway
|
||||
annotations:
|
||||
summary: "High latency in LLM Gateway"
|
||||
description: "P95 latency is {{ $value }}s (threshold: 10s)"
|
||||
|
||||
# Provider errors
|
||||
- alert: LLMProviderHighErrorRate
|
||||
expr: |
|
||||
(
|
||||
sum(rate(provider_requests_total{namespace="llm-gateway",status="error"}[5m])) by (provider)
|
||||
/
|
||||
sum(rate(provider_requests_total{namespace="llm-gateway"}[5m])) by (provider)
|
||||
) > 0.10
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
component: llm-gateway
|
||||
annotations:
|
||||
summary: "High error rate for provider {{ $labels.provider }}"
|
||||
description: "Error rate is {{ $value | humanizePercentage }} (threshold: 10%)"
|
||||
|
||||
# Pod down
|
||||
- alert: LLMGatewayPodDown
|
||||
expr: |
|
||||
up{job="llm-gateway",namespace="llm-gateway"} == 0
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
component: llm-gateway
|
||||
annotations:
|
||||
summary: "LLM Gateway pod is down"
|
||||
description: "Pod {{ $labels.pod }} has been down for more than 2 minutes"
|
||||
|
||||
# High memory usage
|
||||
- alert: LLMGatewayHighMemoryUsage
|
||||
expr: |
|
||||
(
|
||||
container_memory_working_set_bytes{namespace="llm-gateway",container="gateway"}
|
||||
/
|
||||
container_spec_memory_limit_bytes{namespace="llm-gateway",container="gateway"}
|
||||
) > 0.85
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
component: llm-gateway
|
||||
annotations:
|
||||
summary: "High memory usage in LLM Gateway"
|
||||
description: "Memory usage is {{ $value | humanizePercentage }} (threshold: 85%)"
|
||||
|
||||
# Rate limit threshold
|
||||
- alert: LLMGatewayHighRateLimitHitRate
|
||||
expr: |
|
||||
(
|
||||
sum(rate(http_requests_total{namespace="llm-gateway",status_code="429"}[5m]))
|
||||
/
|
||||
sum(rate(http_requests_total{namespace="llm-gateway"}[5m]))
|
||||
) > 0.20
|
||||
for: 10m
|
||||
labels:
|
||||
severity: info
|
||||
component: llm-gateway
|
||||
annotations:
|
||||
summary: "High rate limit hit rate"
|
||||
description: "{{ $value | humanizePercentage }} of requests are being rate limited"
|
||||
|
||||
# Conversation store errors
|
||||
- alert: LLMGatewayConversationStoreErrors
|
||||
expr: |
|
||||
(
|
||||
sum(rate(conversation_store_operations_total{namespace="llm-gateway",status="error"}[5m]))
|
||||
/
|
||||
sum(rate(conversation_store_operations_total{namespace="llm-gateway"}[5m]))
|
||||
) > 0.05
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
component: llm-gateway
|
||||
annotations:
|
||||
summary: "High error rate in conversation store"
|
||||
description: "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)"
|
||||
131
k8s/redis.yaml
Normal file
131
k8s/redis.yaml
Normal file
@@ -0,0 +1,131 @@
|
||||
# Simple Redis deployment for conversation storage
|
||||
# For production, consider using:
|
||||
# - Redis Operator (e.g., Redis Enterprise Operator)
|
||||
# - Managed Redis (AWS ElastiCache, GCP Memorystore, Azure Cache for Redis)
|
||||
# - Redis Cluster for high availability
|
||||
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: redis-config
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: redis
|
||||
data:
|
||||
redis.conf: |
|
||||
maxmemory 256mb
|
||||
maxmemory-policy allkeys-lru
|
||||
save ""
|
||||
appendonly no
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: redis
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: redis
|
||||
spec:
|
||||
serviceName: redis
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: redis
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: redis
|
||||
spec:
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 999
|
||||
fsGroup: 999
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
|
||||
containers:
|
||||
- name: redis
|
||||
image: redis:7.2-alpine
|
||||
imagePullPolicy: IfNotPresent
|
||||
|
||||
command:
|
||||
- redis-server
|
||||
- /etc/redis/redis.conf
|
||||
|
||||
ports:
|
||||
- name: redis
|
||||
containerPort: 6379
|
||||
protocol: TCP
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
|
||||
livenessProbe:
|
||||
tcpSocket:
|
||||
port: redis
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 3
|
||||
|
||||
readinessProbe:
|
||||
exec:
|
||||
command:
|
||||
- redis-cli
|
||||
- ping
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 3
|
||||
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /etc/redis
|
||||
- name: data
|
||||
mountPath: /data
|
||||
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
readOnlyRootFilesystem: true
|
||||
runAsNonRoot: true
|
||||
runAsUser: 999
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
|
||||
volumes:
|
||||
- name: config
|
||||
configMap:
|
||||
name: redis-config
|
||||
|
||||
volumeClaimTemplates:
|
||||
- metadata:
|
||||
name: data
|
||||
spec:
|
||||
accessModes: ["ReadWriteOnce"]
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: redis
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: redis
|
||||
spec:
|
||||
type: ClusterIP
|
||||
clusterIP: None
|
||||
selector:
|
||||
app: redis
|
||||
ports:
|
||||
- name: redis
|
||||
port: 6379
|
||||
targetPort: redis
|
||||
protocol: TCP
|
||||
46
k8s/secret.yaml
Normal file
46
k8s/secret.yaml
Normal file
@@ -0,0 +1,46 @@
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: llm-gateway-secrets
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
type: Opaque
|
||||
stringData:
|
||||
# IMPORTANT: Replace these with actual values or use external secret management
|
||||
# For production, use:
|
||||
# - kubectl create secret generic llm-gateway-secrets --from-literal=...
|
||||
# - External Secrets Operator with AWS Secrets Manager/HashiCorp Vault
|
||||
# - Sealed Secrets
|
||||
GOOGLE_API_KEY: "your-google-api-key-here"
|
||||
ANTHROPIC_API_KEY: "your-anthropic-api-key-here"
|
||||
OPENAI_API_KEY: "your-openai-api-key-here"
|
||||
OIDC_AUDIENCE: "your-client-id.apps.googleusercontent.com"
|
||||
---
|
||||
# Example using External Secrets Operator (commented out)
|
||||
# apiVersion: external-secrets.io/v1beta1
|
||||
# kind: ExternalSecret
|
||||
# metadata:
|
||||
# name: llm-gateway-secrets
|
||||
# namespace: llm-gateway
|
||||
# spec:
|
||||
# refreshInterval: 1h
|
||||
# secretStoreRef:
|
||||
# name: aws-secrets-manager
|
||||
# kind: SecretStore
|
||||
# target:
|
||||
# name: llm-gateway-secrets
|
||||
# creationPolicy: Owner
|
||||
# data:
|
||||
# - secretKey: GOOGLE_API_KEY
|
||||
# remoteRef:
|
||||
# key: prod/llm-gateway/google-api-key
|
||||
# - secretKey: ANTHROPIC_API_KEY
|
||||
# remoteRef:
|
||||
# key: prod/llm-gateway/anthropic-api-key
|
||||
# - secretKey: OPENAI_API_KEY
|
||||
# remoteRef:
|
||||
# key: prod/llm-gateway/openai-api-key
|
||||
# - secretKey: OIDC_AUDIENCE
|
||||
# remoteRef:
|
||||
# key: prod/llm-gateway/oidc-audience
|
||||
40
k8s/service.yaml
Normal file
40
k8s/service.yaml
Normal file
@@ -0,0 +1,40 @@
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
annotations:
|
||||
# For cloud load balancers (uncomment as needed)
|
||||
# service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
|
||||
# cloud.google.com/neg: '{"ingress": true}'
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app: llm-gateway
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: http
|
||||
protocol: TCP
|
||||
sessionAffinity: None
|
||||
---
|
||||
# Headless service for pod-to-pod communication (if needed)
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: llm-gateway-headless
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
spec:
|
||||
type: ClusterIP
|
||||
clusterIP: None
|
||||
selector:
|
||||
app: llm-gateway
|
||||
ports:
|
||||
- name: http
|
||||
port: 8080
|
||||
targetPort: http
|
||||
protocol: TCP
|
||||
14
k8s/serviceaccount.yaml
Normal file
14
k8s/serviceaccount.yaml
Normal file
@@ -0,0 +1,14 @@
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
annotations:
|
||||
# For GKE Workload Identity
|
||||
# iam.gke.io/gcp-service-account: llm-gateway@PROJECT_ID.iam.gserviceaccount.com
|
||||
|
||||
# For EKS IRSA (IAM Roles for Service Accounts)
|
||||
# eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/llm-gateway-role
|
||||
automountServiceAccountToken: true
|
||||
35
k8s/servicemonitor.yaml
Normal file
35
k8s/servicemonitor.yaml
Normal file
@@ -0,0 +1,35 @@
|
||||
# ServiceMonitor for Prometheus Operator
|
||||
# Requires Prometheus Operator to be installed
|
||||
# https://github.com/prometheus-operator/prometheus-operator
|
||||
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
name: llm-gateway
|
||||
namespace: llm-gateway
|
||||
labels:
|
||||
app: llm-gateway
|
||||
prometheus: kube-prometheus
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: llm-gateway
|
||||
|
||||
endpoints:
|
||||
- port: http
|
||||
path: /metrics
|
||||
interval: 30s
|
||||
scrapeTimeout: 10s
|
||||
|
||||
relabelings:
|
||||
# Add namespace label
|
||||
- sourceLabels: [__meta_kubernetes_namespace]
|
||||
targetLabel: namespace
|
||||
|
||||
# Add pod label
|
||||
- sourceLabels: [__meta_kubernetes_pod_name]
|
||||
targetLabel: pod
|
||||
|
||||
# Add service label
|
||||
- sourceLabels: [__meta_kubernetes_service_name]
|
||||
targetLabel: service
|
||||
Reference in New Issue
Block a user