# Kubernetes Deployment Guide > Production-ready Kubernetes manifests for deploying the LLM Gateway with high availability, monitoring, and security. ## Table of Contents - [Quick Start](#quick-start) - [Prerequisites](#prerequisites) - [Deployment](#deployment) - [Configuration](#configuration) - [Secrets Management](#secrets-management) - [Monitoring](#monitoring) - [Storage Options](#storage-options) - [Scaling](#scaling) - [Updates and Rollbacks](#updates-and-rollbacks) - [Security](#security) - [Cloud Provider Guides](#cloud-provider-guides) - [Troubleshooting](#troubleshooting) ## Quick Start Deploy with default settings using pre-built images: ```bash # Update kustomization.yaml with your image cd k8s/ vim kustomization.yaml # Set image to ghcr.io/yourusername/llm-gateway:v1.0.0 # Create secrets kubectl create namespace llm-gateway kubectl create secret generic llm-gateway-secrets \ --from-literal=OPENAI_API_KEY="sk-your-key" \ --from-literal=ANTHROPIC_API_KEY="sk-ant-your-key" \ --from-literal=GOOGLE_API_KEY="your-key" \ -n llm-gateway # Deploy kubectl apply -k . # Verify kubectl get pods -n llm-gateway kubectl logs -n llm-gateway -l app=llm-gateway ``` ## Prerequisites - **Kubernetes**: v1.24+ cluster - **kubectl**: Configured and authenticated - **Container images**: Access to `ghcr.io/yourusername/llm-gateway` **Optional but recommended:** - **Prometheus Operator**: For metrics and alerting - **cert-manager**: For automatic TLS certificates - **Ingress Controller**: nginx, ALB, or GCE - **External Secrets Operator**: For secrets management ## Deployment ### Using Kustomize (Recommended) ```bash # Review and customize cd k8s/ vim kustomization.yaml # Update image, namespace, etc. vim configmap.yaml # Configure gateway settings vim ingress.yaml # Set your domain # Deploy all resources kubectl apply -k . # Deploy with Kustomize overlays kubectl apply -k overlays/production/ ``` ### Using kubectl ```bash kubectl apply -f namespace.yaml kubectl apply -f serviceaccount.yaml kubectl apply -f secret.yaml kubectl apply -f configmap.yaml kubectl apply -f redis.yaml kubectl apply -f deployment.yaml kubectl apply -f service.yaml kubectl apply -f ingress.yaml kubectl apply -f hpa.yaml kubectl apply -f pdb.yaml kubectl apply -f networkpolicy.yaml ``` ### With Monitoring If Prometheus Operator is installed: ```bash kubectl apply -f servicemonitor.yaml kubectl apply -f prometheusrule.yaml ``` ## Configuration ### Image Configuration Update `kustomization.yaml`: ```yaml images: - name: llm-gateway newName: ghcr.io/yourusername/llm-gateway newTag: v1.2.3 # Or 'latest', 'main', 'sha-abc123' ``` ### Gateway Configuration Edit `configmap.yaml` for gateway settings: ```yaml apiVersion: v1 kind: ConfigMap metadata: name: llm-gateway-config data: config.yaml: | server: address: ":8080" logging: level: info format: json rate_limit: enabled: true requests_per_second: 10 burst: 20 observability: enabled: true metrics: enabled: true tracing: enabled: true exporter: type: otlp endpoint: tempo:4317 conversations: store: redis dsn: redis://redis:6379/0 ttl: 1h ``` ### Resource Limits Default resources (adjust based on load testing): ```yaml resources: requests: cpu: 100m memory: 128Mi limits: cpu: 1000m memory: 512Mi ``` ### Ingress Configuration Edit `ingress.yaml` for your domain: ```yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: llm-gateway annotations: cert-manager.io/cluster-issuer: letsencrypt-prod nginx.ingress.kubernetes.io/ssl-redirect: "true" spec: ingressClassName: nginx tls: - hosts: - llm-gateway.yourdomain.com secretName: llm-gateway-tls rules: - host: llm-gateway.yourdomain.com http: paths: - path: / pathType: Prefix backend: service: name: llm-gateway port: number: 80 ``` ## Secrets Management ### Option 1: kubectl (Development) ```bash kubectl create secret generic llm-gateway-secrets \ --from-literal=OPENAI_API_KEY="sk-..." \ --from-literal=ANTHROPIC_API_KEY="sk-ant-..." \ --from-literal=GOOGLE_API_KEY="..." \ --from-literal=OIDC_AUDIENCE="your-client-id" \ -n llm-gateway ``` ### Option 2: External Secrets Operator (Production) Install ESO, then create ExternalSecret: ```yaml apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: llm-gateway-secrets namespace: llm-gateway spec: refreshInterval: 1h secretStoreRef: name: aws-secretsmanager # or vault, gcpsm, etc. kind: ClusterSecretStore target: name: llm-gateway-secrets data: - secretKey: OPENAI_API_KEY remoteRef: key: llm-gateway/openai-key - secretKey: ANTHROPIC_API_KEY remoteRef: key: llm-gateway/anthropic-key - secretKey: GOOGLE_API_KEY remoteRef: key: llm-gateway/google-key ``` ### Option 3: Sealed Secrets ```bash # Encrypt secrets echo -n "sk-your-key" | kubectl create secret generic llm-gateway-secrets \ --dry-run=client --from-file=OPENAI_API_KEY=/dev/stdin -o yaml | \ kubeseal -o yaml > sealed-secret.yaml # Commit sealed-secret.yaml to git kubectl apply -f sealed-secret.yaml ``` ## Monitoring ### Metrics ServiceMonitor for Prometheus Operator: ```yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: llm-gateway spec: selector: matchLabels: app: llm-gateway endpoints: - port: http path: /metrics interval: 30s ``` **Available metrics:** - `gateway_requests_total` - Total requests by provider/model - `gateway_request_duration_seconds` - Request latency histogram - `gateway_provider_errors_total` - Errors by provider - `gateway_circuit_breaker_state` - Circuit breaker state changes - `gateway_rate_limit_hits_total` - Rate limit violations ### Alerts PrometheusRule with common alerts: ```yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: llm-gateway-alerts spec: groups: - name: llm-gateway interval: 30s rules: - alert: HighErrorRate expr: rate(gateway_requests_total{status=~"5.."}[5m]) > 0.05 for: 5m annotations: summary: High error rate detected - alert: PodDown expr: kube_deployment_status_replicas_available{deployment="llm-gateway"} < 2 for: 5m annotations: summary: Less than 2 gateway pods running ``` ### Logging View logs: ```bash # Tail logs kubectl logs -n llm-gateway -l app=llm-gateway -f # Filter by level kubectl logs -n llm-gateway -l app=llm-gateway | jq 'select(.level=="error")' # Search logs kubectl logs -n llm-gateway -l app=llm-gateway | grep "circuit.*open" ``` ### Tracing Configure OpenTelemetry collector: ```yaml observability: tracing: enabled: true exporter: type: otlp endpoint: tempo:4317 # or jaeger-collector:4317 ``` ## Storage Options ### In-Memory (Default) No persistence, lost on pod restart: ```yaml conversations: store: memory ``` ### Redis (Recommended) Deploy Redis StatefulSet: ```bash kubectl apply -f redis.yaml ``` Configure gateway: ```yaml conversations: store: redis dsn: redis://redis:6379/0 ttl: 1h ``` ### External Redis For production, use managed Redis: ```yaml conversations: store: redis dsn: redis://:password@redis.example.com:6379/0 ttl: 1h ``` **Cloud providers:** - **AWS**: ElastiCache for Redis - **GCP**: Memorystore for Redis - **Azure**: Azure Cache for Redis ### PostgreSQL ```yaml conversations: store: sql driver: pgx dsn: postgres://user:pass@postgres:5432/llm_gateway?sslmode=require ttl: 1h ``` ## Scaling ### Horizontal Pod Autoscaler Default HPA configuration: ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: llm-gateway spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: llm-gateway minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` Monitor HPA: ```bash kubectl get hpa -n llm-gateway kubectl describe hpa llm-gateway -n llm-gateway ``` ### Manual Scaling ```bash # Scale to specific replica count kubectl scale deployment/llm-gateway --replicas=10 -n llm-gateway # Check status kubectl get deployment llm-gateway -n llm-gateway ``` ### Pod Disruption Budget Ensures availability during disruptions: ```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: llm-gateway spec: minAvailable: 2 selector: matchLabels: app: llm-gateway ``` ## Updates and Rollbacks ### Rolling Updates ```bash # Update image kubectl set image deployment/llm-gateway \ gateway=ghcr.io/yourusername/llm-gateway:v1.2.3 \ -n llm-gateway # Watch rollout kubectl rollout status deployment/llm-gateway -n llm-gateway # Pause rollout if issues kubectl rollout pause deployment/llm-gateway -n llm-gateway # Resume rollout kubectl rollout resume deployment/llm-gateway -n llm-gateway ``` ### Rollback ```bash # Rollback to previous version kubectl rollout undo deployment/llm-gateway -n llm-gateway # Rollback to specific revision kubectl rollout history deployment/llm-gateway -n llm-gateway kubectl rollout undo deployment/llm-gateway --to-revision=3 -n llm-gateway ``` ### Blue-Green Deployment ```bash # Deploy new version with different label kubectl apply -f deployment-v2.yaml # Test new version kubectl port-forward -n llm-gateway deployment/llm-gateway-v2 8080:8080 # Switch service to new version kubectl patch service llm-gateway -n llm-gateway \ -p '{"spec":{"selector":{"version":"v2"}}}' # Delete old version after verification kubectl delete deployment llm-gateway-v1 -n llm-gateway ``` ## Security ### Pod Security Deployment includes security best practices: ```yaml securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 seccompProfile: type: RuntimeDefault containers: - name: gateway securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL ``` ### Network Policies Restrict traffic to/from gateway pods: ```yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: llm-gateway spec: podSelector: matchLabels: app: llm-gateway policyTypes: - Ingress - Egress ingress: - from: - namespaceSelector: matchLabels: name: ingress-nginx ports: - protocol: TCP port: 8080 egress: - to: # Allow DNS - namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 - to: # Allow Redis - podSelector: matchLabels: app: redis ports: - protocol: TCP port: 6379 - to: # Allow external LLM providers (HTTPS) - namespaceSelector: {} ports: - protocol: TCP port: 443 ``` ### RBAC ServiceAccount with minimal permissions: ```yaml apiVersion: v1 kind: ServiceAccount metadata: name: llm-gateway --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: llm-gateway rules: - apiGroups: [""] resources: ["configmaps"] verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: llm-gateway roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: llm-gateway subjects: - kind: ServiceAccount name: llm-gateway ``` ## Cloud Provider Guides ### AWS EKS ```bash # Install AWS Load Balancer Controller kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master" helm install aws-load-balancer-controller eks/aws-load-balancer-controller \ -n kube-system \ --set clusterName=my-cluster # Update ingress for ALB # Add annotations to ingress.yaml: metadata: annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: ip ``` **IRSA for secrets:** ```bash # Create IAM role and associate with ServiceAccount eksctl create iamserviceaccount \ --name llm-gateway \ --namespace llm-gateway \ --cluster my-cluster \ --attach-policy-arn arn:aws:iam::aws:policy/SecretsManagerReadWrite \ --approve ``` **ElastiCache Redis:** ```yaml conversations: store: redis dsn: redis://my-cluster.cache.amazonaws.com:6379/0 ``` ### GCP GKE ```bash # Enable Workload Identity gcloud container clusters update my-cluster \ --workload-pool=PROJECT_ID.svc.id.goog # Create service account with Secret Manager access gcloud iam service-accounts create llm-gateway gcloud projects add-iam-policy-binding PROJECT_ID \ --member "serviceAccount:llm-gateway@PROJECT_ID.iam.gserviceaccount.com" \ --role "roles/secretmanager.secretAccessor" # Bind K8s SA to GCP SA kubectl annotate serviceaccount llm-gateway \ -n llm-gateway \ iam.gke.io/gcp-service-account=llm-gateway@PROJECT_ID.iam.gserviceaccount.com ``` **Memorystore Redis:** ```yaml conversations: store: redis dsn: redis://10.0.0.3:6379/0 # Private IP from Memorystore ``` ### Azure AKS ```bash # Install Application Gateway Ingress Controller az aks enable-addons \ --resource-group myResourceGroup \ --name myAKSCluster \ --addons ingress-appgw \ --appgw-name myApplicationGateway # Configure Azure AD Workload Identity az aks update \ --resource-group myResourceGroup \ --name myAKSCluster \ --enable-oidc-issuer \ --enable-workload-identity ``` **Azure Key Vault with ESO:** ```yaml apiVersion: external-secrets.io/v1beta1 kind: SecretStore metadata: name: azure-keyvault spec: provider: azurekv: authType: WorkloadIdentity vaultUrl: https://my-vault.vault.azure.net ``` ## Troubleshooting ### Pods Not Starting ```bash # Check pod status kubectl get pods -n llm-gateway # Describe pod for events kubectl describe pod llm-gateway-xxx -n llm-gateway # Check logs kubectl logs -n llm-gateway llm-gateway-xxx # Check previous container logs (if crashed) kubectl logs -n llm-gateway llm-gateway-xxx --previous ``` **Common issues:** - Image pull errors: Check registry credentials - CrashLoopBackOff: Check logs for startup errors - Pending: Check resource quotas and node capacity ### Health Check Failures ```bash # Port-forward to test locally kubectl port-forward -n llm-gateway svc/llm-gateway 8080:80 # Test endpoints curl http://localhost:8080/health curl http://localhost:8080/ready # Check from inside pod kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- http://localhost:8080/health ``` ### Provider Connection Issues ```bash # Test egress from pod kubectl exec -n llm-gateway deployment/llm-gateway -- wget -O- https://api.openai.com # Check secrets kubectl get secret llm-gateway-secrets -n llm-gateway -o jsonpath='{.data.OPENAI_API_KEY}' | base64 -d # Verify network policies kubectl get networkpolicy -n llm-gateway kubectl describe networkpolicy llm-gateway -n llm-gateway ``` ### Redis Connection Issues ```bash # Test Redis connectivity kubectl exec -n llm-gateway deployment/llm-gateway -- nc -zv redis 6379 # Connect to Redis kubectl exec -it -n llm-gateway redis-0 -- redis-cli # Check Redis logs kubectl logs -n llm-gateway redis-0 ``` ### Performance Issues ```bash # Check resource usage kubectl top pods -n llm-gateway kubectl top nodes # Check HPA status kubectl describe hpa llm-gateway -n llm-gateway # Check for throttling kubectl describe pod llm-gateway-xxx -n llm-gateway | grep -i throttl ``` ### Debug Container For distroless/minimal images: ```bash # Use ephemeral debug container kubectl debug -it -n llm-gateway llm-gateway-xxx --image=busybox --target=gateway # Or use debug pod kubectl run debug --rm -it --image=nicolaka/netshoot -n llm-gateway -- /bin/bash ``` ## Useful Commands ```bash # View all resources kubectl get all -n llm-gateway # Check deployment status kubectl rollout status deployment/llm-gateway -n llm-gateway # Tail logs from all pods kubectl logs -n llm-gateway -l app=llm-gateway -f --max-log-requests=10 # Get events kubectl get events -n llm-gateway --sort-by='.lastTimestamp' # Check resource quotas kubectl describe resourcequota -n llm-gateway # Export current config kubectl get deployment llm-gateway -n llm-gateway -o yaml > deployment-backup.yaml # Force pod restart kubectl rollout restart deployment/llm-gateway -n llm-gateway # Delete and recreate deployment kubectl delete deployment llm-gateway -n llm-gateway kubectl apply -f deployment.yaml ``` ## Architecture Overview ``` ┌─────────────────────────────────────────────────┐ │ Internet / Load Balancer │ └────────────────────┬────────────────────────────┘ │ ▼ ┌──────────────────────┐ │ Ingress Controller │ │ (TLS/SSL) │ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Gateway Service │ │ (ClusterIP:80) │ └──────────┬───────────┘ │ ┌────────────┼────────────┐ ▼ ▼ ▼ ┌─────┐ ┌─────┐ ┌─────┐ │ Pod │ │ Pod │ │ Pod │ │ 1 │ │ 2 │ │ 3 │ └──┬──┘ └──┬──┘ └──┬──┘ │ │ │ └────────────┼────────────┘ │ ┌────────────┼────────────┐ ▼ ▼ ▼ ┌──────┐ ┌──────┐ ┌──────┐ │Redis │ │Prom │ │Tempo │ └──────┘ └──────┘ └──────┘ ``` ## Additional Resources - [Main Documentation](../README.md) - [Docker Deployment](../docs/DOCKER_DEPLOYMENT.md) - [Kubernetes Best Practices](https://kubernetes.io/docs/concepts/configuration/overview/) - [Prometheus Operator](https://prometheus-operator.dev/) - [External Secrets Operator](https://external-secrets.io/) - [cert-manager](https://cert-manager.io/)