Files
latticelm/SECURITY_IMPROVEMENTS.md

5.8 KiB

Security Improvements - March 2026

This document summarizes the security and reliability improvements made to the go-llm-gateway project.

Issues Fixed

1. Request Size Limits (Issue #2)

Problem: The server had no limits on request body size, making it vulnerable to DoS attacks via oversized payloads.

Solution: Implemented RequestSizeLimitMiddleware that enforces a maximum request body size.

Implementation Details:

  • Created internal/server/middleware.go with RequestSizeLimitMiddleware
  • Uses http.MaxBytesReader to enforce limits at the HTTP layer
  • Default limit: 10MB (10,485,760 bytes)
  • Configurable via server.max_request_body_size in config.yaml
  • Returns HTTP 413 (Request Entity Too Large) for oversized requests
  • Only applies to POST, PUT, and PATCH requests (not GET/DELETE)

Files Modified:

  • internal/server/middleware.go (new file)
  • internal/server/server.go (added 413 error handling)
  • cmd/gateway/main.go (integrated middleware)
  • internal/config/config.go (added config field)
  • config.example.yaml (documented configuration)

Testing:

  • Comprehensive test suite in internal/server/middleware_test.go
  • Tests cover: small payloads, exact size, oversized payloads, different HTTP methods
  • Integration test verifies middleware chain behavior

2. Panic Recovery Middleware (Issue #4)

Problem: Any panic in HTTP handlers would crash the entire server, causing downtime.

Solution: Implemented PanicRecoveryMiddleware that catches panics and returns proper error responses.

Implementation Details:

  • Created PanicRecoveryMiddleware in internal/server/middleware.go
  • Uses defer recover() pattern to catch all panics
  • Logs full stack trace with request context for debugging
  • Returns HTTP 500 (Internal Server Error) to clients
  • Positioned as the outermost middleware to catch panics from all layers

Files Modified:

  • internal/server/middleware.go (new file)
  • cmd/gateway/main.go (integrated as outermost middleware)

Testing:

  • Tests verify recovery from string panics, error panics, and struct panics
  • Integration test confirms panic recovery works through middleware chain
  • Logs are captured and verified to include stack traces

3. Error Handling Improvements (Bonus)

Problem: Multiple instances of ignored JSON encoding errors could lead to incomplete responses.

Solution: Fixed all ignored json.Encoder.Encode() errors throughout the codebase.

Files Modified:

  • internal/server/health.go (lines 32, 86)
  • internal/server/server.go (lines 72, 217)

All JSON encoding errors are now logged with proper context including request IDs.

Architecture

Middleware Chain Order

The middleware chain is now (from outermost to innermost):

  1. PanicRecoveryMiddleware - Catches all panics
  2. RequestSizeLimitMiddleware - Enforces body size limits
  3. loggingMiddleware - Request/response logging
  4. TracingMiddleware - OpenTelemetry tracing
  5. MetricsMiddleware - Prometheus metrics
  6. rateLimitMiddleware - Rate limiting
  7. authMiddleware - OIDC authentication
  8. routes - Application handlers

This order ensures:

  • Panics are caught from all middleware layers
  • Size limits are enforced before expensive operations
  • All requests are logged, traced, and metered
  • Security checks happen closest to the application

Configuration

Add to your config.yaml:

server:
  address: ":8080"
  max_request_body_size: 10485760  # 10MB in bytes (default)

To customize the size limit:

  • 1MB: 1048576
  • 5MB: 5242880
  • 10MB: 10485760 (default)
  • 50MB: 52428800

If not specified, defaults to 10MB.

Testing

All new functionality includes comprehensive tests:

# Run all tests
go test ./...

# Run only middleware tests
go test ./internal/server -v -run "TestPanicRecoveryMiddleware|TestRequestSizeLimitMiddleware"

# Run with coverage
go test ./internal/server -cover

Test Coverage:

  • internal/server/middleware.go: 100% coverage
  • All edge cases covered (panics, size limits, different HTTP methods)
  • Integration tests verify middleware chain interactions

Production Readiness

These changes significantly improve production readiness:

  1. DoS Protection: Request size limits prevent memory exhaustion attacks
  2. Fault Tolerance: Panic recovery prevents cascading failures
  3. Observability: All errors are logged with proper context
  4. Configurability: Limits can be tuned per deployment environment

Remaining Production Concerns

While these issues are fixed, the following should still be addressed:

  • HIGH: Exposed credentials in .env file (must rotate and remove from git)
  • MEDIUM: Observability code has 0% test coverage
  • MEDIUM: Conversation store has only 27% test coverage
  • LOW: Missing circuit breaker pattern for provider failures
  • LOW: No retry logic for failed provider requests

See the original assessment for complete details.

Verification

Build and verify the changes:

# Build the application
go build ./cmd/gateway

# Run the gateway
./gateway -config config.yaml

# Test with oversized payload (should return 413)
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d "$(python3 -c 'print("{\"data\":\"" + "x"*11000000 + "\"}")')"

Expected response: HTTP 413 Request Entity Too Large

References