Files
int-layer/docs/rag-api-specification.md
2026-02-23 06:51:36 +00:00

7.5 KiB

RAG API Specification

Overview

This document defines the API contract between the integration layer (capa-de-integracion) and the RAG server.

The RAG server replaces Dialogflow CX for intent detection and response generation using Retrieval-Augmented Generation.

Base URL

https://your-rag-server.com/api/v1

Authentication

  • Method: API Key (optional)
  • Header: X-API-Key: <your-api-key>

Endpoint: Query

POST /query

Process a user message or notification and return a generated response.

Request

Headers:

  • Content-Type: application/json
  • X-API-Key: <api-key> (optional)

Body:

{
  "phone_number": "string (required)",
  "text": "string (required - obfuscated user input or notification text)",
  "type": "string (optional: 'conversation' or 'notification')",
  "notification": {
    "text": "string (optional - original notification text)",
    "parameters": {
      "key": "value"
    }
  },
  "language_code": "string (optional, default: 'es')"
}

Field Descriptions:

Field Type Required Description
phone_number string Yes User's phone number (used by RAG for internal conversation history tracking)
text string Yes Obfuscated user input (already processed by DLP in integration layer)
type string No Request type: "conversation" (default) or "notification"
notification object No Present only when processing a notification-related query
notification.text string No Original notification text (obfuscated)
notification.parameters object No Key-value pairs of notification metadata
language_code string No Language code (e.g., "es", "en"). Defaults to "es"

Response

Status Code: 200 OK

Body:

{
  "response_id": "string (unique identifier for this response)",
  "response_text": "string (generated response)",
  "parameters": {
    "key": "value"
  },
  "confidence": 0.95
}

Field Descriptions:

Field Type Description
response_id string Unique identifier for this RAG response (for tracking/logging)
response_text string The generated response text to send back to the user
parameters object Optional key-value pairs extracted or computed by RAG (can be empty)
confidence number Optional confidence score (0.0 - 1.0)

Error Responses

400 Bad Request

Invalid request format or missing required fields.

{
  "error": "Bad Request",
  "message": "Missing required field: phone_number",
  "status": 400
}

500 Internal Server Error

RAG server encountered an error processing the request.

{
  "error": "Internal Server Error",
  "message": "Failed to generate response",
  "status": 500
}

503 Service Unavailable

RAG server is temporarily unavailable (triggers retry in client).

{
  "error": "Service Unavailable",
  "message": "RAG service is currently unavailable",
  "status": 503
}

Example Requests

Example 1: Regular Conversation

POST /api/v1/query
{
  "phone_number": "573001234567",
  "text": "¿Cuál es el estado de mi solicitud?",
  "type": "conversation",
  "language_code": "es"
}

Response:

{
  "response_id": "rag-resp-12345-67890",
  "response_text": "Tu solicitud está en proceso de revisión. Te notificaremos cuando esté lista.",
  "parameters": {},
  "confidence": 0.92
}

Example 2: Notification Flow

POST /api/v1/query
{
  "phone_number": "573001234567",
  "text": "necesito más información",
  "type": "notification",
  "notification": {
    "text": "Tu documento ha sido aprobado. Descárgalo desde el portal.",
    "parameters": {
      "document_id": "DOC-2025-001",
      "status": "approved"
    }
  },
  "language_code": "es"
}

Response:

{
  "response_id": "rag-resp-12345-67891",
  "response_text": "Puedes descargar tu documento aprobado ingresando al portal con tu número de documento DOC-2025-001.",
  "parameters": {
    "document_id": "DOC-2025-001"
  },
  "confidence": 0.88
}

Design Decisions

1. RAG Handles Conversation History Internally

  • The RAG server maintains its own conversation history indexed by phone_number
  • The integration layer will continue to store conversation history (redundant for now)
  • This allows gradual migration without risk

2. No Session ID Required

  • Unlike Dialogflow (complex session paths), RAG uses phone_number as the session identifier
  • Simpler and aligns with RAG's internal tracking

3. Notifications Are Contextual

  • When a notification is active, the integration layer passes both:
    • The user's query (text)
    • The notification context (notification.text and notification.parameters)
  • RAG uses this context to generate relevant responses

4. Minimal Parameter Passing

  • Only essential data is sent to RAG
  • The integration layer can store additional metadata internally without sending it to RAG
  • RAG can return parameters if needed (e.g., extracted entities)

5. Obfuscation Stays in Integration Layer

  • DLP obfuscation happens before calling RAG
  • RAG receives already-obfuscated text
  • This maintains the existing security boundary

Non-Functional Requirements

Performance

  • Target Response Time: < 2 seconds (p95)
  • Timeout: 30 seconds (configurable in client)

Reliability

  • Availability: 99.5%+
  • Retry Strategy: Client will retry on 500, 503, 504 errors (exponential backoff)

Scalability

  • Concurrent Requests: Support 100+ concurrent requests
  • Rate Limiting: None (or specify if needed)

Migration Notes

What the Integration Layer Will Do:

Continue to obfuscate text via DLP before calling RAG Continue to store conversation history in Memorystore + Firestore (redundant but safe) Continue to manage session timeouts (30 minutes) Continue to handle notification storage and retrieval Map DetectIntentRequestDTO → RAG request format Map RAG response → DetectIntentResponseDTO

What the RAG Server Will Do:

Maintain its own conversation history by phone_number Use notification context when provided to generate relevant responses Generate responses using RAG (retrieval + generation) Return structured responses with optional parameters

What We're NOT Changing:

External API contracts (controllers remain unchanged) DTO structures (DetectIntentRequestDTO, DetectIntentResponseDTO) Conversation storage logic (Memorystore + Firestore) DLP obfuscation flow Session management (30-minute timeout) Notification storage


Questions for RAG Team

Before implementation:

  1. Endpoint URL: What is the actual RAG server URL?
  2. Authentication: Do we need API key authentication? If yes, what's the header format?
  3. Timeout: What's a reasonable timeout? (We're using 30s as default)
  4. Rate Limiting: Any rate limits we should be aware of?
  5. Conversation History: Does RAG need explicit conversation history, or does it fetch by phone_number internally?
  6. Response Parameters: Will RAG return any extracted parameters, or just response_text?
  7. Health Check: Is there a /health endpoint for monitoring?
  8. Versioning: Should we use /api/v1/query or a different version?

Changelog

Version Date Changes
1.0 2025-02-22 Initial specification based on 3 core requirements