# RAG API Specification ## Overview This document defines the API contract between the integration layer (`capa-de-integracion`) and the RAG server. The RAG server replaces Dialogflow CX for intent detection and response generation using Retrieval-Augmented Generation. ## Base URL ``` https://your-rag-server.com/api/v1 ``` ## Authentication - Method: API Key (optional) - Header: `X-API-Key: ` --- ## Endpoint: Query ### **POST /query** Process a user message or notification and return a generated response. ### Request **Headers:** - `Content-Type: application/json` - `X-API-Key: ` (optional) **Body:** ```json { "phone_number": "string (required)", "text": "string (required - obfuscated user input or notification text)", "type": "string (optional: 'conversation' or 'notification')", "notification": { "text": "string (optional - original notification text)", "parameters": { "key": "value" } }, "language_code": "string (optional, default: 'es')" } ``` **Field Descriptions:** | Field | Type | Required | Description | |-------|------|----------|-------------| | `phone_number` | string | ✅ Yes | User's phone number (used by RAG for internal conversation history tracking) | | `text` | string | ✅ Yes | Obfuscated user input (already processed by DLP in integration layer) | | `type` | string | ❌ No | Request type: `"conversation"` (default) or `"notification"` | | `notification` | object | ❌ No | Present only when processing a notification-related query | | `notification.text` | string | ❌ No | Original notification text (obfuscated) | | `notification.parameters` | object | ❌ No | Key-value pairs of notification metadata | | `language_code` | string | ❌ No | Language code (e.g., `"es"`, `"en"`). Defaults to `"es"` | ### Response **Status Code:** `200 OK` **Body:** ```json { "response_id": "string (unique identifier for this response)", "response_text": "string (generated response)", "parameters": { "key": "value" }, "confidence": 0.95 } ``` **Field Descriptions:** | Field | Type | Description | |-------|------|-------------| | `response_id` | string | Unique identifier for this RAG response (for tracking/logging) | | `response_text` | string | The generated response text to send back to the user | | `parameters` | object | Optional key-value pairs extracted or computed by RAG (can be empty) | | `confidence` | number | Optional confidence score (0.0 - 1.0) | --- ## Error Responses ### **400 Bad Request** Invalid request format or missing required fields. ```json { "error": "Bad Request", "message": "Missing required field: phone_number", "status": 400 } ``` ### **500 Internal Server Error** RAG server encountered an error processing the request. ```json { "error": "Internal Server Error", "message": "Failed to generate response", "status": 500 } ``` ### **503 Service Unavailable** RAG server is temporarily unavailable (triggers retry in client). ```json { "error": "Service Unavailable", "message": "RAG service is currently unavailable", "status": 503 } ``` --- ## Example Requests ### Example 1: Regular Conversation ```json POST /api/v1/query { "phone_number": "573001234567", "text": "¿Cuál es el estado de mi solicitud?", "type": "conversation", "language_code": "es" } ``` **Response:** ```json { "response_id": "rag-resp-12345-67890", "response_text": "Tu solicitud está en proceso de revisión. Te notificaremos cuando esté lista.", "parameters": {}, "confidence": 0.92 } ``` ### Example 2: Notification Flow ```json POST /api/v1/query { "phone_number": "573001234567", "text": "necesito más información", "type": "notification", "notification": { "text": "Tu documento ha sido aprobado. Descárgalo desde el portal.", "parameters": { "document_id": "DOC-2025-001", "status": "approved" } }, "language_code": "es" } ``` **Response:** ```json { "response_id": "rag-resp-12345-67891", "response_text": "Puedes descargar tu documento aprobado ingresando al portal con tu número de documento DOC-2025-001.", "parameters": { "document_id": "DOC-2025-001" }, "confidence": 0.88 } ``` --- ## Design Decisions ### 1. **RAG Handles Conversation History Internally** - The RAG server maintains its own conversation history indexed by `phone_number` - The integration layer will continue to store conversation history (redundant for now) - This allows gradual migration without risk ### 2. **No Session ID Required** - Unlike Dialogflow (complex session paths), RAG uses `phone_number` as the session identifier - Simpler and aligns with RAG's internal tracking ### 3. **Notifications Are Contextual** - When a notification is active, the integration layer passes both: - The user's query (`text`) - The notification context (`notification.text` and `notification.parameters`) - RAG uses this context to generate relevant responses ### 4. **Minimal Parameter Passing** - Only essential data is sent to RAG - The integration layer can store additional metadata internally without sending it to RAG - RAG can return parameters if needed (e.g., extracted entities) ### 5. **Obfuscation Stays in Integration Layer** - DLP obfuscation happens before calling RAG - RAG receives already-obfuscated text - This maintains the existing security boundary --- ## Non-Functional Requirements ### Performance - **Target Response Time:** < 2 seconds (p95) - **Timeout:** 30 seconds (configurable in client) ### Reliability - **Availability:** 99.5%+ - **Retry Strategy:** Client will retry on 500, 503, 504 errors (exponential backoff) ### Scalability - **Concurrent Requests:** Support 100+ concurrent requests - **Rate Limiting:** None (or specify if needed) --- ## Migration Notes ### What the Integration Layer Will Do: ✅ Continue to obfuscate text via DLP before calling RAG ✅ Continue to store conversation history in Memorystore + Firestore (redundant but safe) ✅ Continue to manage session timeouts (30 minutes) ✅ Continue to handle notification storage and retrieval ✅ Map `DetectIntentRequestDTO` → RAG request format ✅ Map RAG response → `DetectIntentResponseDTO` ### What the RAG Server Will Do: ✅ Maintain its own conversation history by `phone_number` ✅ Use notification context when provided to generate relevant responses ✅ Generate responses using RAG (retrieval + generation) ✅ Return structured responses with optional parameters ### What We're NOT Changing: ❌ External API contracts (controllers remain unchanged) ❌ DTO structures (`DetectIntentRequestDTO`, `DetectIntentResponseDTO`) ❌ Conversation storage logic (Memorystore + Firestore) ❌ DLP obfuscation flow ❌ Session management (30-minute timeout) ❌ Notification storage --- ## Questions for RAG Team Before implementation: 1. **Endpoint URL:** What is the actual RAG server URL? 2. **Authentication:** Do we need API key authentication? If yes, what's the header format? 3. **Timeout:** What's a reasonable timeout? (We're using 30s as default) 4. **Rate Limiting:** Any rate limits we should be aware of? 5. **Conversation History:** Does RAG need explicit conversation history, or does it fetch by phone_number internally? 6. **Response Parameters:** Will RAG return any extracted parameters, or just `response_text`? 7. **Health Check:** Is there a `/health` endpoint for monitoring? 8. **Versioning:** Should we use `/api/v1/query` or a different version? --- ## Changelog | Version | Date | Changes | |---------|------|---------| | 1.0 | 2025-02-22 | Initial specification based on 3 core requirements |