Add RAG client

2026-02-22 22:45:47 +00:00
parent 3c1c1a246a
commit 10520012d4
189 changed files with 10690 additions and 31 deletions
--- a/docs/rag-api-specification.md
+++ b/docs/rag-api-specification.md
@@ -0,0 +1,268 @@
+# RAG API Specification
+
+## Overview
+This document defines the API contract between the integration layer (`capa-de-integracion`) and the RAG server.
+
+The RAG server replaces Dialogflow CX for intent detection and response generation using Retrieval-Augmented Generation.
+
+## Base URL
+```
+https://your-rag-server.com/api/v1
+```
+
+## Authentication
+- Method: API Key (optional)
+- Header: `X-API-Key: <your-api-key>`
+
+---
+
+## Endpoint: Query
+
+### **POST /query**
+
+Process a user message or notification and return a generated response.
+
+### Request
+
+**Headers:**
+- `Content-Type: application/json`
+- `X-API-Key: <api-key>` (optional)
+
+**Body:**
+```json
+{
+  "phone_number": "string (required)",
+  "text": "string (required - obfuscated user input or notification text)",
+  "type": "string (optional: 'conversation' or 'notification')",
+  "notification": {
+    "text": "string (optional - original notification text)",
+    "parameters": {
+      "key": "value"
+    }
+  },
+  "language_code": "string (optional, default: 'es')"
+}
+```
+
+**Field Descriptions:**
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `phone_number` | string | ✅ Yes | User's phone number (used by RAG for internal conversation history tracking) |
+| `text` | string | ✅ Yes | Obfuscated user input (already processed by DLP in integration layer) |
+| `type` | string | ❌ No | Request type: `"conversation"` (default) or `"notification"` |
+| `notification` | object | ❌ No | Present only when processing a notification-related query |
+| `notification.text` | string | ❌ No | Original notification text (obfuscated) |
+| `notification.parameters` | object | ❌ No | Key-value pairs of notification metadata |
+| `language_code` | string | ❌ No | Language code (e.g., `"es"`, `"en"`). Defaults to `"es"` |
+
+### Response
+
+**Status Code:** `200 OK`
+
+**Body:**
+```json
+{
+  "response_id": "string (unique identifier for this response)",
+  "response_text": "string (generated response)",
+  "parameters": {
+    "key": "value"
+  },
+  "confidence": 0.95
+}
+```
+
+**Field Descriptions:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `response_id` | string | Unique identifier for this RAG response (for tracking/logging) |
+| `response_text` | string | The generated response text to send back to the user |
+| `parameters` | object | Optional key-value pairs extracted or computed by RAG (can be empty) |
+| `confidence` | number | Optional confidence score (0.0 - 1.0) |
+
+---
+
+## Error Responses
+
+### **400 Bad Request**
+Invalid request format or missing required fields.
+
+```json
+{
+  "error": "Bad Request",
+  "message": "Missing required field: phone_number",
+  "status": 400
+}
+```
+
+### **500 Internal Server Error**
+RAG server encountered an error processing the request.
+
+```json
+{
+  "error": "Internal Server Error",
+  "message": "Failed to generate response",
+  "status": 500
+}
+```
+
+### **503 Service Unavailable**
+RAG server is temporarily unavailable (triggers retry in client).
+
+```json
+{
+  "error": "Service Unavailable",
+  "message": "RAG service is currently unavailable",
+  "status": 503
+}
+```
+
+---
+
+## Example Requests
+
+### Example 1: Regular Conversation
+```json
+POST /api/v1/query
+{
+  "phone_number": "573001234567",
+  "text": "¿Cuál es el estado de mi solicitud?",
+  "type": "conversation",
+  "language_code": "es"
+}
+```
+
+**Response:**
+```json
+{
+  "response_id": "rag-resp-12345-67890",
+  "response_text": "Tu solicitud está en proceso de revisión. Te notificaremos cuando esté lista.",
+  "parameters": {},
+  "confidence": 0.92
+}
+```
+
+### Example 2: Notification Flow
+```json
+POST /api/v1/query
+{
+  "phone_number": "573001234567",
+  "text": "necesito más información",
+  "type": "notification",
+  "notification": {
+    "text": "Tu documento ha sido aprobado. Descárgalo desde el portal.",
+    "parameters": {
+      "document_id": "DOC-2025-001",
+      "status": "approved"
+    }
+  },
+  "language_code": "es"
+}
+```
+
+**Response:**
+```json
+{
+  "response_id": "rag-resp-12345-67891",
+  "response_text": "Puedes descargar tu documento aprobado ingresando al portal con tu número de documento DOC-2025-001.",
+  "parameters": {
+    "document_id": "DOC-2025-001"
+  },
+  "confidence": 0.88
+}
+```
+
+---
+
+## Design Decisions
+
+### 1. **RAG Handles Conversation History Internally**
+- The RAG server maintains its own conversation history indexed by `phone_number`
+- The integration layer will continue to store conversation history (redundant for now)
+- This allows gradual migration without risk
+
+### 2. **No Session ID Required**
+- Unlike Dialogflow (complex session paths), RAG uses `phone_number` as the session identifier
+- Simpler and aligns with RAG's internal tracking
+
+### 3. **Notifications Are Contextual**
+- When a notification is active, the integration layer passes both:
+  - The user's query (`text`)
+  - The notification context (`notification.text` and `notification.parameters`)
+- RAG uses this context to generate relevant responses
+
+### 4. **Minimal Parameter Passing**
+- Only essential data is sent to RAG
+- The integration layer can store additional metadata internally without sending it to RAG
+- RAG can return parameters if needed (e.g., extracted entities)
+
+### 5. **Obfuscation Stays in Integration Layer**
+- DLP obfuscation happens before calling RAG
+- RAG receives already-obfuscated text
+- This maintains the existing security boundary
+
+---
+
+## Non-Functional Requirements
+
+### Performance
+- **Target Response Time:** < 2 seconds (p95)
+- **Timeout:** 30 seconds (configurable in client)
+
+### Reliability
+- **Availability:** 99.5%+
+- **Retry Strategy:** Client will retry on 500, 503, 504 errors (exponential backoff)
+
+### Scalability
+- **Concurrent Requests:** Support 100+ concurrent requests
+- **Rate Limiting:** None (or specify if needed)
+
+---
+
+## Migration Notes
+
+### What the Integration Layer Will Do:
+✅ Continue to obfuscate text via DLP before calling RAG
+✅ Continue to store conversation history in Memorystore + Firestore (redundant but safe)
+✅ Continue to manage session timeouts (30 minutes)
+✅ Continue to handle notification storage and retrieval
+✅ Map `DetectIntentRequestDTO` → RAG request format
+✅ Map RAG response → `DetectIntentResponseDTO`
+
+### What the RAG Server Will Do:
+✅ Maintain its own conversation history by `phone_number`
+✅ Use notification context when provided to generate relevant responses
+✅ Generate responses using RAG (retrieval + generation)
+✅ Return structured responses with optional parameters
+
+### What We're NOT Changing:
+❌ External API contracts (controllers remain unchanged)
+❌ DTO structures (`DetectIntentRequestDTO`, `DetectIntentResponseDTO`)
+❌ Conversation storage logic (Memorystore + Firestore)
+❌ DLP obfuscation flow
+❌ Session management (30-minute timeout)
+❌ Notification storage
+
+---
+
+## Questions for RAG Team
+
+Before implementation:
+
+1. **Endpoint URL:** What is the actual RAG server URL?
+2. **Authentication:** Do we need API key authentication? If yes, what's the header format?
+3. **Timeout:** What's a reasonable timeout? (We're using 30s as default)
+4. **Rate Limiting:** Any rate limits we should be aware of?
+5. **Conversation History:** Does RAG need explicit conversation history, or does it fetch by phone_number internally?
+6. **Response Parameters:** Will RAG return any extracted parameters, or just `response_text`?
+7. **Health Check:** Is there a `/health` endpoint for monitoring?
+8. **Versioning:** Should we use `/api/v1/query` or a different version?
+
+---
+
+## Changelog
+
+| Version | Date | Changes |
+|---------|------|---------|
+| 1.0 | 2025-02-22 | Initial specification based on 3 core requirements |