Add RAG client
This commit is contained in:
268
docs/rag-api-specification.md
Normal file
268
docs/rag-api-specification.md
Normal file
@@ -0,0 +1,268 @@
|
||||
# RAG API Specification
|
||||
|
||||
## Overview
|
||||
This document defines the API contract between the integration layer (`capa-de-integracion`) and the RAG server.
|
||||
|
||||
The RAG server replaces Dialogflow CX for intent detection and response generation using Retrieval-Augmented Generation.
|
||||
|
||||
## Base URL
|
||||
```
|
||||
https://your-rag-server.com/api/v1
|
||||
```
|
||||
|
||||
## Authentication
|
||||
- Method: API Key (optional)
|
||||
- Header: `X-API-Key: <your-api-key>`
|
||||
|
||||
---
|
||||
|
||||
## Endpoint: Query
|
||||
|
||||
### **POST /query**
|
||||
|
||||
Process a user message or notification and return a generated response.
|
||||
|
||||
### Request
|
||||
|
||||
**Headers:**
|
||||
- `Content-Type: application/json`
|
||||
- `X-API-Key: <api-key>` (optional)
|
||||
|
||||
**Body:**
|
||||
```json
|
||||
{
|
||||
"phone_number": "string (required)",
|
||||
"text": "string (required - obfuscated user input or notification text)",
|
||||
"type": "string (optional: 'conversation' or 'notification')",
|
||||
"notification": {
|
||||
"text": "string (optional - original notification text)",
|
||||
"parameters": {
|
||||
"key": "value"
|
||||
}
|
||||
},
|
||||
"language_code": "string (optional, default: 'es')"
|
||||
}
|
||||
```
|
||||
|
||||
**Field Descriptions:**
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `phone_number` | string | ✅ Yes | User's phone number (used by RAG for internal conversation history tracking) |
|
||||
| `text` | string | ✅ Yes | Obfuscated user input (already processed by DLP in integration layer) |
|
||||
| `type` | string | ❌ No | Request type: `"conversation"` (default) or `"notification"` |
|
||||
| `notification` | object | ❌ No | Present only when processing a notification-related query |
|
||||
| `notification.text` | string | ❌ No | Original notification text (obfuscated) |
|
||||
| `notification.parameters` | object | ❌ No | Key-value pairs of notification metadata |
|
||||
| `language_code` | string | ❌ No | Language code (e.g., `"es"`, `"en"`). Defaults to `"es"` |
|
||||
|
||||
### Response
|
||||
|
||||
**Status Code:** `200 OK`
|
||||
|
||||
**Body:**
|
||||
```json
|
||||
{
|
||||
"response_id": "string (unique identifier for this response)",
|
||||
"response_text": "string (generated response)",
|
||||
"parameters": {
|
||||
"key": "value"
|
||||
},
|
||||
"confidence": 0.95
|
||||
}
|
||||
```
|
||||
|
||||
**Field Descriptions:**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `response_id` | string | Unique identifier for this RAG response (for tracking/logging) |
|
||||
| `response_text` | string | The generated response text to send back to the user |
|
||||
| `parameters` | object | Optional key-value pairs extracted or computed by RAG (can be empty) |
|
||||
| `confidence` | number | Optional confidence score (0.0 - 1.0) |
|
||||
|
||||
---
|
||||
|
||||
## Error Responses
|
||||
|
||||
### **400 Bad Request**
|
||||
Invalid request format or missing required fields.
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "Bad Request",
|
||||
"message": "Missing required field: phone_number",
|
||||
"status": 400
|
||||
}
|
||||
```
|
||||
|
||||
### **500 Internal Server Error**
|
||||
RAG server encountered an error processing the request.
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "Internal Server Error",
|
||||
"message": "Failed to generate response",
|
||||
"status": 500
|
||||
}
|
||||
```
|
||||
|
||||
### **503 Service Unavailable**
|
||||
RAG server is temporarily unavailable (triggers retry in client).
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "Service Unavailable",
|
||||
"message": "RAG service is currently unavailable",
|
||||
"status": 503
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example Requests
|
||||
|
||||
### Example 1: Regular Conversation
|
||||
```json
|
||||
POST /api/v1/query
|
||||
{
|
||||
"phone_number": "573001234567",
|
||||
"text": "¿Cuál es el estado de mi solicitud?",
|
||||
"type": "conversation",
|
||||
"language_code": "es"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"response_id": "rag-resp-12345-67890",
|
||||
"response_text": "Tu solicitud está en proceso de revisión. Te notificaremos cuando esté lista.",
|
||||
"parameters": {},
|
||||
"confidence": 0.92
|
||||
}
|
||||
```
|
||||
|
||||
### Example 2: Notification Flow
|
||||
```json
|
||||
POST /api/v1/query
|
||||
{
|
||||
"phone_number": "573001234567",
|
||||
"text": "necesito más información",
|
||||
"type": "notification",
|
||||
"notification": {
|
||||
"text": "Tu documento ha sido aprobado. Descárgalo desde el portal.",
|
||||
"parameters": {
|
||||
"document_id": "DOC-2025-001",
|
||||
"status": "approved"
|
||||
}
|
||||
},
|
||||
"language_code": "es"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"response_id": "rag-resp-12345-67891",
|
||||
"response_text": "Puedes descargar tu documento aprobado ingresando al portal con tu número de documento DOC-2025-001.",
|
||||
"parameters": {
|
||||
"document_id": "DOC-2025-001"
|
||||
},
|
||||
"confidence": 0.88
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### 1. **RAG Handles Conversation History Internally**
|
||||
- The RAG server maintains its own conversation history indexed by `phone_number`
|
||||
- The integration layer will continue to store conversation history (redundant for now)
|
||||
- This allows gradual migration without risk
|
||||
|
||||
### 2. **No Session ID Required**
|
||||
- Unlike Dialogflow (complex session paths), RAG uses `phone_number` as the session identifier
|
||||
- Simpler and aligns with RAG's internal tracking
|
||||
|
||||
### 3. **Notifications Are Contextual**
|
||||
- When a notification is active, the integration layer passes both:
|
||||
- The user's query (`text`)
|
||||
- The notification context (`notification.text` and `notification.parameters`)
|
||||
- RAG uses this context to generate relevant responses
|
||||
|
||||
### 4. **Minimal Parameter Passing**
|
||||
- Only essential data is sent to RAG
|
||||
- The integration layer can store additional metadata internally without sending it to RAG
|
||||
- RAG can return parameters if needed (e.g., extracted entities)
|
||||
|
||||
### 5. **Obfuscation Stays in Integration Layer**
|
||||
- DLP obfuscation happens before calling RAG
|
||||
- RAG receives already-obfuscated text
|
||||
- This maintains the existing security boundary
|
||||
|
||||
---
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
### Performance
|
||||
- **Target Response Time:** < 2 seconds (p95)
|
||||
- **Timeout:** 30 seconds (configurable in client)
|
||||
|
||||
### Reliability
|
||||
- **Availability:** 99.5%+
|
||||
- **Retry Strategy:** Client will retry on 500, 503, 504 errors (exponential backoff)
|
||||
|
||||
### Scalability
|
||||
- **Concurrent Requests:** Support 100+ concurrent requests
|
||||
- **Rate Limiting:** None (or specify if needed)
|
||||
|
||||
---
|
||||
|
||||
## Migration Notes
|
||||
|
||||
### What the Integration Layer Will Do:
|
||||
✅ Continue to obfuscate text via DLP before calling RAG
|
||||
✅ Continue to store conversation history in Memorystore + Firestore (redundant but safe)
|
||||
✅ Continue to manage session timeouts (30 minutes)
|
||||
✅ Continue to handle notification storage and retrieval
|
||||
✅ Map `DetectIntentRequestDTO` → RAG request format
|
||||
✅ Map RAG response → `DetectIntentResponseDTO`
|
||||
|
||||
### What the RAG Server Will Do:
|
||||
✅ Maintain its own conversation history by `phone_number`
|
||||
✅ Use notification context when provided to generate relevant responses
|
||||
✅ Generate responses using RAG (retrieval + generation)
|
||||
✅ Return structured responses with optional parameters
|
||||
|
||||
### What We're NOT Changing:
|
||||
❌ External API contracts (controllers remain unchanged)
|
||||
❌ DTO structures (`DetectIntentRequestDTO`, `DetectIntentResponseDTO`)
|
||||
❌ Conversation storage logic (Memorystore + Firestore)
|
||||
❌ DLP obfuscation flow
|
||||
❌ Session management (30-minute timeout)
|
||||
❌ Notification storage
|
||||
|
||||
---
|
||||
|
||||
## Questions for RAG Team
|
||||
|
||||
Before implementation:
|
||||
|
||||
1. **Endpoint URL:** What is the actual RAG server URL?
|
||||
2. **Authentication:** Do we need API key authentication? If yes, what's the header format?
|
||||
3. **Timeout:** What's a reasonable timeout? (We're using 30s as default)
|
||||
4. **Rate Limiting:** Any rate limits we should be aware of?
|
||||
5. **Conversation History:** Does RAG need explicit conversation history, or does it fetch by phone_number internally?
|
||||
6. **Response Parameters:** Will RAG return any extracted parameters, or just `response_text`?
|
||||
7. **Health Check:** Is there a `/health` endpoint for monitoring?
|
||||
8. **Versioning:** Should we use `/api/v1/query` or a different version?
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| 1.0 | 2025-02-22 | Initial specification based on 3 core requirements |
|
||||
Reference in New Issue
Block a user