269 lines
7.5 KiB
Markdown
269 lines
7.5 KiB
Markdown
# RAG API Specification
|
|
|
|
## Overview
|
|
This document defines the API contract between the integration layer (`capa-de-integracion`) and the RAG server.
|
|
|
|
The RAG server replaces Dialogflow CX for intent detection and response generation using Retrieval-Augmented Generation.
|
|
|
|
## Base URL
|
|
```
|
|
https://your-rag-server.com/api/v1
|
|
```
|
|
|
|
## Authentication
|
|
- Method: API Key (optional)
|
|
- Header: `X-API-Key: <your-api-key>`
|
|
|
|
---
|
|
|
|
## Endpoint: Query
|
|
|
|
### **POST /query**
|
|
|
|
Process a user message or notification and return a generated response.
|
|
|
|
### Request
|
|
|
|
**Headers:**
|
|
- `Content-Type: application/json`
|
|
- `X-API-Key: <api-key>` (optional)
|
|
|
|
**Body:**
|
|
```json
|
|
{
|
|
"phone_number": "string (required)",
|
|
"text": "string (required - obfuscated user input or notification text)",
|
|
"type": "string (optional: 'conversation' or 'notification')",
|
|
"notification": {
|
|
"text": "string (optional - original notification text)",
|
|
"parameters": {
|
|
"key": "value"
|
|
}
|
|
},
|
|
"language_code": "string (optional, default: 'es')"
|
|
}
|
|
```
|
|
|
|
**Field Descriptions:**
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `phone_number` | string | ✅ Yes | User's phone number (used by RAG for internal conversation history tracking) |
|
|
| `text` | string | ✅ Yes | Obfuscated user input (already processed by DLP in integration layer) |
|
|
| `type` | string | ❌ No | Request type: `"conversation"` (default) or `"notification"` |
|
|
| `notification` | object | ❌ No | Present only when processing a notification-related query |
|
|
| `notification.text` | string | ❌ No | Original notification text (obfuscated) |
|
|
| `notification.parameters` | object | ❌ No | Key-value pairs of notification metadata |
|
|
| `language_code` | string | ❌ No | Language code (e.g., `"es"`, `"en"`). Defaults to `"es"` |
|
|
|
|
### Response
|
|
|
|
**Status Code:** `200 OK`
|
|
|
|
**Body:**
|
|
```json
|
|
{
|
|
"response_id": "string (unique identifier for this response)",
|
|
"response_text": "string (generated response)",
|
|
"parameters": {
|
|
"key": "value"
|
|
},
|
|
"confidence": 0.95
|
|
}
|
|
```
|
|
|
|
**Field Descriptions:**
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `response_id` | string | Unique identifier for this RAG response (for tracking/logging) |
|
|
| `response_text` | string | The generated response text to send back to the user |
|
|
| `parameters` | object | Optional key-value pairs extracted or computed by RAG (can be empty) |
|
|
| `confidence` | number | Optional confidence score (0.0 - 1.0) |
|
|
|
|
---
|
|
|
|
## Error Responses
|
|
|
|
### **400 Bad Request**
|
|
Invalid request format or missing required fields.
|
|
|
|
```json
|
|
{
|
|
"error": "Bad Request",
|
|
"message": "Missing required field: phone_number",
|
|
"status": 400
|
|
}
|
|
```
|
|
|
|
### **500 Internal Server Error**
|
|
RAG server encountered an error processing the request.
|
|
|
|
```json
|
|
{
|
|
"error": "Internal Server Error",
|
|
"message": "Failed to generate response",
|
|
"status": 500
|
|
}
|
|
```
|
|
|
|
### **503 Service Unavailable**
|
|
RAG server is temporarily unavailable (triggers retry in client).
|
|
|
|
```json
|
|
{
|
|
"error": "Service Unavailable",
|
|
"message": "RAG service is currently unavailable",
|
|
"status": 503
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Example Requests
|
|
|
|
### Example 1: Regular Conversation
|
|
```json
|
|
POST /api/v1/query
|
|
{
|
|
"phone_number": "573001234567",
|
|
"text": "¿Cuál es el estado de mi solicitud?",
|
|
"type": "conversation",
|
|
"language_code": "es"
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"response_id": "rag-resp-12345-67890",
|
|
"response_text": "Tu solicitud está en proceso de revisión. Te notificaremos cuando esté lista.",
|
|
"parameters": {},
|
|
"confidence": 0.92
|
|
}
|
|
```
|
|
|
|
### Example 2: Notification Flow
|
|
```json
|
|
POST /api/v1/query
|
|
{
|
|
"phone_number": "573001234567",
|
|
"text": "necesito más información",
|
|
"type": "notification",
|
|
"notification": {
|
|
"text": "Tu documento ha sido aprobado. Descárgalo desde el portal.",
|
|
"parameters": {
|
|
"document_id": "DOC-2025-001",
|
|
"status": "approved"
|
|
}
|
|
},
|
|
"language_code": "es"
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"response_id": "rag-resp-12345-67891",
|
|
"response_text": "Puedes descargar tu documento aprobado ingresando al portal con tu número de documento DOC-2025-001.",
|
|
"parameters": {
|
|
"document_id": "DOC-2025-001"
|
|
},
|
|
"confidence": 0.88
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Design Decisions
|
|
|
|
### 1. **RAG Handles Conversation History Internally**
|
|
- The RAG server maintains its own conversation history indexed by `phone_number`
|
|
- The integration layer will continue to store conversation history (redundant for now)
|
|
- This allows gradual migration without risk
|
|
|
|
### 2. **No Session ID Required**
|
|
- Unlike Dialogflow (complex session paths), RAG uses `phone_number` as the session identifier
|
|
- Simpler and aligns with RAG's internal tracking
|
|
|
|
### 3. **Notifications Are Contextual**
|
|
- When a notification is active, the integration layer passes both:
|
|
- The user's query (`text`)
|
|
- The notification context (`notification.text` and `notification.parameters`)
|
|
- RAG uses this context to generate relevant responses
|
|
|
|
### 4. **Minimal Parameter Passing**
|
|
- Only essential data is sent to RAG
|
|
- The integration layer can store additional metadata internally without sending it to RAG
|
|
- RAG can return parameters if needed (e.g., extracted entities)
|
|
|
|
### 5. **Obfuscation Stays in Integration Layer**
|
|
- DLP obfuscation happens before calling RAG
|
|
- RAG receives already-obfuscated text
|
|
- This maintains the existing security boundary
|
|
|
|
---
|
|
|
|
## Non-Functional Requirements
|
|
|
|
### Performance
|
|
- **Target Response Time:** < 2 seconds (p95)
|
|
- **Timeout:** 30 seconds (configurable in client)
|
|
|
|
### Reliability
|
|
- **Availability:** 99.5%+
|
|
- **Retry Strategy:** Client will retry on 500, 503, 504 errors (exponential backoff)
|
|
|
|
### Scalability
|
|
- **Concurrent Requests:** Support 100+ concurrent requests
|
|
- **Rate Limiting:** None (or specify if needed)
|
|
|
|
---
|
|
|
|
## Migration Notes
|
|
|
|
### What the Integration Layer Will Do:
|
|
✅ Continue to obfuscate text via DLP before calling RAG
|
|
✅ Continue to store conversation history in Memorystore + Firestore (redundant but safe)
|
|
✅ Continue to manage session timeouts (30 minutes)
|
|
✅ Continue to handle notification storage and retrieval
|
|
✅ Map `DetectIntentRequestDTO` → RAG request format
|
|
✅ Map RAG response → `DetectIntentResponseDTO`
|
|
|
|
### What the RAG Server Will Do:
|
|
✅ Maintain its own conversation history by `phone_number`
|
|
✅ Use notification context when provided to generate relevant responses
|
|
✅ Generate responses using RAG (retrieval + generation)
|
|
✅ Return structured responses with optional parameters
|
|
|
|
### What We're NOT Changing:
|
|
❌ External API contracts (controllers remain unchanged)
|
|
❌ DTO structures (`DetectIntentRequestDTO`, `DetectIntentResponseDTO`)
|
|
❌ Conversation storage logic (Memorystore + Firestore)
|
|
❌ DLP obfuscation flow
|
|
❌ Session management (30-minute timeout)
|
|
❌ Notification storage
|
|
|
|
---
|
|
|
|
## Questions for RAG Team
|
|
|
|
Before implementation:
|
|
|
|
1. **Endpoint URL:** What is the actual RAG server URL?
|
|
2. **Authentication:** Do we need API key authentication? If yes, what's the header format?
|
|
3. **Timeout:** What's a reasonable timeout? (We're using 30s as default)
|
|
4. **Rate Limiting:** Any rate limits we should be aware of?
|
|
5. **Conversation History:** Does RAG need explicit conversation history, or does it fetch by phone_number internally?
|
|
6. **Response Parameters:** Will RAG return any extracted parameters, or just `response_text`?
|
|
7. **Health Check:** Is there a `/health` endpoint for monitoring?
|
|
8. **Versioning:** Should we use `/api/v1/query` or a different version?
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
| Version | Date | Changes |
|
|
|---------|------|---------|
|
|
| 1.0 | 2025-02-22 | Initial specification based on 3 core requirements |
|