7.5 KiB
RAG API Specification
Overview
This document defines the API contract between the integration layer (capa-de-integracion) and the RAG server.
The RAG server replaces Dialogflow CX for intent detection and response generation using Retrieval-Augmented Generation.
Base URL
https://your-rag-server.com/api/v1
Authentication
- Method: API Key (optional)
- Header:
X-API-Key: <your-api-key>
Endpoint: Query
POST /query
Process a user message or notification and return a generated response.
Request
Headers:
Content-Type: application/jsonX-API-Key: <api-key>(optional)
Body:
{
"phone_number": "string (required)",
"text": "string (required - obfuscated user input or notification text)",
"type": "string (optional: 'conversation' or 'notification')",
"notification": {
"text": "string (optional - original notification text)",
"parameters": {
"key": "value"
}
},
"language_code": "string (optional, default: 'es')"
}
Field Descriptions:
| Field | Type | Required | Description |
|---|---|---|---|
phone_number |
string | ✅ Yes | User's phone number (used by RAG for internal conversation history tracking) |
text |
string | ✅ Yes | Obfuscated user input (already processed by DLP in integration layer) |
type |
string | ❌ No | Request type: "conversation" (default) or "notification" |
notification |
object | ❌ No | Present only when processing a notification-related query |
notification.text |
string | ❌ No | Original notification text (obfuscated) |
notification.parameters |
object | ❌ No | Key-value pairs of notification metadata |
language_code |
string | ❌ No | Language code (e.g., "es", "en"). Defaults to "es" |
Response
Status Code: 200 OK
Body:
{
"response_id": "string (unique identifier for this response)",
"response_text": "string (generated response)",
"parameters": {
"key": "value"
},
"confidence": 0.95
}
Field Descriptions:
| Field | Type | Description |
|---|---|---|
response_id |
string | Unique identifier for this RAG response (for tracking/logging) |
response_text |
string | The generated response text to send back to the user |
parameters |
object | Optional key-value pairs extracted or computed by RAG (can be empty) |
confidence |
number | Optional confidence score (0.0 - 1.0) |
Error Responses
400 Bad Request
Invalid request format or missing required fields.
{
"error": "Bad Request",
"message": "Missing required field: phone_number",
"status": 400
}
500 Internal Server Error
RAG server encountered an error processing the request.
{
"error": "Internal Server Error",
"message": "Failed to generate response",
"status": 500
}
503 Service Unavailable
RAG server is temporarily unavailable (triggers retry in client).
{
"error": "Service Unavailable",
"message": "RAG service is currently unavailable",
"status": 503
}
Example Requests
Example 1: Regular Conversation
POST /api/v1/query
{
"phone_number": "573001234567",
"text": "¿Cuál es el estado de mi solicitud?",
"type": "conversation",
"language_code": "es"
}
Response:
{
"response_id": "rag-resp-12345-67890",
"response_text": "Tu solicitud está en proceso de revisión. Te notificaremos cuando esté lista.",
"parameters": {},
"confidence": 0.92
}
Example 2: Notification Flow
POST /api/v1/query
{
"phone_number": "573001234567",
"text": "necesito más información",
"type": "notification",
"notification": {
"text": "Tu documento ha sido aprobado. Descárgalo desde el portal.",
"parameters": {
"document_id": "DOC-2025-001",
"status": "approved"
}
},
"language_code": "es"
}
Response:
{
"response_id": "rag-resp-12345-67891",
"response_text": "Puedes descargar tu documento aprobado ingresando al portal con tu número de documento DOC-2025-001.",
"parameters": {
"document_id": "DOC-2025-001"
},
"confidence": 0.88
}
Design Decisions
1. RAG Handles Conversation History Internally
- The RAG server maintains its own conversation history indexed by
phone_number - The integration layer will continue to store conversation history (redundant for now)
- This allows gradual migration without risk
2. No Session ID Required
- Unlike Dialogflow (complex session paths), RAG uses
phone_numberas the session identifier - Simpler and aligns with RAG's internal tracking
3. Notifications Are Contextual
- When a notification is active, the integration layer passes both:
- The user's query (
text) - The notification context (
notification.textandnotification.parameters)
- The user's query (
- RAG uses this context to generate relevant responses
4. Minimal Parameter Passing
- Only essential data is sent to RAG
- The integration layer can store additional metadata internally without sending it to RAG
- RAG can return parameters if needed (e.g., extracted entities)
5. Obfuscation Stays in Integration Layer
- DLP obfuscation happens before calling RAG
- RAG receives already-obfuscated text
- This maintains the existing security boundary
Non-Functional Requirements
Performance
- Target Response Time: < 2 seconds (p95)
- Timeout: 30 seconds (configurable in client)
Reliability
- Availability: 99.5%+
- Retry Strategy: Client will retry on 500, 503, 504 errors (exponential backoff)
Scalability
- Concurrent Requests: Support 100+ concurrent requests
- Rate Limiting: None (or specify if needed)
Migration Notes
What the Integration Layer Will Do:
✅ Continue to obfuscate text via DLP before calling RAG
✅ Continue to store conversation history in Memorystore + Firestore (redundant but safe)
✅ Continue to manage session timeouts (30 minutes)
✅ Continue to handle notification storage and retrieval
✅ Map DetectIntentRequestDTO → RAG request format
✅ Map RAG response → DetectIntentResponseDTO
What the RAG Server Will Do:
✅ Maintain its own conversation history by phone_number
✅ Use notification context when provided to generate relevant responses
✅ Generate responses using RAG (retrieval + generation)
✅ Return structured responses with optional parameters
What We're NOT Changing:
❌ External API contracts (controllers remain unchanged)
❌ DTO structures (DetectIntentRequestDTO, DetectIntentResponseDTO)
❌ Conversation storage logic (Memorystore + Firestore)
❌ DLP obfuscation flow
❌ Session management (30-minute timeout)
❌ Notification storage
Questions for RAG Team
Before implementation:
- Endpoint URL: What is the actual RAG server URL?
- Authentication: Do we need API key authentication? If yes, what's the header format?
- Timeout: What's a reasonable timeout? (We're using 30s as default)
- Rate Limiting: Any rate limits we should be aware of?
- Conversation History: Does RAG need explicit conversation history, or does it fetch by phone_number internally?
- Response Parameters: Will RAG return any extracted parameters, or just
response_text? - Health Check: Is there a
/healthendpoint for monitoring? - Versioning: Should we use
/api/v1/queryor a different version?
Changelog
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2025-02-22 | Initial specification based on 3 core requirements |