va/int-layer

Fork 0

Files

Anibal Angulo 10520012d4 Add RAG client

2026-02-23 06:51:36 +00:00

7.5 KiB

Raw Blame History

RAG API Specification

Overview

This document defines the API contract between the integration layer (capa-de-integracion) and the RAG server.

The RAG server replaces Dialogflow CX for intent detection and response generation using Retrieval-Augmented Generation.

Base URL

https://your-rag-server.com/api/v1

Authentication

Method: API Key (optional)
Header: X-API-Key: <your-api-key>

Endpoint: Query

POST /query

Process a user message or notification and return a generated response.

Request

Headers:

Content-Type: application/json
X-API-Key: <api-key> (optional)

Body:

{
  "phone_number": "string (required)",
  "text": "string (required - obfuscated user input or notification text)",
  "type": "string (optional: 'conversation' or 'notification')",
  "notification": {
    "text": "string (optional - original notification text)",
    "parameters": {
      "key": "value"
    }
  },
  "language_code": "string (optional, default: 'es')"
}

Field Descriptions:

Field	Type	Required	Description
`phone_number`	string	✅ Yes	User's phone number (used by RAG for internal conversation history tracking)
`text`	string	✅ Yes	Obfuscated user input (already processed by DLP in integration layer)
`type`	string	❌ No	Request type: `"conversation"` (default) or `"notification"`
`notification`	object	❌ No	Present only when processing a notification-related query
`notification.text`	string	❌ No	Original notification text (obfuscated)
`notification.parameters`	object	❌ No	Key-value pairs of notification metadata
`language_code`	string	❌ No	Language code (e.g., `"es"`, `"en"`). Defaults to `"es"`

Response

Status Code: 200 OK

Body:

{
  "response_id": "string (unique identifier for this response)",
  "response_text": "string (generated response)",
  "parameters": {
    "key": "value"
  },
  "confidence": 0.95
}

Field Descriptions:

Field	Type	Description
`response_id`	string	Unique identifier for this RAG response (for tracking/logging)
`response_text`	string	The generated response text to send back to the user
`parameters`	object	Optional key-value pairs extracted or computed by RAG (can be empty)
`confidence`	number	Optional confidence score (0.0 - 1.0)

Error Responses

400 Bad Request

Invalid request format or missing required fields.

{
  "error": "Bad Request",
  "message": "Missing required field: phone_number",
  "status": 400
}

500 Internal Server Error

RAG server encountered an error processing the request.

{
  "error": "Internal Server Error",
  "message": "Failed to generate response",
  "status": 500
}

503 Service Unavailable

RAG server is temporarily unavailable (triggers retry in client).

{
  "error": "Service Unavailable",
  "message": "RAG service is currently unavailable",
  "status": 503
}

Example Requests

Example 1: Regular Conversation

POST /api/v1/query
{
  "phone_number": "573001234567",
  "text": "¿Cuál es el estado de mi solicitud?",
  "type": "conversation",
  "language_code": "es"
}

Response:

{
  "response_id": "rag-resp-12345-67890",
  "response_text": "Tu solicitud está en proceso de revisión. Te notificaremos cuando esté lista.",
  "parameters": {},
  "confidence": 0.92
}

Example 2: Notification Flow

POST /api/v1/query
{
  "phone_number": "573001234567",
  "text": "necesito más información",
  "type": "notification",
  "notification": {
    "text": "Tu documento ha sido aprobado. Descárgalo desde el portal.",
    "parameters": {
      "document_id": "DOC-2025-001",
      "status": "approved"
    }
  },
  "language_code": "es"
}

Response:

{
  "response_id": "rag-resp-12345-67891",
  "response_text": "Puedes descargar tu documento aprobado ingresando al portal con tu número de documento DOC-2025-001.",
  "parameters": {
    "document_id": "DOC-2025-001"
  },
  "confidence": 0.88
}

Design Decisions

1. RAG Handles Conversation History Internally

The RAG server maintains its own conversation history indexed by phone_number
The integration layer will continue to store conversation history (redundant for now)
This allows gradual migration without risk

2. No Session ID Required

Unlike Dialogflow (complex session paths), RAG uses phone_number as the session identifier
Simpler and aligns with RAG's internal tracking

3. Notifications Are Contextual

When a notification is active, the integration layer passes both:
- The user's query (text)
- The notification context (notification.text and notification.parameters)
RAG uses this context to generate relevant responses

4. Minimal Parameter Passing

Only essential data is sent to RAG
The integration layer can store additional metadata internally without sending it to RAG
RAG can return parameters if needed (e.g., extracted entities)

5. Obfuscation Stays in Integration Layer

DLP obfuscation happens before calling RAG
RAG receives already-obfuscated text
This maintains the existing security boundary

Non-Functional Requirements

Performance

Target Response Time: < 2 seconds (p95)
Timeout: 30 seconds (configurable in client)

Reliability

Availability: 99.5%+
Retry Strategy: Client will retry on 500, 503, 504 errors (exponential backoff)

Scalability

Concurrent Requests: Support 100+ concurrent requests
Rate Limiting: None (or specify if needed)

Migration Notes

What the Integration Layer Will Do:

✅ Continue to obfuscate text via DLP before calling RAG ✅ Continue to store conversation history in Memorystore + Firestore (redundant but safe) ✅ Continue to manage session timeouts (30 minutes) ✅ Continue to handle notification storage and retrieval ✅ Map DetectIntentRequestDTO → RAG request format ✅ Map RAG response → DetectIntentResponseDTO

What the RAG Server Will Do:

✅ Maintain its own conversation history by phone_number ✅ Use notification context when provided to generate relevant responses ✅ Generate responses using RAG (retrieval + generation) ✅ Return structured responses with optional parameters

What We're NOT Changing:

❌ External API contracts (controllers remain unchanged) ❌ DTO structures (DetectIntentRequestDTO, DetectIntentResponseDTO) ❌ Conversation storage logic (Memorystore + Firestore) ❌ DLP obfuscation flow ❌ Session management (30-minute timeout) ❌ Notification storage

Questions for RAG Team

Before implementation:

Endpoint URL: What is the actual RAG server URL?
Authentication: Do we need API key authentication? If yes, what's the header format?
Timeout: What's a reasonable timeout? (We're using 30s as default)
Rate Limiting: Any rate limits we should be aware of?
Conversation History: Does RAG need explicit conversation history, or does it fetch by phone_number internally?
Response Parameters: Will RAG return any extracted parameters, or just response_text?
Health Check: Is there a /health endpoint for monitoring?
Versioning: Should we use /api/v1/query or a different version?

Changelog

Version	Date	Changes
1.0	2025-02-22	Initial specification based on 3 core requirements

7.5 KiB Raw Blame History