int-layer/docs/rag-api-specification.md

# RAG API Specification

## Overview
This document defines the API contract between the integration layer (`capa-de-integracion`) and the RAG server.

The RAG server replaces Dialogflow CX for intent detection and response generation using Retrieval-Augmented Generation.

## Base URL
```
https://your-rag-server.com/api/v1
```

## Authentication
- Method: API Key (optional)
- Header: `X-API-Key: <your-api-key>`

---

## Endpoint: Query

### **POST /query**

Process a user message or notification and return a generated response.

### Request

**Headers:**
- `Content-Type: application/json`
- `X-API-Key: <api-key>` (optional)

**Body:**
```json
{
  "phone_number": "string (required)",
  "text": "string (required - obfuscated user input or notification text)",
  "type": "string (optional: 'conversation' or 'notification')",
  "notification": {
    "text": "string (optional - original notification text)",
    "parameters": {
      "key": "value"
    }
  },
  "language_code": "string (optional, default: 'es')"
}
```

**Field Descriptions:**

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `phone_number` | string | ✅ Yes | User's phone number (used by RAG for internal conversation history tracking) |
| `text` | string | ✅ Yes | Obfuscated user input (already processed by DLP in integration layer) |
| `type` | string | ❌ No | Request type: `"conversation"` (default) or `"notification"` |
| `notification` | object | ❌ No | Present only when processing a notification-related query |
| `notification.text` | string | ❌ No | Original notification text (obfuscated) |
| `notification.parameters` | object | ❌ No | Key-value pairs of notification metadata |
| `language_code` | string | ❌ No | Language code (e.g., `"es"`, `"en"`). Defaults to `"es"` |

### Response

**Status Code:** `200 OK`

**Body:**
```json
{
  "response_id": "string (unique identifier for this response)",
  "response_text": "string (generated response)",
  "parameters": {
    "key": "value"
  },
  "confidence": 0.95
}
```

**Field Descriptions:**

| Field | Type | Description |
|-------|------|-------------|
| `response_id` | string | Unique identifier for this RAG response (for tracking/logging) |
| `response_text` | string | The generated response text to send back to the user |
| `parameters` | object | Optional key-value pairs extracted or computed by RAG (can be empty) |
| `confidence` | number | Optional confidence score (0.0 - 1.0) |

---

## Error Responses

### **400 Bad Request**
Invalid request format or missing required fields.

```json
{
  "error": "Bad Request",
  "message": "Missing required field: phone_number",
  "status": 400
}
```

### **500 Internal Server Error**
RAG server encountered an error processing the request.

```json
{
  "error": "Internal Server Error",
  "message": "Failed to generate response",
  "status": 500
}
```

### **503 Service Unavailable**
RAG server is temporarily unavailable (triggers retry in client).

```json
{
  "error": "Service Unavailable",
  "message": "RAG service is currently unavailable",
  "status": 503
}
```

---

## Example Requests

### Example 1: Regular Conversation
```json
POST /api/v1/query
{
  "phone_number": "573001234567",
  "text": "¿Cuál es el estado de mi solicitud?",
  "type": "conversation",
  "language_code": "es"
}
```

**Response:**
```json
{
  "response_id": "rag-resp-12345-67890",
  "response_text": "Tu solicitud está en proceso de revisión. Te notificaremos cuando esté lista.",
  "parameters": {},
  "confidence": 0.92
}
```

### Example 2: Notification Flow
```json
POST /api/v1/query
{
  "phone_number": "573001234567",
  "text": "necesito más información",
  "type": "notification",
  "notification": {
    "text": "Tu documento ha sido aprobado. Descárgalo desde el portal.",
    "parameters": {
      "document_id": "DOC-2025-001",
      "status": "approved"
    }
  },
  "language_code": "es"
}
```

**Response:**
```json
{
  "response_id": "rag-resp-12345-67891",
  "response_text": "Puedes descargar tu documento aprobado ingresando al portal con tu número de documento DOC-2025-001.",
  "parameters": {
    "document_id": "DOC-2025-001"
  },
  "confidence": 0.88
}
```

---

## Design Decisions

### 1. **RAG Handles Conversation History Internally**
- The RAG server maintains its own conversation history indexed by `phone_number`
- The integration layer will continue to store conversation history (redundant for now)
- This allows gradual migration without risk

### 2. **No Session ID Required**
- Unlike Dialogflow (complex session paths), RAG uses `phone_number` as the session identifier
- Simpler and aligns with RAG's internal tracking

### 3. **Notifications Are Contextual**
- When a notification is active, the integration layer passes both:
  - The user's query (`text`)
  - The notification context (`notification.text` and `notification.parameters`)
- RAG uses this context to generate relevant responses

### 4. **Minimal Parameter Passing**
- Only essential data is sent to RAG
- The integration layer can store additional metadata internally without sending it to RAG
- RAG can return parameters if needed (e.g., extracted entities)

### 5. **Obfuscation Stays in Integration Layer**
- DLP obfuscation happens before calling RAG
- RAG receives already-obfuscated text
- This maintains the existing security boundary

---

## Non-Functional Requirements

### Performance
- **Target Response Time:** < 2 seconds (p95)
- **Timeout:** 30 seconds (configurable in client)

### Reliability
- **Availability:** 99.5%+
- **Retry Strategy:** Client will retry on 500, 503, 504 errors (exponential backoff)

### Scalability
- **Concurrent Requests:** Support 100+ concurrent requests
- **Rate Limiting:** None (or specify if needed)

---

## Migration Notes

### What the Integration Layer Will Do:
✅ Continue to obfuscate text via DLP before calling RAG
✅ Continue to store conversation history in Memorystore + Firestore (redundant but safe)
✅ Continue to manage session timeouts (30 minutes)
✅ Continue to handle notification storage and retrieval
✅ Map `DetectIntentRequestDTO` → RAG request format
✅ Map RAG response → `DetectIntentResponseDTO`

### What the RAG Server Will Do:
✅ Maintain its own conversation history by `phone_number`
✅ Use notification context when provided to generate relevant responses
✅ Generate responses using RAG (retrieval + generation)
✅ Return structured responses with optional parameters

### What We're NOT Changing:
❌ External API contracts (controllers remain unchanged)
❌ DTO structures (`DetectIntentRequestDTO`, `DetectIntentResponseDTO`)
❌ Conversation storage logic (Memorystore + Firestore)
❌ DLP obfuscation flow
❌ Session management (30-minute timeout)
❌ Notification storage

---

## Questions for RAG Team

Before implementation:

1. **Endpoint URL:** What is the actual RAG server URL?
2. **Authentication:** Do we need API key authentication? If yes, what's the header format?
3. **Timeout:** What's a reasonable timeout? (We're using 30s as default)
4. **Rate Limiting:** Any rate limits we should be aware of?
5. **Conversation History:** Does RAG need explicit conversation history, or does it fetch by phone_number internally?
6. **Response Parameters:** Will RAG return any extracted parameters, or just `response_text`?
7. **Health Check:** Is there a `/health` endpoint for monitoring?
8. **Versioning:** Should we use `/api/v1/query` or a different version?

---

## Changelog

| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-02-22 | Initial specification based on 3 core requirements |