Add conversation management

This commit is contained in:
2026-02-28 22:10:55 +00:00
parent 4439567ccd
commit ae4c7ab489
8 changed files with 311 additions and 41 deletions

View File

@@ -54,6 +54,7 @@ Go LLM Gateway (unified API)
**Streaming support** (Server-Sent Events for all providers)
**OAuth2/OIDC authentication** (Google, Auth0, any OIDC provider)
**Terminal chat client** (Python with Rich UI, PEP 723)
**Conversation tracking** (previous_response_id for efficient context)
## Quick Start
@@ -186,8 +187,21 @@ You> /model claude
You> /models # List all available models
```
The chat client automatically uses `previous_response_id` to reduce token usage by only sending new messages instead of the full conversation history.
See **[CHAT_CLIENT.md](./CHAT_CLIENT.md)** for full documentation.
## Conversation Management
The gateway implements conversation tracking using `previous_response_id` from the Open Responses spec:
- 📉 **Reduced token usage** - Only send new messages
- ⚡ **Smaller requests** - Less bandwidth
- 🧠 **Server-side context** - Gateway maintains history
- ⏰ **Auto-expire** - Conversations expire after 1 hour
See **[CONVERSATIONS.md](./CONVERSATIONS.md)** for details.
## Authentication
The gateway supports OAuth2/OIDC authentication. See **[AUTH.md](./AUTH.md)** for setup instructions.
@@ -216,6 +230,8 @@ curl -X POST http://localhost:8080/v1/responses \
-~~Implement streaming responses~~
-~~Add OAuth2/OIDC authentication~~
-~~Implement conversation tracking with previous_response_id~~
- ⬜ Add structured logging, tracing, and request-level metrics
- ⬜ Support tool/function calling
- ⬜ Persistent conversation storage (Redis/database)
- ⬜ Expand configuration to support routing policies (cost, latency, failover)