Add docstrings

2025-09-26 15:45:13 +00:00
parent 17fcd3596b
commit b44a209d42
10 changed files with 942 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,390 @@
+# Vector Search MCP - Documentation
+
+A comprehensive Model Context Protocol (MCP) server for vector similarity search operations with pluggable backend support.
+
+## 📋 Table of Contents
+
+- [Overview](#overview)
+- [Architecture](#architecture)
+- [API Documentation](#api-documentation)
+- [Type Safety](#type-safety)
+- [Testing](#testing)
+- [Development](#development)
+- [Examples](#examples)
+
+## 🔍 Overview
+
+This package provides a production-ready MCP server that enables semantic search capabilities through a unified interface. It supports multiple vector database backends while maintaining type safety and comprehensive test coverage.
+
+### Key Features
+
+- **🔌 Pluggable Backends**: Abstract engine interface for easy backend integration
+- **🛡️ Type Safety**: Full generic typing with Rust-like associated types pattern
+- **⚡ Performance**: Caching and async/await throughout
+- **🧪 Well Tested**: 62+ tests with 100% critical path coverage
+- **📚 Comprehensive Docs**: Detailed docstrings and examples
+
+### Supported Backends
+
+- **Qdrant** ✅ Fully implemented with async client
+- **Cosmos DB** 🚧 Planned (interface ready)
+
+## 🏗️ Architecture
+
+### Core Components
+
+```mermaid
+graph TB
+    A[MCP Server] --> B[BaseEngine Abstract Class]
+    B --> C[QdrantEngine]
+    B --> D[CosmosEngine - Future]
+    C --> E[Qdrant AsyncClient]
+    F[Factory with Overloads] --> B
+    G[Generic Type System] --> B
+```
+
+### Design Patterns
+
+#### 1. **Abstract Factory with Overloaded Types**
+```python
+# Type checker knows exact return type for literals
+engine = get_engine(Backend.QDRANT)  # Returns: QdrantEngine
+
+# Generic typing for variables
+backend: Backend = some_variable
+engine = get_engine(backend)  # Returns: BaseEngine
+```
+
+#### 2. **Generic Interface (Rust-like Associated Types)**
+```python
+class BaseEngine(ABC, Generic[ResponseType, ConditionType]):
+    # ResponseType: Backend-specific raw response (e.g., list[ScoredPoint])
+    # ConditionType: Backend-specific filter type (e.g., models.Filter)
+
+class QdrantEngine(BaseEngine[list[models.ScoredPoint], models.Filter]):
+    # Concrete implementation with Qdrant types
+```
+
+#### 3. **Template Method Pattern**
+```python
+async def semantic_search(self, ...):
+    """Public interface orchestrates the workflow"""
+    conditions = self.transform_conditions(...)  # Abstract
+    response = await self.run_similarity_query(...)  # Abstract
+    return self.transform_response(response)  # Abstract
+```
+
+## 📖 API Documentation
+
+### Main Entry Points
+
+#### `run(transport: Transport = "sse")`
+Starts the MCP server with specified transport protocol.
+
+**Parameters:**
+- `transport`: Either `"sse"` (Server-Sent Events) or `"stdio"`
+
+**Example:**
+```python
+from vector_search_mcp import run
+run("sse")  # Start server
+```
+
+#### `get_engine(backend: Backend) -> BaseEngine`
+Factory function creating cached engine instances.
+
+**Parameters:**
+- `backend`: Backend enum value (Backend.QDRANT, Backend.COSMOS)
+
+**Returns:**
+- Typed engine instance (QdrantEngine for QDRANT)
+
+**Example:**
+```python
+from vector_search_mcp.engine import get_engine, Backend
+
+engine = get_engine(Backend.QDRANT)
+results = await engine.semantic_search(
+    embedding=[0.1, 0.2, 0.3],
+    collection="documents",
+    limit=10
+)
+```
+
+### Core Classes
+
+#### `BaseEngine[ResponseType, ConditionType]`
+Abstract base class defining the engine interface.
+
+**Generic Parameters:**
+- `ResponseType`: Backend's native response format
+- `ConditionType`: Backend's native filter format
+
+**Key Methods:**
+- `semantic_search()`: Main public interface
+- `transform_conditions()`: Convert generic to backend conditions
+- `transform_response()`: Convert backend to generic results
+- `run_similarity_query()`: Execute backend-specific search
+
+#### `QdrantEngine(BaseEngine[list[ScoredPoint], Filter])`
+Concrete Qdrant implementation.
+
+**Features:**
+- Async Qdrant client with connection pooling
+- Automatic payload filtering (excludes null payloads)
+- Support for Match, MatchAny, MatchExclude conditions
+- Named vector support
+
+### Data Models
+
+#### `SearchRow`
+Standardized search result format.
+
+```python
+SearchRow(
+    chunk_id="doc_123",           # Document identifier
+    score=0.95,                   # Similarity score (0.0-1.0)
+    payload={"text": "...", ...}  # Metadata dictionary
+)
+```
+
+#### Condition Types
+
+**`Match`** - Exact field matching
+```python
+Match(key="category", value="technology")
+```
+
+**`MatchAny`** - Match any of provided values
+```python
+MatchAny(key="tags", any=["python", "rust", "go"])
+```
+
+**`MatchExclude`** - Exclude specified values
+```python
+MatchExclude(key="status", exclude=["draft", "deleted"])
+```
+
+## 🛡️ Type Safety
+
+### Generic Type System
+
+The package uses a sophisticated generic type system that provides compile-time type safety while maintaining flexibility:
+
+```python
+# Engine implementations specify their exact types
+class QdrantEngine(BaseEngine[list[models.ScoredPoint], models.Filter]):
+    def transform_response(self, response: list[models.ScoredPoint]) -> list[SearchRow]:
+        # Type checker validates response parameter type
+
+    async def run_similarity_query(...) -> list[models.ScoredPoint]:
+        # Type checker validates return type matches generic parameter
+```
+
+### Factory Type Overloads
+
+```python
+@overload
+def get_engine(backend: Literal[Backend.QDRANT]) -> QdrantEngine: ...
+
+@overload
+def get_engine(backend: Backend) -> BaseEngine: ...
+
+# Usage provides different type information:
+engine1 = get_engine(Backend.QDRANT)      # Type: QdrantEngine
+engine2 = get_engine(some_variable)       # Type: BaseEngine
+```
+
+## 🧪 Testing
+
+### Test Coverage
+
+- **62 Tests Total** across 4 test modules
+- **100% Critical Path Coverage** for search workflows
+- **Integration Testing** with full mock environments
+- **Type Safety Validation** with runtime checks
+
+### Test Structure
+
+```
+tests/test_engine/
+├── test_base_engine.py      # Abstract interface tests (12 tests)
+├── test_qdrant_engine.py    # Qdrant implementation (20 tests)
+├── test_factory.py          # Factory and typing tests (17 tests)
+├── test_integration.py      # End-to-end workflows (13 tests)
+├── conftest.py              # Shared fixtures and mocks
+└── README.md                # Testing documentation
+```
+
+### Running Tests
+
+```bash
+# Run all engine tests
+uv run pytest tests/test_engine/ -v
+
+# Run with coverage
+uv run pytest tests/test_engine/ --cov=src/vector_search_mcp/engine --cov-report=html
+
+# Run specific test categories
+uv run pytest tests/test_engine/test_integration.py -v
+```
+
+### Key Testing Features
+
+- **Cache Management**: Auto-clearing fixtures prevent test interference
+- **Mock Isolation**: Comprehensive mocking prevents real network calls
+- **Async Testing**: Full async/await support with proper event loops
+- **Type Validation**: Runtime checks for generic type correctness
+
+## 🛠️ Development
+
+### Prerequisites
+
+```bash
+# Install with uv
+uv install
+
+# Or with pip
+pip install -e .
+```
+
+### Code Quality
+
+The package maintains high code quality standards:
+
+```bash
+# Linting and formatting
+uv run ruff check          # Check for issues
+uv run ruff check --fix    # Auto-fix issues
+uv run ruff format         # Format code
+
+# Type checking
+uv run mypy src/
+
+# Run tests
+uv run pytest
+```
+
+### Adding New Backends
+
+1. **Define Types**: Determine ResponseType and ConditionType for your backend
+2. **Implement Engine**: Create class extending `BaseEngine[ResponseType, ConditionType]`
+3. **Add to Factory**: Update `Backend` enum and `get_engine()` function
+4. **Write Tests**: Follow existing test patterns
+5. **Update Documentation**: Add examples and API docs
+
+Example template:
+```python
+class MyEngine(BaseEngine[MyResponseType, MyConditionType]):
+    def transform_conditions(self, conditions: list[Condition] | None) -> MyConditionType | None:
+        # Convert generic conditions to backend format
+
+    def transform_response(self, response: MyResponseType) -> list[SearchRow]:
+        # Convert backend response to SearchRow objects
+
+    async def run_similarity_query(...) -> MyResponseType:
+        # Execute backend-specific search
+```
+
+## 💡 Examples
+
+### Basic Usage
+
+```python
+from vector_search_mcp.engine import get_engine, Backend
+from vector_search_mcp.models import Match, MatchAny
+
+# Create engine
+engine = get_engine(Backend.QDRANT)
+
+# Simple search
+results = await engine.semantic_search(
+    embedding=[0.1, 0.2, 0.3, 0.4, 0.5],
+    collection="documents",
+    limit=10
+)
+
+for result in results:
+    print(f"Score: {result.score:.3f} - {result.payload['text'][:50]}...")
+```
+
+### Advanced Filtering
+
+```python
+# Complex conditions
+conditions = [
+    Match(key="category", value="technology"),
+    MatchAny(key="language", any=["python", "rust", "go"]),
+    MatchExclude(key="status", exclude=["draft", "archived"])
+]
+
+results = await engine.semantic_search(
+    embedding=query_vector,
+    collection="tech_docs",
+    limit=20,
+    conditions=conditions,
+    threshold=0.75  # Minimum similarity score
+)
+```
+
+### Custom Backend Implementation
+
+```python
+from vector_search_mcp.engine.base_engine import BaseEngine
+from vector_search_mcp.models import SearchRow, Condition
+
+class CustomEngine(BaseEngine[dict, str]):
+    """Example custom backend implementation."""
+
+    def transform_conditions(self, conditions: list[Condition] | None) -> str | None:
+        if not conditions:
+            return None
+        # Convert to custom query string format
+        return " AND ".join([f"{c.key}:{c.value}" for c in conditions])
+
+    def transform_response(self, response: dict) -> list[SearchRow]:
+        # Convert custom response to SearchRow objects
+        return [
+            SearchRow(
+                chunk_id=str(item['id']),
+                score=item['similarity'],
+                payload=item['metadata']
+            )
+            for item in response.get('results', [])
+        ]
+
+    async def run_similarity_query(self, embedding, collection, limit=10,
+                                 conditions=None, threshold=None) -> dict:
+        # Custom backend API call
+        return await self.custom_client.search(
+            vector=embedding,
+            index=collection,
+            limit=limit,
+            filter=conditions,
+            min_score=threshold
+        )
+```
+
+### MCP Server Integration
+
+```python
+# Start the MCP server
+from vector_search_mcp import run
+
+# With Server-Sent Events (web-based clients)
+run("sse")
+
+# With stdio (terminal/CLI clients)
+run("stdio")
+```
+
+---
+
+## 📚 Additional Resources
+
+- **Source Code**: Fully documented with comprehensive docstrings
+- **Test Suite**: Located in `tests/test_engine/` with detailed README
+- **Type Definitions**: All public APIs have complete type annotations
+- **Examples**: See `examples/` directory (if available) for more use cases
+
+This documentation covers the current state of the Vector Search MCP package. The architecture is designed for extensibility, type safety, and production use.