The fastmcp server code is now an optional dependency that can be installed with the "mcp" extra. Core vector search functionality is available without the MCP server dependency.
Vector Search MCP - Documentation
A comprehensive Model Context Protocol (MCP) server for vector similarity search operations with pluggable backend support.
📋 Table of Contents
🔍 Overview
This package provides a production-ready MCP server that enables semantic search capabilities through a unified interface. It supports multiple vector database backends while maintaining type safety and comprehensive test coverage.
Key Features
- 🔌 Pluggable Backends: Abstract engine interface for easy backend integration
- 🛡️ Type Safety: Full generic typing with Rust-like associated types pattern
- ⚡ Performance: Caching and async/await throughout
- 🧪 Well Tested: 62+ tests with 100% critical path coverage
- 📚 Comprehensive Docs: Detailed docstrings and examples
Supported Backends
- Qdrant ✅ Fully implemented with async client
- Cosmos DB 🚧 Planned (interface ready)
🏗️ Architecture
Core Components
graph TB
A[MCP Server] --> B[BaseEngine Abstract Class]
B --> C[QdrantEngine]
B --> D[CosmosEngine - Future]
C --> E[Qdrant AsyncClient]
F[Factory with Overloads] --> B
G[Generic Type System] --> B
Design Patterns
1. Abstract Factory with Overloaded Types
# Type checker knows exact return type for literals
engine = get_engine(Backend.QDRANT) # Returns: QdrantEngine
# Generic typing for variables
backend: Backend = some_variable
engine = get_engine(backend) # Returns: BaseEngine
2. Generic Interface (Rust-like Associated Types)
class BaseEngine(ABC, Generic[ResponseType, ConditionType]):
# ResponseType: Backend-specific raw response (e.g., list[ScoredPoint])
# ConditionType: Backend-specific filter type (e.g., models.Filter)
class QdrantEngine(BaseEngine[list[models.ScoredPoint], models.Filter]):
# Concrete implementation with Qdrant types
3. Template Method Pattern
async def semantic_search(self, ...):
"""Public interface orchestrates the workflow"""
conditions = self.transform_conditions(...) # Abstract
response = await self.run_similarity_query(...) # Abstract
return self.transform_response(response) # Abstract
📖 API Documentation
Main Entry Points
run(transport: Transport = "sse")
Starts the MCP server with specified transport protocol.
Parameters:
transport: Either"sse"(Server-Sent Events) or"stdio"
Example:
from vector_search_mcp import run
run("sse") # Start server
get_engine(backend: Backend) -> BaseEngine
Factory function creating cached engine instances.
Parameters:
backend: Backend enum value (Backend.QDRANT, Backend.COSMOS)
Returns:
- Typed engine instance (QdrantEngine for QDRANT)
Example:
from vector_search_mcp.engine import get_engine, Backend
engine = get_engine(Backend.QDRANT)
results = await engine.semantic_search(
embedding=[0.1, 0.2, 0.3],
collection="documents",
limit=10
)
Core Classes
BaseEngine[ResponseType, ConditionType]
Abstract base class defining the engine interface.
Generic Parameters:
ResponseType: Backend's native response formatConditionType: Backend's native filter format
Key Methods:
semantic_search(): Main public interfacetransform_conditions(): Convert generic to backend conditionstransform_response(): Convert backend to generic resultsrun_similarity_query(): Execute backend-specific search
QdrantEngine(BaseEngine[list[ScoredPoint], Filter])
Concrete Qdrant implementation.
Features:
- Async Qdrant client with connection pooling
- Automatic payload filtering (excludes null payloads)
- Support for Match, MatchAny, MatchExclude conditions
- Named vector support
Data Models
SearchRow
Standardized search result format.
SearchRow(
chunk_id="doc_123", # Document identifier
score=0.95, # Similarity score (0.0-1.0)
payload={"text": "...", ...} # Metadata dictionary
)
Condition Types
Match - Exact field matching
Match(key="category", value="technology")
MatchAny - Match any of provided values
MatchAny(key="tags", any=["python", "rust", "go"])
MatchExclude - Exclude specified values
MatchExclude(key="status", exclude=["draft", "deleted"])
🛡️ Type Safety
Generic Type System
The package uses a sophisticated generic type system that provides compile-time type safety while maintaining flexibility:
# Engine implementations specify their exact types
class QdrantEngine(BaseEngine[list[models.ScoredPoint], models.Filter]):
def transform_response(self, response: list[models.ScoredPoint]) -> list[SearchRow]:
# Type checker validates response parameter type
async def run_similarity_query(...) -> list[models.ScoredPoint]:
# Type checker validates return type matches generic parameter
Factory Type Overloads
@overload
def get_engine(backend: Literal[Backend.QDRANT]) -> QdrantEngine: ...
@overload
def get_engine(backend: Backend) -> BaseEngine: ...
# Usage provides different type information:
engine1 = get_engine(Backend.QDRANT) # Type: QdrantEngine
engine2 = get_engine(some_variable) # Type: BaseEngine
🧪 Testing
Test Coverage
- 62 Tests Total across 4 test modules
- 100% Critical Path Coverage for search workflows
- Integration Testing with full mock environments
- Type Safety Validation with runtime checks
Test Structure
tests/test_engine/
├── test_base_engine.py # Abstract interface tests (12 tests)
├── test_qdrant_engine.py # Qdrant implementation (20 tests)
├── test_factory.py # Factory and typing tests (17 tests)
├── test_integration.py # End-to-end workflows (13 tests)
├── conftest.py # Shared fixtures and mocks
└── README.md # Testing documentation
Running Tests
# Run all engine tests
uv run pytest tests/test_engine/ -v
# Run with coverage
uv run pytest tests/test_engine/ --cov=src/vector_search_mcp/engine --cov-report=html
# Run specific test categories
uv run pytest tests/test_engine/test_integration.py -v
Key Testing Features
- Cache Management: Auto-clearing fixtures prevent test interference
- Mock Isolation: Comprehensive mocking prevents real network calls
- Async Testing: Full async/await support with proper event loops
- Type Validation: Runtime checks for generic type correctness
🛠️ Development
Prerequisites
# Install with uv
uv install
# Or with pip
pip install -e .
Code Quality
The package maintains high code quality standards:
# Linting and formatting
uv run ruff check # Check for issues
uv run ruff check --fix # Auto-fix issues
uv run ruff format # Format code
# Type checking
uv run mypy src/
# Run tests
uv run pytest
Adding New Backends
- Define Types: Determine ResponseType and ConditionType for your backend
- Implement Engine: Create class extending
BaseEngine[ResponseType, ConditionType] - Add to Factory: Update
Backendenum andget_engine()function - Write Tests: Follow existing test patterns
- Update Documentation: Add examples and API docs
Example template:
class MyEngine(BaseEngine[MyResponseType, MyConditionType]):
def transform_conditions(self, conditions: list[Condition] | None) -> MyConditionType | None:
# Convert generic conditions to backend format
def transform_response(self, response: MyResponseType) -> list[SearchRow]:
# Convert backend response to SearchRow objects
async def run_similarity_query(...) -> MyResponseType:
# Execute backend-specific search
💡 Examples
Basic Usage
from vector_search_mcp.engine import get_engine, Backend
from vector_search_mcp.models import Match, MatchAny
# Create engine
engine = get_engine(Backend.QDRANT)
# Simple search
results = await engine.semantic_search(
embedding=[0.1, 0.2, 0.3, 0.4, 0.5],
collection="documents",
limit=10
)
for result in results:
print(f"Score: {result.score:.3f} - {result.payload['text'][:50]}...")
Advanced Filtering
# Complex conditions
conditions = [
Match(key="category", value="technology"),
MatchAny(key="language", any=["python", "rust", "go"]),
MatchExclude(key="status", exclude=["draft", "archived"])
]
results = await engine.semantic_search(
embedding=query_vector,
collection="tech_docs",
limit=20,
conditions=conditions,
threshold=0.75 # Minimum similarity score
)
Custom Backend Implementation
from vector_search_mcp.engine.base_engine import BaseEngine
from vector_search_mcp.models import SearchRow, Condition
class CustomEngine(BaseEngine[dict, str]):
"""Example custom backend implementation."""
def transform_conditions(self, conditions: list[Condition] | None) -> str | None:
if not conditions:
return None
# Convert to custom query string format
return " AND ".join([f"{c.key}:{c.value}" for c in conditions])
def transform_response(self, response: dict) -> list[SearchRow]:
# Convert custom response to SearchRow objects
return [
SearchRow(
chunk_id=str(item['id']),
score=item['similarity'],
payload=item['metadata']
)
for item in response.get('results', [])
]
async def run_similarity_query(self, embedding, collection, limit=10,
conditions=None, threshold=None) -> dict:
# Custom backend API call
return await self.custom_client.search(
vector=embedding,
index=collection,
limit=limit,
filter=conditions,
min_score=threshold
)
MCP Server Integration
# Start the MCP server
from vector_search_mcp import run
# With Server-Sent Events (web-based clients)
run("sse")
# With stdio (terminal/CLI clients)
run("stdio")
📚 Additional Resources
- Source Code: Fully documented with comprehensive docstrings
- Test Suite: Located in
tests/test_engine/with detailed README - Type Definitions: All public APIs have complete type annotations
- Examples: See
examples/directory (if available) for more use cases
This documentation covers the current state of the Vector Search MCP package. The architecture is designed for extensibility, type safety, and production use.