forked from innovacion/searchbox
Modify mcp
This commit is contained in:
272
AGENTS.md
Normal file
272
AGENTS.md
Normal file
@@ -0,0 +1,272 @@
|
||||
# AGENTS.md - Guide for Coding Agents
|
||||
|
||||
This document provides essential information for coding agents working on the Searchbox MCP (Model Context Protocol) project. It serves as a quick reference to understand the codebase architecture, testing setup, and development workflows.
|
||||
|
||||
## 🏗️ Project Overview
|
||||
|
||||
**Searchbox** is an MCP server that provides vector search capabilities using Qdrant as the backend. It allows AI assistants to perform semantic search operations on document collections through a standardized MCP interface.
|
||||
|
||||
### Key Components:
|
||||
- **MCP Server**: FastMCP-based server exposing semantic search tools
|
||||
- **Vector Engine**: Pluggable backend system (currently supports Qdrant)
|
||||
- **Client Interface**: High-level API for vector operations
|
||||
- **Models**: Pydantic data structures for search operations
|
||||
|
||||
## 📁 Project Structure
|
||||
|
||||
```
|
||||
qdrant-mcp/
|
||||
├── src/searchbox/ # Main package (renamed from vector_search_mcp)
|
||||
│ ├── __init__.py # Package initialization
|
||||
│ ├── client.py # High-level client interface
|
||||
│ ├── config.py # Configuration management
|
||||
│ ├── models.py # Pydantic models and data structures
|
||||
│ ├── engine/ # Vector database backends
|
||||
│ │ ├── __init__.py # Engine factory and Backend enum
|
||||
│ │ ├── base_engine.py # Abstract base engine with generics
|
||||
│ │ └── qdrant_engine.py # Qdrant implementation
|
||||
│ └── mcp_server/ # MCP protocol implementation
|
||||
│ ├── __init__.py # Server entry point
|
||||
│ └── server.py # FastMCP server setup
|
||||
├── tests/ # Test suite (97.22% coverage)
|
||||
│ ├── test_client/ # Client API tests
|
||||
│ ├── test_engine/ # Engine implementation tests
|
||||
│ └── test_mcp/ # MCP server integration tests
|
||||
├── pyproject.toml # Project configuration
|
||||
└── README.md # Project documentation
|
||||
```
|
||||
|
||||
## 🔧 Development Environment
|
||||
|
||||
### Prerequisites
|
||||
- Python 3.13+
|
||||
- UV package manager
|
||||
- Qdrant (for integration tests)
|
||||
|
||||
### Setup Commands
|
||||
```bash
|
||||
# Install dependencies
|
||||
uv sync --all-extras --dev
|
||||
|
||||
# Run tests with coverage
|
||||
uv run pytest --cov --cov-report=term-missing
|
||||
|
||||
# Run linting
|
||||
uv run ruff check
|
||||
|
||||
# Start MCP server
|
||||
uv run searchbox-mcp
|
||||
```
|
||||
|
||||
### Package Management
|
||||
- **Main deps**: `qdrant-client`, `vault-settings`
|
||||
- **Optional**: `fastmcp` (MCP functionality)
|
||||
- **Dev deps**: `pytest`, `pytest-cov`, `pytest-asyncio`, `ruff`, `fastembed`
|
||||
|
||||
## 🧪 Testing Framework
|
||||
|
||||
### Test Structure (91 tests, 97.22% coverage)
|
||||
```
|
||||
tests/
|
||||
├── test_client/ # 14 tests - Client API functionality
|
||||
├── test_engine/ # 68 tests - Engine implementations & factory
|
||||
└── test_mcp/ # 9 tests - MCP server integration
|
||||
```
|
||||
|
||||
### Key Test Commands
|
||||
```bash
|
||||
# Run all tests
|
||||
uv run pytest
|
||||
|
||||
# Run with coverage
|
||||
uv run pytest --cov --cov-report=html
|
||||
|
||||
# Run specific test module
|
||||
uv run pytest tests/test_client/
|
||||
|
||||
# Run integration tests (starts real MCP server)
|
||||
uv run pytest tests/test_mcp/
|
||||
```
|
||||
|
||||
### Test Patterns
|
||||
- **Unit tests**: Mock external dependencies (Qdrant client, settings)
|
||||
- **Integration tests**: Use real MCP server with in-memory Qdrant
|
||||
- **Fixtures**: Located in `conftest.py` files for each test package
|
||||
- **Async testing**: Uses `pytest-asyncio` with `asyncio_mode = "auto"`
|
||||
|
||||
## 🏛️ Architecture Patterns
|
||||
|
||||
### 1. Generic Engine System
|
||||
```python
|
||||
# Base engine uses generics for type safety
|
||||
class BaseEngine[ResponseType, ConditionType, ChunkType](ABC):
|
||||
# Abstract methods that subclasses must implement
|
||||
@abstractmethod
|
||||
def transform_conditions(self, conditions: list[Condition] | None) -> ConditionType | None: ...
|
||||
|
||||
# Template method pattern for semantic search
|
||||
async def semantic_search(self, embedding, collection, limit, conditions, threshold):
|
||||
# Orchestrates: transform -> query -> transform
|
||||
```
|
||||
|
||||
### 2. Factory Pattern with Caching
|
||||
```python
|
||||
@cache # functools.cache for singleton behavior
|
||||
def get_engine(backend: Backend):
|
||||
if backend == Backend.QDRANT:
|
||||
return QdrantEngine()
|
||||
elif backend == Backend.COSMOS:
|
||||
raise NotImplementedError("Cosmos engine not implemented yet")
|
||||
```
|
||||
|
||||
### 3. Client Facade Pattern
|
||||
```python
|
||||
@final
|
||||
class Client:
|
||||
"""High-level interface abstracting engine complexity"""
|
||||
def __init__(self, backend: Backend, collection: str):
|
||||
self.engine = get_engine(backend) # Factory usage
|
||||
self.collection = collection
|
||||
```
|
||||
|
||||
## 📊 Data Models
|
||||
|
||||
### Core Models (in `models.py`)
|
||||
```python
|
||||
# Search conditions
|
||||
class Match(BaseModel): # Exact match filter
|
||||
class MatchAny(BaseModel): # Any-of match filter
|
||||
class MatchExclude(BaseModel): # Exclusion filter
|
||||
|
||||
# Document data
|
||||
class ChunkData(BaseModel): # Document content + metadata
|
||||
class Chunk(BaseModel): # Vector + ChunkData
|
||||
class SearchRow(BaseModel): # Search result format
|
||||
```
|
||||
|
||||
### Backend-Specific Models
|
||||
- **Qdrant**: Uses `qdrant_client.models` (PointStruct, Filter, etc.)
|
||||
- **Generic**: Abstractions in `base_engine.py`
|
||||
|
||||
## 🔌 MCP Integration
|
||||
|
||||
### Server Setup (`mcp_server/server.py`)
|
||||
```python
|
||||
from fastmcp import FastMCP
|
||||
from ..engine import Backend, get_engine
|
||||
|
||||
mcp = FastMCP("Vector Search MCP")
|
||||
engine = get_engine(Backend.QDRANT)
|
||||
|
||||
# Auto-register semantic_search as MCP tool
|
||||
_ = mcp.tool(engine.semantic_search)
|
||||
```
|
||||
|
||||
### Entry Point (`mcp_server/__init__.py`)
|
||||
```python
|
||||
def run(transport: Transport = "sse"):
|
||||
"""Start MCP server with specified transport"""
|
||||
mcp.run(transport=transport)
|
||||
```
|
||||
|
||||
## 🎯 Common Development Tasks
|
||||
|
||||
### Adding a New Vector Backend
|
||||
1. **Create engine**: Inherit from `BaseEngine[ResponseType, ConditionType, ChunkType]`
|
||||
2. **Implement abstracts**: `transform_conditions`, `transform_response`, `run_similarity_query`, etc.
|
||||
3. **Update factory**: Add new backend to `Backend` enum and `get_engine()`
|
||||
4. **Add tests**: Create test file following existing patterns
|
||||
5. **Update docs**: Add configuration examples
|
||||
|
||||
### Modifying Search Functionality
|
||||
- **Models**: Extend condition types in `models.py`
|
||||
- **Base Engine**: Update abstract methods if needed
|
||||
- **Implementations**: Update concrete engines (`qdrant_engine.py`)
|
||||
- **Tests**: Add test cases for new functionality
|
||||
|
||||
### Extending MCP Tools
|
||||
- **Server**: Add new tools in `mcp_server/server.py`
|
||||
- **Client**: Extend `Client` class with new methods
|
||||
- **Integration**: Add MCP integration tests
|
||||
|
||||
## 🐛 Debugging & Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
1. **Import errors**: Check if module was renamed from `vector_search_mcp` to `searchbox`
|
||||
2. **Test failures**: Ensure MCP server uses correct script name `searchbox-mcp`
|
||||
3. **Coverage gaps**: Focus on branch coverage, not just line coverage
|
||||
4. **Async issues**: Use `pytest.mark.asyncio` for async tests
|
||||
|
||||
### Debugging Tools
|
||||
```bash
|
||||
# Verbose test output
|
||||
uv run pytest -v --tb=long
|
||||
|
||||
# Coverage with missing lines
|
||||
uv run pytest --cov --cov-report=term-missing
|
||||
|
||||
# Run single test
|
||||
uv run pytest tests/path/to/test.py::TestClass::test_method -v
|
||||
```
|
||||
|
||||
### MCP Server Testing
|
||||
- Integration tests start real server on `http://localhost:8000/sse`
|
||||
- Server startup takes ~5 seconds (configured in test fixtures)
|
||||
- Uses in-memory Qdrant with pre-seeded test data
|
||||
|
||||
## 📋 Code Quality Standards
|
||||
|
||||
### Linting Configuration (Ruff)
|
||||
```toml
|
||||
[tool.ruff.lint]
|
||||
extend-select = ["I", "D", "ERA", "UP", "FURB", "TRY", "PERF"]
|
||||
ignore = ["D203", "D213"]
|
||||
```
|
||||
- **I**: Import sorting
|
||||
- **D**: Docstring conventions (Google style)
|
||||
- **ERA**: Remove commented code
|
||||
- **UP**: Python upgrade syntax
|
||||
- **FURB**: Refurb suggestions
|
||||
- **TRY**: Exception handling
|
||||
- **PERF**: Performance
|
||||
|
||||
### Coverage Standards
|
||||
- **Target**: >95% coverage (currently 97.22%)
|
||||
- **Branch coverage**: Enabled via `tool.coverage.run.branch = true`
|
||||
- **Exclusions**: Abstract methods, defensive code patterns
|
||||
- **HTML reports**: Generated in `htmlcov/` directory
|
||||
|
||||
### Documentation Requirements
|
||||
- **Docstrings**: All public classes/methods (Google style)
|
||||
- **Type hints**: Full typing for all functions
|
||||
- **Examples**: Include usage examples in docstrings
|
||||
- **Generics**: Document type parameters clearly
|
||||
|
||||
## 🚀 Deployment & Distribution
|
||||
|
||||
### Package Configuration
|
||||
- **Name**: `searchbox` (internal), distributed as `vector-search-mcp`
|
||||
- **Entry point**: `searchbox-mcp` command
|
||||
- **Build system**: `uv_build`
|
||||
- **Python requirement**: `>=3.13`
|
||||
|
||||
### Dependencies
|
||||
- **Core**: Minimal (qdrant-client, vault-settings)
|
||||
- **MCP**: Optional extras (`[mcp]`)
|
||||
- **Dev**: Comprehensive testing and linting tools
|
||||
|
||||
## 🔗 Key Files to Understand
|
||||
|
||||
### Must-read files for new contributors:
|
||||
1. **`src/searchbox/engine/base_engine.py`** - Core abstractions and patterns
|
||||
2. **`src/searchbox/client.py`** - Main user-facing API
|
||||
3. **`src/searchbox/models.py`** - Data structures and types
|
||||
4. **`tests/conftest.py` files** - Test configuration and fixtures
|
||||
5. **`pyproject.toml`** - Project configuration and dependencies
|
||||
|
||||
### Reference implementations:
|
||||
- **`src/searchbox/engine/qdrant_engine.py`** - Complete backend implementation
|
||||
- **`tests/test_client/test_client.py`** - Comprehensive testing patterns
|
||||
- **`src/searchbox/mcp_server/server.py`** - MCP integration example
|
||||
|
||||
This guide should provide sufficient context for any coding agent to quickly understand the project structure, make meaningful contributions, and maintain the high code quality standards established in this codebase.
|
||||
Reference in New Issue
Block a user