5.1 KiB
Refactoring Summary
High-ROI Refactorings Completed
1. Eliminated Code Duplication - Session Management ✅
Problem: The _get_aio_session() method was duplicated identically in both GoogleCloudFileStorage and GoogleCloudVectorSearch classes.
Solution:
- Created a new
BaseGoogleCloudClientbase class that encapsulates shared session management logic - Both
GoogleCloudFileStorageandGoogleCloudVectorSearchnow inherit from this base class - Added a
close()method to properly clean up resources
Files Changed:
src/knowledge_search_mcp/main.py:25-80- Added base classsrc/knowledge_search_mcp/main.py:83- GoogleCloudFileStorage inherits from basesrc/knowledge_search_mcp/main.py:219- GoogleCloudVectorSearch inherits from base
Impact: Reduced ~24 lines of duplicated code, improved maintainability
2. Fixed Resource Cleanup ✅
Problem: aiohttp sessions were never explicitly closed, leading to potential resource leaks and warnings.
Solution:
- Added
close()method toBaseGoogleCloudClientto properly close aiohttp sessions - Extended
close()inGoogleCloudVectorSearchto also close the storage client's session - Modified
lifespan()function's finally block to callvs.close()on shutdown
Files Changed:
src/knowledge_search_mcp/main.py:74-78- Base close methodsrc/knowledge_search_mcp/main.py:228-231- VectorSearch close overridesrc/knowledge_search_mcp/main.py:699-707- Cleanup in lifespan finally block
Impact: Prevents resource leaks, eliminates aiohttp warnings on shutdown
3. Implemented LRU Cache with Size Limits ✅
Problem: The _cache dictionary in GoogleCloudFileStorage grew indefinitely, potentially causing memory issues with large document sets.
Solution:
- Created a new
LRUCacheclass with configurable max size (default: 100 items) - Automatically evicts least recently used items when cache is full
- Maintains insertion order and tracks access patterns
Files Changed:
src/knowledge_search_mcp/main.py:28-58- New LRUCache classsrc/knowledge_search_mcp/main.py:85-87- Updated GoogleCloudFileStorage to use LRUCachesrc/knowledge_search_mcp/main.py:115-122- Updated cache access patternssrc/knowledge_search_mcp/main.py:147-148- Updated cache write patternstests/test_search.py- Updated tests to work with LRUCache interface
Impact: Bounded memory usage, prevents cache from growing indefinitely
4. Broke Down Large Functions ✅
a. Extracted Validation Functions from lifespan()
Problem: The lifespan() function was 225 lines with repetitive validation logic.
Solution: Extracted three helper functions:
_validate_genai_access()- Validates GenAI embedding API access_validate_gcs_access()- Validates GCS bucket access_validate_vector_search_access()- Validates vector search endpoint access
Files Changed:
src/knowledge_search_mcp/main.py:424-587- New validation functionssrc/knowledge_search_mcp/main.py:644-693- Simplified lifespan function
Impact: Reduced lifespan() from 225 to ~65 lines, improved readability and testability
b. Extracted Helper Functions from knowledge_search()
Problem: The knowledge_search() function was 149 lines mixing multiple concerns.
Solution: Extracted three helper functions:
_generate_query_embedding()- Handles embedding generation with error handling_filter_search_results()- Applies similarity thresholds and filtering_format_search_results()- Formats results as XML-like documents
Files Changed:
src/knowledge_search_mcp/main.py:717-766- _generate_query_embeddingsrc/knowledge_search_mcp/main.py:769-793- _filter_search_resultssrc/knowledge_search_mcp/main.py:796-810- _format_search_resultssrc/knowledge_search_mcp/main.py:814-876- Simplified knowledge_search function
Impact: Reduced knowledge_search() from 149 to ~63 lines, improved testability, added input validation for empty queries
Additional Improvements
Input Validation
- Added validation for empty/whitespace-only queries in
_generate_query_embedding()
Code Organization
- Moved
import timefrom inline to module-level imports
Test Updates
- Updated all tests to work with the new LRUCache interface
- All 11 tests passing
Metrics
| Metric | Before | After | Change |
|---|---|---|---|
| Total lines (main.py) | 809 | 876 | +67 (more modular code) |
| Longest function | 225 lines | 65 lines | -71% |
| Code duplication instances | 2 major | 0 | -100% |
| Resource leaks | Yes | No | Fixed |
| Cache memory bound | No | Yes (100 items) | Fixed |
| Test coverage | 11 tests | 11 tests | Maintained |
What's Left for Future Work
Medium Priority (Not Done)
- Move magic numbers to Settings configuration
- Update outdated DockerfileConnector
- Review and adjust logging levels
- Add dependency injection for tighter coupling issues
Lower Priority (Not Done)
- Add integration tests for end-to-end flows
- Add performance tests
- Introduce abstraction layers for cloud services
- Standardize on f-strings (one %-format remaining)