Contributing to CrossVector
Thank you for your interest in contributing to CrossVector!
Getting Started
Prerequisites
- Python 3.11+
- Git
- uv (recommended for fast package management)
Development Setup
- Clone the repository:
- Install dependencies with uv:
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install project with all dependencies (dev + all backends/embeddings)
uv pip install -e ".[dev,all]"
# Or install specific extras
uv pip install -e ".[dev,pgvector,gemini]" # Just PgVector + Gemini
- Setup pre-commit hooks:
# Install pre-commit hooks
pre-commit install
# (Optional) Run on all files to test
pre-commit run --all-files
- Configure environment:
Development Workflow
Code Style
CrossVector follows PEP 8 and uses:
- Ruff for fast linting and formatting (replaces Black, isort, flake8)
- pre-commit for automated code quality checks
- mypy for type checking (optional, can be enabled in
.pre-commit-config.yaml)
Automatic formatting with pre-commit:
Pre-commit hooks will automatically run on every commit. To manually run:
# Run all hooks on staged files
pre-commit run
# Run all hooks on all files
pre-commit run --all-files
# Run specific hook
pre-commit run ruff --all-files
Manual formatting and linting:
# Format code with ruff
ruff format src/ tests/ scripts/
# Lint and auto-fix issues
ruff check src/ tests/ scripts/ --fix
# Type checking (optional)
mypy src/
Type Hints
All code must include type hints:
from typing import List, Dict, Any, Optional
from crossvector import VectorDocument
def process_documents(
docs: List[VectorDocument],
filters: Optional[Dict[str, Any]] = None
) -> List[VectorDocument]:
"""Process documents with optional filters."""
pass
Testing
Running Tests
All tests:
Specific test file:
With coverage:
Integration tests with real backends:
# Run all integration tests
pytest scripts/tests/ -v
# Specific backend
pytest scripts/tests/test_pgvector.py -v
Benchmarking
Before submitting performance-related changes, run benchmarks to measure impact:
# Quick benchmark (10 docs)
python scripts/benchmark.py --num-docs 10
# Full benchmark (1000 docs) - before and after your changes
python scripts/benchmark.py --output benchmark_before.md
# ... make your changes ...
python scripts/benchmark.py --output benchmark_after.md
# Compare specific backend
python scripts/benchmark.py --backends pgvector --num-docs 100
The benchmark tool tests: - Bulk and individual create operations - Vector search performance - Metadata-only search - Query DSL operators (10 operators) - Update and delete operations
Results are saved as markdown reports for easy comparison. See Benchmarking Guide for details.
Writing Tests
Test structure:
import pytest
from crossvector import VectorEngine
from crossvector.dbs.pgvector import PgVectorAdapter
from crossvector.embeddings.gemini import GeminiEmbeddingAdapter
class TestVectorEngine:
@pytest.fixture
def engine(self):
"""Create test engine."""
return VectorEngine(
db=PgVectorAdapter(),
embedding=GeminiEmbeddingAdapter(),
collection_name="test_collection"
)
def test_create_document(self, engine):
"""Test document creation."""
doc = engine.create("Test content")
assert doc.id is not None
assert doc.text == "Test content"
assert len(doc.vector) == 1536
def test_search(self, engine):
"""Test vector search."""
engine.create("Python tutorial")
results = engine.search("python", limit=10)
assert len(results) > 0
Use fixtures:
@pytest.fixture(scope="module")
def test_data():
"""Create test data."""
return [
{"text": "Document 1", "metadata": {"category": "tech"}},
{"text": "Document 2", "metadata": {"category": "science"}},
]
def test_with_fixture(engine, test_data):
"""Test using fixture data."""
created = engine.bulk_create(test_data)
assert len(created) == 2
Test Coverage
Aim for >90% code coverage. Check coverage:
Adding Features
New Database Adapter
- Create adapter class:
# src/crossvector/dbs/newdb.py
from crossvector.abc import VectorDBAdapter
from typing import List, Dict, Any, Optional
from crossvector import VectorDocument
class NewDBAdapter(VectorDBAdapter):
"""Adapter for NewDB vector database."""
def __init__(self, host: str = "localhost", port: int = 9000):
self.host = host
self.port = port
self._client = None
def add_collection(
self,
collection_name: str,
dimension: int,
**kwargs
) -> bool:
"""Create collection."""
pass
def insert(
self,
collection_name: str,
documents: List[VectorDocument],
**kwargs
) -> List[VectorDocument]:
"""Insert documents."""
pass
def search(
self,
collection_name: str,
query_vector: List[float],
where: Optional[Dict[str, Any]] = None,
limit: int = 10,
**kwargs
) -> List[VectorDocument]:
"""Search documents."""
pass
# Implement other required methods...
- Create where compiler:
# src/crossvector/querydsl/compilers/newdb.py
from crossvector.querydsl.compilers.base import BaseWhere
from typing import Dict, Any
class NewDBWhereCompiler(BaseWhere):
"""Compile filters for NewDB."""
# Capability flags
SUPPORTS_NESTED = True # Supports nested fields
REQUIRES_VECTOR = False # Can search metadata-only
REQUIRES_AND_WRAPPER = False # Multiple fields use implicit AND
_OP_MAP = {
"$eq": "==",
"$ne": "!=",
"$gt": ">",
"$gte": ">=",
"$lt": "<",
"$lte": "<=",
"$in": "in",
"$nin": "not in",
}
def to_where(self, where: Dict[str, Any]) -> str:
"""Compile to NewDB filter format."""
pass
def to_expr(self, where: Dict[str, Any]) -> str:
"""Convert to expression string."""
pass
- Add tests:
# tests/test_newdb.py
import pytest
from crossvector import VectorEngine
from crossvector.dbs.newdb import NewDBAdapter
class TestNewDB:
@pytest.fixture
def engine(self):
return VectorEngine(
db=NewDBAdapter(),
embedding=...,
collection_name="test"
)
def test_create(self, engine):
"""Test document creation."""
pass
def test_search(self, engine):
"""Test vector search."""
pass
-
Update documentation:
-
Add to
docs/adapters/databases.md - Update feature comparison tables
- Add configuration examples
New Embedding Provider
- Create adapter class:
# src/crossvector/embeddings/newprovider.py
from crossvector.abc import EmbeddingAdapter
from typing import List
class NewProviderEmbeddingAdapter(EmbeddingAdapter):
"""Adapter for NewProvider embeddings."""
def __init__(
self,
api_key: str,
model_name: str = "default-model"
):
self.api_key = api_key
super().__init__(model_name=model_name, dim=768)
def get_embeddings(self, texts: List[str]) -> List[List[float]]:
"""Generate embeddings for texts."""
# Implementation
pass
- Add tests:
# tests/test_newprovider_embeddings.py
import pytest
from crossvector.embeddings.newprovider import NewProviderEmbeddingAdapter
def test_embeddings():
"""Test embedding generation."""
adapter = NewProviderEmbeddingAdapter(api_key="test")
vectors = adapter.get_embeddings(["test text"])
assert len(vectors) == 1
assert len(vectors[0]) == 768
-
Update documentation:
-
Add to
docs/adapters/embeddings.md - Add configuration examples
- Update comparison tables
Documentation
Writing Documentation
Documentation is in docs/ directory using Markdown:
docs/
├── index.md # Main page
├── installation.md # Installation guide
├── quickstart.md # Quick start tutorial
├── api.md # API reference
├── schema.md # Data models
├── querydsl.md # Query DSL guide
├── configuration.md # Configuration reference
└── adapters/
├── databases.md # Database adapters
└── embeddings.md # Embedding adapters
Building docs:
Documentation Guidelines
- Use clear, concise language
- Include code examples
- Add type hints to examples
- Show both success and error cases
- Update all affected docs when changing features
Pull Request Process
Before Submitting
- Run tests:
- Format and lint code:
# Let pre-commit handle it automatically
pre-commit run --all-files
# Or manually
ruff format src/ tests/ scripts/
ruff check src/ tests/ scripts/ --fix
-
Update documentation:
-
Add/update docstrings
- Update relevant .md files
-
Add examples if needed
-
Update CHANGELOG.md:
## [Unreleased]
### Added
- New feature X with Y capability
### Changed
- Modified Z to improve performance
### Fixed
- Bug in A causing B
Submitting PR
- Create feature branch:
- Commit changes:
Use conventional commits:
feat:- New featurefix:- Bug fixdocs:- Documentation changestest:- Test additions/changesrefactor:- Code refactoring-
perf:- Performance improvements -
Push branch:
-
Create Pull Request:
-
Go to GitHub repository
- Click "New Pull Request"
- Fill in template:
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
## Testing
- [ ] Tests pass locally
- [ ] Added new tests for feature
- [ ] Updated documentation
## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Comments added for complex code
- [ ] Documentation updated
- [ ] No new warnings generated
Code Review
- Respond to reviewer feedback
- Make requested changes
- Re-request review after changes
Release Process
Version Numbering
Follow Semantic Versioning (SemVer):
- MAJOR (1.0.0): Breaking changes
- MINOR (0.1.0): New features, backward compatible
- PATCH (0.0.1): Bug fixes, backward compatible
Creating Release
- Update version:
- Update CHANGELOG.md:
## [0.2.0] - 2024-01-15
### Added
- Feature X
- Feature Y
### Changed
- Improved Z performance
### Fixed
- Bug in A
- Create release:
- Publish to PyPI:
Community
Communication
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Questions and general discussion
- Pull Requests: Code contributions
Getting Help
- Check existing documentation
- Search issues
- Ask in discussions
Reporting Bugs
Use the bug report template:
## Bug Description
Clear description of the bug
## Steps to Reproduce
1. Step 1
2. Step 2
3. Error occurs
## Expected Behavior
What should happen
## Actual Behavior
What actually happens
## Environment
- CrossVector version: 0.1.0
- Python version: 3.11
- OS: macOS 14
- Backend: PgVector
## Additional Context
Any other relevant information
Code of Conduct
Our Standards
- Be respectful and inclusive
- Welcome newcomers
- Focus on constructive feedback
- Accept responsibility for mistakes
- Prioritize community benefit
Enforcement
Violations can be reported to maintainers. All complaints will be reviewed and investigated promptly and fairly.
License
By contributing, you agree that your contributions will be licensed under the same license as the project (see LICENSE file).
Questions?
Feel free to ask questions in:
- GitHub Issues (for bugs)
- GitHub Discussions (for general questions)
- Pull Request comments (for specific code questions)
Thank you for contributing to CrossVector!