Benchmarking Guide
This guide explains how to use CrossVector's benchmarking tool to measure and compare performance across different database backends and embedding providers.
Overview
The benchmark tool (scripts/benchmark.py) provides comprehensive performance testing for:
- 4 Database Backends: PgVector, AstraDB, Milvus, ChromaDB
- 2 Embedding Providers: OpenAI, Gemini
- 7 Operation Types: Bulk create, individual create, vector search, metadata search, Query DSL operators, updates, deletes
Quick Start
Basic Usage
# Test all backends with both embeddings (10 documents)
python scripts/benchmark.py --num-docs 10
# Full benchmark with 1000 documents
python scripts/benchmark.py
# Test specific configuration
python scripts/benchmark.py --backends pgvector milvus --embedding-providers openai --num-docs 100
Command Line Options
python scripts/benchmark.py [OPTIONS]
Options:
--num-docs INT Number of documents to test (default: 1000)
--backends NAME [NAME ...] Specific backends: pgvector, astradb, milvus, chroma
--embedding-providers NAME Embedding providers: openai, gemini
--skip-slow Skip slow cloud backends (astradb, milvus) for faster testing
--search-limit INT Number of results to return in search operations (default: 100)
--collection-name STR Custom collection name (default: auto-generate with UUID8)
--timeout INT Timeout per backend test in seconds (default: 60)
--output PATH Output file path (default: benchmark.md)
--use-fixtures PATH Path to pre-generated fixtures JSON file
--add-vectors Generate and add vectors to fixture documents
What Gets Measured
1. Upsert Performance
Measures throughput for batch document upsert with automatic embedding generation.
Metrics: - Duration (seconds) - Throughput (docs/sec)
2. Individual Create Performance
Tests single document creation with embedding generation.
Metrics: - Average duration per document
3. Vector Search Performance
Semantic similarity search using embedded queries.
Metrics: - Average query duration (10 queries tested) - Queries per second
4. Metadata-Only Search
Filtering without vector similarity (if supported by backend).
Metrics: - Average query duration - Support status
5. Query DSL Operators
Tests all 10 Query DSL operators:
- Comparison: eq, ne, gt, gte, lt, lte
- Array: in, nin
- Logical: and (&), or (|)
Metrics: - Average operator query duration - Number of operators successfully tested
6. Update Operations
Document update performance.
Metrics: - Average update duration (100 updates tested)
7. Delete Operations
Batch deletion throughput.
Metrics: - Duration - Throughput (docs/sec)
Prerequisites
Required Environment Variables
Embedding Providers (at least one required):
Database Backends (optional, will skip if not configured):
# PgVector
export PGVECTOR_HOST=localhost
export PGVECTOR_PORT=5432
export PGVECTOR_DATABASE=vector_db
export PGVECTOR_USER=postgres
export PGVECTOR_PASSWORD=postgres
# Or use connection string:
export PGVECTOR_CONNECTION_STRING=postgresql://user:pass@host:port/db
# AstraDB
export ASTRADB_API_ENDPOINT=https://...apps.astra.datastax.com
export ASTRADB_APPLICATION_TOKEN=AstraCS:...
# Milvus
export MILVUS_API_ENDPOINT=https://...
export MILVUS_API_TOKEN=...
# ChromaDB (optional for cloud)
export CHROMA_HOST=api.trychroma.com
export CHROMA_API_KEY=ck-...
export CHROMA_TENANT=...
export CHROMA_DATABASE=Test
Running Benchmarks
Recommended Workflow
Step 1: Quick Verification (1-2 minutes)
Test that everything is configured correctly:
Step 2: Fast Comparison (5-10 minutes)
Compare all backends with small dataset:
This runs 8 combinations (4 backends × 2 embeddings) with 10 documents each.
Step 3: Production Benchmark (30-60 minutes)
Full performance test with larger dataset:
Note: This will: - Make ~1000+ API calls to embedding providers - Take 30-60 minutes depending on network and API rate limits - Cost approximately $0.10-0.20 in API fees
Targeted Benchmarks
Test Specific Backend
Test Specific Embedding
# All backends with only OpenAI
python scripts/benchmark.py --embedding-providers openai --num-docs 100
Compare Two Backends
Understanding Results
Output Format
Results are saved as a markdown file (default: benchmark.md) with:
- Performance Summary Table - Quick comparison across all combinations
- Detailed Results - Individual metrics for each backend+embedding pair
- Notes - Configuration and methodology details
Example Output
## Performance Summary
| Backend | Embedding | Model | Dim | Upsert | Search (avg) | Update (avg) | Delete (batch) | Status |
|---------|-----------|-------|-----|--------|--------------|--------------|----------------|--------|
| pgvector | openai | text-embedding-3-small | 1536 | 7.06s | 21.26ms | 6.21ms | 22.63ms | OK |
| astradb | openai | text-embedding-3-small | 1536 | 18.89s | 23.86s | 1.11s | 15.15s | OK |
| milvus | openai | text-embedding-3-small | 1536 | 7.94s | 654.43ms | 569.52ms | 2.17s | OK |
| chroma | openai | text-embedding-3-small | 1536 | 17.08s | 654.76ms | 1.23s | 4.73s | OK |
| pgvector | gemini | models/gemini-embedding-001 | 1536 | 6.65s | 18.72ms | 6.40ms | 20.25ms | OK |
| astradb | gemini | models/gemini-embedding-001 | 1536 | 11.25s | 6.71s | 903.37ms | 15.05s | OK |
| milvus | gemini | models/gemini-embedding-001 | 1536 | 6.14s | 571.90ms | 561.38ms | 1.91s | OK |
| chroma | gemini | models/gemini-embedding-001 | 1536 | 18.93s | 417.28ms | 1.24s | 4.63s | OK |
Interpreting Metrics
Bulk Create: - Lower duration = better - Higher throughput (docs/sec) = better - Gemini typically slower due to API rate limits
Search: - Lower average duration = better - Milvus typically fastest for vector search - Gemini often faster than OpenAI for search (smaller vectors)
Updates & Deletes: - Lower duration = better - PgVector typically fast for updates due to SQL efficiency
Query DSL Operators: - Should test 10/10 operators successfully - Duration typically <1ms for metadata operations
Performance Tips
For Better Results
- Stable Network: Run benchmarks on stable network connection
- Isolated Environment: Avoid running other heavy processes
- Warm-up: First run may be slower due to cold starts
- Multiple Runs: Run 2-3 times and use median values for important decisions
API Rate Limits
Be aware of rate limits: - OpenAI: 3,500 requests/min (Tier 2) - Gemini: 1,500 requests/min (free tier)
For large benchmarks (--num-docs 1000+), the tool will automatically pace requests.
Avoiding quota errors
- If you see Gemini
RESOURCE_EXHAUSTED, rerun with--embedding-providers openaior reduce--num-docs. - To avoid embedding API calls entirely, provide fixtures with vectors (e.g.,
--use-fixtures scripts/benchmark/data/openai_3.json --add-vectors) or let the tool generate static vectors when configured. - Keep long runs to a single backend to reduce concurrent calls (e.g.,
--backends pgvector).
Comparing Before/After Changes
When optimizing performance:
# Before changes
python scripts/benchmark.py --num-docs 100 --output benchmark_before.md
# Make your changes to code
# After changes
python scripts/benchmark.py --num-docs 100 --output benchmark_after.md
# Compare the two markdown files
diff benchmark_before.md benchmark_after.md
Or use a markdown diff tool for better visualization.
Troubleshooting
Backend Not Available
If you see:
Solution: Set the required environment variables or the backend will be skipped.
Embedding API Errors
If you see rate limit errors:
Solutions:
- Reduce --num-docs
- Wait and retry
- Check API quota/billing
Slow Performance
If benchmarks are unexpectedly slow:
- Check network latency to database
- Verify database is not under load
- Check API rate limits aren't being hit
- Try reducing --num-docs for faster iterations
Advanced Usage
Custom Test Data
Modify scripts/benchmark.py to use custom test data:
# In generate_documents() function
SAMPLE_TEXTS = [
"Your custom text 1",
"Your custom text 2",
# ...
]
Adding Custom Metrics
Extend benchmark_backend() method to add custom metrics:
# In BenchmarkRunner.benchmark_backend()
# After existing benchmarks, add:
# Custom metric
print("\nCustom Metric...")
duration, result = benchmark_operation("custom", lambda: engine.custom_operation())
results["custom_metric"] = {"duration": duration}
Cost Estimation
Approximate costs for running benchmarks:
| Documents | OpenAI Cost | Gemini Cost | Total Time |
|---|---|---|---|
| 10 | $0.001 | Free | 2-5 min |
| 100 | $0.01 | Free | 10-15 min |
| 1000 | $0.10 | Free | 30-60 min |
Note: Costs are approximate and depend on: - Embedding model used - Document text length - Current API pricing
For Gemini, the free tier typically covers benchmarking needs.
Best Practices
- Start Small: Always test with
--num-docs 10first - Document Results: Save benchmark outputs with timestamps
- Consistent Environment: Run on same machine/network for comparisons
- Version Control: Commit benchmark results with code changes
- CI/CD Integration: Consider running small benchmarks in CI for regression testing
Examples
Example 1: Quick Backend Comparison
# Compare PgVector and Milvus with 50 docs
python scripts/benchmark.py --backends pgvector milvus --num-docs 50
Example 2: Embedding Provider Comparison
# Test which embedding is faster for your use case
python scripts/benchmark.py --backends pgvector --num-docs 200
Example 3: Pre-Release Validation
# Full benchmark before major release
python scripts/benchmark.py --num-docs 1000 --output release_v1.0_benchmark.md
Example 4: Query Performance Focus
# Test with more documents to stress search performance
python scripts/benchmark.py --backends milvus --num-docs 5000
Contributing
Found a performance issue or want to add a new benchmark metric? See Contributing Guide.