Query DSL Guide
Comprehensive guide to CrossVector's Query DSL for building filters.
Overview
CrossVector provides two ways to build filters:
- Q Objects - Composable, Pythonic query builders
- Universal Filters - Dict-based filter format
Both compile to backend-specific filter syntax automatically.
Q Objects
Basic Usage
from crossvector.querydsl.q import Q
# Simple equality
Q(category="tech")
# Comparison operators
Q(score__gte=0.8)
Q(price__lt=100)
# IN operator
Q(status__in=["active", "pending"])
# NOT IN operator
Q(role__nin=["guest", "banned"])
Combining Filters
# AND (using &)
Q(category="tech") & Q(level="beginner")
# OR (using |)
Q(featured=True) | Q(score__gte=0.9)
# NOT (using ~)
~Q(archived=True)
# Complex combinations
(Q(category="tech") & Q(level="beginner")) | Q(featured=True)
Nested Metadata
Access nested fields using double underscore (__):
# Nested object
Q(user__role="admin")
Q(author__verified=True)
# Deep nesting
Q(post__stats__views__gte=1000)
Q(config__settings__enabled=True)
Backend Support:
| Backend | Nested Metadata |
|---|---|
| AstraDB | Full support |
| PgVector | Full support |
| Milvus | Full support |
| ChromaDB | Via dot notation |
Supported Operators
CrossVector supports 8 universal operators that work across all backends:
Field Operators
These 8 operators work on field values and are compiled to backend-specific syntax:
| Operator | Usage | Example |
|---|---|---|
eq |
Equal | Q(status="active") or Q(status__eq="active") |
ne |
Not equal | Q(status__ne="deleted") |
gt |
Greater than | Q(score__gt=0.8) |
gte |
Greater than or equal | Q(score__gte=0.8) |
lt |
Less than | Q(price__lt=100) |
lte |
Less than or equal | Q(stock__lte=10) |
in |
In array | Q(status__in=["active", "pending"]) |
nin |
Not in array | Q(status__nin=["deleted", "banned"]) |
Boolean Operators
These are used to combine Q objects (not field operators):
| Operator | Symbol | Example |
|---|---|---|
| AND | & |
Q(category="tech") & Q(level="beginner") |
| OR | \| |
Q(featured=True) \| Q(score__gte=0.9) |
| NOT | ~ |
~Q(archived=True) |
Filter format: {"status": {"$ne": "deleted"}}
Comparison Operators
gt - Greater Than
Filter format: {"score": {"$gt": 0.5}}
Note: Requires numeric values. PgVector automatically casts text to numeric.
gte - Greater Than or Equal
Filter format: {"score": {"$gte": 0.8}}
lt - Less Than
Filter format: {"price": {"$lt": 100}}
lte - Less Than or Equal
Filter format: {"stock": {"$lte": 10}}
in - In Array
Q(status__in=["active", "pending", "review"])
Q(category__in=["tech", "science"])
Q(priority__in=[1, 2, 3])
Filter format: {"status": {"$in": ["active", "pending", "review"]}}
nin - Not In Array
Filter format: {"status": {"$nin": ["deleted", "banned"]}}
Universal Filter Format
Alternative dict-based filter syntax.
Basic Filters
# Equality
where = {"status": {"$eq": "active"}}
# Comparison
where = {"score": {"$gt": 0.8}}
# IN
where = {"role": {"$in": ["admin", "moderator"]}}
Multiple Conditions
Multiple conditions in same dict are combined with AND:
where = {
"category": {"$eq": "tech"},
"level": {"$eq": "beginner"},
"score": {"$gte": 0.5}
}
# Equivalent to: category="tech" AND level="beginner" AND score>=0.5
Nested Fields
Use dot notation for nested metadata:
Complete Examples
Search with Filters
from crossvector import VectorEngine
from crossvector.querydsl.q import Q
engine = VectorEngine(db=..., embedding=...)
# Q object filters
results = engine.search(
"python tutorials",
where=Q(category="tech") & Q(level="beginner"),
limit=10
)
# Universal filter
results = engine.search(
"machine learning",
where={
"category": {"$eq": "ai"},
"difficulty": {"$lte": 3}
},
limit=20
)
# Complex Q filters
results = engine.search(
"database design",
where=(Q(featured=True) | Q(score__gte=0.9)) & ~Q(archived=True),
limit=15
)
Metadata-Only Search
Some backends support filtering without vector search:
# AstraDB, PgVector, ChromaDB, Milvus support metadata-only search
results = engine.search(
query=None, # No vector search
where=Q(status="published") & Q(category="tech"),
limit=50
)
# Always check backend support
if engine.supports_metadata_only:
results = engine.search(query=None, where={"status": {"$eq": "active"}})
Get Document with Filters
# Get by metadata (must return exactly one)
doc = engine.get(status="draft", author="user123")
# Using Q
doc = engine.get(Q(slug="my-post") & Q(published=False))
# Universal filter
doc = engine.get(**{"status": {"$eq": "draft"}, "author": {"$eq": "user123"}})
Operator Examples by Type
String Values
# Equality
Q(category="tech")
Q(author__name="John Doe")
# NOT equality
Q(status__ne="deleted")
# IN
Q(language__in=["python", "javascript", "rust"])
# NOT IN
Q(category__nin=["spam", "nsfw"])
Numeric Values
# Comparison
Q(score__gt=0.5)
Q(price__gte=100)
Q(age__lt=65)
Q(stock__lte=10)
# Equality
Q(count=42)
Q(version__eq=3)
# IN
Q(priority__in=[1, 2, 3])
Boolean Values
Nested Objects
# Single level
Q(user__role="admin")
Q(author__verified=True)
# Multiple levels
Q(post__meta__featured=True)
Q(config__db__host="localhost")
# With operators
Q(user__stats__posts__gte=10)
Q(author__rating__gt=4.5)
Array Fields
# Check if array contains value
Q(tags__in=["python"]) # Has "python" tag
# Check if field is one of values
Q(status__in=["active", "pending"])
Backend-Specific Compilation
Different backends compile filters differently:
AstraDB
Pass-through universal operators directly:
Q(score__gte=0.8)
# Compiles to: {"score": {"$gte": 0.8}}
Q(user__role="admin")
# Compiles to: {"user.role": {"$eq": "admin"}}
ChromaDB
Flattens nested metadata with dot notation:
Q(user__role="admin")
# Compiles to: {"user.role": {"$eq": "admin"}}
Q(score__gte=0.8) & Q(category="tech")
# Compiles to: {"$and": [{"score": {"$gte": 0.8}}, {"category": {"$eq": "tech"}}]}
Milvus
Boolean expression syntax:
Q(category="tech")
# Compiles to: 'category == "tech"'
Q(score__gt=0.8) & Q(price__lt=100)
# Compiles to: '(score > 0.8) and (price < 100)'
Q(status__in=["active", "pending"])
# Compiles to: 'status in ["active", "pending"]'
PgVector
PostgreSQL WHERE clause with JSONB operators:
Q(category="tech")
# Compiles to: "metadata->>'category' = 'tech'"
Q(score__gt=0.8)
# Compiles to: "(metadata->>'score')::numeric > 0.8"
Q(user__role="admin")
# Compiles to: "metadata #>> '{user,role}' = 'admin'"
Advanced Usage
Dynamic Filter Building
def build_filter(category=None, min_score=None, featured=None):
filters = []
if category:
filters.append(Q(category=category))
if min_score is not None:
filters.append(Q(score__gte=min_score))
if featured is not None:
filters.append(Q(featured=featured))
# Combine with AND
if filters:
result = filters[0]
for f in filters[1:]:
result = result & f
return result
return None
# Use in search
where = build_filter(category="tech", min_score=0.8, featured=True)
results = engine.search("query", where=where)
Complex Queries
# Featured OR high score, but not archived
where = (Q(featured=True) | Q(score__gte=0.9)) & ~Q(archived=True)
# Tech category with beginner or intermediate level
where = Q(category="tech") & Q(level__in=["beginner", "intermediate"])
# Published articles by verified authors
where = (
Q(type="article") &
Q(status="published") &
Q(author__verified=True) &
Q(author__rating__gte=4.0)
)
results = engine.search("query", where=where)
Conditional Filters
# Build filter based on user role
user_role = "admin"
if user_role == "admin":
where = None # See all documents
elif user_role == "moderator":
where = ~Q(status="draft") # See all except drafts
else:
where = Q(status="published") # Only published
results = engine.search("query", where=where)
Error Handling
Unsupported Operators
from crossvector.exceptions import InvalidFieldError
try:
# Regex not supported
results = engine.search("query", where={"text": {"$regex": "pattern"}})
except InvalidFieldError as e:
print(f"Error: {e.message}")
print(f"Operator: {e.details['operator']}")
Type Mismatches
# Correct: numeric comparison with number
Q(score__gt=0.8)
# Incorrect: numeric comparison with string (backend-dependent)
Q(score__gt="0.8") # May fail on some backends
Best Practice: Use correct types for comparisons:
# Numbers
Q(score__gte=0.8, price__lt=100, count=42)
# Strings
Q(category="tech", status="active")
# Booleans
Q(featured=True, archived=False)
Performance Tips
Index-Friendly Queries
# Good: Simple equality on indexed field
Q(category="tech")
# Good: Range on indexed numeric field
Q(created_at__gte=timestamp)
# Slower: Complex nested queries
Q(user__profile__settings__theme="dark")
Limit Result Sets
Pagination
# Page 1
results = engine.search("query", limit=20, offset=0)
# Page 2
results = engine.search("query", limit=20, offset=20)
# Page 3
results = engine.search("query", limit=20, offset=40)
Testing Queries
Check Backend Support
engine = VectorEngine(db=..., embedding=...)
# Check capabilities
print(f"Metadata-only: {engine.supports_metadata_only}")
print(f"Backend: {engine.db.__class__.__name__}")
# Test query
where = Q(category="tech") & Q(score__gte=0.8)
print(f"Filter: {where.to_dict()}")
Debug Compiled Filters
from crossvector.querydsl.q import Q
# Build query
q = Q(category="tech") & Q(level__in=["beginner", "intermediate"])
# See universal format
print(q.to_dict())
# {'$and': [{'category': {'$eq': 'tech'}}, {'level': {'$in': ['beginner', 'intermediate']}}]}
# Compile for specific backend
from crossvector.querydsl.compilers.pgvector import PgVectorWhereCompiler
compiler = PgVectorWhereCompiler()
compiled = compiler.compile(q.to_dict())
print(compiled)
# "metadata->>'category' = 'tech' AND metadata->>'level' IN ('beginner', 'intermediate')"
Migration Guide
From Dict Filters
Before:
After (Q objects):
From Raw Backend Queries
Before (PgVector):
After (Universal):
Next Steps
- API Reference - Complete API documentation
- Schema Reference - Data models
- Database Adapters - Backend-specific features
- Examples - Practical examples