Filtering Strategies¶
Filtering is essential for creating efficient optimization models. LumiX provides multiple levels of filtering to control which variables and constraints are created.
Why Filter?¶
Benefits:
Reduce model size (fewer variables/constraints)
Improve solve time
Create sparse models (only valid combinations)
Express business rules naturally
Without filtering:
# Creates variables for ALL combinations, including invalid ones
# 100 drivers × 365 days = 36,500 variables
assignment = (
LXVariable[Tuple[Driver, Date], int]("assignment")
.indexed_by_product(
LXIndexDimension(Driver, lambda d: d.id).from_data(drivers),
LXIndexDimension(Date, lambda dt: dt.date).from_data(dates)
)
)
With filtering:
# Creates variables only for valid combinations
# ~20,000 variables after filtering
assignment = (
LXVariable[Tuple[Driver, Date], int]("assignment")
.indexed_by_product(
LXIndexDimension(Driver, lambda d: d.id)
.where(lambda d: d.is_active) # Per-dimension filter
.from_data(drivers),
LXIndexDimension(Date, lambda dt: dt.date)
.where(lambda dt: dt >= today) # Per-dimension filter
.from_data(dates)
)
.where_multi(lambda d, dt: # Cross-dimension filter
dt.weekday() not in d.days_off
)
)
Filtering Levels¶
Level 1: Data Filtering¶
Filter data before creating dimensions:
# Filter in Python before LumiX
active_drivers = [d for d in all_drivers if d.is_active]
dim = LXIndexDimension(Driver, lambda d: d.id).from_data(active_drivers)
Level 2: Dimension Filtering¶
Filter within dimension using .where():
dim = (
LXIndexDimension(Driver, lambda d: d.id)
.from_data(all_drivers)
.where(lambda d: d.is_active and d.years_experience >= 2)
)
When to use: Simple conditions on single dimension
Level 3: Cross-Dimension Filtering¶
Filter combinations using .where_multi():
assignment = (
LXVariable[Tuple[Driver, Date], int]("assignment")
.indexed_by_product(driver_dim, date_dim)
.where_multi(lambda driver, date:
date.weekday() not in driver.days_off and
driver.can_work_on(date)
)
)
When to use: Relationships between dimensions
Level 4: Variable Filtering¶
Filter on single-dimension variables using .where():
production = (
LXVariable[Product, float]("production")
.continuous()
.where(lambda p: p.is_active and p.stock > 0)
.from_data(products)
)
When to use: Simple single-model filtering
Filtering Strategies¶
Strategy 1: Filter Early¶
Apply most restrictive filters first:
# Good: Filter at dimension level (early)
dim = (
LXIndexDimension(Driver, lambda d: d.id)
.where(lambda d: d.is_active) # Reduces from 100 to 80 drivers
.from_data(drivers)
)
# Then cross-dimension filter
.where_multi(lambda d, dt: ...) # Operates on 80 drivers, not 100
# Bad: Only cross-dimension filter (late)
.where_multi(lambda d, dt: d.is_active and ...) # Checks all 100 drivers
Strategy 2: Separate Static vs Dynamic¶
# Static filters (data-based) at dimension level
driver_dim = (
LXIndexDimension(Driver, lambda d: d.id)
.where(lambda d: d.is_active and d.certification_valid)
.from_data(drivers)
)
# Dynamic filters (relationship-based) at cross-dimension level
assignment = (
LXVariable[Tuple[Driver, Date], int]("assignment")
.indexed_by_product(driver_dim, date_dim)
.where_multi(lambda d, dt: dt.weekday() not in d.days_off)
)
Strategy 3: Combine Simple Conditions¶
# Good: Single where() with combined conditions
dim = (
LXIndexDimension(Product, lambda p: p.sku)
.where(lambda p:
p.in_stock and
p.price > 0 and
not p.discontinued and
p.category in ["A", "B", "C"]
)
)
# Avoid: Multiple where() calls (last one wins)
dim = (
LXIndexDimension(Product, lambda p: p.sku)
.where(lambda p: p.in_stock) # This is lost
.where(lambda p: p.price > 0) # Only this applies
)
Common Filtering Patterns¶
Time-Based Filtering¶
from datetime import date, timedelta
today = date.today()
next_month = today + timedelta(days=30)
date_dim = (
LXIndexDimension(Date, lambda dt: dt.date)
.where(lambda dt: today <= dt.date <= next_month)
.from_data(all_dates)
)
Availability Filtering¶
assignment = (
LXVariable[Tuple[Worker, Task], int]("assignment")
.indexed_by_product(worker_dim, task_dim)
.where_multi(lambda w, t:
# Worker has required skills
all(skill in w.skills for skill in t.required_skills) and
# Worker is available during task period
w.available_from <= t.start_date <= w.available_until and
# Worker not already assigned to conflicting task
not w.has_conflict(t)
)
)
Capacity Filtering¶
shipment = (
LXVariable[Tuple[Warehouse, Customer], float]("shipment")
.indexed_by_product(warehouse_dim, customer_dim)
.where_multi(lambda w, c:
# Warehouse has sufficient capacity
w.remaining_capacity >= c.order_size and
# Warehouse can serve customer region
c.region in w.service_regions and
# Distance is acceptable
calculate_distance(w, c) <= MAX_DISTANCE
)
)
Business Rule Filtering¶
route = (
LXVariable[Tuple[Origin, Destination], float]("route")
.indexed_by_product(origin_dim, destination_dim)
.where_multi(lambda o, d:
# No self-loops
o.id != d.id and
# Route must be operational
is_route_operational(o, d) and
# Comply with regulations
meets_regulations(o, d) and
# Within service network
are_connected(o, d)
)
)
Performance Optimization¶
Measure Filter Impact¶
# Check how many variables are created
print(f"Potential combinations: {len(drivers) * len(dates)}")
# With dimension filters only
filtered_drivers = sum(1 for d in drivers if dim_filter(d))
filtered_dates = sum(1 for dt in dates if date_filter(dt))
print(f"After dimension filters: {filtered_drivers * filtered_dates}")
# After cross-dimension filters
actual_count = sum(
1 for d in filtered_drivers
for dt in filtered_dates
if cross_filter(d, dt)
)
print(f"After cross-dimension filters: {actual_count}")
Optimize Filter Order¶
Place most restrictive filters first:
# Good: Most restrictive first
.where_multi(lambda d, dt:
d.is_certified and # Filters out 50% (check first)
dt.is_weekday and # Filters out 28% of remaining
d.can_work_overtime # Filters out 10% of remaining
)
# Less optimal: Least restrictive first
.where_multi(lambda d, dt:
d.can_work_overtime and # Only filters 10%
dt.is_weekday and # Then filters 28%
d.is_certified # Then filters 50%
)
Cache Complex Computations¶
# Bad: Expensive computation in filter
.where_multi(lambda w, c:
calculate_distance(w.location, c.location) <= MAX_DISTANCE
)
# Good: Pre-compute distances
distances = {
(w.id, c.id): calculate_distance(w.location, c.location)
for w in warehouses
for c in customers
}
.where_multi(lambda w, c:
distances.get((w.id, c.id), float('inf')) <= MAX_DISTANCE
)
Debugging Filters¶
Count Filtered Items¶
# Before filtering
print(f"Total drivers: {len(drivers)}")
# Create dimension with filter
dim = (
LXIndexDimension(Driver, lambda d: d.id)
.where(lambda d: d.is_active and d.certified)
.from_data(drivers)
)
# Check filtered count
filtered = dim.get_instances()
print(f"Filtered drivers: {len(filtered)}")
# Inspect filtered items
print("Active certified drivers:")
for driver in filtered:
print(f" - {driver.name}")
Test Filters Separately¶
# Test dimension filter
dimension_filter = lambda d: d.is_active and d.years_experience >= 5
passed_dim = [d for d in drivers if dimension_filter(d)]
print(f"Passed dimension filter: {len(passed_dim)}")
# Test cross-dimension filter
cross_filter = lambda d, dt: dt.weekday() not in d.days_off
passed_cross = [
(d, dt) for d in passed_dim for dt in dates
if cross_filter(d, dt)
]
print(f"Passed cross filter: {len(passed_cross)}")
Best Practices¶
Filter at the right level
# Data-based filters: Dimension level .where(lambda d: d.is_active) # Relationship-based filters: Cross-dimension level .where_multi(lambda d, dt: dt not in d.blackout_dates)
Combine conditions efficiently
# Good: Short-circuit evaluation .where(lambda p: p.in_stock and p.price > 0 and expensive_check(p)) # Bad: Always evaluates expensive_check .where(lambda p: expensive_check(p) and p.in_stock)
Document complex filters
def is_valid_assignment(driver: Driver, date: Date) -> bool: """Check if driver can be assigned to date. Rules: - Driver must be active and certified - Date must not be in driver's blackout dates - Driver must not exceed monthly hours """ return ( driver.is_active and driver.is_certified and date not in driver.blackout_dates and driver.remaining_hours_this_month >= 8 ) assignment = ( LXVariable[Tuple[Driver, Date], int]("assignment") .indexed_by_product(driver_dim, date_dim) .where_multi(is_valid_assignment) )
Next Steps¶
Index Dimensions - Learn more about index dimensions
Multi-Model Indexing - Apply filtering to multi-model problems
Variables Guide - Variable filtering details
Examples - See filtering in real examples