Indexing Architecture

Deep dive into the indexing module’s architecture and design patterns.

Design Philosophy

The indexing module follows these key principles:

  1. Late Binding: Dimensions store metadata; expansion happens during solving

  2. Type Safety: Full generic type support with Generic[TModel]

  3. Composability: Dimensions combine via cartesian products

  4. Lazy Evaluation: Data retrieved only when needed

Architecture Overview

        classDiagram
    class LXIndexDimension {
        +model_type: Type[TModel]
        +key_func: Callable
        +filter_func: Optional[Callable]
        +_data: Optional[List]
        +_session: Optional[Any]
        +from_data(data)
        +from_model(session)
        +get_instances()
        +where(predicate)
    }

    class LXCartesianProduct {
        +dimensions: List[LXIndexDimension]
        +_cross_filter: Optional[Callable]
        +add_dimension(dim)
        +where(predicate)
    }

    class LXVariable {
        +_cartesian: Optional[LXCartesianProduct]
        +indexed_by_product(dim1, dim2)
        +where_multi(predicate)
        +get_instances()
    }

    LXCartesianProduct --> LXIndexDimension
    LXVariable --> LXCartesianProduct
    

Component Details

LXIndexDimension

Purpose: Represents a single dimension in multi-dimensional indexing.

Key Features:

  • Generic type parameter TModel

  • Flexible data sources (direct data or ORM)

  • Per-dimension filtering

  • Lazy data retrieval

Implementation:

@dataclass
class LXIndexDimension(Generic[TModel]):
    model_type: Type[TModel]
    key_func: Callable[[TModel], Any]
    filter_func: Optional[Callable[[TModel], bool]] = None
    _data: Optional[List[TModel]] = None
    _session: Optional[Any] = None

    def get_instances(self) -> List[TModel]:
        # 1. Retrieve data (from _data or _session)
        # 2. Apply filter_func if present
        # 3. Return filtered instances

Data Flow:

  1. User creates dimension with model type and key function

  2. User provides data source (.from_data() or .from_model())

  3. Optionally adds filter (.where())

  4. During solving, get_instances() retrieves and filters data

LXCartesianProduct

Purpose: Combines multiple dimensions into multi-dimensional index space.

Key Features:

  • Stores list of dimensions

  • Supports 2D, 3D, N-dimensional products

  • Cross-dimension filtering

  • Lazy combination generation

Implementation:

class LXCartesianProduct(Generic[TModel1, TModel2]):
    def __init__(self, dim1, dim2):
        self.dimensions = [dim1, dim2]
        self._cross_filter: Optional[Callable] = None

    def add_dimension(self, dim):
        self.dimensions.append(dim)
        return self

    def where(self, predicate):
        self._cross_filter = predicate
        return self

Expansion Logic (in LXVariable.get_instances()):

# 1. Get instances from each dimension
dimension_instances = [dim.get_instances() for dim in dimensions]

# 2. Generate cartesian product
combinations = itertools.product(*dimension_instances)

# 3. Apply cross-filter if present
if cross_filter:
    combinations = [c for c in combinations if cross_filter(*c)]

return list(combinations)

Integration with Core

The indexing module integrates with the core module’s LXVariable:

class LXVariable:
    _cartesian: Optional[LXCartesianProduct] = None

    def indexed_by_product(self, dim1, dim2, *extra_dims):
        self._cartesian = LXCartesianProduct(dim1, dim2)
        for dim in extra_dims:
            self._cartesian.add_dimension(dim)
        return self

    def get_instances(self):
        if self._cartesian:
            # Multi-model indexing path
            dimension_instances = [
                dim.get_instances() for dim in self._cartesian.dimensions
            ]
            combinations = itertools.product(*dimension_instances)
            if self._cartesian._cross_filter:
                combinations = [
                    c for c in combinations
                    if self._cartesian._cross_filter(*c)
                ]
            return list(combinations)
        elif self._data:
            # Single-model indexing path
            return self._data
        # ...

Type System

Generics for Type Safety

TModel = TypeVar("TModel")

class LXIndexDimension(Generic[TModel]):
    model_type: Type[TModel]
    # ...

Benefits:

  • IDE autocomplete in lambdas

  • mypy type checking

  • Self-documenting code

Usage:

# TModel = Driver
dim = LXIndexDimension(Driver, lambda d: d.id)
#                                     ^ IDE knows 'd' is Driver

# TModel1 = Driver, TModel2 = Date
product = LXCartesianProduct(driver_dim, date_dim)

Tuple Types for Multi-Indexing

from typing import Tuple

# Variable knows it's indexed by (Driver, Date)
duty = LXVariable[Tuple[Driver, Date], int]("duty")

# Lambda receives both models with full type information
.cost_multi(lambda driver, date: driver.daily_rate * date.multiplier)
#                  ^^^^^^  ^^^^ IDE provides autocomplete

Data Flow

Model Building Phase

        sequenceDiagram
    participant User
    participant Dimension
    participant CartesianProduct
    participant Variable

    User->>Dimension: Create with model type & key func
    User->>Dimension: from_data(drivers)
    Note over Dimension: Stores data reference
    User->>Dimension: where(lambda d: d.is_active)
    Note over Dimension: Stores filter predicate

    User->>CartesianProduct: Create(driver_dim, date_dim)
    Note over CartesianProduct: Stores dimension list
    User->>CartesianProduct: where(lambda d, dt: ...)
    Note over CartesianProduct: Stores cross-filter

    User->>Variable: indexed_by_product(...)
    Note over Variable: Stores CartesianProduct reference
    

Key Point: No data expansion yet - only metadata stored.

Solving Phase

        sequenceDiagram
    participant Solver
    participant Variable
    participant CartesianProduct
    participant Dimension

    Solver->>Variable: get_instances()
    Variable->>CartesianProduct: Get dimensions
    loop For each dimension
        CartesianProduct->>Dimension: get_instances()
        Dimension->>Dimension: Retrieve data (_data or _session)
        Dimension->>Dimension: Apply filter_func
        Dimension-->>CartesianProduct: Filtered instances
    end
    CartesianProduct->>CartesianProduct: Generate cartesian product
    CartesianProduct->>CartesianProduct: Apply _cross_filter
    CartesianProduct-->>Variable: Filtered combinations
    Variable-->>Solver: Final instance list
    

Key Point: Expansion and filtering happen here, not during model building.

Performance Considerations

Late Binding Overhead

Trade-off: Late binding adds overhead but provides flexibility.

Mitigation:

  • Data retrieved once per solve

  • Filters applied efficiently (early exit on False)

  • Cartesian product uses itertools.product (memory-efficient)

Memory Usage

Dimension Storage:

  • Dimensions store references, not copies

  • Filter predicates are small (lambda closures)

Cartesian Product:

  • Not materialized until needed

  • Can be large: O(n1 × n2 × … × nN)

  • Filters reduce size before variable creation

Optimization:

# Bad: Stores all combinations
all_combos = list(itertools.product(drivers, dates))  # Large list

# Good: Filters while generating
combos = [
    (d, dt) for d in drivers for dt in dates
    if cross_filter(d, dt)
]  # Smaller list

Filtering Performance

Filter Order Matters:

# Good: Dimension filters first (reduces product size)
driver_dim = LXIndexDimension(...).where(lambda d: expensive_check(d))
# Operates on 100 drivers

# Then cartesian product
# Operates on (filtered_drivers × dates)

# Bad: Only cross-filter
product.where(lambda d, dt: expensive_check(d) and ...)
# Operates on (all_drivers × dates) - much larger

Extension Points

Custom Dimension Types

Subclass for specialized dimensions:

class LXTimeDimension(LXIndexDimension[datetime]):
    """Dimension for time periods with automatic filtering."""

    def __init__(self, start: datetime, end: datetime, interval: timedelta):
        periods = generate_periods(start, end, interval)
        super().__init__(
            datetime,
            lambda dt: dt.isoformat(),
        )
        self.from_data(periods)

    def business_hours_only(self):
        return self.where(lambda dt: 9 <= dt.hour < 17)

Custom Cartesian Products

Subclass for specialized products:

class LXSparseCartesianProduct(LXCartesianProduct):
    """Cartesian product with built-in sparsity checking."""

    def __init__(self, dim1, dim2, sparsity_matrix):
        super().__init__(dim1, dim2)
        self.sparsity = sparsity_matrix

    def where(self, predicate):
        # Combine sparsity matrix with user predicate
        def combined(m1, m2):
            return self.sparsity.get((m1.id, m2.id), False) and predicate(m1, m2)
        self._cross_filter = combined
        return self

Testing Strategy

Unit Tests

Test individual components:

def test_dimension_filtering():
    dim = (
        LXIndexDimension(Driver, lambda d: d.id)
        .from_data([driver1, driver2, driver3])
        .where(lambda d: d.is_active)
    )
    instances = dim.get_instances()
    assert len(instances) == 2
    assert all(d.is_active for d in instances)

def test_cartesian_product():
    product = LXCartesianProduct(driver_dim, date_dim)
    product.where(lambda d, dt: dt.weekday() not in d.days_off)
    # Test expansion logic

Integration Tests

Test with variables:

def test_multi_indexed_variable():
    duty = (
        LXVariable[Tuple[Driver, Date], int]("duty")
        .binary()
        .indexed_by_product(driver_dim, date_dim)
        .where_multi(lambda d, dt: is_valid(d, dt))
    )
    instances = duty.get_instances()
    # Verify correct expansion

Type Tests

Use mypy for static type checking:

mypy src/lumix/indexing

Next Steps