A production-ready, object-oriented data pipeline for processing NYC yellow taxi parquet files. Built with modern Python libraries including Polars, Pydantic, PyArrow, DuckDB, and dbt.
turboquant-py implements the TurboQuant and QJL vector quantization algorithms from Google Research (ICLR 2026 / AISTATS 2026). It compresses high-dimensional floating-point vectors to 1-4 bits per ...