Tuplex: Data Science in Python at Native Code Speed

Tuplex: Blazing Fast Python Data Science

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set. Under the hood, Tuplex is based on data-driven compilation and dual-mode processing, two key techniques that make it possible for Tuplex to provide speed comparable to a pipeline written in hand-optimized C++.

Native Code Speed

Because Tuplex compiles data science pipelines with inline Python to native code, it runs them 5–91x faster than systems that call into a Python interpreter.

Easy to Use

Tuplex makes wrangling data easy: it works interactively in the Python toplevel, integrates with Jupyter Notebooks, and provides familiar APIs, all backed by its data-driven compiler. Tuplex jobs never crash on malformed inputs because Tuplex's dual-mode execution model separates the common-case inputs from exception-case inputs (e.g., malformed data, wrong types) and reports them separately.

Start using Tuplex

Getting started with Tuplex is easy: we provide a Python package, Docker image, and instructions to build from source.

Linux, Python 3.7-3.9:

$ pip install tuplex

macOS, Catalina or later:

$ docker run -p 8888:8888 tuplex/tuplex

Development version from our Github repository:

$ git clone https://github.com/tuplex/tuplex

Publications

	Leonhard F. Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf and Tim Kraska. Tuplex: Data Science in Python at Native Code Speed. Proceedings of SIGMOD 2021, June 2021. URL: https://doi.org/10.1145/3448016.3457244.
	Leonhard F. Spiegelberg and Tim Kraska. Tuplex: robust, efficient analytics when Python rules (Demo paper). Proceedings of the VLDB Endowment, 12(12):1958–1961, August 2019. URL: https://doi.org/10.14778/3352063.3352109.

Team

Contributors and Alumni:

Andrew Wei	Andy Ly	Benjamin Givertz
Colby Anderson	Yunzhi Shao	Raghu Nimmagadda
Willam Riley

Subscribe to get news on Tuplex first!

If you want to receive updates about Tuplex releases, new features, and development progress, sign up for our updates below.

Getting Involved

If you're a Brown student interested in systems research, please check out our guide and starter projects.