Tuplex is a new framework for processing larger than memory datasets in a monadic programming paradigm as in Apache Spark or Apache Flink. Under the hood it uses whole-stage code generation to speed up processing and provides native speed comparable to a pipeline written in C, which is then compiled to a native executable. Furthermore, it allows users to handle exceptions in a novel way to bolster overall productivity and to facilitate running complex and data intense ETL pipelines. Tuplex is developed currently within the Database Management Group at Brown University.
Leonhard F. Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf and Tim Kraska. Tuplex: Data Science in Python at Native Code Speed. preprint | |
Leonhard F. Spiegelberg and Tim Kraska. Tuplex: robust, efficient analytics when python rules. Proc. VLDB Endow., 12(12):1958–1961, August 2019. URL: https://doi.org/10.14778/3352063.3352109. |