Tuplex is a new framework for processing larger than memory datasets in a monadic programming paradigm as in Apache Spark or Apache Flink. Under the hood it uses whole-stage code generation to speed up processing and provides native speed comparable to a pipeline written in C, which is then compiled to a native executable. Furthermore, it allows users to handle exceptions in a novel way to bolster overall productivity and to facilitate running complex and data intense ETL pipelines. Tuplex is developed currently within the Database Management Group at Brown University.
|Leonhard F. Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf and Tim Kraska. Tuplex: Data Science in Python at Native Code Speed. preprint|
|Leonhard F. Spiegelberg and Tim Kraska. Tuplex: robust, efficient analytics when python rules. Proc. VLDB Endow., 12(12):1958–1961, August 2019. URL: https://doi.org/10.14778/3352063.3352109.|