ConfigurationΒΆ

Tuplex provides several mechanism to easily configure the framework. To quickly try out different settings, you can directly manipulate the Context object. For an overview over available configuration keys, consult the API of the Context object directly.

This can be either done by passing a dictionary

from tuplex import *
c = Context(conf={"executorMemory" : "2G"})

or using keyword arguments.

from tuplex import *
c = Context(executorMemory="2G")

Furthermore, Tuplex allows to use a yaml file for configuration. If no file is specified, Tuplex looks for a file conf.yaml in the working directory. If found, this file will be attempted to be loaded as configuration file.

from tuplex import *
c = Context(conf="/conf/tuplex.yaml")

For any keys the user did not supply values, Tuplex will use its internal defaults. An example config file is

# FastETL configuration file
# 	created 2019-02-17 16:45:09.940033 UTC
tuplex:
    -   allowUndefinedBehavior: false
    -   autoUpcast: false
    -   csv:
            -   comments: ["#", "~"]
            -   generateParser: true
            -   maxDetectionMemory: 256KB
            -   maxDetectionRows: 100
            -   quotechar: "\""
            -   selectionPushdown: true
            -   separators: [",", ;, "|", "\t"]
    -   driverMemory: 1GB
    -   executorCount: 4
    -   executorMemory: 1GB
    -   logDir: .
    -   normalcaseThreshold: 0.9
    -   partitionSize: 1MB
    -   runTimeLibrary: tuplex_runtime
    -   runTimeMemory: 32MB
    -   runTimeMemoryBlockSize: 4MB
    -   scratchDir: /tmp
    -   useLLVMOptimizer: true

Note

The same yaml file passed to configure Tuplex can be also used to store application specific configuration details. Tuplex only requires keys to start with tuplex in the yaml file.