ConfigurationΒΆ
Tuplex provides several mechanism to easily configure the framework. To quickly try out different settings,
you can directly manipulate the Context
object. For an overview over available configuration keys, consult the API of the Context
object directly.
This can be either done by passing a dictionary
from tuplex import *
c = Context(conf={"executorMemory" : "2G"})
or using keyword arguments.
from tuplex import *
c = Context(executorMemory="2G")
Furthermore, Tuplex allows to use a yaml file for configuration. If no file is specified, Tuplex looks for a file conf.yaml
in the working directory.
If found, this file will be attempted to be loaded as configuration file.
from tuplex import *
c = Context(conf="/conf/tuplex.yaml")
For any keys the user did not supply values, Tuplex will use its internal defaults. An example config file is
# FastETL configuration file
# created 2019-02-17 16:45:09.940033 UTC
tuplex:
- allowUndefinedBehavior: false
- autoUpcast: false
- csv:
- comments: ["#", "~"]
- generateParser: true
- maxDetectionMemory: 256KB
- maxDetectionRows: 100
- quotechar: "\""
- selectionPushdown: true
- separators: [",", ;, "|", "\t"]
- driverMemory: 1GB
- executorCount: 4
- executorMemory: 1GB
- logDir: .
- normalcaseThreshold: 0.9
- partitionSize: 1MB
- runTimeLibrary: tuplex_runtime
- runTimeMemory: 32MB
- runTimeMemoryBlockSize: 4MB
- scratchDir: /tmp
- useLLVMOptimizer: true
Note
The same yaml file passed to configure Tuplex can be also used to store application specific configuration details. Tuplex only requires keys to start with tuplex
in the yaml file.