Design of the preparation¶
The design of a data preparation is always done on an in-memory sample of the data. See Sampling for more information.
Execution in analysis¶
When in an analysis, execution on the whole dataset happens when:
- Exporting the prepared data
- Running a machine learning model
In both cases, this uses a streaming engine: all data goes through the DSS server but does not need to be in memory.
Execution of the recipe¶
For execution of the recipe, DSS provides three execution engines:
All data goes through the DSS server but does not need to be in memory.
When both the input and output datasets of a Data Preparation recipe are supported HDFS datasets, the data preparation recipe can run fully on Hadoop, as a MapReduce job.
To enable this behavior, go to the
Settings / Build tab of the data preparation recipe and check “Run on Hadoop”. You do not need to fill the “Split size” parameter.