Features handling

Note

You can change the settings for feature processing under Models > Settings > Features tab

Most The machine learning engines in DSS visual machine learning can only process numerical features, with no missing values.

DSS allows users to specify pre-processing of variables before model training.

Rescaling numeric variables

Numeric features can be rescaled prior to training, which can improve model performance in some instances. Standard rsecaling scales the feature to a standard deviation of one and a mean of zero. Min-max rescaling sets the minimum values of the feature to zero and the max to one.

Rescale numeric variables if there are large differences between the features.

Encoding categorical variables

Generating Features from Text

Missing values

DSS has facilities for handling missing data prior to model training. First, the user must decide whether to discard rows with missing data.

Avoid discarding rows, unless missing data is extremely rare.

The user must also decide whether to treat “missing” as a regular value. Structurally missing data are those that are impossible to measure, e.g. the US state for an address in Canada. In contrast, randomly missing data are missing due to random noise.

Treat “missing” as a regular value when data is structurally missing. Impute when data is randomly missing.