Features roles and types

Roles

A feature’s role determines how it’s used during machine learning.

  • Reject means that the feature is not used

  • Input means that the feature is used to build a model, either as a potential predictor for a target or for clustering

  • Use for display only means that the feature is not used to build a model, but is used to label model output. This role is currently only used by cluster models.

Variable type

A feature’s variable type determines the feature handling options during machine learning.

  • Categorical variables take one of an enumerated list values. The goal of categorical feature handling is to encode the values of a categorical variable so that they can be treated as numeric.

  • Numerical variables take values that can be added, subtracted, multiplied, and so on. There are times when it may be useful to treat a numerical variable with a limited number of values as categorical.

  • Text variables are arbitrary blocks of text. If a text variable takes a limited number of values, it may be useful to treat it as categorical.

  • Vector variables are arrays of numerical values, of the same length.

  • Image variables are available for Deep learning. See Using image features for more information.

Note

For MLflow Models, string and boolean features will be considered Categorical.