Genetic Algorithm for Feature Generation and Selection

This capability provides recipes to create and select features with a genetic algorithm based on the DEAP framework. It is provided by the “Feature generation and selection with genetic algorithms” plugin, which you need to install. Please see Installing plugins.

Overview

Genetic Algorithms are inspired by the concepts of evolution through natural selection. They are often used in high dimensional spaces where grid / random search would be prohibitive.

Genetic Algorithms encode the space to explore with genes and proceed by generations. For each generation: - individuals forming the current population are evaluated (fitness) - the best individuals are chosen to mix their genes together (crossover) - independent random changes are performed (mutation).

This plugin deals with feature creation and selection, powered by genetic algorithms. Starting from a dataset with features and a target, it will automatically select among features both from the dataset and their combinations (product, sum and differences). In this setting, an individual is represented by a boolean array with a value for every feature (originals and combinations) indicating whether it is selected or not as an input for the model to train.

Usage - Fit Transform Recipe

Parameters

  • Target is the target of the machine learning prediction task.

  • Population is the number of individuals considered at the first generation.

  • Crossover probability is the probability that a random exchange of genes will happen for two individuals selected to form the next generation.

  • Mutation probability is the probability that a random flip occurs for each feature of an individual (from active to inactive or vice versa).

  • Number of generations is the number of iterations of the evolutionary process (fitness, crossover and mutation) over the whole population.

Input

  • A dataset of features from which we want to select and create new features. Features should be numerical, and contain no missing values.

Outputs

  • A dataset of selected and created features.

  • A folder containing the json of information

Usage - Transform Recipe

Input

  • A dataset of features for which we want to apply the selection/creation pipeline.

  • A folder output of a previous fit-transform recipe that contains the information json.

Outputs

  • A dataset of selected and created features.