Model architecture

When creating a Deep Learning model, you need to write the Architecture of the Neural Network. To do so, you must fill two Python functions defined in the “Architecture” tab of the settings.

../../_images/architecture-tab.png

Build Keras model

The build_model function needs to return an instance of the Keras Model class.

This function takes two parameters:

  • input_shapes is a dictionary of the shapes of the input tensors
  • n_classes is the number of classes to predict (for classification models only)

input_shapes

Advanced models can have multiple inputs (see Multiple inputs for more information). Each input has a name.

The input_shapes is a dictionary indexed by input name. Indeed, in most cases, the shape of the inputs is not known before the preprocessing, which will create a certain numbers of columns (when doing dummification, vectorization …), so they are provided to you to build your model.

If you haven’t used multi-input features, you only have a main input. Thus, to know the shape of your input tensor, simply use input_shapes["main"]

For example,

input_main = Input(shape=input_shapes["main"])
x = Dense(64, activation="relu")(input_main)
...

n_classes

n_classes, for multiclass classification problems, you may not know the number of target classes

Layer dimensions

You need to be careful with the dimension of the last layer of you network, as it will condition the output of the model.

  • For regression, the last layer needs to have a dimension equal to 1, and it should not have any activation if the variable to predict does not have particularities
  • For binary classification, the last layer should be either: dimension equal to 1 and sigmoid activation, or dimension equal to 2 and softmax activation
  • For multiclass classification, the last layer should have a dimension equal to the number of target classes, and a softmax activation.

If this is not respected, train will either fail (mismatch in dimension) or give inconsistent results (if the activation is not a proper one, the result may not be a probability distribution).

Compile the model

The compile_model function takes the previously created model as input and must compile it. The reason it is separated from the build_model function is that DSS may need to manipulate the model between creation and compilation, in particular for multi-GPU training.

The compile function usually takes a list of metrics to track during training. In DSS context, it is not necessary to precise them because we already compute the list of metrics found in the “Metrics” tab, and they will be available on Tensorboard afterwards.

The two arguments that needs to be carefully filled are:

  • optimizer, which indicates the method used to optimize the model
  • loss function, which is the function optimized during the training. They are problem dependent (regression, classification) and by default DSS selects one that “works” for the prediction type guessed in the analysis.