Troubleshooting

Here is a list of known issues and limitations of the Visual Deep Learning.

Using pre-trained models from Keras

Keras provides [applications](https://keras.io/applications/), which are state-of-the-art architectures trained on millions of data points that can be reused, for example, to do transfer learning.

For example, for the [ResNet 50](https://keras.io/applications/#resnet50) applications, it works as follows:

from keras.applications import ResNet50
model = ResNet50(weights="imagenet")

where weights can be “imagenet” (or None), and it means that they are weights on the architecture trained on Imagenet. It is the most interesting use case because these applications are big networks that require a lot of training data. In that case, it needs to download the weights of the model, which is only possible if you have access to the internet.

If you don’t have access to the internet, the code throws Exception: URL fetch failure.

To work around this issue, you need to have the file containing the weights of your model available locally on the server.

Each application has the possibility to load the model with a top (i.e. last fully connected layers for the classification and the corresponding weights) or without these last layers. It corresponds to the include_top parameter for each application. It means that for each application, there are two possible sets of weights: one for architecture with top and one without.

All weights files are available here

Once you have downloaded the appropriate files, you have two options:

  • manually put them at ~/.keras/models/. In that case, be sure to respect Keras naming for files.

  • put them inside a managed folder and then load them in the model with code like:

    import dataiku
    import os
    from keras.applications import ResNet50
    
    model = ResNet50(weights=None)
    
    weights_mf_path = dataiku.Folder("folder_containing_weights").get_path()
    weights_path = os.path.join(weights_mf_path, "resnet_weights.h5")
    
    model.load_weights(weights_path)
    

Code environment lineage

As for the Python backend, the Keras backend “attaches” the code environment used to train a model for every action performed afterwards. This means that the code environment used a train time will be also used for Scoring/Evaluating/Retraining a model, and also if deployed as an endpoint on the API Node.

In particular, it means that if a model is trained with a code environment that has no access to GPU (without tensorflow-gpu), it will not be possible to use a GPU for the following operations.

On the contrary, if the model is trained with a GPU-aware code environment, it means that all the following operations will have access to GPU(s). In particular:

  • for Retrain recipe, the same options as for the training will be used
  • for the Scoring/Evaluation recipes, you can choose in the Advanced tab whether to use a GPU to run the recipe
  • for the API node, the DSS server needs to have access to a GPU, otherwise the code environment will fail to install. Then, for scoring, the model will be loaded on GPU (not possible to run it on CPU), but will only take the required space (thanks to the allow_growth option of TensorFlow).

TensorFlow session

TensorFlow provides sessions to manage the environment on which the operations are performed. In particular, when we use GPU(s), we can define in theses Sessions how we want to handle the computation and split into our different devices (CPU, GPU #0, GPU #1, …).

In the context of Visual Deep Learning, it is DSS that handles the session, so you should not use it inside its architecture. Otherwise, it could result in unwanted behaviors.

You can “modify” the session in the UI, when selecting the GPU(s). If you modify the “Memory allocation rate per GPU”, it will impact the config.gpu_options.per_process_gpu_memory_fraction of the session.

We also set the allow_soft_placement parameter to True. As per TensorFlow documentation:

If you would like TensorFlow to automatically choose an existing and supported device to run the operations in case the specified one doesn’t exist, you can set allow_soft_placement to True in the configuration option when creating the session.

In the context of DSS, it is useful when using multi GPUs where the model is put to different GPUs then averaged.

ML API

DSS provides an Machine learning API to interact with models created in the visual machine learning part of the software. However, it is currently not available for Deep Learning.

Number of outputs in the model

Keras allows you to build multi output architectures. For example, to perform object detection inside images or videos, you want to be able to predict the type of object and its position in the image.

Currently, the Deep Learning is only available in the Prediction part of the Visual Machine Learning, which can treat various problems: Regression, Binary Classification and Multiclass Classification. All those types of problems require one output. Therefore it is currently not possible to have several outputs in an architecture.

Enforced code environment for Project

DSS allows you to select a preferred code environment for a Project, and even allows you to enforce this code environment to be used in every code recipe/Visual ML. However, it currently conflicts with the process of pre-selecting a code environment when creating a new Deep Learning analysis, that will always override this behaviour by selecting a code environment with the appropriate packages.

If you want to enforce the usage of a code environment in your project, you would need to install the appropriate packages on it for being able to run deep-learning (see Runtime and GPU support). Then, when creating a new analysis, you should go to the Python Environment tab, and select Inherit project default (and ignore the warning that is displayed).