Runtime and GPU support¶
The training/scoring of Keras models can be run a CPU or GPU(s). Training on GPUs is much faster, especially when images are involved.
Deep Learning uses particular Python libraries (such as Keras and TensorFlow) that are not shipped with the DSS built-in Python environment. The use of a specific code environment is required.
Prior to running your first deep learning model, you must create a code environment with the required packages. See Code environments for more information about code environments.
To help you, you can simply click “Add additional packages” in the “Packages to install” section of the code environment.
There are several lists of packages to install, depending on the set-up of the instance where DSS is installed.
If you have GPUs, and your machine has a fully working CUDA + CUDNN installation (see below), you can select the appropriate GPU packages set.
Once the proper environment is set-up, you can create a Deep Learning model. DSS will look for an environment that has the required packages and select it by default. It first looks for a GPU code environment and if it does not find one, look for a CPU code environment. If it does not find one, a Warning will be displayed. You will need to create a proper code env before being able to train.
You can select a different code environment at your own risk.
Selection of GPU¶
If the DSS instance has access to a GPU, you can choose to train the model on one or more GPUs when you click on Train
When you deploy a scoring or evaluation recipe, you can also choose to score or evaluate using GPU(s), by going to the “Advanced” tab of the recipe.
If a model trained on a GPU code environment is deployed as a service endpoint on an API node, the endpoint will require access to a GPU on the API node, and will automatically use GPU resources.
For the time being, we enforce the model to “allow growth”, i.e. to only use the required memory (see TensorFlow documentation).
Using multi GPUs for training¶
If you have access to GPU(s), either on your DSS instance server, or on available containers, you can train your model on them. You do not need to change the code of your architecture in this case, and DSS will handle, thanks to Keras and TensorFlow, the appropriate usage of GPU(s).
If available, it is possible to train a model on multiple GPUs. In that case, DSS will use those GPUs to speed up the training. It will put a copy of the architecture on each GPU, then split each batch equally between GPUs and send them the data to compute the value of the gradient. Then, it will gather the results to update the model and send the new weights to each GPU, and so on. This is made possible thanks to the multi_gpu_model of Keras.
This means that on each GPU, the actual batch_size will be batch_size / n_gpus. Therefore the user should use a batch_size that is a multiple of the number of GPUs.
According to Keras documentation,
This induces quasi-linear speedup on up to 8 GPUs.
However, to compare the speed of two trainings, you should always compare trainings with the same per GPU batch_size, i.e. if the first training is run on a GPU with a batch_size of 32, and the second on two GPUs, the batch_size should be 64.