Interaction between DSS and containers

There are three parts to the interaction of DSS and container engines:

Running DSS as a container

This is about running the entirety of a DSS design or automation node as a Docker container


This section of the documentation is not about running DSS itself as a container. Please see /installation/docker


Running DSS itself as a container (either by running Docker directly, or through Kubernetes) is generally speaking incompatible with the ability to leverage containers as a processing engine

Running scalable API nodes as containers in Kubernetes

The DSS API node can run as multiple containers orchestrated by Kubernetes.


This section of the documentation is not about running the API node in Kubernetes. Please see API Node & API Deployer: Real-time APIs

Containers as a processing engine

This is about running parts of the processing tasks of the DSS design and automation nodes on one or several hosts, powered by Docker or Kubernetes


This is what this section of the documentation is about

Capabilities and benefits

DSS can run the following kinds of processing as containers:

  • Python and R recipes
  • Plugin-provided recipes (if they are written in Python or R)
  • Initial training of in-memory machine learning models (when using the “in-memory” engine. See In-memory Python (Scikit-learn / XGBoost))
  • Retraining of in-memory machine learning models
  • Scoring of in-memory machine learning models when NOT using the “Optimized engine” (See Scoring engines). The optimized engine runs on Spark

DSS can run these containers either:

  • directly against one or several Docker daemons:
  • by scheduling them in DSS

Running DSS processing in containers gives several key advantages:

  • Ability to scale processing of “local” code beyond the single DSS design/automation node machine. This is especially true when using Kubernetes
  • Ability to leverage processing nodes that may have different computing capabilities. This notably allows you to leverage remote machines that provide GPUs even though the DSS machine itself doesn’t. This is especially useful for deep learning (See Deep Learning)
  • Ability to restrict the used resources (cpu, memory, …) either per container or globally by using the resource management capabilities of Kubernetes. If using Docker directly

Docker or Kubernetes?

DSS can run its workloads either directly using Docker, or through Kubernetes.

A Docker-only setup is much easier to setup, as any recent OS comes will full Docker execution capabilities. However, Docker itself is mono-machine. While DSS can leverage multiple Docker daemons (see below), each workload must explicitely target a single machine.

With Docker, you can manage the resources used by each container, but cannot globally restrict resources used by the sum of all containers (or all containers of a user)

A Kubernetes setup offers much more flexibility:

  • Native ability to run on a cluster of machines. Kubernetes automatically places containers on machines depending on resources availability
  • Ability to globally control resource usage

While setting up a Kubernetes cluster can be a challenging task, all large cloud providers offer managed Kubernetes services.

Container execution configurations

Each activity (recipe, machine learning, …) that you run on containers targets a specific “Container execution configuration”.


Docker execution configurations indicate:

  • Which base image to use (see below)
  • The host of the Docker daemon (by default, runs on the local Docker daemon)
  • Resource restriction keys (as specified by Docker)
  • Permissions: it is possible to restrict which user groups have the right to use a specific Docker execution configuration
  • Optionally, the image registry URL
  • Optionally, the Docker “runtime” (this is used for advanced use cases like GPUs)


Kubernetes execution configurations indicate:

  • Which base image to use (see below)
  • The “context” for the kubectl command. This allows you to target multiple Kubernetes clusters or to use multiple set of Kubernetes credentials
  • Resource restriction keys (as specified by Kubernetes)
  • The Kubernetes resource namespace for overall resource quota
  • The image registry URL
  • Permissions: it is possible to restrict which user groups have the right to use a specific Kubernetes execution configuration

Multiple execution configurations

Since each execution configuration specifies resource restrictions, you can use multiple ones to provide differentiated container sizes and quotas to users.

Base images

DSS uses one or multiple Docker images that must be built prior to running any workload.

In the vast majority of cases, you’ll only have a single Docker base image that will be used for all container-based executions. It is possible to setup at build time whether you want your image to have:

  • R support
  • CUDA support for execution on GPUs

For advanced use cases, you can build multiple base images. This can be used for example:

  • By having one base image with CUDA support, and one without
  • If you require additional base system packages

Support of code environments

Docker/Kubernetes execution capabilities are fully compatible with multiple code environments. You simply need to indicate for which Container execution configuration(s) your code environment must be made available.

See Using code envs with container execution