Concepts

Interaction between DSS and containers

There are three ways in which DSS can work with container engines.

  1. Parts of the processing tasks of the DSS design and automation nodes can run on one or several hosts, powered by Docker or Kubernetes. This section of the documentation, Running in containers, covers this scenario.
  2. The DSS API node can run as multiple containers orchestrated by Kubernetes. For details of this scenario, please see API Node & API Deployer: Real-time APIs.
  3. You can run the entirety of a DSS design or automation node as a Docker container. For details on this scenario, please see Running DSS as a Docker container.

Note

Running DSS itself as a container (either by running Docker directly, or through Kubernetes) is generally speaking incompatible with the ability to leverage containers as a processing engine

Capabilities and benefits

DSS can run the following kinds of processing as containers:

  • Python and R recipes
  • Plugin-provided recipes (if they are written in Python or R)
  • Initial training of in-memory machine learning models (when using the “in-memory” engine. See In-memory Python (Scikit-learn / XGBoost))
  • Retraining of in-memory machine learning models
  • Scoring of in-memory machine learning models when NOT using the “Optimized engine” (See Scoring engines). The optimized engine can run on Spark.

DSS can run these containers either:

  • directly against one or several Docker daemons
  • by scheduling them in DSS

Running DSS processing in containers gives several key advantages:

  • Ability to scale processing of “local” code beyond the single DSS design/automation node machine. This is especially true when using Kubernetes
  • Ability to leverage processing nodes that may have different computing capabilities. This notably allows you to leverage remote machines that provide GPUs even though the DSS machine itself doesn’t. This is especially useful for deep learning (See Deep Learning)
  • Ability to restrict the used resources (cpu, memory, …) either per container or globally by using the resource management capabilities of Kubernetes. If using Docker directly, you can specify those restrictions per container in DSS.

Warning

The base image for containers only has the basic python packages for DSS, and does not have packages that were manually added to DSS’ built-in python environment. If you need those packages, use a code environment (recommended) or a custom base image.

Docker or Kubernetes?

DSS can run its workloads either directly using Docker, or through Kubernetes.

A Docker-only setup is much easier to setup, as any recent OS comes will full Docker execution capabilities. However, Docker itself is mono-machine. While DSS can leverage multiple Docker daemons (see below), each workload must explicitely target a single machine.

With Docker, you can manage the resources used by each container, but cannot globally restrict resources used by the sum of all containers (or all containers of a user)

A Kubernetes setup offers much more flexibility:

  • Native ability to run on a cluster of machines. Kubernetes automatically places containers on machines depending on resources availability.
  • Ability to globally control resource usage.
  • Managed cloud Kubernetes services can have auto-scaling capabilities.

While setting up a Kubernetes cluster can be a challenging task, all large cloud providers offer managed Kubernetes services.

Container execution configurations

Each activity (recipe, machine learning, …) that you run on containers targets a specific “Container execution configuration”.

Docker

Docker execution configurations indicate:

  • Which base image to use (see below)
  • The host of the Docker daemon (by default, runs on the local Docker daemon)
  • Resource restriction keys (as specified by Docker)
  • Permissions: it is possible to restrict which user groups have the right to use a specific Docker execution configuration
  • Optionally, the image registry URL
  • Optionally, the Docker “runtime” (this is used for advanced use cases like GPUs)

Kubernetes

Kubernetes execution configurations indicate:

  • Which base image to use (see below)
  • The “context” for the kubectl command. This allows you to target multiple Kubernetes clusters or to use multiple set of Kubernetes credentials
  • Resource restriction keys (as specified by Kubernetes)
  • The Kubernetes resource namespace for overall resource quota
  • The image registry URL
  • Permissions: it is possible to restrict which user groups have the right to use a specific Kubernetes execution configuration

Multiple execution configurations

Since each execution configuration specifies resource restrictions, you can use multiple ones to provide differentiated container sizes and quotas to users.

Base images

DSS uses one or multiple Docker images that must be built prior to running any workload.

In the vast majority of cases, you’ll only have a single Docker base image that will be used for all container-based executions. It is possible to setup at build time whether you want your image to have:

  • R support
  • CUDA support for execution on GPUs

For advanced use cases, you can build multiple base images. This can be used for example:

  • By having one base image with CUDA support, and one without
  • If you require additional base system packages

Support of code environments

Docker/Kubernetes execution capabilities are fully compatible with multiple code environments. You simply need to indicate for which Container execution configuration(s) your code environment must be made available.

See Using code envs with container execution