Using cgroups for resource control

Note

If using Dataiku Cloud Stacks installation, cgroups are automatically managed for you, and you do not need to follow these instructions

DSS can automatically classify a large number of its local subprocesses in Linux cgroups for resource control and limitation.

Using this feature, you can restrict usage of memory, CPU (+ other resources) by most processes. The cgroups integration in DSS is very flexible and allows you to devise multiple resource allocation strategies:

  • Limiting resources for all processes from all users

  • Limiting resources by process type (i.e. a resource limit for notebooks, another one for webapps, …)

  • Limiting resources by user

  • Limiting resources by project key

cgroups resources limitation is possible, whether User Isolation Framework is enabled or not.

Warning

This requires some understanding of the Linux cgroups mechanism

Warning

cgroups support is only available for Linux and is not available for macOS.

Prerequisites

  • You need to have a Linux machine with cgroups enabled (this is the default on all recent DSS-supported distributions)

  • The DSS service account needs to have write access to one or several cgroups in which you want the DSS processes to be placed. This normally requires some action to be performed at system boot before DSS startup, and can be handled by the DSS-provided service startup script.

Applicability

cgroups restriction applies to:

  • Python and R recipes

  • PySpark, SparkR and sparklyr recipes (only applies to the driver part, executors are covered by the cluster manager and Spark-level configuration keys)

  • Python and R recipes from plugins

  • Python, R and Scala notebooks (not differentiated, same limits for all 3 types)

  • In-memory visual machine learning and deep learning (for scikit-learn, computer vision and Keras backends. For MLlib backend, this is covered by the cluster manager and Spark-level configuration keys)

  • Webapps (Shiny, Bokeh, Dash and Python backend of HTML webapps, not differentiated, same limits for all 4 types)

  • Interactive statistics

  • Statistics recipes (for univariate analysis, PCA and statistical test recipes)

cgroups restrictions do not apply to:

Note

cgroups do not apply to recipes and machine learning that are using containerized execution See containerized execution documentation for more information about processes and controlling memory usage for containers

Configuration

All configuration for cgroups integration is done by the DSS administrator in Administration > Settings > Resource control.

You need to configure:

  • The global root (mount point) for the cgroups hierarchy on the DSS Linux host. On recent Linux systems this is typically /sys/fs/cgroup (on earlier systems e.g. RedHat 6, it might be /cgroup)

  • For each kind of process, the list of cgroups in which you want it placed, relative to this root. Each entry of the list can refer to some variables for dynamic placement

  • For each cgroup (which can also refer to some variables), the limits to apply. Refer to Linux cgroups documentation for available limits.

Configuration example 1

If you want to implement the following policy (it’s really more of an example, such a policy would be pretty weird):

  • The total memory for each user (counting notebooks and recipes) may not exceed 3 GB

  • The memory for the notebooks of each user may not exceed 1 GB

  • The CPU for all notebooks in aggregate may not exceed 100% (i.e. one core)

In most Linux distributions, the “cpu” and “memory” controllers are mounted in different hierarchies, generally /sys/fs/cgroup/memory and /sys/fs/cgroup/cpu

You will first need to make sure that you have write access to a cgroup within each of these hierarchies. Let’s say that the DSS user has write access to /sys/fs/cgroup/memory/DSS and /sys/fs/cgroup/cpu/DSS

Global cgroups configuration

  • Check Enable cgroups support

  • Configure Hierarchies mount root to /sys/fs/cgroup

Placements configuration

Under cgroups placements, configure the following:

Placement of notebooks

Add the following target cgroups to Jupyter kernels (Python, R, Scala):

  • memory/DSS/${user}/notebooks

  • cpu/DSS/notebooks

When user u1 starts a notebook, its process will be placed in /sys/fs/cgroup/memory/DSS/u1/notebooks and /sys/fs/cgroup/cpu/DSS/notebooks

Placement of Python and R recipes

Add the following target cgroups to Python + R recipes:

  • memory/DSS/${user}/recipes

  • cpu/DSS/recipes

When user u1 starts a Python or R recipe, its processes will be placed in /sys/fs/cgroup/memory/DSS/u1/recipes and /sys/fs/cgroup/cpu/DSS/recipes

Limits configuration

We have placed processes in cgroups, we now need to implement the desired resource limitations.

Under cgroups limits, configure the following:

Global per-user memory restriction

  • Path template: memory/DSS/${user}

  • Limits:

    • memory.limit_in_bytes : 3G

When placing a process in a given cgroup, DSS evaluates all limit configuration rules and applies those which match the target cgroup or one of its parents.

Here, we put all user processes below /sys/fs/cgroup/memory/DSS/${user}, so the global per-user limit is enforced.

Per-user notebooks memory restriction

  • Path template: memory/DSS/${user}/notebooks

  • Limits:

    • memory.limit_in_bytes : 1G

When placing a process in a given cgroup, DSS evaluates all limit configuration rules and applies those which match the target cgroup or one of its parents.

Here, we put all notebook processes for this user in /sys/fs/cgroup/memory/DSS/${user}/notebooks, so the notebook-specific limit is enforced in addition to the above per-user limit.

CPU restrictions for all notebooks

  • Path template: cpu/DSS/notebooks

  • Limits:

    • cpu.cfs_period_us : 1000000

    • cpu.cfs_quota_us : 1000000

Since we placed all notebooks at the same level in the CPU hierarcy, we can limit it directly.

Configuration example 2

To implement the following policy:

  • The total memory by all notebooks across the system may not exceed 25 GB (to protect other critical resources)

In most recent Linux distributions, the “memory” controller is mounted at /sys/fs/cgroup/memory.

You will first need to make sure that you have write access to a cgroup within this hierarchy. Let’s say that the DSS user has write access to /sys/fs/cgroup/memory/DSS

Global cgroups configuration

  • Check Enable cgroups support

  • Configure Hierarchies mount root to /sys/fs/cgroup

Placements configuration

Under cgroups placements, configure the following:

Placement of notebooks

Add the following target cgroup to Jupyter kernels (Python, R, Scala):

  • memory/DSS/notebooks/${user}

When user u1 starts a notebook, its process will be placed in /sys/fs/cgroup/memory/DSS/notebooks/u1

Limits configuration

We have placed processes in cgroups, we now need to implement the desired resource limitations.

Under cgroups limits, configure the following:

Global notebooks memory restriction

  • Path template: memory/DSS/notebooks

  • Limits:

    • memory.limit_in_bytes : 25G

When placing a process in a given cgroup, DSS evaluates all limit configuration rules and applies those which match the target cgroup or one of its parents.

Here, you may have noticed that we have actually placed the notebooks in /sys/fs/cgroup/memory/DSS/notebooks/${user} but restricted /sys/fs/cgroup/memory/DSS/notebooks (effectively limiting the cumulative memory consumption of notebooks for all users). We could have simply placed all notebooks in the same cgroup as the one we’re limiting.

However, creating one cgroup for all notebooks of each user allows for better accounting: cgroups can be used to implement limitations, but each cgroup also contains accounting files that allows us to know how much memory the notebooks of each user are consuming, all while respecting the global limit.

Creating DSS-specific cgroups at boot time

DSS requires write access to those subdirectories of the global cgroup hierarchies for which you have configured placement or resource limitation rules.

As by default the cgroup hierarchy is only writable by root, you will need to create these subdirectories, and change their permissions accordingly, before DSS can use them. Moreover, these subdirectories would not persist across system reboots, so you would typically configure a boot-time action for this, to be run before DSS startup.

For example, assuming that:

  • the global cgroup root on your system is /sys/fs/cgroup

  • you have configured a rule placing some processes into memory/DSS

  • the DSS service account is dataiku

you would need to issue the following commands as root:

mkdir -p /sys/fs/cgroup/memory/DSS
chown -Rh dataiku /sys/fs/cgroup/memory/DSS

Note

To avoid conflicts with other parts of the system which manage cgroups (eg systemd, docker) it is advised to configure dedicated subdirectories within the cgroup hierarchies for DSS use. These subdirectories may be located at top-level of their respective controller hierarchies (cpu, memory, etc…) or be subdirectories of systemd-managed directories.

The DSS-provided service management script which can be installed as described here can optionally create cgroup directories for DSS before starting DSS itself. To configure this, edit the service configuration file (at /etc/default/dataiku[.INSTANCE_NAME] or /etc/sysconfig/dataiku[.INSTANCE_NAME]) and add the following variable definitions:

  • DIP_CGROUP_ROOT [optional] : global hierarchies mount root for your system (default /sys/fs/cgroup)

  • DIP_CGROUPS : colon-separated list of cgroup directories to create, relative to the global hierarchies mount root

Example:

# Service configuration file for Dataiku DSS instance dataiku
DIP_HOME="/data/dataiku/dss_datadir"
DIP_USER="dataiku"

# Create /sys/fs/cgroup/cpu/DSS and /sys/fs/cgroup/memory/DSS on startup
DIP_CGROUPS="cpu/DSS:memory/DSS"

Additional setup for User Isolation Framework deployments

When DSS is configured with User Isolation Framework enabled, the cgroup hierarchies which are under control of DSS should be added to the additional_allowed_file_dirs configuration key under section [dirs] of the /etc/dataiku-security/INSTALL_ID/security-config.ini configuration file (you can find the INSTALL_ID in DATADIR/install.ini).

Example:

[dirs]
dss_datadir = /data/dataiku/dss_datadir
additional_allowed_file_dirs = /sys/fs/cgroup/cpu/DSS;/sys/fs/cgroup/memory/DSS