Using cgroups for resource control¶
Note
If using Dataiku Cloud Stacks installation, cgroups are automatically managed for you, and you do not need to follow these instructions
DSS can automatically classify a large number of its local subprocesses in Linux cgroups for resource control and limitation.
Using this feature, you can restrict usage of memory, CPU (+ other resources) by most processes. The cgroups integration in DSS is very flexible and allows you to devise multiple resource allocation strategies:
Limiting resources for all processes from all users
Limiting resources by process type (i.e. a resource limit for notebooks, another one for webapps, …)
Limiting resources by user
Limiting resources by project key
cgroups resources limitation is possible, whether User Isolation Framework is enabled or not.
Warning
This requires some understanding of the Linux cgroups mechanism
Warning
cgroups support is only available for Linux and is not available for macOS.
Prerequisites¶
You need to have a Linux machine with cgroups enabled (this is the default on all recent DSS-supported distributions)
The DSS service account needs to have write access to one or several cgroups in which you want the DSS processes to be placed. This normally requires some action to be performed at system boot before DSS startup, and can be handled by the DSS-provided service startup script.
Applicability¶
cgroups restriction applies to:
Python and R recipes
PySpark, SparkR and sparklyr recipes (only applies to the driver part, executors are covered by the cluster manager and Spark-level configuration keys)
Python and R recipes from plugins
Python, R and Scala notebooks (not differentiated, same limits for all 3 types)
In-memory visual machine learning and deep learning (for scikit-learn, computer vision and Keras backends. For MLlib backend, this is covered by the cluster manager and Spark-level configuration keys)
Webapps (Shiny, Bokeh, Dash and Python backend of HTML webapps, not differentiated, same limits for all 4 types)
Interactive statistics
cgroups restrictions do not apply to:
The DSS backend itself. For memory tuning of the backend, see Tuning and controlling memory usage
Execution of jobs with the DSS engine (prepare recipe and others). For memory tuning of the jobs, see Tuning and controlling memory usage
The DSS public API, which runs as part of the backend
Custom Python steps and triggers in scenarios
Note
cgroups do not apply to recipes and machine learning that are using containerized execution See containerized execution documentation for more information about processes and controlling memory usage for containers
Configuration¶
All configuration for cgroups integration is done by the DSS administrator in Administration > Settings > Resource control.
You need to configure:
The global root (mount point) for the cgroups hierarchy on the DSS Linux host. On recent Linux systems this is typically
/sys/fs/cgroup
(on earlier systems e.g. RedHat 6, it might be/cgroup
)For each kind of process, the list of cgroups in which you want it placed, relative to this root. Each entry of the list can refer to some variables for dynamic placement
For each cgroup (which can also refer to some variables), the limits to apply. Refer to Linux cgroups documentation for available limits.
Configuration example 1¶
If you want to implement the following policy (it’s really more of an example, such a policy would be pretty weird):
The total memory for each user (counting notebooks and recipes) may not exceed 3 GB
The memory for the notebooks of each user may not exceed 1 GB
The CPU for all notebooks in aggregate may not exceed 100% (i.e. one core)
In most Linux distributions, the “cpu” and “memory” controllers are mounted in different hierarchies, generally /sys/fs/cgroup/memory
and /sys/fs/cgroup/cpu
You will first need to make sure that you have write access to a cgroup within each of these hierarchies. Let’s say that the DSS user has write access to /sys/fs/cgroup/memory/DSS
and /sys/fs/cgroup/cpu/DSS
Global cgroups configuration¶
Check Enable cgroups support
Configure Hierarchies mount root to
/sys/fs/cgroup
Placements configuration¶
Under cgroups placements, configure the following:
Placement of notebooks¶
Add the following target cgroups to Jupyter kernels (Python, R, Scala):
memory/DSS/${user}/notebooks
cpu/DSS/notebooks
When user u1 starts a notebook, its process will be placed in /sys/fs/cgroup/memory/DSS/u1/notebooks
and /sys/fs/cgroup/cpu/DSS/notebooks
Placement of Python and R recipes¶
Add the following target cgroups to Python + R recipes:
memory/DSS/${user}/recipes
cpu/DSS/recipes
When user u1 starts a Python or R recipe, its processes will be placed in /sys/fs/cgroup/memory/DSS/u1/recipes
and /sys/fs/cgroup/cpu/DSS/recipes
Limits configuration¶
We have placed processes in cgroups, we now need to implement the desired resource limitations.
Under cgroups limits, configure the following:
Global per-user memory restriction¶
Path template:
memory/DSS/${user}
Limits:
memory.limit_in_bytes
:3G
When placing a process in a given cgroup, DSS evaluates all limit configuration rules and applies those which match the target cgroup or one of its parents.
Here, we put all user processes below /sys/fs/cgroup/memory/DSS/${user}
, so the global per-user limit is enforced.
Per-user notebooks memory restriction¶
Path template:
memory/DSS/${user}/notebooks
Limits:
memory.limit_in_bytes
:1G
When placing a process in a given cgroup, DSS evaluates all limit configuration rules and applies those which match the target cgroup or one of its parents.
Here, we put all notebook processes for this user in /sys/fs/cgroup/memory/DSS/${user}/notebooks
, so the notebook-specific limit is enforced in addition to the above per-user limit.
CPU restrictions for all notebooks¶
Path template:
cpu/DSS/notebooks
Limits:
cpu.cfs_period_us
:1000000
cpu.cfs_quota_us
:1000000
Since we placed all notebooks at the same level in the CPU hierarcy, we can limit it directly.
Configuration example 2¶
To implement the following policy:
The total memory by all notebooks across the system may not exceed 25 GB (to protect other critical resources)
In most recent Linux distributions, the “memory” controller is mounted at /sys/fs/cgroup/memory
.
You will first need to make sure that you have write access to a cgroup within this hierarchy. Let’s say that the DSS user has write access to /sys/fs/cgroup/memory/DSS
Global cgroups configuration¶
Check Enable cgroups support
Configure Hierarchies mount root to
/sys/fs/cgroup
Placements configuration¶
Under cgroups placements, configure the following:
Placement of notebooks¶
Add the following target cgroup to Jupyter kernels (Python, R, Scala):
memory/DSS/notebooks/${user}
When user u1 starts a notebook, its process will be placed in /sys/fs/cgroup/memory/DSS/notebooks/u1
Limits configuration¶
We have placed processes in cgroups, we now need to implement the desired resource limitations.
Under cgroups limits, configure the following:
Global notebooks memory restriction¶
Path template:
memory/DSS/notebooks
Limits:
memory.limit_in_bytes
:25G
When placing a process in a given cgroup, DSS evaluates all limit configuration rules and applies those which match the target cgroup or one of its parents.
Here, you may have noticed that we have actually placed the notebooks in /sys/fs/cgroup/memory/DSS/notebooks/${user}
but restricted /sys/fs/cgroup/memory/DSS/notebooks
(effectively limiting the cumulative memory consumption of notebooks for all users). We could have simply placed all notebooks in the same cgroup as the one we’re limiting.
However, creating one cgroup for all notebooks of each user allows for better accounting: cgroups can be used to implement limitations, but each cgroup also contains accounting files that allows us to know how much memory the notebooks of each user are consuming, all while respecting the global limit.
Creating DSS-specific cgroups at boot time¶
DSS requires write access to those subdirectories of the global cgroup hierarchies for which you have configured placement or resource limitation rules.
As by default the cgroup hierarchy is only writable by root, you will need to create these subdirectories, and change their permissions accordingly, before DSS can use them. Moreover, these subdirectories would not persist across system reboots, so you would typically configure a boot-time action for this, to be run before DSS startup.
For example, assuming that:
the global cgroup root on your system is
/sys/fs/cgroup
you have configured a rule placing some processes into
memory/DSS
the DSS service account is
dataiku
you would need to issue the following commands as root:
mkdir -p /sys/fs/cgroup/memory/DSS
chown -Rh dataiku /sys/fs/cgroup/memory/DSS
Note
To avoid conflicts with other parts of the system which manage cgroups (eg systemd, docker) it is advised to configure dedicated subdirectories within the cgroup hierarchies for DSS use. These subdirectories may be located at top-level of their respective controller hierarchies (cpu, memory, etc…) or be subdirectories of systemd-managed directories.
The DSS-provided service management script which can be installed as described here can optionally create cgroup directories for DSS before starting DSS itself.
To configure this, edit the service configuration file (at /etc/default/dataiku[.INSTANCE_NAME]
or /etc/sysconfig/dataiku[.INSTANCE_NAME]
) and add the following variable definitions:
DIP_CGROUP_ROOT
[optional] : global hierarchies mount root for your system (default/sys/fs/cgroup
)DIP_CGROUPS
: colon-separated list of cgroup directories to create, relative to the global hierarchies mount root
Example:
# Service configuration file for Dataiku DSS instance dataiku
DIP_HOME="/data/dataiku/dss_datadir"
DIP_USER="dataiku"
# Create /sys/fs/cgroup/cpu/DSS and /sys/fs/cgroup/memory/DSS on startup
DIP_CGROUPS="cpu/DSS:memory/DSS"
Additional setup for User Isolation Framework deployments¶
When DSS is configured with User Isolation Framework enabled,
the cgroup hierarchies which are under control of DSS should be added to the additional_allowed_file_dirs
configuration key under section [dirs]
of the /etc/dataiku-security/INSTALL_ID/security-config.ini
configuration file (you can find the INSTALL_ID
in DATADIR/install.ini
).
Example:
[dirs]
dss_datadir = /data/dataiku/dss_datadir
additional_allowed_file_dirs = /sys/fs/cgroup/cpu/DSS;/sys/fs/cgroup/memory/DSS