Operations (Python)

Note

You need to have specific permissions to create, modify and use code environments. If you do not have these permissions, contact your DSS administrator.

Create a code environment

  • Go to Administration > Code envs

  • Click on “New Python env”

  • Give an identifier to your code environment. Only use A-Z, a-z, digits and hyphens

Note

Code environment identifiers must be globally unique to the DSS instance, so use a complete and descriptive identifier

  • Choose the Python version that you want to use. DSS is compatible with Python versions 2.7, 3.6, 3.7, 3.8 and 3.9.

Note

  • The requested version of Python must be installed on your system (by your system administrator)

  • In most cases, you also need the Python development headers packages in order to install packages with pip. Depending on the OS, this system package (to be installed by the system administrator) is called “libpython-dev” or “python-devel”

  • Click on “Create”

  • DSS creates the code environment and installs the minimal set of packages

Note

To use Visual Machine Learning, Visual Deep Learning or Time series forecasting, additional packages are required. They can be added in the “Packages to install” tab after code env creation by clicking “Add sets of packages”.

Warning

  • Visual Machine Learning and Visual Deep Learning code envs are compatible with Python 2.7, 3.6 and 3.7.

  • Time series code envs are compatible with Python 3.6 and 3.7.

  • You are taken to the new environment page

Manage packages

You can manage the list of packages to install by clicking on the “Packages to install” tab.

You see here two lists:

  • A non-editable list of the “Base Packages”. These are packages that are required by your current settings. These packages cannot be removed, and you cannot modify their version. For more information, see Base packages

  • An editable list of “Requested Packages”. This is where you write the list of packages that you want in your virtual environment. To quickly add the required packages for visual machine learning and deep learning on CPU or GPU, click on “Add Sets of Packages” and make your selections. The required packages will be added to the Requested Packages list.

The list of requested packages is in the requirements.txt file format (see the documentation about the format of requirements.txt). Each line must be a package name, optionally with constraints information.

For example:

  • tabulate

  • sklearn==0.18.2

  • sklearn>0.19

Once you have written the packages you want, click on Save and update. DSS downloads and installs the newly required packages

Afterwards, you can inspect the exact installed versions in the “Actually installed packages” tab.

Installing packages not available through pip

Some packages aren’t directly available from pip and need to be installed from the source code. To install such a package in a code environment, you should:

  • download the source code of the package on the DSS server

  • in the “Packages to install” section of your code environment, fill the “Requested packages” field with:

    • /path/to/package/source.zip for zipped or gzipped packages

    • -e /path/to/package/source for unzipped packages where source is a directory that contains a setup.py file

  • click on “Save and update”.

Warning

  • This operation is not possible for a combined use with containerized execution and model API deployment on Kubernetes.

  • For automation/API nodes, the package must exist at the same path on the server.

Managed code environment resources directory

The resources directory allows you to download/upload resources to a directory managed by the code environment, and set environment variables that will be loaded at runtime. This makes those resources available to all the recipes, notebooks, etc. that use this code environment.

Note

The typical use case is to download heavy pre-trained deep learning models to the resources directory, by settings the framework’s cache directory environment variable (e.g. TFHUB_CACHE_DIR for TensorFlow, TORCH_HOME for PyTorch, HF_HOME for Hugging Face etc).

Manage the code environment resources directory in the “Resources” tab:

  • Write the resources initialization script (code samples for common deep learning frameworks are available). This script is executed when updating the code environment, if the “Run resources init script” option is active.

  • Choose whether the resources initialization script will be executed or not when building this code environment on an API node.

  • View the environment variables to load at runtime (set by the initialization script).

  • Explore the resources directory.

Warning

  • Code environment resources require the core packages to be installed.

  • Code environment resources are not supported on external conda code environments.

  • Code environment resources are not made available to Spark executors.

Using custom package repositories

On the Design or Automation Nodes, custom repositories can be set via GUI by defining extra “options for ‘pip install’” at Admin > Settings > Misc. under the “Code Envs” section. Each option must be added on a separate line.

For example:

  • --index-url=http://custom.pip.repo

  • --extra-index-url=http://custom.pip.repo/sample

  • --trusted-host=custom.pip.repo

On the API node, custom repositories can be set by editing the config/server.json file. The codeEnvsSettings field contains pipInstallExtraOptions where you can set required options.

For example:

"codeEnvsSettings": {
    "preventOverrideFromImportedEnvs": true,
    "condaInstallExtraOptions": [],
    "condaCreateExtraOptions": [],
    "pipInstallExtraOptions": [
        "--index-url", "http://custom.pip.repo",
        "--extra-index-url", "http://custom.pip.repo/sample",
        "--trusted-host", "custom.pip.repo"
    ],
    "virtualenvCreateExtraOptions": [],
    "cranMirrorURL": "https://your.cran.mirror"
}

Containerized execution

When running DSS processes in containers, you can specify which containers should include this code environment.

Code Environments Management On Dataiku Cloud

On Dataiku Cloud the management of Python environements is accessible in the Launchpad in the Code Environments tab.

For more details see see the dedicated documentation.