Python integration

DSS comes with native Python integration. The DSS installation phase creates an initial “builtin” Python environment, which is used to run all Python-based internal DSS operations, and is also used as a default environment to run user-provided Python code.

This builtin Python environment comes with a default set of packages, suitable for this version of DSS. These are setup by the DSS installer and updated accordingly on DSS upgrades. The builtin environment may be based on Python version 2.7, 3.6 or 3.7, to be chosen at installation time. See Customizing setup of the builtin Python environment.

Warning

Python 2.7 support is deprecated

In addition to this builtin environment, DSS can dynamically build and manage multiple additional Python environments, to run user-provided Python code. These can be built with different versions of Python, and different sets of installed packages. See Code environments.

Installing Python packages

Warning

Installingn Python packages in the builtin environmnet is highly discouraged. It can lead to package dependency conflicts with the mandatory set of packages provided by the DSS installer, and may complexify later DSS upgrades.

Managed code environments must be preferred. See Installing Python packages for details.

Customizing setup of the builtin Python environment

Warning

It is very rare to need to customize this. We strongly recommend that you only do it under instructions from Dataiku Support

Rebuilding the builtin Python environment

It is possible to rebuild the builtin Python virtual environment, if necessary. This is the case if you moved or renamed DSS’s data directory, as Python virtual environments embed their full directory name. This may be also be the case if you want to reset the virtualenv to a pristine state following installation / desinstallation of additional packages.

The builtin Python virtualenv is automatically created by the installer when it is not present. The sequence of operations to reinitialize it thus consists in removing the virtualenv and reinstalling DSS.

Warning

Doing this will automatically recreate the Python environment with Python 3.6 or Python 3.7 depending on your system. If your previous builtin environment was running Python 2.7, user code still using the builtin environment may need to be changed, and ML models trained with the builtin environment may need to be retrained

# Stop DSS
DATA_DIR/bin/dss stop

# Remove the virtualenv, keeping backup
mv DATA_DIR/pyenv DATA_DIR/pyenv.backup

# Reinstall DSS (upgrade mode), choosing the underlying base Python to use
dataiku-dss-VERSION/installer.sh -d DATA_DIR -u

# Start DSS
DATA_DIR/bin/dss start

# When everything is considered stable, remove the backup
rm -rf DATA_DIR/pyenv.backup

Advanced: using a fully custom Python environment

Warning

It is very rare to need to customize this. We strongly recommend that you only do it under instructions from Dataiku Support

For non-standard needs, you can force DSS to use an externally-maintained Python installation by defining the DKUPYTHONBIN environment variable;

When this variable is defined, the precompiled third-party Python packages shipped with DSS are not used. You must make sure that the interpreter started by $DKUPYTHONBIN contains all packages required by DSS. Please refer to the script INSTALL_DIR/scripts/install/install-python-packages.sh, found in the DSS installation directory, for this purpose.