Python integration¶
DSS comes with native Python integration. The DSS installation phase creates an initial “builtin” Python environment, which is used to run all Python-based internal DSS operations, and is also used as a default environment to run user-provided Python code.
This builtin Python environment comes with a default set of packages, suitable for this version of DSS. These are setup by the DSS installer and updated accordingly on DSS upgrades. The builtin environment may be based on Python version 2.7, 3.6 or 3.7, to be chosen at installation time. See Customizing setup of the builtin Python environment.
Warning
Python 2.7 support is deprecated
In addition to this builtin environment, DSS can dynamically build and manage multiple additional Python environments, to run user-provided Python code. These can be built with different versions of Python, and different sets of installed packages. See Code environments.
Installing Python packages¶
Warning
Installingn Python packages in the builtin environmnet is highly discouraged. It can lead to package dependency conflicts with the mandatory set of packages provided by the DSS installer, and may complexify later DSS upgrades.
Managed code environments must be preferred. See Installing Python packages for details.
Customizing setup of the builtin Python environment¶
Warning
It is very rare to need to customize this. We strongly recommend that you only do it under instructions from Dataiku Support
Rebuilding the builtin Python environment¶
It is possible to rebuild the builtin Python virtual environment, if necessary. This is the case if you moved or renamed DSS’s data directory, as Python virtual environments embed their full directory name. This may be also be the case if you want to reset the virtualenv to a pristine state following installation / desinstallation of additional packages.
The builtin Python virtualenv is automatically created by the installer when it is not present. The sequence of operations to reinitialize it thus consists in removing the virtualenv and reinstalling DSS.
Warning
Doing this will automatically recreate the Python environment with Python 3.6 or Python 3.7 depending on your system. If your previous builtin environment was running Python 2.7, user code still using the builtin environment may need to be changed, and ML models trained with the builtin environment may need to be retrained
# Stop DSS
DATA_DIR/bin/dss stop
# Remove the virtualenv, keeping backup
mv DATA_DIR/pyenv DATA_DIR/pyenv.backup
# Reinstall DSS (upgrade mode), choosing the underlying base Python to use
dataiku-dss-VERSION/installer.sh -d DATA_DIR -u
# Start DSS
DATA_DIR/bin/dss start
# When everything is considered stable, remove the backup
rm -rf DATA_DIR/pyenv.backup
Advanced: using a fully custom Python environment¶
Warning
It is very rare to need to customize this. We strongly recommend that you only do it under instructions from Dataiku Support
For non-standard needs, you can force DSS to use an externally-maintained Python installation by defining the DKUPYTHONBIN environment variable;
When this variable is defined, the precompiled third-party Python packages shipped with DSS are not used. You must make sure that the
interpreter started by $DKUPYTHONBIN
contains all packages required by DSS. Please refer to the script
INSTALL_DIR/scripts/install/install-python-packages.sh
, found in the DSS installation directory, for this purpose.