Python integration

DSS comes with Python integration builtin.

Additional prerequisites

As usual with Python package installation on Linux, you may need to install additional system dependencies if the target Python packages include native code. In particular you may need the system development tools (“build-essentials” on Debian/Ubuntu, “@Development tools” on RedHat/CentOS) and the Python interpreter header files (“python-dev” on Debian/Ubuntu, “python27-devel” on RedHat/CentOS 6.x, “python-devel” on RedHat/CentOS 7.x).

Builtin environment setup details

DSS requires a Python 2.7 interpreter. As part of the standard DSS installation, the presence of the distribution default packages for Python 2.7 is checked and if necessary those are pulled by the dependency installation phase.


On CentOS and RedHat 6.x, where the system’s version of Python is 2.6, Python 2.7 is pulled from the additional repository IUS (

The installation script locates the Python interpreter to use by looking up python2.7 in the standard PATH. It then proceeds to build a Python virtual environment on top of this interpreter, containing the standard Python packages shipped with DSS.

DSS uses this builtin environment to run Python core necessary to the proper working on DSS. User’s code can run either in the builtin environment, or using a code environment.

The DATA_DIR/bin/pip command can be used to list or otherwise manage the contents of the builtin virtual environment.

For testing purposes, the builtin virtual environment used by DSS can be launched with DATA_DIR/bin/python


If several Python 2.7 systems are available on your server, you can control which one is used by DSS by adjusting the PATH environment variable of the DSS Unix user account so that it is found by command python2.7. You should NOT use environment variable DKUPYTHONBIN for this as this would switch to a different advanced installation mode, described below.


The native libraries of the standard Python packages shipped with DSS are built using UCS-4 Unicode characters. Make sure the default Python interpreter used by DSS has been built with --enable-unicode=ucs4. This is the default on most recent Linux distributions, but it is not the default when building Python interpreters directly from source.

Rebuilding the builtin Python environment

It is possible to rebuild the builtin Python virtual environment, if necessary. This is the case if you moved or renamed DSS’s data directory, as Python virtual environments embed their full directory name. This may be also be the case if you want to reset the virtualenv to a pristine state following installation / desinstallation of additional packages.

The builtin Python virtualenv is automatically created by the installer when it is not present. The sequence of operations to reinitialize it thus consists in removing the virtualenv and reinstalling DSS, keeping track of any local package which you want to reinstall afterwards:

# Stop DSS
DATADIR/bin/dss stop
# Save the list of locally-installed packages
DATADIR/bin/pip freeze -l >dss-local-packages.txt
# Remove the virtualenv, keeping backup
mv DATADIR/pyenv DATADIR/pyenv.backup
# Reinstall DSS (upgrade mode)
dataiku-dss-VERSION/ -d DATADIR -u
# Review and possibly edit the list of locally-installed packages
vi dss-local-packages.txt
# Reinstall local packages
DATADIR/bin/pip install -r dss-local-packages.txt
# Start DSS
DATADIR/bin/dss start
# When everything is considered stable, remove the backup
rm -rf DATADIR/pyenv.backup

Advanced: using a fully custom Python environment

For non-standard needs, you can force DSS to use an externally-maintained Python 2.7 installation by defining the DKUPYTHONBIN environment variable for the Linux user account running the Studio.


Using this mode is not officially supported and not recommended.

This variable points to the Python binary to use. It should be defined before running the installer, and for all subsequent runs of the Studio startup or management scripts. You would typically define it as follows:

$ echo "export DKUPYTHONBIN=/usr/local/bin/python" >>$HOME/.profile

When this variable is defined, the precompiled third-party Python packages shipped with DSS are not used. You must make sure that the interpreter started by $DKUPYTHONBIN contains all packages required by DSS. Please refer to the script INSTALL_DIR/scripts/install/, found in the DSS installation directory, for this purpose.

Using Anaconda Python

DSS supports using Anaconda Python instead of standard system-provided Python for the builtin environment. In that mode, the DSS installer builds an Anaconda environment, containing the standard set of packages required by DSS, instead of a virtualenv-based environment, and uses it for all Python-based tasks.


conda packages repositories tend to be very bleeding-edge, and move quickly, with frequent backwards-incompatibles changes.

Various incompatibilities may happen, and Dataiku can only provide best-effort support with setup and usage of conda-based DSS setups

For these reasons, Dataiku does not generally recommend using conda for the builtin Python environment. We recommend that you only use conda if there are reasons for which you cannot use the native virtualenv and R packages systems.

As for virtualenv-based installations, it is possible but not recommended to manually add supplementary packages to this environment, for use in recipes and notebooks.


You can install individual code environments using conda while still using the regular virtualenv for the builtin environment


  • You must have a 64-bit version of Anaconda ( or Miniconda ( installed on the DSS host.
  • Anaconda/Miniconda are supported in version 4.3.27 or later only.
  • The binary directory for Anaconda must be in the PATH for the DSS user account. In particular, the conda command must be accessible to this user.
  • You must have access to a repository of standard Anaconda packages, either through an outgoing Internet connection (direct or using a proxy), or through a local mirror. See Offline installation for a possible workaround.


The DSS installer switches to Anaconda mode when given the -C flag:

dataiku-dss-VERSION/ -d DATADIR -p PORT -C

It will then download all required packages/versions from the Anaconda repository (plus a few custom ones which are provided directly from the DSS installation directory), and build an Anaconda environment from them in directory DATADIR/condaenv.

Once an Anaconda environment is built in DATADIR/condaenv it is used instead of the standard virtualenv in DATADIR/pyenv.

Upgrading an Anaconda-based DSS installation installs the new set of required packages/versions in the DSS Anaconda environment, preserving manually-installed additional packages, or upgrading them in case of versioning conflicts.

Offline installation

If the DSS host does not have an outgoing Internet connection nor access to a local mirror, you can create a local repository containing the packages needed by DSS to install properly. To do so, you need an access to a host with an Internet connection or a local mirror. This host must have conda installed too. From this host, download DSS and run the following script:


You will get information about the operation progress. Once it is finished, it produces a directory called dataiku-dss-VERSION-conda-python-offline-mirror containing the packages. Send this repository onto the DSS host by any mean at your disposal. Then, from the DSS host, run the following commands:

conda config --add channels file:///FULL/PATH/TO/dataiku-dss-VERSION-conda-python-offline-mirror
conda config --remove channels defaults

Then run the installer:

dataiku-dss-VERSION/ -d DATADIR -p PORT -C

Further operations

Adding / removing / listing additional packages from the DSS-managed Anaconda environment can be done using the standard conda commands:

conda list -p DATADIR/condaenv
conda install -p DATADIR/condaenv PACKAGE


  • Uninstalling / upgrading / downgrading the standard packages installed by DSS is not supported and may lead to subtle compatibility problems.
  • It is recommended to use code envs for user packages instead. See Installing Python packages

Adding / removing / listing additional packages may be done through the pip command, when the required packages are not available as conda packages:

DATADIR/bin/pip list
DATADIR/bin/pip install PACKAGE

For testing purposes, it is possible to run the DSS Anaconda environment outside DSS using:


It is possible to migrate a virtualenv-based DSS installation to Anaconda mode by running the installer in “upgrade” mode and adding the -C flag:

dataiku-dss-VERSION/ -d DATADIR -u -C

It is possible to migrate back n Anaconda-based DSS installation to standard virtualenv mode by moving away the conda environment and re-running the installer in “upgrade” mode:

mv DATADIR/condaenv DATADIR/condaenv.BAK
dataiku-dss-VERSION/ -d DATADIR -u

Mixed conda / virtualenv support

DSS can simultaneously use conda and non-conda (i.e. virtualenv) Python environment when:

  • the base DSS environment is installed with virtualenv (default installer option) but one wishes to build conda-based managed code environments as well
  • the base DSS environment is installed with conda (installer option “-C”) but one wishes to build virtualenv-based managed code environments as well

In both these cases, DSS needs to be able to locate both the conda command (to build conda environments) and the non-conda python2.7 command (to build non-conda environments with virtualenv), and does so by looking up both commands in the PATH environment variable of the DSS user account.

For this to work, it is typically necessary to configure the PATH variable for DSS so that the conda binaries are after the system binaries, so that “which python2.7” resolve to the system Python (which supports virtualenv) and not conda Python (which does not) as in: