Installing a new DSS instance¶
Note
This is the documentation to perform a Custom Dataiku install of a new Dataiku DSS instance on a Linux server
Other installation options are available (Dataiku Cloud Stacks, macOS, Windows, AWS sandbox, Azure sandbox, or Virtual Machine). Please see Installing and setting up.
Pre-requisites¶
To install Dataiku DSS, you need:
the installation tar.gz file
to make sure that you meet the installation Requirements.
Installation folders¶
A Dataiku DSS installation spans over two folders:
The installation directory, which contains the code of Dataiku DSS. This is the directory where the Dataiku DSS tarball is unzipped (denoted as “INSTALL_DIR”)
The data directory (which will later be named “DATA_DIR”).
The data directory contains :
The configuration of Dataiku DSS, including all user-generated configuration (datasets, recipes, insights, models, …)
Log files for the server components
Log files of job executions
Various caches and temporary files
A Python virtual environment dedicated to running the Python components of Dataiku DSS, including any user-installed supplementary packages
Dataiku DSS startup and shutdown scripts and command-line tools
Depending on your configuration, the data directory can also contain some managed datasets. Managed datasets can also be created outside of the data directory with some additional configuration.
It is highly recommended that you reserve at least 100 GB of space for the data directory.
The data directory should be entirely contained within a single mount and be a regular folder. Having foreign mounts within the data directory, or symlinking parts of the data directory to foreign mounts is not supported.
Installation¶
Unpack¶
Unpack the tar.gz in the location you have chosen for the installation directory.
cd SOMEDIR
tar xzf /PATH/TO/dataiku-dss-VERSION.tar.gz
# This creates a directory named dataiku-dss-VERSION in the current directory
# which contains DSS code for this version (no user file is written to it by DSS).
# This directory is referred to as INSTALL_DIR in this document.
Install Dataiku DSS¶
From the user account which will be used to run Dataiku DSS, enter the following command:
dataiku-dss-VERSION/installer.sh -d /path/to/DATA_DIR -p PORT [-l LICENSE_FILE]
Where:
DATA_DIR is the location of the data directory that you want to use. If the directory already exists, it must be empty.
PORT is the base TCP port. Dataiku DSS will use several ports between PORT and PORT+10
LICENSE_FILE is your Dataiku DSS license file.
Warning
DATA_DIR must be outside of the install dir (i.e. DATA_DIR must not be within dataiku-dss-VERSION)
Note
If you don’t enter a license file at this point, DSS will start as a Free Edition. You can enter a license file at any time.
The installer automatically checks for any missing system dependencies. If any is missing, it will give you the command to run to install them with superuser privileges. After installation of dependencies is complete, you can start the Dataiku DSS installer again, using the same command as above.
(Optional) Enable startup at boot time¶
At the end of installation, Dataiku DSS will give you the optional command to run with superuser privileges to configure automatic boot-time startup:
sudo -i INSTALL_DIR/scripts/install/install-boot.sh DATA_DIR USER_ACCOUNT
Note
This configures a boot-time system service with a default name of “dataiku”. If you have several instances of DSS installed on the same host, and want more than one to automatically start at boot time, you need to provide different, non-default names for them so as to configure independent boot-time system services, as follows:
# Defines system service "dataiku.dev" for DSS design instance
sudo -i DESIGN_INSTALL_DIR/scripts/install/install-boot.sh -n dev DESIGN_DATA_DIR DESIGN_USER_ACCOUNT
# Defines system service "dataiku.prod" for DSS automation instance
sudo -i AUTOMATION_INSTALL_DIR/scripts/install/install-boot.sh -n prod AUTOMATION_DATA_DIR AUTOMATION_USER_ACCOUNT
Complete installation example¶
The following shows a transcript from a complete installation sequence:
# Start from the home directory of user account "dataiku"
# which will be used to run the Dataiku DSS
# We will install DSS using data directory: /home/dataiku/dss_data
$ pwd
/home/dataiku
$ ls -l
-rw-rw-r-- 1 dataiku dataiku 159284660 Feb 4 15:20 dataiku-dss-VERSION.tar.gz
-r-------- 1 dataiku dataiku 786 Jan 31 07:42 license.json
# Unpack distribution kit
$ tar xzf dataiku-dss-VERSION.tar.gz
# Run installer, with data directory $HOME/dss_data and base port 10000
# This fails because of missing system dependencies
$ dataiku-dss-VERSION/installer.sh -d /home/dataiku/dss_data -l /home/dataiku/license.json -p 10000
# Install dependencies with elevated privileges, using the command shown by the previous step
$ sudo -i "/home/dataiku/dataiku-dss-VERSION/scripts/install/install-deps.sh"
# Rerun installer script, which will succeed this time
$ dataiku-dss-VERSION/installer.sh -d /home/dataiku/dss_data -l /home/dataiku/license.json -p 10000
# Configure boot-time startup, using the command shown by the previous step
$ sudo -i "/home/dataiku/dataiku-dss-VERSION/scripts/install/install-boot.sh" "/home/dataiku/dss_data" dataiku
# Manually start DSS, using the command shown by the installer step
$ /home/dataiku/dss_data/bin/dss start
# Connect to Dataiku DSS by opening the following URL in a web browser:
# http://HOSTNAME:10000
# Initial credentials : username = "admin" / password = "admin"
Manual dependency installation¶
The Dataiku DSS installer includes a dependency management script, to be run with superuser privileges, which automatically installs the additional Linux packages required for your particular configuration.
In some cases however, it might be necessary to manually install these dependencies, for instance when the person installing DSS does not have access to administrative privileges, or when the server does not have access to the required package repositories.
If you manually pre-installed all the dependencies that would have been selected by the automated script, you can continue installing Dataiku DSS using standard procedures. If that is not the case (because you explicitly chose to leave a component missing, or you installed some component from an alternate source) you must then run the DSS installer with the “-n” flag, to disable the default dependency checks.
Red Hat / CentOS / Oracle Linux distributions¶
You may need to configure the following additional repositories:
Name |
Address |
Notes |
---|---|---|
EPEL |
for nginx [RedHat/CentOS v7.x] and R |
|
IUS |
[RedHat/CentOS v6.x] for Python 2.7 and 3.6 |
|
nginx |
[RedHat/CentOS v6.x] for nginx |
On RedHat and Oracle Linux 6.x and 7.x, you may also need to enable the vendor’s “optional” repository, for indirect dependencies required by R.
On RedHat and CentOS 8.x, you may also need to enable the “CodeReady” or “PowerTools” repositories, for indirect dependencies required by R.
Dataiku DSS depends on the following packages:
Name |
Notes |
---|---|
acl |
For User Isolation Framework support |
expat git nginx unzip zip |
Mandatory |
ncurses-compat-libs |
[RedHat/CentOS v8.x] Mandatory |
java-1.8.0-openjdk |
See “Java” note below |
python27 python36 freetype libgfortran libgomp |
[RedHat/CentOS v6.x] for built-in Python packages. See “Python” note below |
python27-devel python36-devel |
[RedHat/CentOS v6.x] See “Additional Python packages” note below |
python python3 freetype libgfortran libgomp |
[RedHat/CentOS v7.x] for built-in Python packages. See “Python” note below |
python-devel python3-devel |
[RedHat/CentOS v7.x] See “Additional Python packages” note below |
python2 python36 freetype libgfortran libgomp |
[RedHat/CentOS v8.x] for built-in Python packages. See “Python” note below |
python2-devel python36-devel |
[RedHat/CentOS v8.x] See “Additional Python packages” note below |
bzip2 mesa-libGL libSM libXrender libgomp alsa-lib |
For Anaconda-Python-based installations, see note below |
R-core-devel libicu-devel libcurl-devel openssl-devel libxml2-devel |
See “R” note below. |
Debian / Ubuntu Linux distributions¶
You may need to configure the following additional repository:
Name |
Address |
Notes |
---|---|---|
CRAN |
[Debian 9.x, 10.x, Ubuntu 16.04, 18.04] For R |
Dataiku DSS depends on the following packages:
Name |
Notes |
---|---|
acl |
For User Isolation Framework support |
curl git libexpat1 libncurses5 nginx unzip zip |
Mandatory |
default-jre-headless |
See “Java” note below |
python2.7 libpython2.7 libfreetype6 libgomp1 |
For built-in Python packages. See “Python” note below |
python2.7-dev |
See “Additional Python packages” note below |
python3.6 python3-distutils |
[Ubuntu 18.04] For Python 3.6 |
python3.6-dev |
[Ubuntu 18.04] See “Additional Python packages” note below |
bzip2 libgl1-mesa-glx libsm6 libxrender1 libgomp1 libasound2 |
For Anaconda-Python-based installations, see note below |
r-base-dev libicu-dev libcurl4-openssl-dev libssl-dev libxml2-dev pkg-config |
See “R” note below |
SUSE Linux Enterprise Server distributions¶
You may need to configure the following additional repository:
Name |
Address |
Notes |
---|---|---|
nginx |
For nginx |
|
R |
obs://devel:languages:R:patched |
For R |
Dataiku DSS depends on the following packages:
Name |
Notes |
---|---|
acl |
For User Isolation Framework support |
git-core libexpat1 libncurses5 nginx unzip zip |
Mandatory |
java-1_8_0-openjdk-headless |
See “Java” note below |
python python-xml libfreetype6 libgomp1 |
For built-in Python packages. See “Python” note below |
python36 libgfortran3 |
[Suse 12.x] For Python 3.6 |
python3 |
[Suse 15.x] For Python 3.6 |
bzip2 Mesa-libGL1 libSM6 libXrender1 libgomp1 |
For Anaconda-Python-based installations, see note below |
gcc-fortran R-core-devel libicu-devel libcurl-devel libopenssl-devel libxml2-devel |
See “R” note below |
Base development tools |
See “R” note below |
Additional notes¶
- Java
The suggested dependency package is the platform default, but DSS can use other Java runtime environments. See Advanced Java runtime configuration for details.
- Python
The dependencies listed above are required to use the precompiled set of Python packages provided with DSS. This does not apply when using custom-built Python libraries. See Python integration for details.
- Python 3.6
Starting with DSS 6.0, built-in Python packages are provided both for Python 2.7 and Python 3.6, and the base DSS Python environment can be built with either of these versions. The DSS dependency installer pulls Python 2.7 on all Linux distributions, and Python 3.6 on all distributions where it is readily available. This can be overriden if needed - see Python integration for details.
- Additional python packages
Installing additional Python packages which include native code require the dependencies listed above and the system development tools to be installed (typically C/C++ compilers and headers), in addition to any package-specific dependency.
- Anaconda Python
The dependencies listed above for Anaconda-Python-based installations replace those listed for system-Python-based installations.
- R
The dependencies listed above are only necessary to enable R integration in DSS. Note that the system development tools and additional dependencies are usually needed in order to build the required R packages. DSS requires R version 3.2.5 or later.