Installing a new DSS instance

Note

This is the documentation to perform a Custom Dataiku install of a new Dataiku DSS instance on a Linux server

Other installation options are available (Dataiku Cloud Stacks, macOS, Windows, AWS sandbox, Azure sandbox, or Virtual Machine). Please see Installing and setting up.

Pre-requisites

To install Dataiku DSS, you need:

  • the installation tar.gz file

  • to make sure that you meet the installation Requirements.

Installation folders

A Dataiku DSS installation spans over two folders:

  • The installation directory, which contains the code of Dataiku DSS. This is the directory where the Dataiku DSS tarball is unzipped (denoted as “INSTALL_DIR”)

  • The data directory (which will later be named “DATA_DIR”).

The data directory contains :

  • The configuration of Dataiku DSS, including all user-generated configuration (datasets, recipes, insights, models, …)

  • Log files for the server components

  • Log files of job executions

  • Various caches and temporary files

  • A Python virtual environment dedicated to running the Python components of Dataiku DSS, including any user-installed supplementary packages

  • Dataiku DSS startup and shutdown scripts and command-line tools

Depending on your configuration, the data directory can also contain some managed datasets. Managed datasets can also be created outside of the data directory with some additional configuration.

It is highly recommended that you reserve at least 100 GB of space for the data directory.

The data directory should be entirely contained within a single mount and be a regular folder. Having foreign mounts within the data directory, or symlinking parts of the data directory to foreign mounts is not supported.

Installation

Unpack

Unpack the tar.gz in the location you have chosen for the installation directory.

cd SOMEDIR
tar xzf /PATH/TO/dataiku-dss-VERSION.tar.gz
# This creates a directory named dataiku-dss-VERSION in the current directory
# which contains DSS code for this version (no user file is written to it by DSS).
# This directory is referred to as INSTALL_DIR in this document.

Install Dataiku DSS

From the user account which will be used to run Dataiku DSS, enter the following command:

dataiku-dss-VERSION/installer.sh -d /path/to/DATA_DIR -p PORT [-l LICENSE_FILE]

Where:

  • DATA_DIR is the location of the data directory that you want to use. If the directory already exists, it must be empty.

  • PORT is the base TCP port. Dataiku DSS will use several ports between PORT and PORT+10

  • LICENSE_FILE is your Dataiku DSS license file.

Warning

DATA_DIR must be outside of the install dir (i.e. DATA_DIR must not be within dataiku-dss-VERSION)

Note

If you don’t enter a license file at this point, DSS will start as a Free Edition. You can enter a license file at any time.

The installer automatically checks for any missing system dependencies. If any is missing, it will give you the command to run to install them with superuser privileges. After installation of dependencies is complete, you can start the Dataiku DSS installer again, using the same command as above.

(Optional) Enable startup at boot time

At the end of installation, Dataiku DSS will show you the optional command to run with superuser privileges to configure automatic boot-time startup:

sudo INSTALL_DIR/scripts/install/install-boot.sh [-n INSTANCE_NAME] DSS_DATADIR DSS_USER

This configures a systemd-based system service with a default name of “dataiku” (in /etc/systemd/system/dataiku.service), and enables it to automatically start at boot. You can then use standard service management commands to control this DSS instance, as in:

# Start the DSS service
sudo systemctl start dataiku
# Stop the DSS service
sudo systemctl stop dataiku
# Get service status
sudo systemctl status dataiku
# Get service log
sudo journalctl -u dataiku
# Disable boot-time startup
sudo systemctl disable dataiku

Note

If you have several instances of DSS installed on the same host, and want more than one to automatically start at boot time, you need to provide different, non-default names for them so as to configure independent boot-time system services, as follows:

# Defines system service "dataiku.dev" for DSS design instance
sudo DESIGN_INSTALL_DIR/scripts/install/install-boot.sh -n dev DESIGN_DATA_DIR DESIGN_USER_ACCOUNT
# Defines system service "dataiku.prod" for DSS automation instance
sudo AUTOMATION_INSTALL_DIR/scripts/install/install-boot.sh -n prod AUTOMATION_DATA_DIR AUTOMATION_USER_ACCOUNT

This system service is implemented by a helper script installed at /etc/dataiku/INSTANCE_ID/dataiku-boot, where INSTANCE_ID is the unique id of this DSS instance (generated at installation time in DATA_DIR/install.ini).

This script has an associated configuration file dataiku-boot.conf in the same directory (/etc/dataiku/INSTANCE_ID/dataiku-boot.conf). This file can be used to configure the optional creation of resource control cgroups for use by this DSS instance, as described here.

Warning

Versions of Dataiku DSS prior to 13.x were using legacy systemv-based init scripts in /etc/init.d/dataiku[.NAME] for boot-time startup.

In order to migrate an instance to the new systemd-based setup, you need to first remove its legacy startup script if any.

Note that any customization for the legacy script (in /etc/default/dataiku or /etc/sysconfig/dataiku) would have to be reinstalled in the new service configuration file at /etc/dataiku/INSTANCE_ID/dataiku-boot.conf.

Note also that service configuration keys have been renamed from DIP_xxx (legacy syntax) to DSS_xxx (new syntax).

Start Dataiku DSS

To directly start Dataiku DSS, run the following command:

DATA_DIR/bin/dss start

To start the Dataiku DSS system service, run the following command:

# Default DSS service
sudo systemctl start dataiku

# Named DSS service
sudo systemctl start dataiku.INSTANCE_NAME

Warning

Do not mix manual- and system-service-based startup and shutdown. A DSS instance started through systemctl (or at boot) should only be stopped or restarted through systemctl, so the operating system service manager can correctly keep track of the Dataiku DSS service status.

Complete installation example

The following shows a transcript from a complete installation sequence:

# Start from the home directory of user account "dataiku"
# which will be used to run the Dataiku DSS
# We will install DSS using data directory: /home/dataiku/dss_data
$ pwd
/home/dataiku
$ ls -l
-rw-rw-r-- 1 dataiku dataiku 159284660 Feb  4 15:20 dataiku-dss-VERSION.tar.gz
-r-------- 1 dataiku dataiku       786 Jan 31 07:42 license.json

# Unpack the distribution kit
$ tar xzf dataiku-dss-VERSION.tar.gz

# If the User Isolation Framework is to be configured on this instance,
# make sure all user accounts have read-execute permission to the installation directory
$ chmod a+x .
$ umask 22

# Run installer, with data directory $HOME/dss_data and base port 10000
# This fails because of missing system dependencies
$ dataiku-dss-VERSION/installer.sh -d /home/dataiku/dss_data -l /home/dataiku/license.json -p 10000

# Install dependencies with elevated privileges, using the command shown by the previous step
$ sudo -i "/home/dataiku/dataiku-dss-VERSION/scripts/install/install-deps.sh"

# Rerun installer script, which will succeed this time
$ dataiku-dss-VERSION/installer.sh -d /home/dataiku/dss_data -l /home/dataiku/license.json -p 10000

# Manually start DSS, using the command shown by the installer step
$ /home/dataiku/dss_data/bin/dss start

# Connect to Dataiku DSS by opening the following URL in a web browser:
#    http://HOSTNAME:10000
# Initial credentials : username = "admin" / password = "admin"

# [Optional] To finalize the installation, restart as a system-managed service:
#
# Stop the manually-started instance
$ /home/dataiku/dss_data/bin/dss stop
#
# Create a system service, using the command shown by the previous step
$ sudo "/home/dataiku/dataiku-dss-VERSION/scripts/install/install-boot.sh" "/home/dataiku/dss_data" dataiku
#
# Start the system service
$ sudo systemctl start dataiku

Manual dependency installation

The Dataiku DSS installer includes a dependency management script, to be run with superuser privileges, which automatically installs the additional Linux packages required for your particular configuration.

In some cases however, it might be necessary to manually install these dependencies, for instance when the person installing DSS does not have access to administrative privileges, or when the server does not have access to the required package repositories.

You can check for missing packages by running the dependency installer script with option -check. This does not require superuser privileges:

$ dataiku-dss-VERSION/scripts/install/install-deps.sh -check [-with-r]

If you manually pre-installed all the dependencies that would have been selected by the automated script, you can continue installing Dataiku DSS using standard procedures. If that is not the case (because you explicitly chose to leave a component missing, or you installed some component from an alternate source) you must then run the DSS installer with the “-n” flag, to disable the default dependency checks.

# Python 3 has been installed from a custom source instead of the standard system RPM
# Force the DSS installer to run without checking for missing dependencies (option "-n")
$ dataiku-dss-VERSION/installer.sh -n -d /home/dataiku/dss_data -l /home/dataiku/license.json -p 10000

RHEL-compatible distributions

You may need to configure the EPEL additional repository, for R support (and for nginx, on version 7.x systems).

You may need to enable the “CodeReady”, “PowerTools” or “Optional” repositories, for indirect dependencies required by R.

Dataiku DSS depends on the following packages:

Name

Notes

acl

For User Isolation Framework support

expat git nginx unzip zip

Mandatory

java-11-openjdk-headless or java-17-openjdk-headless

See “Java” note below

python3 or python39

For built-in Python packages. See “Python” note below

freetype libgfortran libgomp

Built-in Python packages dependencies

policycoreutils policycoreutils-python-utils

For SELinux support

R-core-devel libicu-devel libcurl-devel openssl-devel libxml2-devel

For R support. See “R” note below

Debian / Ubuntu Linux distributions

You may need to configure the CRAN repository, for R support (https://cran.r-project.org/).

Dataiku DSS depends on the following packages:

Name

Notes

acl

For User Isolation Framework support

curl git libexpat1 nginx unzip zip

Mandatory

openjdk-11-jre-headless or openjdk-17-jre-headless

See “Java” note below

python3.6 or python3.7 or python3.9 or python3.10

For built-in Python packages. See “Python” note below

python3-distutils libfreetype6 libgomp1

Built-in Python packages dependencies

r-base-dev libicu-dev libcurl4-openssl-dev libssl-dev libxml2-dev pkg-config

For R support. See “R” note below

Amazon Linux distributions

On Amazon Linux 2, you may need to enable “extra” repositories for nginx and Java 11, and EPEL for R support.

Dataiku DSS depends on the following packages:

Name

Notes

acl

For User Isolation Framework support

expat git nginx unzip zip

Mandatory

java-11-openjdk-headless or java-17-amazon-corretto-headless

See “Java” note below

python3

For built-in Python packages. See “Python” note below

libgomp

Built-in Python packages dependencies

freetype compat-gcc-48-libgfortran

[Amazon Linux 2] Built-in Python packages dependencies

R-core-devel libicu-devel libcurl-devel openssl-devel libxml2-devel

For R support. See “R” note below

SUSE Linux Enterprise Server distributions

You may need to configure the following additional repositories:

Name

Address

Notes

nginx

https://nginx.org/

[SLES 12.x] For nginx

R

obs://devel:languages:R:patched/<SLES_VERSION>

For R support

Dataiku DSS depends on the following packages:

Name

Notes

acl

For User Isolation Framework support

git-core libexpat1 nginx unzip zip

Mandatory

java-11-openjdk-headless or java-17-openjdk-headless

See “Java” note below

python3 or python36

For built-in Python packages. See “Python” note below

libfreetype6 libgomp1

Built-in Python packages dependencies

libgfortran3

[SLES 12.x] Built-in Python packages dependencies

gcc-fortran R-core-devel libicu-devel libcurl-devel libopenssl-devel libxml2-devel

For R support. See “R” note below

Base development tools

For R support.

Additional notes

Java

DSS supports Java 11 or 17.

The suggested dependency package is the platform default, but DSS can use other Java runtime environments. The Java version to use can be specified with the JAVA_HOME environment variable when running the DSS installer.

See Advanced Java runtime configuration for details.

Python

DSS supports Python 3.6, 3.7, 3.9 and 3.10 for its built-in environment.

One of these versions must be installed on the host, and can be chosen with the -P PYTHONBIN option to the installer.

Additional Python versions may be used for code environments.

Additional python packages

Installing additional Python packages which include native code may require the system development tools to be installed (typically C/C++ compilers and headers), in addition to any package-specific system dependency.

R

DSS requires R 4.x

The dependencies listed above as well as the system development tools are necessary to enable the initial R integration in DSS. Additional dependencies are usually needed in order to build additional R packages.