Java runtime environment

Customizing Java runtime options

The main backend for Data Science Studio is a Java application. Runtime options for this Java process can be customized.

The different Java processes

DSS is made up of 4 main kind of Java processes:

  • The “backend” is the main server, which handles all interaction with users, the configuration, and the visual data preparation
  • The “jek” is a process which runs the jobs (ie, what happens when you use “Build”)
  • The “fek” handles long-running background tasks. It is also responsible for building the data samples
  • The “hproxy” handles interactions with Hive and Pig

What can be customized

All Java options of these 4 kinds of processes can be customized.

For each of these, DSS provides an easy way to configure:

  • Configure the amount of memory allocated to each process
  • Configure the “permgen” (a specific kind of memory for Java processes)
  • Add custom options

These three kinds of customizations can be done by editing the install.ini file.

More advanced customization (taking precedence over default DSS options) can be done via environment files.

Customizing memory (xmx) and permgen

Most often, you will want to customize the amount of memory (“xmx”) variable, which is the maximum memory allocated to the Java process.

Xmx is configured by setting the <processtype>.xmx setting in the javaopts section of the install.ini file (where <processtype> is one of backend, jek, fek or hproxy).

By default, Xmx is set to 2GB. This might not be enough for DSS instances with large number of users. If that amount of memory is not sufficient, the DSS backend may crash, and all users may get disconnected.

Example: Set Xmx of backend to 3g

  • Go to the DSS data directory

Note

On Mac OS X, the DATA_DIR is always: $HOME/Library/DataScienceStudio/dss_home

  • Stop DSS

    ./bin/dss stop
    
  • Edit the install.ini file

  • If it does not exist, add a [javaopts] section

  • Add a line: backend.xmx = 3g

  • Regenerate the runtime config:

    ./bin/dssadmin regenerate-config
    
  • Start DSS

    ./bin/dss start
    

Example install.ini

Here is an example of an install.ini file that configures the Xmx for backend and jek:

[javaopts]
backend.xmx = 3g
jek.xmx = 2g

Memory amounts can be suffixed with “m” or “g” for megabytes and gigabytes

Setting the “permgen”

Permgen is a specific kind of Java memory. You will need to increase it if you encounter DSS restarts with “OutOfMemoryError: PermGen space” messages.

To set the permgen, use the <processtype>.permgen setting (where <processtype> is one of backend, jek, fek or hproxy).

Memory amounts can be suffixed with “m” or “g” for megabytes and gigabytes.

The same stop / regenerate-config / start logic applies

Adding additional options

Use the same procedure as the previous one, but add a line like

[javaopts]
backend.additional.opts=-Dmy.option=value

Advanced customization

The full Java runtime options can be configured by setting environment variables in the DATA_DIR/bin/env-site.sh file in the Data Science Studio data directory.

The default runtime options are stored in several environment variables:

  • DKU_BACKEND_JAVA_OPTS
  • DKU_JEK_JAVA_OPTS
  • DKU_FEK_JAVA_OPTS
  • DKU_HPROXY_JAVA_OPTS

The default values for these files (computed from install.ini) are stored in the DATA_DIR/bin/env-default.sh.

Warning

Do not modify DATA_DIR/bin/env-default.sh, it would get overwritten at the next Data Science Studio upgrade and after each call to ./bin/dssadmin regenerate-config

To configure these options:

  • Stop DSS

    ./bin/dss stop
    
  • Open the bin/env-default.sh file

  • Copy the line you want to change. They look like export DKU_BACKEND_JAVA_OPTS, export DKU_JEK_JAVA_OPTS, …

  • Open the DATA_DIR/bin/env-site.sh file

  • Paste the line and modify it to your needs

  • Start DSS

    ./bin/dss start
    

Customizing the JVM

Data Science Studio requires an installation of Java Development Kit version 7 or 8. Supported versions are OpenJDK (http://openjdk.java.net) and Oracle JDK (http://www.oracle.com/technetwork/java/javase/downloads/index.html).

As part of the standard Data Science Studio installation, a suitable version of Java is looked for in standard locations, and if none is found the OpenJDK 7 system package appropriate for this distribution is pulled by the dependency installation phase.

You can force Data Science Studio to use a specific version of Java (for example, when there are several versions installed on the server, or when you manually installed Java in a non-standard place) by setting the DKUJAVABIN environment variable while running the DSS installer script. This variable should point to the java binary to use. For example:

$ DKUJAVABIN=/usr/local/bin/java dataiku-dss-VERSION/installer.sh <INSTALLER_OPTIONS>

Note that the installer script stores this value in the file DATA_DIR/bin/env-default.sh. You do not need to define it permanently for the Linux user account running the Studio.