Setting up R integration

Due to the large number of additional system dependencies, DSS R integration is not installed by default.

You can install R integration at any time.

Prerequisites (Mac OS X only)

On Mac OS X, you must first install R from http://www.r-project.org/. Note that you might need to also install XQuartz

Case 1: Automatic installation, if your DSS server has Internet access

This procedure installs the required R packages and configures R integration for DSS. It prompts you to install any missing dependency as root if needed. Internet access (direct or through a proxy) may be needed to download missing packages.

  • Go to the DSS data dir

    cd DATADIR
    
  • Stop DSS

    ./bin/dss stop
    
  • Run the installation script

    ./bin/dssadmin install-R-integration
    

Note

The install-R-integration script automatically checks for any missing system dependencies. If any is missing, it will give you the command to run to install them with superuser privileges. After the installation of dependencies is complete, you can retry the install-R-integration script

  • Start DSS

    ./bin/dss start
    

Case 2: If your DSS server does not have Internet access

To help with R package installation when the DSS server does not have Internet access (directly nor through a proxy), the DSS installation kit includes a standalone script which may be used to download the required set of R package sources on a third-party Internet-connected system, and store them to a directory suitable for offline installation on the DSS server.

  • Check for missing system dependencies on the DSS server, including the base R system, the development tools, and libraries required by the mandatory R packages. If any dependency is missing, you will need to install it from a local package repository for your OS distribution.

    dataiku-dss-VERSION/scripts/install/install-deps.sh -check -without-java -without-python -with-r
    
  • Retrieve the standalone download script dataiku-dss-VERSION/scripts/install/download-R-packages.sh and transport it to the system which you will use for download. This system should run Linux or Mac OS X, and should have Internet connection, directly or through a proxy.

  • On this download system, run the download script as follows:

    ./download-R-packages.sh -dir DIR
    

    where DIR is a temporary directory which will hold the downloaded packages.

  • Transport the resulting directory DIR to the DSS server.

  • On the DSS server, install any missing R packages from this download directory, and finish configuring DSS R integration:

    DATADIR/bin/dssadmin install-R-integration -pkgDir DIR
    
  • Restart DSS

    DATADIR/bin/dss restart
    

Case 3: Custom installation

Installing DSS R integration consists in the following steps, which you can perform in any way suitable to your environment:

  • Install R on the DSS server (minimum version 3.1.2, recommended version 3.2 or later)

    Data Science Studio references it by looking up “R” in the PATH. If needed, you can override this by defining environment variable DKURBIN in the local customization file DATADIR/bin/env-site.sh.

  • Install the following R packages, either in the global R library, or in the user library of the DSS user account:

    Packages Repository
    httr RJSONIO dplyr data.table rzmq CRAN (https://cran.r-project.org)
    repr IRkernel IRdisplay IRKernel (http://irkernel.github.io/)
  • Configure DSS R integration, with the option which omits the default dependency check, and restart DSS

    cd DATADIR
    ./bin/dssadmin install-R-integration -noDeps
    ./bin/dss restart
    

Troubleshooting

Mac OS X

Some R versions (notably the one coming through Homebrew) are configured to use source packages by default rather than binary packages. If you leave this option, automatic installation will fail and:

  • Either you will need quite a few additional packages to compile RZmq
  • Or you can disable the option

If you get an error about RZmq while running install-R-integration, you will need to install the IRKernel package manually. At the R prompt:

options(pkgType="both")
install.packages(c("repr", "IRkernel", "IRdisplay"),
c("http://irkernel.github.io/", "http://cran.rstudio.com/"))

Then run the install-R-integration command again.