R recipes

R is a language and environment for statistical computing. Data Science Studio provides an advanced integration with this environment, and gives you the ability to write recipes using the R language.

R recipes, like Python recipes, can read and write datasets, whatever their storage backend is. We provide a simple API to read and write them.

Basic R recipe

  • Create a new R recipe by clicking the « R » button in the Recipes page.
  • Go to the Inputs/Outputs tab
  • Add the input datasets that will be used as source data in your recipes.
  • Select or create the output datasets that will be created by your recipe. For more information, see Creating recipes
  • If needed, fill the partition dependencies. For more information, see Working with partitions
  • Give a name and save your Recipe.
  • You can now write your R code.

First of all, you will need to load the Dataiku R library.

library(dataiku)

You will then be able to obtain the dataframe objects corresponding to your inputs.

Reading a dataset in a dataframe

For example, if your recipe has dataset ‘A’ as input, you can use the method read.dataset() to load it into a native R dataframe :

# Load the content of dataset A into a native R dataframe
dataframeA <- read.dataset("A")

Writing a dataframe in a dataset

Once you have used R to manipulate the input dataframe, you generally want to write it into the output dataset.

The Dataiku R API provides the method write.dataset() to do so.

# Write the R dataframe 'my_dataframe' into the dataset 'output_dataset_name'
write.dataset(my_dataframe,"output_dataset_name")

Writing the output schema

Generally, you should declare the schema of the output dataset prior to running the R code. However, it is often impractical to do so, especially when you write dataframes with many columns (or columns that change often). In that case, it can be useful for the R script to actually modify the schema of the dataset.

The Dataiku R API provides a method to set the schema of the output dataset. When doing that, the schema of the dataset is modified each time the R recipe is run. This must obviously be used with caution.

# Set the schema of ‘my_output_dataset’ to match the columns of the dataframe 'my_dataframe'
write.dataset_schema(my_dataframe,"my_output_dataset")

You can also write the schema and the dataframe at the same time:

# Write the schema from the dataframe 'my_dataframe' and write it into 'my_output_dataset'
write.dataset_with_schema(my_dataframe,"my_output_dataset")

For more information, check the R API documentation.