R recipes¶
R is a language and environment for statistical computing. Data Science Studio provides an advanced integration with this environment, and gives you the ability to write recipes using the R language.
R recipes, like Python recipes, can read and write datasets, whatever their storage backend is. We provide a simple API to read and write them.
Basic R recipe¶
- Create a new R recipe by clicking the « R » button in the Recipes page.
- Go to the Inputs/Outputs tab
- Add the input datasets that will be used as source data in your recipes.
- Select or create the output datasets that will be created by your recipe. For more information, see Creating recipes
- If needed, fill the partition dependencies. For more information, see Working with partitions
- Give a name and save your Recipe.
- You can now write your R code.
First of all, you will need to load the Dataiku R library.
library(dataiku)
You will then be able to obtain the dataframe objects corresponding to your inputs.
Reading a dataset in a dataframe¶
For example, if your recipe has dataset ‘A’ as input, you can use the method read.dataset()
to load it into a native R dataframe :
# Load the content of dataset A into a native R dataframe
dataframeA <- read.dataset("A")
Writing a dataframe in a dataset¶
Once you have used R to manipulate the input dataframe, you generally want to write it into the output dataset.
The Dataiku R API provides the method write.dataset()
to do so.
# Write the R dataframe 'my_dataframe' into the dataset 'output_dataset_name'
write.dataset(my_dataframe,"output_dataset_name")
Writing the output schema¶
Generally, you should declare the schema of the output dataset prior to running the R code. However, it is often impractical to do so, especially when you write dataframes with many columns (or columns that change often). In that case, it can be useful for the R script to actually modify the schema of the dataset.
The Dataiku R API provides a method to set the schema of the output dataset. When doing that, the schema of the dataset is modified each time the R recipe is run. This must obviously be used with caution.
# Set the schema of ‘my_output_dataset’ to match the columns of the dataframe 'my_dataframe'
write.dataset_schema(my_dataframe,"my_output_dataset")
You can also write the schema and the dataframe at the same time:
# Write the schema from the dataframe 'my_dataframe' and write it into 'my_output_dataset'
write.dataset_with_schema(my_dataframe,"my_output_dataset")
For more information, check the R API documentation.