The R API

The Dataiku R API allows you to read & write datasets from the R environment.

read.dataset(name, partitions=NULL, sampling=NULL, columns=NULL)

Create a dataframe from the content of a dataset.

Arguments:
  • name (character) –

    Name of the dataset. Can be in either of two formats:

    • projectKey.datasetName
    • datasetName: in this case, the current project will be used.
  • partitions (list) –

    List of partitions to load (notebook-only).

    Default value is NULL. Accepted values are:

    • NULL if the dataset is not partitioned.
    • A list of partition identifiers if the dataset is partitioned.

    This parameter cannot be used in recipe mode. Please read partition dependencies.

  • sampling

    A sampling method.

    Default value is NULL, which basically means “no sampling”. A sampling method can be created using one of these helpers:

    • ratio.sampling(ratio)
    • fixed.sampling(limit)
    • column.sampling(column, limit)
    • head.sampling(limit)
    • full.sampling()
  • columns (list) –

    List of columns to read.

    Default value is NULL, which basically means “keep all columns”.

    This parameter can be used to extract only a subset of the dataset’s columns.

write.dataset(df, name, partition="")

Write a R dataframe into a dataset.

Arguments:
  • df (data.frame) – The dataframe you want to write.
  • name (character) –

    The name of the target dataset. Can be in either of two formats:

    • projectKey.datasetName
    • datasetName: in this case, the current project will be used.
  • partition (character) –

    Identifier of the target partition (notebook-only).

    • This parameter can be left empty if the target dataset is not partitioned.
    • Otherwise, you have to specify the output partition identifier.

    This parameter cannot be used in recipe mode. Please read partition dependencies.

write.dataset_schema(df, name, partition="")

Set the schema of a dataset from a R dataframe. This function doesn’t write data.

Arguments:
  • df (data.frame) – The dataframe you want to copy the schema from.
  • name (character) –

    The name of the target dataset. Can be in either of two formats:

    • projectKey.datasetName
    • datasetName: in this case, the current project will be used.
write.dataset_with_schema(df, name, partition="")

Set the schema of the target dataset, and write the dataframe into it. This function is a shortcut, and it is strictly equivalent to call write.dataset_schema() followed by write.dataset().

Arguments:
  • df (data.frame) – The dataframe you want to write.
  • name (character) –

    The name of the target dataset. Can be in either of two formats:

    • projectKey.datasetName
    • datasetName: in this case, the current project will be used.
  • partition (character) –

    Identifier of the target partition (notebook-only).

    • This parameter can be left empty if the target dataset is not partitioned.
    • Otherwise, you have to specify the output partition identifier.

    This parameter cannot be used in recipe mode. Please read partition dependencies.