Interacting with Pyspark

dataiku.spark.get_dataframe(sqlContext, dataset)

Opens a DSS dataset as a SparkSQL dataframe. The ‘dataset’ argument must be a dataiku.Dataset object

dataiku.spark.write_dataframe(dataset, dataframe, delete_first=True)

Saves a SparkSQL dataframe into an existing DSS dataset

dataiku.spark.write_schema_from_dataframe(dataset, dataframe)

Sets the schema on an existing dataset to be write-compatible with given SparkSQL dataframe

dataiku.spark.write_with_schema(dataset, dataframe, delete_first=True)

Writes a SparkSQL dataframe into an existing DSS dataset. This first overrides the schema of the dataset to match the schema of the dataframe