Interaction with Pyspark¶
-
dataiku.spark.
start_spark_context_and_setup_sql_context
(load_defaults=True, hive_db='dataiku', conf={})¶ Helper to start a Spark Context and a SQL Context “like DSS recipes do”. This helper is mainly for information purpose and not used by default.
-
dataiku.spark.
setup_sql_context
(sc, hive_db='dataiku', conf={})¶ Helper to start a SQL Context “like DSS recipes do”. This helper is mainly for information purpose and not used by default.
-
dataiku.spark.
distribute_py_libs
(sc)¶
-
dataiku.spark.
get_dataframe
(sqlContext, dataset)¶ Opens a DSS dataset as a SparkSQL dataframe. The ‘dataset’ argument must be a dataiku.Dataset object
-
dataiku.spark.
write_schema_from_dataframe
(dataset, dataframe)¶ Sets the schema on an existing dataset to be write-compatible with given SparkSQL dataframe
-
dataiku.spark.
write_dataframe
(dataset, dataframe, delete_first=True)¶ Saves a SparkSQL dataframe into an existing DSS dataset
-
dataiku.spark.
write_with_schema
(dataset, dataframe, delete_first=True)¶ Writes a SparkSQL dataframe into an existing DSS dataset. This first overrides the schema of the dataset to match the schema of the dataframe
-
dataiku.spark.
apply_prepare_recipe
(df, recipe_name, project_key=None)¶