Streaming Spark Scala¶
DSS uses wrappers around Spark’s structured streaming to manipulate streaming endpoints. This implies using a micro-batch approach and manipulating Spark dataframes.
Data from streaming endpoints is accessed via getStream():
val dkuContext = DataikuSparkContext.getContext(sparkContext)
val df = dkuContext.getStream("wikipedia")
// manipulate df like a regular dataframe
DSS will automatically use Spark’s native Kafka integration, and stream the data via the backend for other endpoint types.
Writing a streaming dataframe to a dataset is:
val q = dkuContext.saveStreamingQueryToDataset("dataset", df)
q.awaitTermination() // waits for the sources to stop
Writing to a streaming endpoint is equally simple:
val q = dkuContext.saveStreamingQueryToStreamingEndpoint("endpoint", df)
q.awaitTermination() // waits for the sources to stop
The awaitTermination() call is needed, otherwise the recipe will exit right away.