Setting up Hadoop and Spark integrationΒΆ

Data Science Studio is able to connect to a Hadoop cluster and to:

  • Read and write HDFS datasets
  • Run Hive queries and scripts
  • Run Impala queries
  • Run Pig scripts
  • Run preparation recipes on Hadoop

In addition, if you setup Spark integration, you can:

  • Run SparkSQL queries
  • Run preparation, join, stack and group recipes on Spark
  • Run PySpark & SparkR scripts
  • Train & use Spark MLLib models

See Setting up Hadoop integration and Setting up Spark integration