Setting up Hadoop and Spark integration

Data Science Studio is able to connect to a Hadoop cluster and to:

  • Read and write HDFS datasets

  • Run Hive queries and scripts

  • Run Impala queries

  • Run Pig scripts

  • Run preparation recipes on Hadoop

In addition, if you setup Spark integration, you can:

  • Run SparkSQL queries

  • Run preparation, join, stack and group recipes on Spark

  • Run PySpark & SparkR scripts

  • Train & use Spark MLLib models

See Setting up Hadoop integration and Setting up Spark integration