Cloudera CDH

CDH includes Spark and Impala.

DSS supports CDH 5.5 to 5.11

Security

  • Connecting to secure clusters is fully supported
  • Multi-user security is supported with Sentry on CDH 5.9 and above

DSS regular security and Sentry

When using DSS in regular security mode to connect to a Sentry-secured cluster, you need to make some configuration adjustments. See DSS and Hive for more information

Scala notebook

CDH’s packaging of Spark 1.6 replaces some of the libraries normally used by Spark by older versions. This makes the Spark version bundled with CDH incompatible with the Spark-Scala notebook of DSS.

The only way to have Spark-scala notebooks working on Spark 1.6 on CDH is to perform a standalone Spark installation. Note that you’ll need to add some configuration keys to your standalone Spark to make it work with YARN.

S3 datasets and Spark 2

The CDH version of Spark 2 repackages some libraries, causing some incompatibilities with DSS S3 code.

Trying to access S3 datasets with CDH Spark 2 will raise errors like `AmazonS3Exception: AWS authentication requires a valid Date or x-amz-date header`

To work-around, you need to add a configuration key to your Spark configurations:

  • Key: `spark.driver.extraClassPath`
  • Value: `INSTALL_DIR/lib/ivy/common-run/joda-time-2.9.2.jar` (replace INSTALL_DIR by the full path to your DSS installation directory)