CDH includes Spark and Impala.
DSS supports CDH 5.9 to 6.3.
CDH 6 is only supported with java 8.
- Connecting to secure clusters is fully supported
- User isolation is supported with Sentry
CDH’s packaging of Spark 1.6 replaces some of the libraries normally used by Spark by older versions. This makes the Spark version bundled with CDH incompatible with the Spark-Scala notebook of DSS.
The only way to have Spark-scala notebooks working on Spark 1.6 on CDH is to perform a standalone Spark installation. Note that you’ll need to add some configuration keys to your standalone Spark to make it work with YARN.
S3 datasets and Spark 2¶
The CDH version of Spark 2 repackages some libraries, causing some incompatibilities with DSS S3 code.
Trying to access S3 datasets with CDH Spark 2 will raise errors like
`AmazonS3Exception: AWS authentication requires a valid Date or x-amz-date header`
To work around, you need to add a configuration key to your Spark configurations:
`INSTALL_DIR/lib/ivy/common-run/joda-time-2.9.2.jar`(replace INSTALL_DIR by the full path to your DSS installation directory)
On cdh 5.15, if kerberos is enabled the following error can appear
`Server impala/X.X.X.X@KERBEROS_DOMAIN not found in Kerberos database` where X.X.X.X is an ip adress.
This is an known impala bug (https://issues.apache.org/jira/browse/IMPALA-7298).
To work around it you need to have
`rdns=true` in your kerberos configuration as documented in the above link.