Cloudera CDP¶
Dataiku supports Cloudera Data Platform - CDP Private Cloud Base:
7.1.7 (aka 7.1.7.p0)
7.1.7 SP1 (aka 7.1.7.p1000 and above)
7.1.7 SP2 (aka 7.1.7.p2000 and above)
7.1.8
7.1.9
7.1.9 SP1
Spark support¶
Dataiku supports the Spark 3 version provided by CDP
The Spark 2 version provided by CDP is not supported anymore
Connection to Azure Blob (abfs:// URLs) with Spark 3 is not supported
Security¶
Connecting to secure clusters is supported
User isolation is supported with Ranger
Using Knox is not supported. DSS must be deployed within the zone protected by Knox
Known issues¶
DSS 14 requires at least Java 17 to run. If the Java version used for Spark is lower, it should be updated via the properties
spark.executorEnv.JAVA_HOME
andspark.yarn.appMasterEnv.JAVA_HOME
in the Spark Settings to point to a suitable JDKThe Hive version deployed by CDP 7.1.9 doesn’t support unbounded ranges (https://issues.apache.org/jira/browse/HIVE-24905) in analytical function, which impacts Window recipes in DSS if they use an unbounded window frame. It can be worked around by setting the following Hive property:
set hive.vectorized.execution.reduce.enabled=false