Dataiku Documentation
  • Discussions
    • Setup & Configuration
    • Using Dataiku DSS
    • Plugins & Extending Dataiku DSS
    • General Discussion
    • Job Board
    • Community Resources
    • Product Ideas
  • Knowledge
    • Getting Started
    • Knowledge Base
    • Documentation
  • Academy
    • Quick Start Programs
    • Learning Paths
    • Certifications
    • Course Catalog
    • Academy Discussions
  • Community Programs
    • Upcoming User Events
    • Find a User Group
    • Past Events
    • Community Conundrums
    • Dataiku Neurons
    • Banana Data Podcast
  • What's New
  • User's Guide
  • DSS concepts
  • Connecting to data
  • Exploring your data
  • Schemas, storage types and meanings
  • Data preparation
  • Charts
  • Interactive statistics
  • Machine learning
  • The Flow
  • Visual recipes
  • Recipes based on code
  • Code notebooks
  • MLOps
  • Webapps
  • Code Studios
  • Code reports
  • Dashboards
  • Workspaces
  • Dataiku Applications
  • Working with partitions
  • DSS and SQL
  • DSS and Python
  • DSS and R
  • DSS and Spark
  • Code environments
  • Collaboration
  • Specific Data Processing
  • Time Series
  • Geographic data
  • Text & Natural Language Processing
  • Images
  • Audio
  • Video
  • Automation & Deployment
  • Automation scenarios, metrics, and checks
  • Production deployments and bundles
  • API Node & API Deployer: Real-time APIs
  • Governance
  • APIs
  • Python APIs
  • R API
  • Public REST API
  • Additional APIs
  • Installation & Administration
  • Installing and setting up
  • Elastic AI computation
  • DSS in the cloud
  • DSS and Hadoop
    • Setting up Hadoop integration
    • Connecting to secure clusters
    • Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS)
    • Hive
    • Impala
    • Spark
    • Hive datasets
    • Hadoop user isolation
    • Distribution-specific notes
      • Cloudera CDP
      • Cloudera CDH
      • Hortonworks HDP
      • Amazon Elastic MapReduce
      • Google Cloud Dataproc
    • Teradata Connector For Hadoop
    • Multiple Hadoop clusters
    • Dynamic AWS EMR clusters
    • Dynamic Google Dataproc clusters
  • Metastore catalog
  • Operating DSS
  • Security
  • User Isolation
  • Other topics
  • Plugins
  • Streaming data
  • Formula language
  • Custom variables expansion
  • Sampling methods
  • Accessibility
  • Troubleshooting
  • Release notes
  • Other Documentation
  • Third-party acknowledgements
Dataiku DSS
You are viewing the documentation for version 11 of DSS.
  • »
  • DSS and Hadoop »
  • Distribution-specific notes

Distribution-specific notesΒΆ

Each supported Hadoop distribution makes different choices in terms of packaging, versions of the different components of the Hadoop stack, supported ecosystems.

Each distribution bundles its own libraries and backports specific bugs that can modify the behavior of the Hadoop ecosystem components.

Therefore, there are some specificities related to the support of each Hadoop distribution

  • Cloudera CDP
    • Spark support
    • Security
  • Cloudera CDH
    • Security
      • DSS regular security and Sentry
    • Scala notebook
    • S3 datasets and Spark 2
    • Impala
  • Hortonworks HDP
    • HDP 3.1 support
    • Limitations
    • Security
      • DSS regular security and Ranger
      • DSS User Isolation Framework and Ranger
    • Migrating to HDP 3.X
  • Amazon Elastic MapReduce
    • Supported versions
    • Security
    • Deployment scenarios
      • Let DSS dynamically manage one or several EMR clusters
      • Connect DSS to an existing EMR cluster
        • DSS running on one of the cluster nodes
        • DSS outside of the cluster
      • Connect DSS to multiple existing EMR clusters
    • Using EMRFS
      • EMRFS credentials
  • Google Cloud Dataproc
    • Security
    • Known limitations
    • Connecting DSS to Cloud Dataproc
      • DSS running on one of the cluster nodes
      • DSS outside of the cluster
Next Previous

© Copyright 2022, Dataiku

Built with Sphinx using a theme provided by Read the Docs.