Dataiku Documentation
  • Academy
    • Join the Academy
      Benefit from guided learning opportunities →
      • Quick Starts
      • Learning Paths
      • New Features
      • Certifications
      • Academy Discussions
  • Community
      • Explore the Community
        Discover, share, and contribute →
      • Learn About Us
      • Ask A Question
      • What's New?
      • Discuss Dataiku
      • Using Dataiku
      • Setup And Configuration
      • General Discussion
      • Plugins & Extending Dataiku
      • Product Ideas
      • Programs
      • Frontrunner Awards
      • Dataiku Neurons
      • Community Resources
      • Community Feedback
      • User Research

      Discover the winners and finalists of the 2023 edition, and read their story to learn about their pioneering achievements in data science and AI!

      View Winners and Finalists

  • Documentation
    • Reference Documentation
      Comprehensive specifications of Dataiku →
      • User's Guide
      • Specific Data Processing
      • Automation & Deployment
      • APIs
      • Installation & Administration
      • Other Topics
  • Knowledge
    • Knowledge Base
      Articles and tutorials on Dataiku features →
      • User Guide
      • Admin Guide
      • Dataiku Solutions
      • Dataiku Cloud
  • Developer
    • Developer Guide
      Tutorials and articles for developers and coder users →
      • Getting Started
      • Concepts and Examples
      • Tutorials
      • API Reference
  • User's Guide
  • DSS concepts
  • Connecting to data
  • Exploring your data
  • Schemas, storage types and meanings
  • Data preparation
  • Charts
  • Interactive statistics
  • Machine learning
  • The Flow
  • Visual recipes
  • Recipes based on code
  • Code notebooks
  • MLOps
  • Webapps
  • Code Studios
  • Code reports
  • Dashboards
  • Workspaces
  • Data Catalog
  • Dataiku Applications
  • Working with partitions
  • DSS and SQL
  • DSS and Python
  • DSS and R
  • DSS and Spark
  • Code environments
  • Collaboration
  • Specific Data Processing
  • Time Series
  • Geographic data
  • Generative AI and LLM Mesh
  • Text & Natural Language Processing
  • Images
  • Audio
  • Video
  • Automation & Deployment
  • Metrics, checks and Data Quality
  • Automation scenarios
  • Production deployments and bundles
  • API Node & API Deployer: Real-time APIs
  • Governance
  • APIs
  • Python APIs
  • R API
  • Public REST API
  • Additional APIs
  • Installation & Administration
  • Installing and setting up
  • Elastic AI computation
  • DSS in the cloud
  • DSS and Hadoop
    • Setting up Hadoop integration
    • Connecting to secure clusters
    • Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS)
    • Hive
    • Impala
    • Spark
    • Hive datasets
    • Hadoop user isolation
    • Distribution-specific notes
      • Cloudera CDP
      • Cloudera CDH
      • Cloudera (ex-Hortonworks) HDP
      • Amazon Elastic MapReduce
      • Google Cloud Dataproc
    • Teradata Connector For Hadoop
    • Multiple Hadoop clusters
    • Dynamic AWS EMR clusters
    • Dynamic Google Dataproc clusters
  • Metastore catalog
  • Operating DSS
  • Security
  • User Isolation
  • Other topics
  • Plugins
  • Streaming data
  • Formula language
  • Custom variables expansion
  • Sampling methods
  • Accessibility
  • Troubleshooting
  • Release notes
  • Other Documentation
  • Third-party acknowledgements
Dataiku DSS
You are viewing the documentation for version 12 of DSS.
  • »
  • DSS and Hadoop »
  • Distribution-specific notes Open page in a new tab

Distribution-specific notes¶

Each supported Hadoop distribution makes different choices in terms of packaging, versions of the different components of the Hadoop stack, supported ecosystems.

Each distribution bundles its own libraries and backports specific bugs that can modify the behavior of the Hadoop ecosystem components.

Therefore, there are some specificities related to the support of each Hadoop distribution

  • Cloudera CDP
    • Spark support
    • Security
    • Know issues
  • Cloudera CDH
    • Security
      • DSS regular security and Sentry
    • Scala notebook
    • S3 datasets and Spark 2
    • Impala
  • Cloudera (ex-Hortonworks) HDP
    • HDP 3.1 support
    • Limitations
    • Security
      • DSS regular security and Ranger
      • DSS User Isolation Framework and Ranger
    • Migrating to HDP 3.X
  • Amazon Elastic MapReduce
    • Supported versions
    • Security
    • Deployment scenarios
      • Let DSS dynamically manage one or several EMR clusters
      • Connect DSS to an existing EMR cluster
        • DSS running on one of the cluster nodes
        • DSS outside of the cluster
      • Connect DSS to multiple existing EMR clusters
    • Using EMRFS
      • EMRFS credentials
  • Google Cloud Dataproc
    • Security
    • Known limitations
    • Connecting DSS to Cloud Dataproc
      • DSS running on one of the cluster nodes
      • DSS outside of the cluster
Next Previous

© Copyright 2024, Dataiku

Built with Sphinx using a theme provided by Read the Docs.