Dataiku Documentation
  • Discussions
    • Setup & Configuration
    • Using Dataiku DSS
    • Plugins & Extending Dataiku DSS
    • General Discussion
    • Job Board
    • Community Resources
    • Product Ideas
  • Knowledge
    • Getting Started
    • Knowledge Base
    • Documentation
  • Academy
    • Quick Start Programs
    • Learning Paths
    • Certifications
    • Course Catalog
    • Academy Discussions
  • Community Programs
    • Upcoming User Events
    • Find a User Group
    • Past Events
    • Community Conundrums
    • Dataiku Neurons
    • Banana Data Podcast
  • What's New
  • User's Guide
  • DSS concepts
  • Connecting to data
  • Exploring your data
  • Schemas, storage types and meanings
  • Data preparation
  • Charts
  • Interactive statistics
  • Machine learning
  • The Flow
  • Visual recipes
  • Recipes based on code
  • Code notebooks
  • MLOps
  • Webapps
  • Code Studios
  • Code reports
  • Dashboards
  • Workspaces
  • Dataiku Applications
  • Working with partitions
  • DSS and SQL
  • DSS and Python
  • DSS and R
  • DSS and Spark
    • Usage of Spark in DSS
    • Spark configurations
    • Interacting with DSS datasets
    • Spark pipelines
    • Limitations and attention points
    • Setting up Spark integration
  • Code environments
  • Collaboration
  • Specific Data Processing
  • Time Series
  • Geographic data
  • Text & Natural Language Processing
  • Images
  • Audio
  • Video
  • Automation & Deployment
  • Automation scenarios, metrics, and checks
  • Production deployments and bundles
  • API Node & API Deployer: Real-time APIs
  • Governance
  • APIs
  • Python APIs
  • R API
  • Public REST API
  • Additional APIs
  • Installation & Administration
  • Installing and setting up
  • Elastic AI computation
  • DSS in the cloud
  • DSS and Hadoop
  • Metastore catalog
  • Operating DSS
  • Security
  • User Isolation
  • Other topics
  • Plugins
  • Streaming data
  • Formula language
  • Custom variables expansion
  • Sampling methods
  • Accessibility
  • Troubleshooting
  • Release notes
  • Other Documentation
  • Third-party acknowledgements
Dataiku DSS
You are viewing the documentation for version 11 of DSS.
  • »
  • DSS and Spark

DSS and Spark¶

  • Usage of Spark in DSS
    • SparkSQL recipes
    • Visual recipes
    • Python code
      • Note about Spark code in Python notebooks
    • R code
      • Note about Spark code in R notebooks
    • Scala code
      • Spark Scala, PySpark & SparkR recipes
      • Spark Scala, PySpark & SparkR notebooks
      • Note about Spark code in Scala notebooks
    • Machine Learning with MLLib
    • Machine Learning with H2O Sparkling Water
  • Spark configurations
  • Interacting with DSS datasets
    • Hadoop FS datasets
    • S3 datasets
    • Other
  • Spark pipelines
    • Enabling Spark pipelines
    • Creating a Spark pipeline
    • Configuring behavior for intermediate datasets
    • Limitations
  • Limitations and attention points
  • Setting up Spark integration
    • Unmanaged Spark on Kubernetes
    • Main steps
      • Configure DSS
      • Build your Docker images
      • Create the Spark configuration

Spark is a general engine for distributed computation. Once Spark integration is setup, DSS will offer settings to choose Spark as a job’s execution engine in various components.

Next Previous

© Copyright 2022, Dataiku

Built with Sphinx using a theme provided by Read the Docs.