Dataiku Documentation
  • Academy
    • Join the Academy
      Benefit from guided learning opportunities →
      • Quick Starts
      • Learning Paths
      • New Features
      • Certifications
      • Academy Discussions
  • Community
      • Explore the Community
        Discover, share, and contribute →
      • Learn About Us
      • Ask A Question
      • What's New?
      • Discuss Dataiku
      • Using Dataiku
      • Setup And Configuration
      • General Discussion
      • Plugins & Extending Dataiku
      • Product Ideas
      • Programs
      • Frontrunner Awards
      • Dataiku Neurons
      • Community Resources
      • Community Feedback
      • User Research
  • Documentation
    • Reference Documentation
      Comprehensive specifications of Dataiku →
      • User's Guide
      • Specific Data Processing
      • Automation & Deployment
      • APIs
      • Installation & Administration
      • Other Topics
  • Knowledge
    • Knowledge Base
      Articles and tutorials on Dataiku features →
      • User Guide
      • Admin Guide
      • Dataiku Solutions
      • Dataiku Cloud
  • Developer
    • Developer Guide
      Tutorials and articles for developers and coder users →
      • Getting Started
      • Concepts and Examples
      • Tutorials
      • API Reference
  • User's Guide
  • DSS concepts
  • Connecting to data
  • Exploring data
  • Charts
  • The Flow
  • Data preparation
  • Visual recipes
  • Code recipes
  • Schemas, storage types and meanings
  • Generative AI and LLM Mesh
  • Machine learning
  • MLOps
  • Interactive statistics
  • Code notebooks
  • Code Studios
  • Webapps
  • Collaboration
  • AI Assistants
  • Dashboards
  • Workspaces
  • Stories
  • Data Catalog
  • Dataiku Applications
  • Working with partitions
  • DSS and SQL
  • DSS and Python
  • DSS and R
  • DSS and Spark
    • Usage of Spark in DSS
    • Spark configurations
    • Interacting with DSS datasets
    • Spark pipelines
    • Limitations and attention points
    • Setting up Spark integration
  • Code environments
  • Specific Data Processing
  • Time Series
  • Geographic data
  • Text & Natural Language Processing
  • Images
  • Audio
  • Video
  • Automation & Deployment
  • Metrics, checks and Data Quality
  • Automation scenarios
  • Production deployments and bundles
  • API Node & API Deployer: Real-time APIs
  • Governance
  • APIs
  • Python APIs
  • R API
  • Public REST API
  • Additional APIs
  • Installation & Administration
  • Installing and setting up
  • Elastic AI computation
  • DSS in the cloud
  • DSS and Hadoop
  • Metastore catalog
  • Operating DSS
  • Security
  • User Isolation
  • Email Notifications
  • Other topics
  • Plugins
  • Streaming data
  • Formula language
  • Custom variables expansion
  • Sampling methods
  • Accessibility
  • Troubleshooting
  • Release notes
  • Other Documentation
  • Third-party acknowledgements
Dataiku DSS
You are viewing the documentation for version 13 of DSS.
  • »
  • DSS and Spark Open page in a new tab

DSS and Spark¶

  • Usage of Spark in DSS
    • SparkSQL recipes
    • Visual recipes
    • Python code
      • Note about Spark code in Python notebooks
    • R code
      • Note about Spark code in R notebooks
    • Scala code
    • Machine Learning with MLLib
  • Spark configurations
  • Interacting with DSS datasets
    • Hadoop FS datasets
    • S3 datasets
    • Other
  • Spark pipelines
    • Enabling Spark pipelines
    • Creating a Spark pipeline
    • Configuring behavior for intermediate datasets
    • Limitations
  • Limitations and attention points
  • Setting up Spark integration
    • Unmanaged Spark on Kubernetes
      • Configure DSS
      • Build your Docker images
      • Create the Spark configuration

Spark is a general engine for distributed computation. Once Spark integration is setup, DSS will offer settings to choose Spark as a job’s execution engine in various components.

Next Previous

© Copyright 2025, Dataiku

Built with Sphinx using a theme provided by Read the Docs.