Dataiku Documentation
  • Academy
    • Join the Academy
      Benefit from guided learning opportunities →
      • Quick Starts
      • Learning Paths
      • New Features
      • Certifications
      • Academy Discussions
  • Community
      • Explore the Community
        Discover, share, and contribute →
      • Learn About Us
      • Ask A Question
      • What's New?
      • Discuss Dataiku
      • Using Dataiku
      • Setup And Configuration
      • General Discussion
      • Plugins & Extending Dataiku
      • Product Ideas
      • Programs
      • Frontrunner Awards
      • Dataiku Neurons
      • Community Resources
      • Community Feedback
      • User Research
  • Documentation
    • Reference Documentation
      Comprehensive specifications of Dataiku →
      • User's Guide
      • Specific Data Processing
      • Automation & Deployment
      • APIs
      • Installation & Administration
      • Other Topics
  • Knowledge
    • Knowledge Base
      Articles and tutorials on Dataiku features →
      • User Guide
      • Admin Guide
      • Dataiku Solutions
      • Dataiku Cloud
  • Developer
    • Developer Guide
      Tutorials and articles for developers and coder users →
      • Getting Started
      • Concepts and Examples
      • Tutorials
      • API Reference
  • User's Guide
  • DSS concepts
  • Connecting to data
    • Supported connections
    • SQL databases
    • Amazon S3
    • Azure Blob Storage
    • Google Cloud Storage
    • Upload your files
    • HDFS
    • Cassandra
    • MongoDB
    • Elasticsearch and OpenSearch
    • File formats
      • Delimiter-separated values (CSV / TSV)
      • Fixed width
      • Parquet
      • Avro
      • Hive ORCFile
      • XML
      • JSON
      • Excel
      • ESRI Shapefiles
      • Delta Lake
    • Managed folders
    • “Files in folder” dataset
    • Metrics dataset
    • Internal stats dataset
    • “Editable” dataset
    • kdb+
    • FTP
    • SCP / SFTP (aka SSH)
    • HTTP
    • HTTP (with cache)
    • Server filesystem
    • Dataset plugins
    • Making relocatable managed datasets
    • Clearing non-managed Datasets
    • Data ordering
    • Dynamic dataset repeat
    • PI System / PIWebAPI server
    • Google Drive
    • Google Sheets
    • Google Analytics
    • Data transfer on Dataiku Cloud
    • SharePoint Online
    • Excel
  • Exploring data
  • Charts
  • The Flow
  • Data preparation
  • Visual recipes
  • Code recipes
  • Schemas, storage types and meanings
  • Generative AI and LLM Mesh
  • Machine learning
  • MLOps
  • Interactive statistics
  • Code notebooks
  • Code Studios
  • Webapps
  • Collaboration
  • AI Assistants
  • Dashboards
  • Workspaces
  • Stories
  • Data Catalog
  • Dataiku Applications
  • Working with partitions
  • DSS and SQL
  • DSS and Python
  • DSS and R
  • DSS and Spark
  • Code environments
  • Specific Data Processing
  • Time Series
  • Geographic data
  • Text & Natural Language Processing
  • Images
  • Audio
  • Video
  • Automation & Deployment
  • Metrics, checks and Data Quality
  • Automation scenarios
  • Production deployments and bundles
  • API Node & API Deployer: Real-time APIs
  • Governance
  • APIs
  • Python APIs
  • R API
  • Public REST API
  • Additional APIs
  • Installation & Administration
  • Installing and setting up
  • Elastic AI computation
  • DSS in the cloud
  • DSS and Hadoop
  • Metastore catalog
  • Operating DSS
  • Security
  • User Isolation
  • Email Notifications
  • Other topics
  • Plugins
  • Streaming data
  • Formula language
  • Custom variables expansion
  • Sampling methods
  • Accessibility
  • Troubleshooting
  • Release notes
  • Other Documentation
  • Third-party acknowledgements
Dataiku DSS
You are viewing the documentation for version 13 of DSS.
  • »
  • Connecting to data »
  • File formats Open page in a new tab

File formats¶

Datasets based on files require a file input format.

This section contains detailed information on the supported formats and options.

  • Delimiter-separated values (CSV / TSV)
    • Quoting and escaping styles
      • Excel-style
        • Example
      • Unix-style
        • Example
      • Escaping only
        • Example
      • No escaping, no quoting
    • Usage in datasets
    • Usage in recipes
  • Fixed width
  • Parquet
    • Requirements
    • Applicability
    • Limitations and issues
      • Case-sensitivity
      • Related to Hive
      • Related to Impala
      • Misc
  • Avro
    • Applicability
      • Reading Avro files
      • Reading Avro files / multiple versions
      • Writing Avro files
  • Hive ORCFile
    • Compatibility
    • Limitations
  • XML
    • Handling the structure
      • Selection of the data to load
      • JSON representation
        • Example
    • Using XPath to select data
      • Limitations
      • Selecting values explicitly
        • Example
  • JSON
    • Example
  • Excel
  • ESRI Shapefiles
    • Vecmath library
  • Delta Lake
Next Previous

© Copyright 2025, Dataiku

Built with Sphinx using a theme provided by Read the Docs.