Dataiku Documentation
  • Academy
    • Join the Academy
      Benefit from guided learning opportunities →
      • Quick Starts
      • Learning Paths
      • New Features
      • Certifications
      • Academy Discussions
  • Community
      • Explore the Community
        Discover, share, and contribute →
      • Learn About Us
      • Ask A Question
      • What's New?
      • Discuss Dataiku
      • Using Dataiku
      • Setup And Configuration
      • General Discussion
      • Plugins & Extending Dataiku
      • Product Ideas
      • Programs
      • Frontrunner Awards
      • Dataiku Neurons
      • Community Resources
      • Community Feedback
      • User Research
  • Documentation
    • Reference Documentation
      Comprehensive specifications of Dataiku →
      • User's Guide
      • Specific Data Processing
      • Automation & Deployment
      • APIs
      • Installation & Administration
      • Other Topics
  • Knowledge
    • Knowledge Base
      Articles and tutorials on Dataiku features →
      • User Guide
      • Admin Guide
      • Dataiku Solutions
      • Dataiku Cloud
  • Developer
    • Developer Guide
      Tutorials and articles for developers and coder users →
      • Getting Started
      • Concepts and Examples
      • Tutorials
      • API Reference
  • User's Guide
  • DSS concepts
  • Connecting to data
  • Exploring data
  • Charts
  • The Flow
  • Data preparation
  • Visual recipes
  • Code recipes
  • Schemas, storage types and meanings
  • Generative AI and LLM Mesh
  • Machine learning
  • MLOps
  • Interactive statistics
  • Code notebooks
  • Code Studios
  • Webapps
  • Collaboration
  • AI Assistants
  • Dashboards
  • Workspaces
  • Stories
  • Data Catalog
    • Data Collections
    • Popular Datasets
      • Basics
      • Settings
    • Datasets & Indexed Tables
    • Column-level Data Lineage
  • Dataiku Applications
  • Working with partitions
  • DSS and SQL
  • DSS and Python
  • DSS and R
  • DSS and Spark
  • Code environments
  • Specific Data Processing
  • Time Series
  • Geographic data
  • Text & Natural Language Processing
  • Images
  • Audio
  • Video
  • Automation & Deployment
  • Metrics, checks and Data Quality
  • Automation scenarios
  • Production deployments and bundles
  • API Node & API Deployer: Real-time APIs
  • Governance
  • APIs
  • Python APIs
  • R API
  • Public REST API
  • Additional APIs
  • Installation & Administration
  • Installing and setting up
  • Elastic AI computation
  • DSS in the cloud
  • DSS and Hadoop
  • Metastore catalog
  • Operating DSS
  • Security
  • User Isolation
  • Email Notifications
  • Other topics
  • Plugins
  • Streaming data
  • Formula language
  • Custom variables expansion
  • Sampling methods
  • Accessibility
  • Troubleshooting
  • Release notes
  • Other Documentation
  • Third-party acknowledgements
Dataiku DSS
You are viewing the documentation for version 13 of DSS.
  • »
  • Data Catalog »
  • Popular Datasets Open page in a new tab

Popular Datasets¶

Basics¶

At the bottom of the Data Collections home page, you can find a Popular Datasets section containing the popular datasets in your organization’s Dataiku instance.

Popular Datasets are datasets that are considered the most relevant for reuse or publication to data collections, workspaces, or feature stores.

If you have the relevant permissions, you can use a popular dataset in your own projects or publish it into a Workspace, a Data Collection, or the Feature Store.

A dataset is considered popular if it satisfies the following conditions:

  • It has a recent last build date.

  • It has been shared with multiple projects.

  • It is used in at least one downstream recipe in a project it is shared with, and that recipe has been run at least once.

Optionally DSS administrators can strengthen these conditions by requiring a dataset to be trending, or part of a least one Data Collection.

Settings¶

DSS administrators can enable or disable Popular Datasets and tune the settings used for the computation.

To configure Popular Datasets, go to Administration > Settings > Misc.

The following parameters can be configured to drive the conditions a popular dataset must fulfill:

Parameter

Default value

Description

Max # days since last rebuild

30

The maximum number of days since the last build of your dataset. This parameter cannot be set to 0.

Max # days since last used by a new recipe

60

The maximum number of days since the dataset has had a new downstream recipe created in a shared project. This parameter cannot be set to 0. This parameter is also used when checking whether a dataset is trending.

Min # shares

3

The minimum number of projects a dataset must be shared with to be considered popular (excluding the source project). This parameter cannot be set to 0.

Only from data collections

false

If true, only consider a dataset as popular if it is part of at least one Data Collection.

Only trending datasets

false

If true, only consider a dataset as popular if it is trending. Trending datasets refer to datasets that exhibit an increasing pattern of new recipe creation over specific temporal windows, determined by analyzing historical usage data.

Note

Popular datasets are not detected across multiple DSS instances.

Next Previous

© Copyright 2025, Dataiku

Built with Sphinx using a theme provided by Read the Docs.