Dataiku Documentation
  • Discussions
    • Setup & Configuration
    • Using Dataiku DSS
    • Plugins & Extending Dataiku DSS
    • General Discussion
    • Job Board
    • Community Resources
    • Product Ideas
  • Knowledge
    • Getting Started
    • Knowledge Base
    • Reference Documentation
    • Developer Guide
  • Academy
    • Quick Start Programs
    • Learning Paths
    • Certifications
    • Course Catalog
    • Academy Discussions
  • Community Programs
    • Upcoming User Events
    • Find a User Group
    • Past Events
    • Community Conundrums
    • Dataiku Neurons
    • Banana Data Podcast
  • What's New
  • User's Guide
  • DSS concepts
  • Connecting to data
  • Exploring your data
  • Schemas, storage types and meanings
  • Data preparation
  • Charts
  • Interactive statistics
  • Machine learning
  • The Flow
  • Visual recipes
  • Recipes based on code
  • Code notebooks
  • MLOps
  • Webapps
  • Code Studios
  • Code reports
  • Dashboards
  • Workspaces
  • Data Catalog
    • Data Collections
    • Datasets & Indexed Tables
    • Popular Datasets
      • Basics
      • Settings
  • Dataiku Applications
  • Working with partitions
  • DSS and SQL
  • DSS and Python
  • DSS and R
  • DSS and Spark
  • Code environments
  • Collaboration
  • Specific Data Processing
  • Time Series
  • Geographic data
  • Generative AI and LLM Mesh
  • Text & Natural Language Processing
  • Images
  • Audio
  • Video
  • Automation & Deployment
  • Automation scenarios, metrics, and checks
  • Production deployments and bundles
  • API Node & API Deployer: Real-time APIs
  • Governance
  • APIs
  • Python APIs
  • R API
  • Public REST API
  • Additional APIs
  • Installation & Administration
  • Installing and setting up
  • Elastic AI computation
  • DSS in the cloud
  • DSS and Hadoop
  • Metastore catalog
  • Operating DSS
  • Security
  • User Isolation
  • Other topics
  • Plugins
  • Streaming data
  • Formula language
  • Custom variables expansion
  • Sampling methods
  • Accessibility
  • Troubleshooting
  • Release notes
  • Other Documentation
  • Third-party acknowledgements
Dataiku DSS
You are viewing the documentation for version 12 of DSS.
  • »
  • Data Catalog »
  • Popular Datasets Open page in a new tab

Popular Datasets¶

Basics¶

At the bottom of the Data Collections home page, you can find a Popular Datasets section containing the popular datasets in your organization’s Dataiku instance.

Popular Datasets are datasets that are considered the most relevant for reuse or publication to data collections, workspaces, or feature stores.

If you have the relevant permissions, you can use a popular dataset in your own projects or publish it into a Workspace, a Data Collection, or the Feature Store.

A dataset is considered popular if it satisfies the following conditions:

  • It has a recent last build date.

  • It has been shared with multiple projects.

  • It is used in at least one downstream recipe in a project it is shared with, and that recipe has been run at least once.

Optionally DSS administrators can strengthen these conditions by requiring a dataset to be trending, or part of a least one Data Collection.

Settings¶

DSS administrators can enable or disable Popular Datasets and tune the settings used for the computation.

To configure Popular Datasets, go to Administration > Settings > Misc.

The following parameters can be configured to drive the conditions a popular dataset must fulfill:

Parameter

Default value

Description

Max # days since last rebuild

30

The maximum number of days since the last build of your dataset. This parameter cannot be set to 0.

Max # days since last used by a new recipe

60

The maximum number of days since the dataset has had a new downstream recipe created in a shared project. This parameter cannot be set to 0. This parameter is also used when checking whether a dataset is trending.

Min # shares

3

The minimum number of projects a dataset must be shared with to be considered popular (excluding the source project). This parameter cannot be set to 0.

Only from data collections

false

If true, only consider a dataset as popular if it is part of at least one Data Collection.

Only trending datasets

false

If true, only consider a dataset as popular if it is trending. Trending datasets refer to datasets that exhibit an increasing pattern of new recipe creation over specific temporal windows, determined by analyzing historical usage data.

Note

Popular datasets are not detected across multiple DSS instances.

Next Previous

© Copyright 2023, Dataiku

Built with Sphinx using a theme provided by Read the Docs.