A core expectation of MLOps is to accelerate the deployment of models. A key part of this acceleration is to build efficient models faster. This can be achieved by using the most relevant data without heavy preparation, especially if this preparation is repeated. Helping Data Scientists to build, find and use this relevant data is the core notion of a Feature Store.
In order to implement such an approach in DSS, there are many capabilities at hand:
Feature Storage is handled by Dataiku extensive Connections Library
Data Ingestion and Curation is performed using Recipes in the Flow
Offline serving for batch processing is done using Join Recipes in projects deployed on an Automation node
Online serving for realtime processing is done using Dataset Lookups in API services
Data monitoring is implemented using Metrics & Checks
Automated building and maintenance is managed by Scenarios and Triggers
In DSS, the Feature Store section is actually the central registry of all Feature Groups, a Feature Group being a curated and promoted Dataset containing valuable Features.
If you are interested in building a complete Feature Store solution within Dataiku, you can read our hands-on article in our knowledge base.
Creating a Feature Group¶
A Feature Group is a curated DSS Dataset that is shared across your entire instance. In order to create Feature Groups:
Create a dataset containing the features, either by direct definition or using recipes
Set this dataset as a feature group
Defining Feature Groups requires the “Manage Feature Store” permission.
In order to streamline the usage of Feature groups by other teams and projects, it is recommend to have as often as possible the underlying Datasets be either Quickly Shareable or with Request access activated (see Shared Objects).
The Feature Store is available through the “nine dots” menu.
From this main screen, you can search and see information on the Feature Groups:
The left panel allows to refine the search on various criteria
The central panel shows the Feature Groups with the main data
When clicking on a line in this central panel, the right panel shows details on the Feature Group such as its description, details on its content and its usage
You may experience a latency of a few seconds before a Feature Group appears in the Feature Store and is usable.
Using a Feature Group¶
As a user of the Feature Store, you have a “Use” button in the right panel when the Feature Group is selected. This button allows to add this specific Feature Group into your project.
You will then be invited to select the target project(s) in which the Feature Group should be added as a dataset. As explained above, leveraging the Request Access and Quick Share options makes this easier.
The Feature Group can then be used as any other dataset. It appears in the flow with a medal overlay in the lower right corner.
Removing a Feature Group¶
To remove a Feature Group, click on the “Remove” button. This action will not delete the underlying Dataset. Similarly, all existing sharings of the underlying dataset will remain fully working. Removing a Feature Group essentially means that it will not be available in the Feature Store for future users.