Managed folders#

Note

There are two main classes related to managed folder handling in Dataiku’s Python APIs:

Both classes have fairly similar capabilities, but we recommend using dataiku.Folder within DSS.

For more details on the two packages, please see Getting started

Detailed examples#

This section contains more advanced examples on Managed Folders.

Load a model from a remote Managed Folder#

If you have a trained model artifact stored remotely (e.g. using a cloud object storage Connection like AWS S3), then you can leverage it in a code Recipe. To do so, you first need to download the artifact and temporarily store it on the Dataiku instance’s local filesystem. The following code sample illustrates an example using a Tensorflow serialized model and assumes that it is stored in a Managed Folder called spam_detection alog with the following files:

  • saved_model.pb

  • variables/variables.data-00000-of-00001

  • variables/variables.index

import dataiku
import tensorflow as tf
from tensorflow.keras.models import load_model
import os
import tempfile
from pathlib import Path
import shutil


folder = dataiku.Folder("NvrBgKDk")
model_folder = "spam_detection"

#Create temporary directory in /tmp
with tempfile.TemporaryDirectory() as tmpdirname:
    #Loop through every file of the TF model and copy it localy to the tmp directory
    for file_name in folder.list_paths_in_partition():
        local_file_path = tmpdirname + file_name
        #Create file localy
        if not os.path.exists(os.path.dirname(local_file_path)):
            os.makedirs(os.path.dirname(local_file_path))
        #Copy file from remote to local
        with folder.get_download_stream(file_name) as f_remote, open(local_file_path,'wb') as f_local:
            shutil.copyfileobj(f_remote,f_local)
    #Load model from local repository  
    model = tf.keras.models.load_model(os.path.join(tmpdirname,model_folder))   

Reference documentation#

Use the following class to interact with managed folders in Python recipes and notebooks. For more information see Managed folders and Usage in Python for usage examples of the Folder API.

dataiku.Folder(lookup[, project_key, ...])

Handle to interact with a folder.

Use the following class preferably outside of DSS.