Managed folders

Note

There are two main classes related to managed folder handling in Dataiku’s Python APIs:

Both classes have fairly similar capabilities, but we recommend using dataiku.Folder within DSS.

For more details on the two packages, please see Python APIs

This class lets you interact with managed folders in Python recipes and notebooks. See Managed folders for more information.

class dataiku.Folder(lookup, project_key=None, ignore_flow=False)

This is a handle to interact with a managed folder.

Note: this class is also available as dataiku.Folder

get_info(sensitive_info=False)

Get information about the location and settings of this managed folder :rtype: dict

get_partition_info(partition)

Get information about the partitions of this managed folder :rtype: dict

get_path()

Gets the filesystem path of this managed folder. This method can only be called for managed folders that are stored on the local filesystem of the DSS server.

For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs.

is_partitioning_directory_based()

Whether the partitioning of the folder is based on sub-directories

list_paths_in_partition(partition='')

Gets the filesystem paths of the folder for the given partition (or for the entire folder)

list_partitions()

Gets the partitions in the folder

Return type:list
get_partition_folder(partition)

Gets the filesystem path of the directory corresponding to the partition (if the partitioning is directory-based)

get_id()
get_name()
file_path(filename)

Gets the filesystem path for a given file within the folder. This method can only be called for managed folders that are stored on the local filesystem of the DSS server.

For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs.

Parameters:filename (str) – Name of the file within the folder
read_json(filename)

Reads a JSON file within the folder and returns its parsed content

Parameters:filename (str) – Path of the file within the folder
Return type:list or dict: Depending on the content of the file
write_json(filename, obj)

Writes a JSON-serializable (mostly dict or list) object as JSON to a file within the folder

Parameters:
  • filename (str) – Path of the target file within the folder
  • obj (str) – JSON-serializable object to write (generally dict or list)
clear()

Removes all files from the folder

clear_partition(partition)

Removes all files from a specific partition of the folder.

clear_path(path)

DEPRECATED - Use delete_path instead

delete_path(path)

Removes a file or directory from the folder

get_path_details(path='/')

Get details about a specific path (file or directory) in the folder

Return type:dict
get_download_stream(path)

Gets a file-like object that allows you to read a single file from this folder. If the file already exists, it will be replaced.

with folder.get_download_stream("myfile") as stream:
    data = stream.readline()
    print("First line of myfile is: {}".format(data))
Return type:file-like
upload_stream(path, f)

Uploads the content of a file-like object to a specific path in the managed folder. If the file already exists, it will be replaced.

# This copies a local file to the managed folder
with open("local_file_to_upload") as f:
    folder.upload_stream("name_of_file_in_folder", f)
Parameters:
  • path (str) – Target path of the file to write in the managed folder
  • f – file-like object open for reading
upload_file(path, file_path)

Uploads a local file to a specific path in the managed folder. If the file already exists, it will be replaced.

Parameters:
  • path (str) – Target path of the file to write in the managed folder
  • file_path – Absolute path to a local file
upload_data(path, data)

Uploads binary data to a specific path in the managed folder. If the file already exists, it will be replaced.

Parameters:
  • path (str) – Target path of the file to write in the managed folder
  • data – str or unicode data to upload
get_writer(path)

Get a writer object to write incrementally to a specific path in the managed folder. If the file already exists, it will be replaced.

Parameters:path (str) – Target path of the file to write in the managed folder
get_last_metric_values(partition='')

Get the set of last values of the metrics on this folder, as a dataiku.ComputedMetrics object

get_metric_history(metric_lookup, partition='')

Get the set of all values a given metric took on this folder :param metric_lookup: metric name or unique identifier :param partition: optionally, the partition for which the values are to be fetched

save_external_metric_values(values_dict)

Save metrics on this folder. The metrics are saved with the type “external”

Parameters:values_dict – the values to save, as a dict. The keys of the dict are used as metric names
save_external_check_values(values_dict)

Save checks on this folder. The checks are saved with the type “external”

Parameters:values_dict – the values to save, as a dict. The keys of the dict are used as check names

dataikuapi version

Use this class preferably outside of DSS

class dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, odb_id)

A managed folder on the DSS instance

id
delete()

Delete the managed folder

get_definition()

Get the definition of the managed folder

Returns:
the definition, as a JSON object
set_definition(definition)

Set the definition of the managed folder

Args:
definition: the definition, as a JSON object. You should only set a definition object that has been retrieved using the get_definition call.
list_contents()

Get the list of files in the managed folder

Returns:
the list of files, as a JSON object
get_file(path)

Get a file from the managed folder

Returns:
the file’s content, as a stream
delete_file(path)

Delete a file from the managed folder

put_file(path, f)

Upload the file to the managed folder

Args:
f: the file contents, as a stream path: the path of the file
compute_metrics(metric_ids=None, probes=None)

Compute metrics on this managed folder. If the metrics are not specified, the metrics setup on the managed folder are used.

get_last_metric_values()

Get the last values of the metrics on this managed folder

Returns:
a list of metric objects and their value
get_metric_history(metric)

Get the history of the values of the metric on this dataset

Returns:
an object containing the values of the metric, cast to the appropriate type (double, boolean,…)
get_zone()

Gets the flow zone of this managed folder

Return type:dataikuapi.dss.flow.DSSFlowZone
move_to_zone(zone)

Moves this object to a flow zone

Parameters:zone (object) – a dataikuapi.dss.flow.DSSFlowZone where to move the object
get_usages()

Get the recipes referencing this folder

Returns:
a list of usages
get_object_discussions()

Get a handle to manage discussions on the managed folder

Returns:the handle to manage discussions
Return type:dataikuapi.discussion.DSSObjectDiscussions
copy_to(target, write_mode='OVERWRITE')

Copies the data of this folder to another folder

Parameters:Folder (target) – a dataikuapi.dss.managedfolder.DSSManagedFolder representing the target of this copy
Returns:a DSSFuture representing the operation