Managed folders

Note

There are two main classes related to managed folder handling in Dataiku’s Python APIs:

Both classes have fairly similar capabilities, but we recommend using dataiku.Folder within DSS.

For more details on the two packages, please see Python APIs

This class lets you interact with managed folders in Python recipes and notebooks. For more information see Managed folders and Usage in Python for usage examples of the Folder API.

class dataiku.Folder(lookup, project_key=None, ignore_flow=False)

This is a handle to interact with a managed folder.

Note: this class is also available as dataiku.Folder

get_info(sensitive_info=False)

Get information about the location and settings of this managed folder :rtype: dict

get_partition_info(partition)

Get information about the partitions of this managed folder :rtype: dict

get_path()

Gets the filesystem path of this managed folder. This method can only be called for managed folders that are stored on the local filesystem of the DSS server.

For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs.

is_partitioning_directory_based()

Whether the partitioning of the folder is based on sub-directories

list_paths_in_partition(partition='')

Gets the filesystem paths of the folder for the given partition (or for the entire folder)

list_partitions()

Gets the partitions in the folder

Return type

list

get_partition_folder(partition)

Gets the filesystem path of the directory corresponding to the partition (if the partitioning is directory-based)

get_id()
get_name()
file_path(filename)

Gets the filesystem path for a given file within the folder. This method can only be called for managed folders that are stored on the local filesystem of the DSS server.

For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs.

Parameters

filename (str) – Name of the file within the folder

read_json(filename)

Reads a JSON file within the folder and returns its parsed content

Parameters

filename (str) – Path of the file within the folder

Return type

list or dict: Depending on the content of the file

write_json(filename, obj)

Writes a JSON-serializable (mostly dict or list) object as JSON to a file within the folder

Parameters
  • filename (str) – Path of the target file within the folder

  • obj (str) – JSON-serializable object to write (generally dict or list)

clear()

Removes all files from the folder

clear_partition(partition)

Removes all files from a specific partition of the folder.

clear_path(path)

DEPRECATED - Use delete_path instead

delete_path(path)

Removes a file or directory from the folder

get_path_details(path='/')

Get details about a specific path (file or directory) in the folder

Return type

dict

get_download_stream(path)

Gets a file-like object that allows you to read a single file from this folder.

with folder.get_download_stream("myfile") as stream:
    data = stream.readline()
    print("First line of myfile is: {}".format(data))
Return type

file-like

upload_stream(path, f)

Uploads the content of a file-like object to a specific path in the managed folder. If the file already exists, it will be replaced.

# This copies a local file to the managed folder
with open("local_file_to_upload") as f:
    folder.upload_stream("name_of_file_in_folder", f)
Parameters
  • path (str) – Target path of the file to write in the managed folder

  • f – file-like object open for reading

upload_file(path, file_path)

Uploads a local file to a specific path in the managed folder. If the file already exists, it will be replaced.

Parameters
  • path (str) – Target path of the file to write in the managed folder

  • file_path – Absolute path to a local file

upload_data(path, data)

Uploads binary data to a specific path in the managed folder. If the file already exists, it will be replaced.

Parameters
  • path (str) – Target path of the file to write in the managed folder

  • data – str or unicode data to upload

get_writer(path)

Get a writer object to write incrementally to a specific path in the managed folder. If the file already exists, it will be replaced.

Parameters

path (str) – Target path of the file to write in the managed folder

get_last_metric_values(partition='')

Get the set of last values of the metrics on this folder, as a dataiku.ComputedMetrics object

get_metric_history(metric_lookup, partition='')

Get the set of all values a given metric took on this folder :param metric_lookup: metric name or unique identifier :param partition: optionally, the partition for which the values are to be fetched

save_external_metric_values(values_dict)

Save metrics on this folder. The metrics are saved with the type “external”

Parameters

values_dict – the values to save, as a dict. The keys of the dict are used as metric names

save_external_check_values(values_dict)

Save checks on this folder. The checks are saved with the type “external”

Parameters

values_dict – the values to save, as a dict. The keys of the dict are used as check names

class dataiku.core.managed_folder.ManagedFolderWriter(project_key, folder_id, path)
write(b)
close()

dataikuapi version

Use this class preferably outside of DSS

class dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, odb_id)

A managed folder on the DSS instance

property id
delete()

Delete the managed folder

get_definition()

Get the definition of the managed folder

Returns:

the definition, as a JSON object

set_definition(definition)

Set the definition of the managed folder

Args:

definition: the definition, as a JSON object. You should only set a definition object that has been retrieved using the get_definition call.

list_contents()

Get the list of files in the managed folder

Returns:

the list of files, as a JSON object

get_file(path)

Get a file from the managed folder

Returns:

the file’s content, as a stream

delete_file(path)

Delete a file from the managed folder

put_file(path, f)

Upload the file to the managed folder

Args:

f: the file contents, as a stream path: the path of the file

upload_folder(path, folder)

Upload the content of a folder at path in the managed folder.

Note: upload_folder(“target”, “source”) will result in “target” containing the content of “source”, not in “target” containing “source”.

Parameters
  • path (str) – the destination path of the folder in the managed folder (POSIX)

  • folder (str) – path (absolute or relative) of the source folder to upload

compute_metrics(metric_ids=None, probes=None)

Compute metrics on this managed folder. If the metrics are not specified, the metrics setup on the managed folder are used.

get_last_metric_values()

Get the last values of the metrics on this managed folder

Returns:

a list of metric objects and their value

get_metric_history(metric)

Get the history of the values of the metric on this dataset

Returns:

an object containing the values of the metric, cast to the appropriate type (double, boolean,…)

get_zone()

Gets the flow zone of this managed folder

Return type

dataikuapi.dss.flow.DSSFlowZone

move_to_zone(zone)

Moves this object to a flow zone

Parameters

zone (object) – a dataikuapi.dss.flow.DSSFlowZone where to move the object

share_to_zone(zone)

Share this object to a flow zone

Parameters

zone (object) – a dataikuapi.dss.flow.DSSFlowZone where to share the object

unshare_from_zone(zone)

Unshare this object from a flow zone

Parameters

zone (object) – a dataikuapi.dss.flow.DSSFlowZone from where to unshare the object

get_usages()

Get the recipes referencing this folder

Returns:

a list of usages

get_object_discussions()

Get a handle to manage discussions on the managed folder

Returns

the handle to manage discussions

Return type

dataikuapi.discussion.DSSObjectDiscussions

copy_to(target, write_mode='OVERWRITE')

Copies the data of this folder to another folder

Parameters

Folder (target) – a dataikuapi.dss.managedfolder.DSSManagedFolder representing the target of this copy

Returns

a DSSFuture representing the operation