Managed folders¶
Note
There are two main classes related to managed folder handling in Dataiku’s Python APIs:
dataiku.Folder
in the dataiku package. It was initially designed for usage within DSS in recipes and Jupyter notebooks.dataikuapi.dss.managedfolder.DSSManagedFolder
in the dataikuapi package. It was initially designed for usage outside of DSS.
Both classes have fairly similar capabilities, but we recommend using dataiku.Folder within DSS.
For more details on the two packages, please see Python APIs
This class lets you interact with managed folders in Python recipes and notebooks. For more information see Managed folders and Usage in Python for usage examples of the Folder API.
-
class
dataiku.
Folder
(lookup, project_key=None, ignore_flow=False)¶ This is a handle to interact with a managed folder.
Note: this class is also available as
dataiku.Folder
-
get_info
(sensitive_info=False)¶ Get information about the location and settings of this managed folder :rtype: dict
-
get_partition_info
(partition)¶ Get information about the partitions of this managed folder :rtype: dict
-
get_path
()¶ Gets the filesystem path of this managed folder. This method can only be called for managed folders that are stored on the local filesystem of the DSS server.
For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs.
-
is_partitioning_directory_based
()¶ Whether the partitioning of the folder is based on sub-directories
-
list_paths_in_partition
(partition='')¶ Gets the filesystem paths of the folder for the given partition (or for the entire folder)
-
list_partitions
()¶ Gets the partitions in the folder
- Return type
list
-
get_partition_folder
(partition)¶ Gets the filesystem path of the directory corresponding to the partition (if the partitioning is directory-based)
-
get_id
()¶
-
get_name
()¶
-
file_path
(filename)¶ Gets the filesystem path for a given file within the folder. This method can only be called for managed folders that are stored on the local filesystem of the DSS server.
For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs.
- Parameters
filename (str) – Name of the file within the folder
-
read_json
(filename)¶ Reads a JSON file within the folder and returns its parsed content
- Parameters
filename (str) – Path of the file within the folder
- Return type
list or dict: Depending on the content of the file
-
write_json
(filename, obj)¶ Writes a JSON-serializable (mostly dict or list) object as JSON to a file within the folder
- Parameters
filename (str) – Path of the target file within the folder
obj (str) – JSON-serializable object to write (generally dict or list)
-
clear
()¶ Removes all files from the folder
-
clear_partition
(partition)¶ Removes all files from a specific partition of the folder.
-
clear_path
(path)¶ DEPRECATED - Use delete_path instead
-
delete_path
(path)¶ Removes a file or directory from the folder
-
get_path_details
(path='/')¶ Get details about a specific path (file or directory) in the folder
- Return type
dict
-
get_download_stream
(path)¶ Gets a file-like object that allows you to read a single file from this folder.
with folder.get_download_stream("myfile") as stream: data = stream.readline() print("First line of myfile is: {}".format(data))
- Return type
file-like
-
upload_stream
(path, f)¶ Uploads the content of a file-like object to a specific path in the managed folder. If the file already exists, it will be replaced.
# This copies a local file to the managed folder with open("local_file_to_upload") as f: folder.upload_stream("name_of_file_in_folder", f)
- Parameters
path (str) – Target path of the file to write in the managed folder
f – file-like object open for reading
-
upload_file
(path, file_path)¶ Uploads a local file to a specific path in the managed folder. If the file already exists, it will be replaced.
- Parameters
path (str) – Target path of the file to write in the managed folder
file_path – Absolute path to a local file
-
upload_data
(path, data)¶ Uploads binary data to a specific path in the managed folder. If the file already exists, it will be replaced.
- Parameters
path (str) – Target path of the file to write in the managed folder
data – str or unicode data to upload
-
get_writer
(path)¶ Get a writer object to write incrementally to a specific path in the managed folder. If the file already exists, it will be replaced.
- Parameters
path (str) – Target path of the file to write in the managed folder
-
get_last_metric_values
(partition='')¶ Get the set of last values of the metrics on this folder, as a
dataiku.ComputedMetrics
object
-
get_metric_history
(metric_lookup, partition='')¶ Get the set of all values a given metric took on this folder :param metric_lookup: metric name or unique identifier :param partition: optionally, the partition for which the values are to be fetched
-
save_external_metric_values
(values_dict)¶ Save metrics on this folder. The metrics are saved with the type “external”
- Parameters
values_dict – the values to save, as a dict. The keys of the dict are used as metric names
-
save_external_check_values
(values_dict)¶ Save checks on this folder. The checks are saved with the type “external”
- Parameters
values_dict – the values to save, as a dict. The keys of the dict are used as check names
-
-
class
dataiku.core.managed_folder.
ManagedFolderWriter
(project_key, folder_id, path)¶ -
write
(b)¶
-
close
()¶
-
dataikuapi version¶
Use this class preferably outside of DSS
-
class
dataikuapi.dss.managedfolder.
DSSManagedFolder
(client, project_key, odb_id)¶ A managed folder on the DSS instance
-
property
id
¶
-
delete
()¶ Delete the managed folder
-
get_definition
()¶ Get the definition of the managed folder
- Returns:
the definition, as a JSON object
-
set_definition
(definition)¶ Set the definition of the managed folder
- Args:
definition: the definition, as a JSON object. You should only set a definition object that has been retrieved using the get_definition call.
-
list_contents
()¶ Get the list of files in the managed folder
- Returns:
the list of files, as a JSON object
-
get_file
(path)¶ Get a file from the managed folder
- Returns:
the file’s content, as a stream
-
delete_file
(path)¶ Delete a file from the managed folder
-
put_file
(path, f)¶ Upload the file to the managed folder
- Args:
f: the file contents, as a stream path: the path of the file
-
upload_folder
(path, folder)¶ Upload the content of a folder at path in the managed folder.
Note: upload_folder(“target”, “source”) will result in “target” containing the content of “source”, not in “target” containing “source”.
- Parameters
path (str) – the destination path of the folder in the managed folder (POSIX)
folder (str) – path (absolute or relative) of the source folder to upload
-
compute_metrics
(metric_ids=None, probes=None)¶ Compute metrics on this managed folder. If the metrics are not specified, the metrics setup on the managed folder are used.
-
get_last_metric_values
()¶ Get the last values of the metrics on this managed folder
- Returns:
a list of metric objects and their value
-
get_metric_history
(metric)¶ Get the history of the values of the metric on this dataset
- Returns:
an object containing the values of the metric, cast to the appropriate type (double, boolean,…)
-
get_zone
()¶ Gets the flow zone of this managed folder
- Return type
-
move_to_zone
(zone)¶ Moves this object to a flow zone
- Parameters
zone (object) – a
dataikuapi.dss.flow.DSSFlowZone
where to move the object
Share this object to a flow zone
- Parameters
zone (object) – a
dataikuapi.dss.flow.DSSFlowZone
where to share the object
Unshare this object from a flow zone
- Parameters
zone (object) – a
dataikuapi.dss.flow.DSSFlowZone
from where to unshare the object
-
get_usages
()¶ Get the recipes referencing this folder
- Returns:
a list of usages
-
get_object_discussions
()¶ Get a handle to manage discussions on the managed folder
- Returns
the handle to manage discussions
- Return type
dataikuapi.discussion.DSSObjectDiscussions
-
copy_to
(target, write_mode='OVERWRITE')¶ Copies the data of this folder to another folder
- Parameters
Folder (target) – a
dataikuapi.dss.managedfolder.DSSManagedFolder
representing the target of this copy- Returns
a DSSFuture representing the operation
-
property