Managed folders¶
Note
There are two main classes related to managed folder handling in Dataiku’s Python APIs:
dataiku.Folder
in the dataiku package. It was initially designed for usage within DSS in recipes and Jupyter notebooks.dataikuapi.dss.managedfolder.DSSManagedFolder
in the dataikuapi package. It was initially designed for usage outside of DSS.
Both classes have fairly similar capabilities, but we recommend using dataiku.Folder within DSS.
For more details on the two packages, please see Python APIs
This class lets you interact with managed folders in Python recipes and notebooks. See Managed folders for more information.
-
class
dataiku.
Folder
(lookup, project_key=None, ignore_flow=False)¶ This is a handle to interact with a managed folder.
Note: this class is also available as
dataiku.Folder
-
get_info
(sensitive_info=False)¶ Get information about the location and settings of this managed folder :rtype: dict
-
get_partition_info
(partition)¶ Get information about the partitions of this managed folder :rtype: dict
-
get_path
()¶ Gets the filesystem path of this managed folder. This method can only be called for managed folders that are stored on the local filesystem of the DSS server.
For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs.
-
is_partitioning_directory_based
()¶ Whether the partitioning of the folder is based on sub-directories
-
list_paths_in_partition
(partition='')¶ Gets the filesystem paths of the folder for the given partition (or for the entire folder)
-
list_partitions
()¶ Gets the partitions in the folder
Return type: list
-
get_partition_folder
(partition)¶ Gets the filesystem path of the directory corresponding to the partition (if the partitioning is directory-based)
-
get_id
()¶
-
get_name
()¶
-
file_path
(filename)¶ Gets the filesystem path for a given file within the folder. This method can only be called for managed folders that are stored on the local filesystem of the DSS server.
For non-filesystem managed folders (HDFS, S3, …), you need to use the various read/download and write/upload APIs.
Parameters: filename (str) – Name of the file within the folder
-
read_json
(filename)¶ Reads a JSON file within the folder and returns its parsed content
Parameters: filename (str) – Path of the file within the folder Return type: list or dict: Depending on the content of the file
-
write_json
(filename, obj)¶ Writes a JSON-serializable (mostly dict or list) object as JSON to a file within the folder
Parameters: - filename (str) – Path of the target file within the folder
- obj (str) – JSON-serializable object to write (generally dict or list)
-
clear
()¶ Removes all files from the folder
-
clear_partition
(partition)¶ Removes all files from a specific partition of the folder.
-
clear_path
(path)¶ DEPRECATED - Use delete_path instead
-
delete_path
(path)¶ Removes a file or directory from the folder
-
get_path_details
(path='/')¶ Get details about a specific path (file or directory) in the folder
Return type: dict
-
get_download_stream
(path)¶ Gets a file-like object that allows you to read a single file from this folder. If the file already exists, it will be replaced.
with folder.get_download_stream("myfile") as stream: data = stream.readline() print("First line of myfile is: {}".format(data))
Return type: file-like
-
upload_stream
(path, f)¶ Uploads the content of a file-like object to a specific path in the managed folder. If the file already exists, it will be replaced.
# This copies a local file to the managed folder with open("local_file_to_upload") as f: folder.upload_stream("name_of_file_in_folder", f)
Parameters: - path (str) – Target path of the file to write in the managed folder
- f – file-like object open for reading
-
upload_file
(path, file_path)¶ Uploads a local file to a specific path in the managed folder. If the file already exists, it will be replaced.
Parameters: - path (str) – Target path of the file to write in the managed folder
- file_path – Absolute path to a local file
-
upload_data
(path, data)¶ Uploads binary data to a specific path in the managed folder. If the file already exists, it will be replaced.
Parameters: - path (str) – Target path of the file to write in the managed folder
- data – str or unicode data to upload
-
get_writer
(path)¶ Get a writer object to write incrementally to a specific path in the managed folder. If the file already exists, it will be replaced.
Parameters: path (str) – Target path of the file to write in the managed folder
-
get_last_metric_values
(partition='')¶ Get the set of last values of the metrics on this folder, as a
dataiku.ComputedMetrics
object
-
get_metric_history
(metric_lookup, partition='')¶ Get the set of all values a given metric took on this folder :param metric_lookup: metric name or unique identifier :param partition: optionally, the partition for which the values are to be fetched
-
save_external_metric_values
(values_dict)¶ Save metrics on this folder. The metrics are saved with the type “external”
Parameters: values_dict – the values to save, as a dict. The keys of the dict are used as metric names
-
save_external_check_values
(values_dict)¶ Save checks on this folder. The checks are saved with the type “external”
Parameters: values_dict – the values to save, as a dict. The keys of the dict are used as check names
-
dataikuapi version¶
Use this class preferably outside of DSS
-
class
dataikuapi.dss.managedfolder.
DSSManagedFolder
(client, project_key, odb_id)¶ A managed folder on the DSS instance
-
id
¶
-
delete
()¶ Delete the managed folder
-
get_definition
()¶ Get the definition of the managed folder
- Returns:
- the definition, as a JSON object
-
set_definition
(definition)¶ Set the definition of the managed folder
- Args:
- definition: the definition, as a JSON object. You should only set a definition object that has been retrieved using the get_definition call.
-
list_contents
()¶ Get the list of files in the managed folder
- Returns:
- the list of files, as a JSON object
-
get_file
(path)¶ Get a file from the managed folder
- Returns:
- the file’s content, as a stream
-
delete_file
(path)¶ Delete a file from the managed folder
-
put_file
(path, f)¶ Upload the file to the managed folder
- Args:
- f: the file contents, as a stream path: the path of the file
-
compute_metrics
(metric_ids=None, probes=None)¶ Compute metrics on this managed folder. If the metrics are not specified, the metrics setup on the managed folder are used.
-
get_last_metric_values
()¶ Get the last values of the metrics on this managed folder
- Returns:
- a list of metric objects and their value
-
get_metric_history
(metric)¶ Get the history of the values of the metric on this dataset
- Returns:
- an object containing the values of the metric, cast to the appropriate type (double, boolean,…)
-
get_zone
()¶ Gets the flow zone of this managed folder
Return type: dataikuapi.dss.flow.DSSFlowZone
-
move_to_zone
(zone)¶ Moves this object to a flow zone
Parameters: zone (object) – a dataikuapi.dss.flow.DSSFlowZone
where to move the object
-
get_usages
()¶ Get the recipes referencing this folder
- Returns:
- a list of usages
-
get_object_discussions
()¶ Get a handle to manage discussions on the managed folder
Returns: the handle to manage discussions Return type: dataikuapi.discussion.DSSObjectDiscussions
-
copy_to
(target, write_mode='OVERWRITE')¶ Copies the data of this folder to another folder
Parameters: Folder (target) – a dataikuapi.dss.managedfolder.DSSManagedFolder
representing the target of this copyReturns: a DSSFuture representing the operation
-