Projects¶
Basic operations¶
The list of projects in the DSS instance can be retrieved with the list_project_keys method.
client = DSSClient(host, apiKey)
dss_projects = client.list_project_keys()
print(dss_projects)
outputs
['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS']
Projects can be created:
new_project = client.create_project('TEST_PROJECT', 'test project', 'tester', description='a simple description')
print(client.list_project_keys()
outputs
['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS', 'TEST_PROJECT']
Or an existing project can be used for later manipulation:
project = client.get_project(ProjectKey)
Creating, listing and getting handles to project items¶
Through various method on the DSSProject
class, you can:
Create most types of project items (datasets, recipes, managed folders, …)
List project items
Get structured handles to interact with each type of project item
Modifying project settings¶
Two parts of the project’s settings can be modified directly: the metadata and the permissions. In both cases, it is advised to first retrieve the current settings state with the get_metadata and get_permissions call, modify the returned object, and then set it back on the DSS instance.
project = client.get_project(ProjectKey)
project_metadata = project.get_metadata()
project_metadata['tags'] = ['tag1','tag2']
project.set_metadata(project_metadata)
project_permissions = project.get_permissions()
project_permissions['permissions'].append({'group':'data_scientists','readProjectContent': True, 'readDashboards': True})
project.set_permissions(project_permissions)
Available permissions to be set:
{
'group': u'data_team',
'admin': False,
'exportDatasetsData': True,
'manageAdditionalDashboardUsers': False,
'manageDashboardAuthorizations': False,
'manageExposedElements': False,
'moderateDashboards': False,
'readDashboards': True,
'readProjectContent': True,
'runScenarios': False,
'writeDashboards': False,
'writeProjectContent': False,
'shareToWorkspaces': False
}
Deleting¶
Projects can also be deleted:
project = client.get_project('TEST_PROJECT')
project.delete()
Exporting¶
Project export is available through the python API in two forms: either as a stream, or exported directly to a file. The data is sent zipped.
project = client.get_project('TEST_PROJECT')
project.export_to_file('exported_project.zip')
with project.get_export_stream() as s:
...
Importing¶
Project can be imported directly from a zip file:
with open("myproject.zip", "rb") as f:
client.prepare_project_import(f).execute()
Duplicating¶
Projects can be duplicated:
project = client.get_project('TEST_PROJECT')
project.duplicate('COPY_TEST_PROJECT', 'Copy of the Test Project')
Reference documentation¶
dataikuapi package API¶
-
class
dataikuapi.dss.project.
DSSProject
(client, project_key)¶ A handle to interact with a project on the DSS instance.
Important
Do not create this class directly, instead use
dataikuapi.DSSClient.get_project()
-
get_summary
()¶ Returns a summary of the project. The summary is a read-only view of some of the state of the project. You cannot edit the resulting dict and use it to update the project state on DSS, you must use the other more specific methods of this
dataikuapi.dss.project.DSSProject
object- Returns
a dict containing a summary of the project. Each dict contains at least a projectKey field
- Return type
dict
-
get_project_folder
()¶ Get the folder containing this project
- Return type
-
move_to_folder
(folder)¶ Moves this project to a project folder
- Parameters
folder (
dataikuapi.dss.projectfolder.DSSProjectFolder
) – destination folder
-
delete
(clear_managed_datasets=False, clear_output_managed_folders=False, clear_job_and_scenario_logs=True, **kwargs)¶ Delete the project
Attention
This call requires an API key with admin rights
- Parameters
clear_managed_datasets (bool) – Should the data of managed datasets be cleared (defaults to False)
clear_output_managed_folders (bool) – Should the data of managed folders used as outputs of recipes be cleared (defaults to False)
clear_job_and_scenario_logs (bool) – Should the job and scenario logs be cleared (defaults to True)
-
get_export_stream
(options=None)¶ Return a stream of the exported project
Warning
You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.
- Parameters
options (dict) –
Dictionary of export options (defaults to {}). The following options are available:
exportUploads (boolean): Exports the data of Uploaded datasets (default to False)
exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)
exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)
exportSavedModels (boolean): Exports the models trained in saved models (default to False)
exportManagedFolders (boolean): Exports the data of managed folders (default to False)
exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)
exportAllDatasets (boolean): Exports the data of all datasets (default to False)
exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)
exportGitRepository (boolean): Exports the Git repository history (default to False)
exportInsightsData (boolean): Exports the data of static insights (default to False)
- Returns
a stream of the export archive
- Return type
file-like object
-
export_to_file
(path, options=None)¶ Export the project to a file
- Parameters
path (str) – the path of the file in which the exported project should be saved
options (dict) –
Dictionary of export options (defaults to {}). The following options are available:
exportUploads (boolean): Exports the data of Uploaded datasets (default to False)
exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)
exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)
exportSavedModels (boolean): Exports the models trained in saved models (default to False)
exportModelEvaluationStores (boolean): Exports the evaluation stores (default to False)
exportManagedFolders (boolean): Exports the data of managed folders (default to False)
exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)
exportAllDatasets (boolean): Exports the data of all datasets (default to False)
exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)
exportGitRepository (boolean): Exports the Git repository history (default to False)
exportInsightsData (boolean): Exports the data of static insights (default to False)
-
duplicate
(target_project_key, target_project_name, duplication_mode='MINIMAL', export_analysis_models=True, export_saved_models=True, export_git_repository=True, export_insights_data=True, remapping=None, target_project_folder=None)¶ Duplicate the project
- Parameters
target_project_key (str) – The key of the new project
target_project_name (str) – The name of the new project
duplication_mode (str) – can be one of the following values: MINIMAL, SHARING, FULL, NONE (defaults to MINIMAL)
export_analysis_models (bool) – (defaults to True)
export_saved_models (bool) – (defaults to True)
export_git_repository (bool) – (defaults to True)
export_insights_data (bool) – (defaults to True)
remapping (dict) – dict of connections to be remapped for the new project (defaults to {})
target_project_folder (A
dataikuapi.dss.projectfolder.DSSProjectFolder
) – the project folder where to put the duplicated project (defaults to None)
- Returns
A dict containing the original and duplicated project’s keys
- Return type
dict
-
get_metadata
()¶ Get the metadata attached to this project. The metadata contains label, description checklists, tags and custom metadata of the project.
Note
For more information on available metadata, please see https://doc.dataiku.com/dss/api/6.0/rest/
- Returns
the project metadata.
- Return type
dict
-
set_metadata
(metadata)¶ Set the metadata on this project.
Usage example:
project_metadata = project.get_metadata() project_metadata['tags'] = ['tag1','tag2'] project.set_metadata(project_metadata)
- Parameters
metadata (dict) – the new state of the metadata for the project. You should only set a metadata object that has been retrieved using the
get_metadata()
call.
-
get_settings
()¶ Gets the settings of this project. This does not contain permissions. See
get_permissions()
- Returns
a handle to read, modify and save the settings
- Return type
dataikuapi.dss.project.DSSProjectSettings
-
get_permissions
()¶ Get the permissions attached to this project
- Returns
A dict containing the owner and the permissions, as a list of pairs of group name and permission type
- Return type
dict
-
set_permissions
(permissions)¶ Sets the permissions on this project
Usage example:
project_permissions = project.get_permissions() project_permissions['permissions'].append({'group':'data_scientists', 'readProjectContent': True, 'readDashboards': True}) project.set_permissions(project_permissions)
- Parameters
permissions (dict) – a permissions object with the same structure as the one returned by
get_permissions()
call
-
get_interest
()¶ Get the interest of this project. The interest means the number of watchers and the number of stars.
- Returns
a dict object containing the interest of the project with two fields:
starCount: number of stars for this project
watchCount: number of users watching this project
- Return type
dict
-
get_timeline
(item_count=100)¶ Get the timeline of this project. The timeline consists of information about the creation of this project (by whom, and when), the last modification of this project (by whom and when), a list of contributors, and a list of modifications. This list of modifications contains a maximum of item_count elements (default to 100). If item_count is greater than the real number of modification, item_count is adjusted.
- Parameters
item_count (int) – maximum number of modifications to retrieve in the items list
- Returns
a timeline where the top-level fields are :
allContributors: all contributors who have been involved in this project
items: a history of the modifications of the project
createdBy: who created this project
createdOn: when the project was created
lastModifiedBy: who modified this project for the last time
lastModifiedOn: when this modification took place
- Return type
dict
-
list_datasets
(as_type='listitems')¶ List the datasets in this project.
- Parameters
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns
The list of the datasets. If “as_type” is “listitems”, each one as a
dataikuapi.dss.dataset.DSSDatasetListItem
. If “as_type” is “objects”, each one as adataikuapi.dss.dataset.DSSDataset
- Return type
list
-
get_dataset
(dataset_name)¶ Get a handle to interact with a specific dataset
- Parameters
dataset_name (str) – the name of the desired dataset
- Returns
A dataset handle
- Return type
-
create_dataset
(dataset_name, type, params=None, formatType=None, formatParams=None)¶ Create a new dataset in the project, and return a handle to interact with it.
The precise structure of params and formatParams depends on the specific dataset type and dataset format type. To know which fields exist for a given dataset type and format type, create a dataset from the UI, and use
get_dataset()
to retrieve the configuration of the dataset and inspect it. Then reproduce a similar structure in thecreate_dataset()
call.Not all settings of a dataset can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the dataset
- Parameters
dataset_name (str) – the name of the dataset to create. Must not already exist
type (str) – the type of the dataset
params (dict) – the parameters for the type, as a python dict (defaults to {})
formatType (str) – an optional format to create the dataset with (only for file-oriented datasets)
formatParams (dict) – the parameters to the format, as a python dict (only for file-oriented datasets, default to {})
- Returns
A dataset handle
- Return type
-
create_upload_dataset
(dataset_name, connection=None)¶ Create a new dataset of type ‘UploadedFiles’ in the project, and return a handle to interact with it.
- Parameters
dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the upload connection (defaults to None)
- Returns
A dataset handle
- Return type
-
create_filesystem_dataset
(dataset_name, connection, path_in_connection)¶ Create a new filesystem dataset in the project, and return a handle to interact with it.
- Parameters
dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
- Returns
A dataset handle
- Return type
-
create_s3_dataset
(dataset_name, connection, path_in_connection, bucket=None)¶ Creates a new external S3 dataset in the project and returns a
dataikuapi.dss.dataset.DSSDataset
to interact with it.The created dataset does not have its format and schema initialized, it is recommended to use
autodetect_settings()
on the returned object- Parameters
dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
bucket (str) – the name of the s3 bucket (defaults to None)
- Returns
A dataset handle
- Return type
-
create_fslike_dataset
(dataset_name, dataset_type, connection, path_in_connection, extra_params=None)¶ Create a new file-based dataset in the project, and return a handle to interact with it.
- Parameters
dataset_name (str) – the name of the dataset to create. Must not already exist
dataset_type (str) – the type of the dataset
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
extra_params (dict) – a python dict of extra parameters (defaults to None)
- Returns
A dataset handle
- Return type
-
create_sql_table_dataset
(dataset_name, type, connection, table, schema)¶ Create a new SQL table dataset in the project, and return a handle to interact with it.
- Parameters
dataset_name (str) – the name of the dataset to create. Must not already exist
type (str) – the type of the dataset
connection (str) – the name of the connection
table (str) – the name of the table in the connection
schema (str) – the schema of the table
- Returns
A dataset handle
- Return type
-
new_managed_dataset_creation_helper
(dataset_name)¶ Caution
Deprecated. Please use
new_managed_dataset()
-
new_managed_dataset
(dataset_name)¶ Initializes the creation of a new managed dataset. Returns a
dataikuapi.dss.dataset.DSSManagedDatasetCreationHelper
or one of its subclasses to complete the creation of the managed dataset.Usage example:
builder = project.new_managed_dataset("my_dataset") builder.with_store_into("target_connection") dataset = builder.create()
- Parameters
dataset_name (str) – Name of the dataset to create
- Returns
An object to create the managed dataset
- Return type
-
list_streaming_endpoints
(as_type='listitems')¶ List the streaming endpoints in this project.
- Parameters
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns
The list of the streaming endpoints. If “as_type” is “listitems”, each one as a
dataikuapi.dss.streaming_endpoint.DSSStreamingEndpointListItem
. If “as_type” is “objects”, each one as adataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint
- Return type
list
-
get_streaming_endpoint
(streaming_endpoint_name)¶ Get a handle to interact with a specific streaming endpoint
- Parameters
streaming_endpoint_name (str) – the name of the desired streaming endpoint
- Returns
A streaming endpoint handle
- Return type
-
create_streaming_endpoint
(streaming_endpoint_name, type, params=None)¶ Create a new streaming endpoint in the project, and return a handle to interact with it.
The precise structure of params depends on the specific streaming endpoint type. To know which fields exist for a given streaming endpoint type, create a streaming endpoint from the UI, and use
get_streaming_endpoint()
to retrieve the configuration of the streaming endpoint and inspect it. Then reproduce a similar structure in thecreate_streaming_endpoint()
call.Not all settings of a streaming endpoint can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the streaming endpoint.
- Parameters
streaming_endpoint_name (str) – the name for the new streaming endpoint
type (str) – the type of the streaming endpoint
params (dict) – the parameters for the type, as a python dict (defaults to {})
- Returns
A streaming endpoint handle
- Return type
-
create_kafka_streaming_endpoint
(streaming_endpoint_name, connection=None, topic=None)¶ Create a new kafka streaming endpoint in the project, and return a handle to interact with it.
- Parameters
streaming_endpoint_name (str) – the name for the new streaming endpoint
connection (str) – the name of the kafka connection (defaults to None)
topic (str) – the name of the kafka topic (defaults to None)
- Returns
A streaming endpoint handle
- Return type
-
create_httpsse_streaming_endpoint
(streaming_endpoint_name, url=None)¶ Create a new https streaming endpoint in the project, and return a handle to interact with it.
- Parameters
streaming_endpoint_name (str) – the name for the new streaming endpoint
url (str) – the url of the endpoint (defaults to None)
- Returns
A streaming endpoint handle
- Return type
-
new_managed_streaming_endpoint
(streaming_endpoint_name, streaming_endpoint_type=None)¶ Initializes the creation of a new streaming endpoint. Returns a
dataikuapi.dss.streaming_endpoint.DSSManagedStreamingEndpointCreationHelper
to complete the creation of the streaming endpoint- Parameters
streaming_endpoint_name (str) – Name of the new streaming endpoint - must be unique in the project
streaming_endpoint_type (str) – Type of the new streaming endpoint (optional if it can be inferred from a connection type)
- Returns
An object to create the streaming endpoint
- Return type
-
create_prediction_ml_task
(input_dataset, target_variable, ml_backend_type='PY_MEMORY', guess_policy='DEFAULT', prediction_type=None, wait_guess_complete=True)¶ Creates a new prediction task in a new visual analysis lab for a dataset.
- Parameters
input_dataset (str) – the dataset to use for training/testing the model
target_variable (str) – the variable to predict
ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)
guess_policy (str) – Policy to use for setting the default parameters. Valid values are: DEFAULT, SIMPLE_FORMULA, DECISION_TREE, EXPLANATORY and PERFORMANCE (defaults to DEFAULT)
prediction_type (str) – The type of prediction problem this is. If not provided the prediction type will be guessed. Valid values are: BINARY_CLASSIFICATION, REGRESSION, MULTICLASS (defaults to None)
wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)
- Returns
A ML task handle of type ‘PREDICTION’
- Return type
-
create_clustering_ml_task
(input_dataset, ml_backend_type='PY_MEMORY', guess_policy='KMEANS', wait_guess_complete=True)¶ Creates a new clustering task in a new visual analysis lab for a dataset.
The returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms.
You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)
- Parameters
ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)
guess_policy (str) – Policy to use for setting the default parameters. Valid values are: KMEANS and ANOMALY_DETECTION (defaults to KMEANS)
wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)
- Returns
A ML task handle of type ‘CLUSTERING’
- Return type
-
create_timeseries_forecasting_ml_task
(input_dataset, target_variable, time_variable, timeseries_identifiers=None, guess_policy='TIMESERIES_DEFAULT', wait_guess_complete=True)¶ Creates a new time series forecasting task in a new visual analysis lab for a dataset.
- Parameters
input_dataset (string) – The dataset to use for training/testing the model
target_variable (string) – The variable to forecast
time_variable (string) – Column to be used as time variable. Should be a Date (parsed) column.
timeseries_identifiers (list) – List of columns to be used as time series identifiers (when the dataset has multiple series)
guess_policy (string) – Policy to use for setting the default parameters. Valid values are: TIMESERIES_DEFAULT, TIMESERIES_STATISTICAL, and TIMESERIES_DEEP_LEARNING
wait_guess_complete (boolean) – If False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling
wait_guess_complete
on the returned object before doing anything else (in particular callingtrain
orget_settings
)
:return :class dataiku.dss.ml.DSSMLTask
-
list_ml_tasks
()¶ List the ML tasks in this project
- Returns
the list of the ML tasks summaries, each one as a python dict
- Return type
list
-
get_ml_task
(analysis_id, mltask_id)¶ Get a handle to interact with a specific ML task
- Parameters
analysis_id (str) – the identifier of the visual analysis containing the desired ML task
mltask_id (str) – the identifier of the desired ML task
- Returns
A ML task handle
- Return type
-
list_mltask_queues
()¶ List non-empty ML task queues in this project
- Returns
an iterable listing of MLTask queues (each a dict)
- Return type
dataikuapi.dss.ml.DSSMLTaskQueues
-
create_analysis
(input_dataset)¶ Creates a new visual analysis lab for a dataset.
- Parameters
input_dataset (str) – the dataset to use for the analysis
- Returns
A visual analysis handle
- Return type
dataikuapi.dss.analysis.DSSAnalysis
-
list_analyses
()¶ List the visual analyses in this project
- Returns
the list of the visual analyses summaries, each one as a python dict
- Return type
list
-
get_analysis
(analysis_id)¶ Get a handle to interact with a specific visual analysis
- Parameters
analysis_id (str) – the identifier of the desired visual analysis
- Returns
A visual analysis handle
- Return type
dataikuapi.dss.analysis.DSSAnalysis
-
list_saved_models
()¶ List the saved models in this project
- Returns
the list of the saved models, each one as a python dict
- Return type
list
-
get_saved_model
(sm_id)¶ Get a handle to interact with a specific saved model
- Parameters
sm_id (str) – the identifier of the desired saved model
- Returns
A saved model handle
- Return type
-
create_mlflow_pyfunc_model
(name, prediction_type=None)¶ Creates a new external saved model for storing and managing MLFlow models
- Parameters
name (str) – Human readable name for the new saved model in the flow
prediction_type (str) – Optional (but needed for most operations). One of BINARY_CLASSIFICATION, MULTICLASS or REGRESSION
- Returns
The created saved model handle
- Return type
-
create_proxy_model
(name, prediction_type=None)¶ EXPERIMENTAL. Creates a new external saved model that can contain proxy model as versions.
This is an experimental API, subject to change. :param string name: Human readable name for the new saved model in the flow :param string prediction_type: Optional (but needed for most operations). One of BINARY_CLASSIFICATION, MULTICLASS or REGRESSION
-
list_managed_folders
()¶ List the managed folders in this project
- Returns
the list of the managed folders, each one as a python dict
- Return type
list
-
get_managed_folder
(odb_id)¶ Get a handle to interact with a specific managed folder
- Parameters
odb_id (str) – the identifier of the desired managed folder
- Returns
A managed folder handle
- Return type
-
create_managed_folder
(name, folder_type=None, connection_name='filesystem_folders')¶ Create a new managed folder in the project, and return a handle to interact with it
- Parameters
name (str) – the name of the managed folder
folder_type (str) – type of storage (defaults to None)
connection_name (str) – the connection name (defaults to filesystem_folders)
- Returns
A managed folder handle
- Return type
-
list_model_evaluation_stores
()¶ List the model evaluation stores in this project.
- Returns
The list of the model evaluation stores
- Return type
list of
dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore
-
get_model_evaluation_store
(mes_id)¶ Get a handle to interact with a specific model evaluation store
- Parameters
mes_id (str) – the id of the desired model evaluation store
- Returns
A model evaluation store handle
- Return type
-
create_model_evaluation_store
(name)¶ Create a new model evaluation store in the project, and return a handle to interact with it.
- Parameters
name (str) – the name for the new model evaluation store
- Returns
A model evaluation store handle
- Return type
-
list_model_comparisons
()¶ List the model comparisons in this project.
- Returns
The list of the model comparisons
- Return type
list
-
get_model_comparison
(mec_id)¶ Get a handle to interact with a specific model comparison
- Parameters
mec_id (str) – the id of the desired model comparison
- Returns
A model comparison handle
- Return type
dataikuapi.dss.modelcomparison.DSSModelComparison
-
create_model_comparison
(name, prediction_type)¶ Create a new model comparison in the project, and return a handle to interact with it.
- Parameters
name (str) – the name for the new model comparison
prediction_type (str) – one of BINARY_CLASSIFICATION, REGRESSION, MULTICLASS, and TIMESERIES_FORECAST
- Returns
A new model comparison handle
- Return type
dataikuapi.dss.modelcomparison.DSSModelComparison
-
list_jobs
()¶ List the jobs in this project
- Returns
a list of the jobs, each one as a python dict, containing both the definition and the state
- Return type
list
-
get_job
(id)¶ Get a handler to interact with a specific job
- Parameters
id (str) – the id of the desired job
- Returns
A job handle
- Return type
-
start_job
(definition)¶ Create a new job, and return a handle to interact with it
- Parameters
definition (dict) –
The definition should contain:
the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)
a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)
(Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.
- Returns
A job handle
- Return type
-
start_job_and_wait
(definition, no_fail=False)¶ Starts a new job and waits for it to complete.
- Parameters
definition (dict) –
The definition should contain:
the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)
a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)
(Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.
no_fail (bool) – if true, the function won’t fail even if the job fails or aborts (defaults to False)
- Returns
the final status of the job
- Return type
str
-
new_job
(job_type='NON_RECURSIVE_FORCED_BUILD')¶ Create a job to be run. You need to add outputs to the job (i.e. what you want to build) before running it.
job_builder = project.new_job() job_builder.with_output("mydataset") complete_job = job_builder.start_and_wait() print("Job %s done" % complete_job.id)
- Parameters
job_type (str) – the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) (defaults to NON_RECURSIVE_FORCED_BUILD)
- Returns
A job handle
- Return type
-
new_job_definition_builder
(job_type='NON_RECURSIVE_FORCED_BUILD')¶ Caution
Deprecated. Please use
new_job()
-
list_jupyter_notebooks
(active=False, as_type='object')¶ List the jupyter notebooks of a project.
- Parameters
active (bool) – if True, only return currently running jupyter notebooks (defaults to active).
as_type (bool) – How to return the list. Supported values are “listitems” and “object” (defaults to object).
- Returns
The list of the notebooks. If “as_type” is “listitems”, each one as a
dataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem
, if “as_type” is “objects”, each one as adataikuapi.dss.jupyternotebook.DSSJupyterNotebook
- Return type
list of
dataikuapi.dss.jupyternotebook.DSSJupyterNotebook
or list ofdataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem
-
get_jupyter_notebook
(notebook_name)¶ Get a handle to interact with a specific jupyter notebook
- Parameters
notebook_name (str) – The name of the jupyter notebook to retrieve
- Returns
A handle to interact with this jupyter notebook
- Return type
dataikuapi.dss.jupyternotebook.DSSJupyterNotebook
jupyter notebook handle
-
create_jupyter_notebook
(notebook_name, notebook_content)¶ Create a new jupyter notebook and get a handle to interact with it
- Parameters
notebook_name (str) – the name of the notebook to create
notebook_content (dict) – the data of the notebook to create, as a dict. The data will be converted to a JSON string internally. Use
DSSJupyterNotebook.get_content()
on a similar existing DSSJupyterNotebook object in order to get a sample definition object.
- Returns
A handle to interact with the newly created jupyter notebook
- Return type
dataikuapi.dss.jupyternotebook.DSSJupyterNotebook
jupyter notebook handle
-
list_continuous_activities
(as_objects=True)¶ List the continuous activities in this project
- Parameters
as_objects (bool) – if True, returns a list of
dataikuapi.dss.continuousactivity.DSSContinuousActivity
objects, else returns a list of python dicts (defaults to True)- Returns
a list of the continuous activities, each one as a python dict, containing both the definition and the state
- Return type
list
-
get_continuous_activity
(recipe_id)¶ Get a handler to interact with a specific continuous activities
- Parameters
recipe_id (str) – the identifier of the recipe controlled by the continuous activity
- Returns
A job handle
- Return type
-
get_variables
()¶ Gets the variables of this project.
- Returns
a dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project
- Return type
dict
-
set_variables
(obj)¶ Sets the variables of this project.
Warning
If executed from a python recipe, the changes made by set_variables will not be “seen” in that recipe. Use the internal API dataiku.get_custom_variables() instead if this behavior is needed
- Parameters
obj (dict) – must be a modified version of the object returned by get_variables
-
update_variables
(variables, type='standard')¶ Updates a set of variables for this project
- Parameters
dict (variables) – a dict of variable name -> value to set. Keys of the dict must be strings. Values in the dict can be strings, numbers, booleans, lists or dicts
str (type) – Can be “standard” to update regular variables or “local” to update local-only variables that are not part of bundles for this project (defaults to standard)
-
list_api_services
()¶ List the API services in this project
- Returns
the list of API services, each one as a python dict
- Return type
list
-
create_api_service
(service_id)¶ Create a new API service, and returns a handle to interact with it. The newly-created service does not have any endpoint.
- Parameters
service_id (str) – the ID of the API service to create
- Returns
A API Service handle
- Return type
-
get_api_service
(service_id)¶ Get a handle to interact with a specific API Service from the API Designer
- Parameters
service_id (str) – The identifier of the API Designer API Service to retrieve
- Returns
A handle to interact with this API Service
- Return type
-
list_exported_bundles
()¶ List all the bundles created in this project on the Design Node.
- Returns
A dictionary of all bundles for a project on the Design node.
- Return type
dict
-
export_bundle
(bundle_id)¶ Creates a new project bundle on the Design node
- Parameters
bundle_id (str) – bundle id tag
-
get_exported_bundle_archive_stream
(bundle_id)¶ Download a bundle archive that can be deployed in a DSS automation Node, as a binary stream.
Warning
The stream must be closed after use. Use a with statement to handle closing the stream at the end of the block by default. For example:
with project.get_exported_bundle_archive_stream('v1') as fp: # use fp # or explicitly close the stream after use fp = project.get_exported_bundle_archive_stream('v1') # use fp, then close fp.close()
- Parameters
bundle_id (str) – the identifier of the bundle
-
download_exported_bundle_archive_to_file
(bundle_id, path)¶ Download a bundle archive that can be deployed in a DSS automation Node into the given output file.
- Parameters
bundle_id (str) – the identifier of the bundle
path (str) – if “-“, will write to /dev/stdout
-
publish_bundle
(bundle_id, published_project_key=None)¶ Publish a bundle on the Project Deployer.
- Parameters
bundle_id (str) – The identifier of the bundle
published_project_key (str) – The key of the project on the Project Deployer where the bundle will be published.A new published project will be created if none matches the key. If the parameter is not set, the key from the current
DSSProject
is used.
- Returns
a dict with info on the bundle state once published. It contains the keys “publishedOn” for the publish date, “publishedBy” for the user who published the bundle, and “publishedProjectKey” for the key of the Project Deployer project used.
- Return type
dict
-
list_imported_bundles
()¶ List all the bundles imported for this project, on the Automation node.
- Returns
a dict containing bundle imports for a project, on the Automation node.
- Return type
dict
-
import_bundle_from_archive
(archive_path)¶ Imports a bundle from a zip archive path on the Automation node.
- Parameters
archive_path (str) – A full path to a zip archive, for example /home/dataiku/my-bundle-v1.zip
-
import_bundle_from_stream
(fp)¶ Imports a bundle from a file stream, on the Automation node.
Usage example:
project = client.get_project('MY_PROJECT') with open('/home/dataiku/my-bundle-v1.zip', 'rb') as f: project.import_bundle_from_stream(f)
- Parameters
fp (file-like) – file handler.
-
activate_bundle
(bundle_id, scenarios_to_enable=None)¶ Activates a bundle in this project.
- Parameters
bundle_id (str) – The ID of the bundle to activate
scenarios_to_enable (dict) – An optional dict of scenarios to enable or disable upon bundle activation. The format of the dict should be scenario IDs as keys with values of True or False (defaults to {}).
- Returns
A report containing any error or warning messages that occurred during bundle activation
- Return type
dict
-
preload_bundle
(bundle_id)¶ Preloads a bundle that has been imported on the Automation node
- Parameters
bundle_id (str) – the bundle_id for an existing imported bundle
-
list_scenarios
(as_type='listitems')¶ List the scenarios in this project.
- Parameters
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns
The list of the datasets. If “rtype” is “listitems”, each one as a
dataikuapi.dss.scenario.DSSScenarioListItem
. If “rtype” is “objects”, each one as adataikuapi.dss.scenario.DSSScenario
- Return type
list
-
get_scenario
(scenario_id)¶ Get a handle to interact with a specific scenario
- Parameters
str – scenario_id: the ID of the desired scenario
- Returns
A scenario handle
- Return type
-
create_scenario
(scenario_name, type, definition=None)¶ Create a new scenario in the project, and return a handle to interact with it
- Parameters
scenario_name (str) – The name for the new scenario. This does not need to be unique (although this is strongly recommended)
type (str) – The type of the scenario. MUst be one of ‘step_based’ or ‘custom_python’
definition (dict) – the JSON definition of the scenario. Use get_definition(with_status=False) on an existing DSSScenario object in order to get a sample definition object (defaults to {‘params’: {}})
- Returns
a
dataikuapi.dss.scenario.DSSScenario
handle to interact with the newly-created scenario
-
list_recipes
(as_type='listitems')¶ List the recipes in this project
- Parameters
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns
The list of the recipes. If “as_type” is “listitems”, each one as a
dataikuapi.dss.recipe.DSSRecipeListItem
. If “as_type” is “objects”, each one as adataikuapi.dss.recipe.DSSRecipe
- Return type
list
-
get_recipe
(recipe_name)¶ Gets a
dataikuapi.dss.recipe.DSSRecipe
handle to interact with a recipe- Parameters
recipe_name (str) – The name of the recipe
- Returns
A recipe handle
- Return type
-
create_recipe
(recipe_proto, creation_settings)¶ Create a new recipe in the project, and return a handle to interact with it. We strongly recommend that you use the creator helpers instead of calling this directly.
Some recipe types require additional parameters in creation_settings:
‘grouping’ : a ‘groupKey’ column name
‘python’, ‘sql_query’, ‘hive’, ‘impala’ : the code of the recipe as a ‘payload’ string
- Parameters
recipe_proto (dict) – a prototype for the recipe object. Must contain at least ‘type’ and ‘name’
creation_settings (dict) – recipe-specific creation settings
- Returns
A recipe handle
- Return type
-
new_recipe
(type, name=None)¶ Initializes the creation of a new recipe. Returns a
dataikuapi.dss.recipe.DSSRecipeCreator
or one of its subclasses to complete the creation of the recipe.Usage example:
grouping_recipe_builder = project.new_recipe("grouping") grouping_recipe_builder.with_input("dataset_to_group_on") # Create a new managed dataset for the output in the "filesystem_managed" connection grouping_recipe_builder.with_new_output("grouped_dataset", "filesystem_managed") grouping_recipe_builder.with_group_key("column") recipe = grouping_recipe_builder.build() # After the recipe is created, you can edit its settings recipe_settings = recipe.get_settings() recipe_settings.set_column_aggregations("value", sum=True) recipe_settings.save() # And you may need to apply new schemas to the outputs recipe.compute_schema_updates().apply()
- Parameters
type (str) – Type of the recipe
name (str) – Optional, base name for the new recipe.
- Returns
A new DSS Recipe Creator handle
- Return type
-
get_flow
()¶ - Returns
A Flow handle
- Return type
-
sync_datasets_acls
()¶ Resync permissions on HDFS datasets in this project
Attention
This call requires an API key with admin rights
- Returns
a handle to the task of resynchronizing the permissions
- Return type
-
list_running_notebooks
(as_objects=True)¶ Caution
Deprecated. Use
DSSProject.list_jupyter_notebooks()
List the currently-running notebooks
- Returns
list of notebooks. Each object contains at least a ‘name’ field
- Return type
list
List the tags of this project.
- Returns
a dictionary containing the tags with a color
- Return type
dict
Set the tags of this project. :param dict tags: must be a modified version of the object returned by list_tags (defaults to {})
-
list_macros
(as_objects=False)¶ List the macros accessible in this project
- Parameters
as_objects – if True, return the macros as
dataikuapi.dss.macro.DSSMacro
macro handles instead of a list of python dicts (defaults to False)- Returns
the list of the macros
- Return type
list
-
get_macro
(runnable_type)¶ Get a handle to interact with a specific macro
- Parameters
runnable_type (str) – the identifier of a macro
- Returns
A macro handle
- Return type
-
get_wiki
()¶ Get the wiki
- Returns
the wiki associated to the project
- Return type
-
get_object_discussions
()¶ Get a handle to manage discussions on the project
- Returns
the handle to manage discussions
- Return type
-
init_tables_import
()¶ Start an operation to import Hive or SQL tables as datasets into this project
- Returns
a
dataikuapi.dss.project.TablesImportDefinition
to add tables to import- Return type
-
list_sql_schemas
(connection_name)¶ Lists schemas from which tables can be imported in a SQL connection
- Parameters
connection_name (str) – name of the SQL connection
- Returns
an array of schemas names
- Return type
list
-
list_hive_databases
()¶ Lists Hive databases from which tables can be imported
- Returns
an array of databases names
- Return type
list
-
list_sql_tables
(connection_name, schema_name=None)¶ Lists tables to import in a SQL connection
- Parameters
connection_name (str) – name of the SQL connection
schema_name (str) – Optional, name of the schema in the SQL connection in which to list tables.
- Returns
an array of tables
- Return type
list
-
list_hive_tables
(hive_database)¶ Lists tables to import in a Hive database
- Parameters
hive_database (str) – name of the Hive database
- Returns
an array of tables
- Return type
list
-
list_elasticsearch_indices_or_aliases
(connection_name)¶
-
get_app_manifest
()¶ Gets the manifest of the application if the project is an app template or an app instance, fails otherwise.
- Returns
the manifest of the application associated to the project
- Return type
-
setup_mlflow
(managed_folder, host=None)¶ Set up the dss-plugin for MLflow
- Parameters
managed_folder (object) – the managed folder where MLflow artifacts should be stored. Can be either a managed folder id as a string, a
dataikuapi.dss.managedfolder.DSSManagedFolder
, or adataiku.Folder
host (str) – setup a custom host if the backend used is not DSS (defaults to None).
-
get_mlflow_extension
()¶ Get a handle to interact with the extension of MLflow provided by DSS
- Returns
A Mlflow Extension handle
- Return type
-
list_code_studios
(as_type='listitems')¶ List the code studio objects in this project
- Parameters
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns
the list of the code studio objects, each one as a python dict
- Return type
list
-
get_code_studio
(code_studio_id)¶ Get a handle to interact with a specific code studio object
- Parameters
code_studio_id (str) – the identifier of the desired code studio object
- Returns
A code studio object handle
- Return type
-
create_code_studio
(name, template_id)¶ Create a new code studio object in the project, and return a handle to interact with it
- Parameters
name (str) – the name of the code studio object
template_id (str) – the identifier of a code studio template
- Returns
A code studio object handle
- Return type
-
get_library
()¶ Get a handle to manage the project library
- Returns
- Return type
-
list_webapps
(as_type='listitems')¶ List the webapp heads of this project
- Parameters
as_type (str) – How to return the list. Supported values are “listitems” and “objects”.
- Returns
The list of the webapps. If “as_type” is “listitems”, each one as a
scenario.DSSWebAppListItem
. If “as_type” is “objects”, each one as ascenario.DSSWebApp
- Return type
list
-
get_webapp
(webapp_id)¶ Get a handle to interact with a specific webapp :param webapp_id: the identifier of a webapp :returns: A
dataikuapi.dss.webapp.DSSWebApp
webapp handle
-
dataiku package API¶
-
class
dataiku.
Project
(project_key=None)¶ This is a handle to interact with the current project
Note: this class is also available as
dataiku.Project
-
get_last_metric_values
()¶ Get the set of last values of the metrics on this project, as a
dataiku.ComputedMetrics
object
-
get_metric_history
(metric_lookup)¶ Get the set of all values a given metric took on this project :param metric_lookup: metric name or unique identifier
-
save_external_metric_values
(values_dict)¶ Save metrics on this project. The metrics are saved with the type “external”
- Parameters
values_dict – the values to save, as a dict. The keys of the dict are used as metric names
-
get_last_check_values
()¶ Get the set of last values of the checks on this project, as a
dataiku.ComputedChecks
object
-
get_check_history
(check_lookup)¶ Get the set of all values a given check took on this project :param check_lookup: check name or unique identifier
-
set_variables
(variables)¶ Set all variables of the current project
- Parameters
variables (dict) – must be a modified version of the object returned by get_variables
-
get_variables
()¶ Get project variables :param bool typed: typed true to try to cast the variable into its original type (eg. int rather than string)
- Returns:
A dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project
-
save_external_check_values
(values_dict)¶ Save checks on this project. The checks are saved with the type “external”
- Parameters
values_dict – the values to save, as a dict. The keys of the dict are used as check names
-