Projects

Basic operations

The list of projects in the DSS instance can be retrieved with the list_project_keys method.

client = DSSClient(host, apiKey)
dss_projects = client.list_project_keys()
print(dss_projects)

outputs

['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS']

Projects can be created:

new_project = client.create_project('TEST_PROJECT', 'test project', 'tester', description='a simple description')
print(client.list_project_keys()

outputs

['IMPALA', 'MYSQL', 'PARTITIONED', 'PLUGINS', 'TEST_PROJECT']

Or an existing project can be used for later manipulation:

project = client.get_project(ProjectKey)

Creating, listing and getting handles to project items

Through various method on the DSSProject class, you can:

  • Create most types of project items (datasets, recipes, managed folders, …)

  • List project items

  • Get structured handles to interact with each type of project item

Modifying project settings

Two parts of the project’s settings can be modified directly: the metadata and the permissions. In both cases, it is advised to first retrieve the current settings state with the get_metadata and get_permissions call, modify the returned object, and then set it back on the DSS instance.

project = client.get_project(ProjectKey)

project_metadata = project.get_metadata()
project_metadata['tags'] = ['tag1','tag2']
project.set_metadata(project_metadata)

project_permissions = project.get_permissions()
project_permissions['permissions'].append({'group':'data_scientists','readProjectContent': True, 'readDashboards': True})
project.set_permissions(project_permissions)

Available permissions to be set:

{
        'group': u'data_team',
        'admin': False,
        'exportDatasetsData': True,
        'manageAdditionalDashboardUsers': False,
        'manageDashboardAuthorizations': False,
        'manageExposedElements': False,
        'moderateDashboards': False,
        'readDashboards': True,
        'readProjectContent': True,
        'runScenarios': False,
        'writeDashboards': False,
        'writeProjectContent': False
}

Deleting

Projects can also be deleted:

project = client.get_project('TEST_PROJECT')
project.delete()

Exporting

Project export is available through the python API in two forms: either as a stream, or exported directly to a file. The data is sent zipped.

project = client.get_project('TEST_PROJECT')

project.export_to_file('exported_project.zip')

with project.get_export_stream() as s:
        ...

Duplicating

Projects can be duplicated:

project = client.get_project('TEST_PROJECT')
project.duplicate('COPY_TEST_PROJECT', 'Copy of the Test Project')

Reference documentation

class dataikuapi.dss.project.DSSProject(client, project_key)

A handle to interact with a project on the DSS instance.

Do not create this class directly, instead use dataikuapi.DSSClient.get_project`()

get_summary()

Returns a summary of the project. The summary is a read-only view of some of the state of the project. You cannot edit the resulting dict and use it to update the project state on DSS, you must use the other more specific methods of this dataikuapi.dss.project.DSSProject object

Returns

a dict containing a summary of the project. Each dict contains at least a ‘projectKey’ field

Return type

dict

get_project_folder()

Returns the dataikuapi.dss.projectfolder.DSSProjectFolder containing this project :rtype: dataikuapi.dss.projectfolder.DSSProjectFolder

move_to_folder(folder)

Moves this project to a project folder :param folder :class:`dataikuapi.dss.projectfolder.DSSProjectFolder

delete(drop_data=False)

Delete the project

This call requires an API key with admin rights

Parameters

drop_data (bool) – Should the data of managed datasets be dropped

get_export_stream(options=None)

Return a stream of the exported project You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Parameters

options (dict) –

Dictionary of export options (defaults to {}). The following options are available:

  • exportUploads (boolean): Exports the data of Uploaded datasets - default False

  • exportManagedFS (boolean): Exports the data of managed Filesystem datasets - default False

  • exportAnalysisModels (boolean): Exports the models trained in analysis - default False

  • exportSavedModels (boolean): Exports the models trained in saved models - default False

  • exportManagedFolders (boolean): Exports the data of managed folders - default False

  • exportAllInputDatasets (boolean): Exports the data of all input datasets - default False

  • exportAllDatasets (boolean): Exports the data of all datasets - default False

  • exportAllInputManagedFolders (boolean): Exports the data of all input managed folders - default False

  • exportGitRepositoy (boolean): Exports the Git repository history - default False

  • exportInsightsData (boolean): Exports the data of static insights - default False

Returns

a file-like obbject that is a stream of the export archive

Return type

file-like

export_to_file(path, options=None)

Export the project to a file

Parameters
  • path (str) – the path of the file in which the exported project should be saved

  • options (dict) –

    Dictionary of export options (defaults to {}). The following options are available:

    • exportUploads (boolean): Exports the data of Uploaded datasets - default False

    • exportManagedFS (boolean): Exports the data of managed Filesystem datasets - default False

    • exportAnalysisModels (boolean): Exports the models trained in analysis - default False

    • exportSavedModels (boolean): Exports the models trained in saved models - default False

    • exportManagedFolders (boolean): Exports the data of managed folders - default False

    • exportAllInputDatasets (boolean): Exports the data of all input datasets - default False

    • exportAllDatasets (boolean): Exports the data of all datasets - default False

    • exportAllInputManagedFolders (boolean): Exports the data of all input managed folders - default False

    • exportGitRepositoy (boolean): Exports the Git repository history - default False

    • exportInsightsData (boolean): Exports the data of static insights - default False

duplicate(target_project_key, target_project_name, duplication_mode='MINIMAL', export_analysis_models=True, export_saved_models=True, export_git_repository=True, export_insights_data=True, remapping=None, target_project_folder=None)

Duplicate the project

Parameters
  • target_project_key (string) – The key of the new project

  • target_project_name (string) – The name of the new project

  • duplication_mode (string) – can be one of the following values: MINIMAL, SHARING, FULL, NONE

  • export_analysis_models (bool) –

  • export_saved_models (bool) –

  • export_git_repository (bool) –

  • export_insights_data (bool) –

  • remapping (dict) – dict of connections to be remapped for the new project (defaults to {})

  • target_project_folder (A :class:`dataikuapi.dss.projectfolder.DSSProjectFolder) – the project folder where to put the duplicated project

Returns

A dict containing the original and duplicated project’s keys

Return type

ProjectDuplicateResult

get_metadata()

Get the metadata attached to this project. The metadata contains label, description checklists, tags and custom metadata of the project.

For more information on available metadata, please see https://doc.dataiku.com/dss/api/6.0/rest/

Returns

a dict object containing the project metadata.

Return type

dict

set_metadata(metadata)

Set the metadata on this project.

Parameters

dict (metadata) – the new state of the metadata for the project. You should only set a metadata object that has been retrieved using the get_metadata() call.

get_settings()

Gets the settings of this project. This does not contain permissions. See get_permissions()

:returns a handle to read, modify and save the settings :rtype: DSSProjectSettings

get_permissions()

Get the permissions attached to this project

returns

A dict containing the owner and the permissions, as a list of pairs of group name and permission type

set_permissions(permissions)

Sets the permissions on this project

Parameters

dict (permissions) – a permissions object with the same structure as the one returned by get_permissions() call

list_datasets(as_type='listitems')

List the datasets in this project.

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects”.

Returns

The list of the datasets. If “as_type” is “listitems”, each one as a dataset.DSSDatasetListItem. If “as_type” is “objects”, each one as a dataset.DSSDataset

Return type

list

get_dataset(dataset_name)

Get a handle to interact with a specific dataset

Parameters

dataset_name (string) – the name of the desired dataset

Returns

A dataikuapi.dss.dataset.DSSDataset dataset handle

create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)

Create a new dataset in the project, and return a handle to interact with it.

The precise structure of params and formatParams depends on the specific dataset type and dataset format type. To know which fields exist for a given dataset type and format type, create a dataset from the UI, and use get_dataset() to retrieve the configuration of the dataset and inspect it. Then reproduce a similar structure in the create_dataset() call.

Not all settings of a dataset can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the dataset

Parameters
  • dataset_name (string) – the name for the new dataset

  • type (string) – the type of the dataset

  • params (dict) – the parameters for the type, as a JSON object (defaults to {})

  • formatType (string) – an optional format to create the dataset with (only for file-oriented datasets)

  • formatParams (dict) – the parameters to the format, as a JSON object (only for file-oriented datasets, default to {})

Returns:

A dataikuapi.dss.dataset.DSSDataset dataset handle

create_upload_dataset(dataset_name, connection=None)
create_filesystem_dataset(dataset_name, connection, path_in_connection)
create_s3_dataset(dataset_name, connection, path_in_connection, bucket=None)

Creates a new external S3 dataset in the project and returns a DSSDataset to interact with it.

The created dataset doesn not have its format and schema initialized, it is recommend to use autodetect_settings() on the returned object

Parameters

dataset_name – Name of the dataset to create. Must not already exist

Return type

~dataikuapi.dss.dataset.DSSDataset

create_fslike_dataset(dataset_name, dataset_type, connection, path_in_connection, extra_params=None)
create_sql_table_dataset(dataset_name, type, connection, table, schema)
new_managed_dataset_creation_helper(dataset_name)

Creates a helper class to create a managed dataset in the project

Parameters

dataset_name (string) – Name of the new dataset - must be unique in the project

Returns

A dataikuapi.dss.dataset.DSSManagedDatasetCreationHelper object to create the managed dataset

list_streaming_endpoints(as_type='listitems')

List the streaming endpoints in this project.

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects”.

Returns

The list of the streaming endpoints. If “as_type” is “listitems”, each one as a streaming_endpoint.DSSStreamingEndpointListItem. If “as_type” is “objects”, each one as a streaming_endpoint.DSSStreamingEndpoint

Return type

list

get_streaming_endpoint(streaming_endpoint_name)

Get a handle to interact with a specific streaming endpoint

Parameters

streaming_endpoint_name (string) – the name of the desired streaming endpoint

Returns

A dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint streaming endpoint handle

create_streaming_endpoint(streaming_endpoint_name, type, params=None)

Create a new streaming endpoint in the project, and return a handle to interact with it.

The precise structure of params depends on the specific streaming endpoint type. To know which fields exist for a given streaming endpoint type, create a streaming endpoint from the UI, and use get_streaming_endpoint() to retrieve the configuration of the streaming endpoint and inspect it. Then reproduce a similar structure in the create_streaming_endpoint() call.

Not all settings of a streaming endpoint can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the streaming endpoint

Parameters
  • streaming_endpoint_name (string) – the name for the new streaming endpoint

  • type (string) – the type of the streaming endpoint

  • params (dict) – the parameters for the type, as a JSON object (defaults to {})

Returns:

A dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint streaming endpoint handle

create_kafka_streaming_endpoint(streaming_endpoint_name, connection=None, topic=None)
create_httpsse_streaming_endpoint(streaming_endpoint_name, url=None)
new_managed_streaming_endpoint_creation_helper(streaming_endpoint_name, streaming_endpoint_type=None)

Creates a helper class to create a managed streaming endpoint in the project

Parameters
  • streaming_endpoint_name (string) – Name of the new streaming endpoint - must be unique in the project

  • streaming_endpoint_type (string) – Type of the new streaming endpoint (optional if it can be inferred from a connection type)

Returns

A dataikuapi.dss.streaming_endpoint.DSSManagedStreamingEndpointCreationHelper object to create the streaming endpoint

create_prediction_ml_task(input_dataset, target_variable, ml_backend_type='PY_MEMORY', guess_policy='DEFAULT', prediction_type=None, wait_guess_complete=True)

Creates a new prediction task in a new visual analysis lab for a dataset.

Parameters
  • input_dataset (string) – the dataset to use for training/testing the model

  • target_variable (string) – the variable to predict

  • ml_backend_type (string) – ML backend to use, one of PY_MEMORY, MLLIB or H2O

  • guess_policy (string) – Policy to use for setting the default parameters. Valid values are: DEFAULT, SIMPLE_FORMULA, DECISION_TREE, EXPLANATORY and PERFORMANCE

  • prediction_type (string) – The type of prediction problem this is. If not provided the prediction type will be guessed. Valid values are: BINARY_CLASSIFICATION, REGRESSION, MULTICLASS

  • wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

create_clustering_ml_task(input_dataset, ml_backend_type='PY_MEMORY', guess_policy='KMEANS', wait_guess_complete=True)

Creates a new clustering task in a new visual analysis lab for a dataset.

The returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms.

You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Parameters
  • ml_backend_type (string) – ML backend to use, one of PY_MEMORY, MLLIB or H2O

  • guess_policy (string) – Policy to use for setting the default parameters. Valid values are: KMEANS and ANOMALY_DETECTION

  • wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

list_ml_tasks()

List the ML tasks in this project

Returns:

the list of the ML tasks summaries, each one as a JSON object

get_ml_task(analysis_id, mltask_id)

Get a handle to interact with a specific ML task

Args:

analysis_id: the identifier of the visual analysis containing the desired ML task mltask_id: the identifier of the desired ML task

Returns:

A dataikuapi.dss.ml.DSSMLTask ML task handle

create_analysis(input_dataset)

Creates a new visual analysis lab for a dataset.

list_analyses()

List the visual analyses in this project

Returns:

the list of the visual analyses summaries, each one as a JSON object

get_analysis(analysis_id)

Get a handle to interact with a specific visual analysis

Args:

analysis_id: the identifier of the desired visual analysis

Returns:

A dataikuapi.dss.analysis.DSSAnalysis visual analysis handle

list_saved_models()

List the saved models in this project

Returns:

the list of the saved models, each one as a JSON object

get_saved_model(sm_id)

Get a handle to interact with a specific saved model

Args:

sm_id: the identifier of the desired saved model

Returns:

A dataikuapi.dss.savedmodel.DSSSavedModel saved model handle

list_managed_folders()

List the managed folders in this project

Returns:

the list of the managed folders, each one as a JSON object

get_managed_folder(odb_id)

Get a handle to interact with a specific managed folder

Args:

odb_id: the identifier of the desired managed folder

Returns:

A dataikuapi.dss.managedfolder.DSSManagedFolder managed folder handle

create_managed_folder(name, folder_type=None, connection_name='filesystem_folders')

Create a new managed folder in the project, and return a handle to interact with it

Args:

name: the name of the managed folder

Returns:

A dataikuapi.dss.managedfolder.DSSManagedFolder managed folder handle

list_model_evaluation_stores(as_type=None)

List the model evaluation stores in this project.

Returns

The list of the model evaluation stores

Return type

list

get_model_evaluation_store(mes_id)

Get a handle to interact with a specific model evaluation store

Parameters

mes_id (string) – the id of the desired model evaluation store

Returns

A dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore model evaluation store handle

create_model_evaluation_store(name, mes_id=None)

Create a new model evaluation store in the project, and return a handle to interact with it.

Parameters
  • name (string) – the name for the new model evaluation store

  • mes_id optional (string) – the id for the new model evaluation store

Returns

A dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore model evaluation store handle

list_jobs()

List the jobs in this project

Returns:

a list of the jobs, each one as a JSON object, containing both the definition and the state

get_job(id)

Get a handler to interact with a specific job

Returns:

A dataikuapi.dss.job.DSSJob job handle

start_job(definition)

Create a new job, and return a handle to interact with it

Args:

definition: the definition for the job to create. The definition must contain the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) and a list of outputs to build. Optionally, a refreshHiveMetastore field can specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.

Returns:

A dataikuapi.dss.job.DSSJob job handle

start_job_and_wait(definition, no_fail=False)

Starts a new job and waits for it to complete.

Args:

definition: the definition for the job to create. The definition must contain the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) and a list of outputs to build. Optionally, a refreshHiveMetastore field can specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.

new_job(job_type='NON_RECURSIVE_FORCED_BUILD')

Create a job to be run

You need to add outputs to the job (i.e. what you want to build) before running it.

job_builder = project.new_job()
job_builder.with_output("mydataset")
complete_job = job_builder.start_and_wait()
print("Job %s done" % complete_job.id)
Return type

JobDefinitionBuilder

new_job_definition_builder(job_type='NON_RECURSIVE_FORCED_BUILD')

Deprecated. Please use new_job()

list_jupyter_notebooks(as_objects=True, active=False)

List the jupyter notebooks of a project.

Parameters
  • as_objects (bool) – if True, return the jupyter notebooks as a dataikuapi.dss.notebook.DSSNotebook notebook handles instead of raw JSON

  • active (bool) – if True, only return currently running jupyter notebooks.

Returns

The list of the notebooks - see as_objects for more information

Return type

list

get_jupyter_notebook(notebook_name)

Get a handle to interact with a specific jupyter notebook

Parameters

notebook_name (str) – The name of the jupyter notebook to retrieve

Returns

A handle to interact with this jupyter notebook

Return type

DSSNotebook jupyter notebook handle

create_jupyter_notebook(notebook_name, notebook_content)

Create a new jupyter notebook and get a handle to interact with it

Parameters
  • notebook_name (str) – the name of the notebook to create

  • notebook_content (dict) – the data of the notebook to create, as a dict. The data will be converted to a JSON string internally. Use get_content() on a similar existing DSSNotebook object in order to get a sample definition object

Returns

A handle to interact with the newly created jupyter notebook

Return type

DSSNotebook jupyter notebook handle

list_continuous_activities(as_objects=True)

List the continuous activities in this project

Returns:

a list of the continuous activities, each one as a JSON object, containing both the definition and the state

get_continuous_activity(recipe_id)

Get a handler to interact with a specific continuous activities

Returns:

A dataikuapi.dss.continuousactivity.DSSContinuousActivity job handle

get_variables()

Gets the variables of this project.

Returns:

a dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project

set_variables(obj)

Sets the variables of this project. WARNING: if executed from a python recipe, the changes made by set_variables will not be “seen” in that recipe.

Use the internal API dataiku.get_custom_variables() instead if this behavior is needed

@param dict obj: must be a modified version of the object returned by get_variables

update_variables(variables, type='standard')

Updates a set of variables for this project

Parameters
  • dict (variables) – a dict of variable name -> value to set. Keys of the dict must be strings. Values in the dict can be strings, numbers, booleans, lists or dicts

  • str (type) – Can be “standard” to update regular variables or “local” to update local-only variables that are not part of bundles for this project

list_api_services()

List the API services in this project

Returns:

the list of API services, each one as a JSON object

create_api_service(service_id)

Create a new API service, and returns a handle to interact with it. The newly-created service does not have any endpoint.

Parameters

service_id (str) – the ID of the API service to create

Returns

A DSSAPIService API Service handle

get_api_service(service_id)

Get a handle to interact with a specific API Service from the API Designer

Parameters

service_id (str) – The identifier of the API Designer API Service to retrieve

Returns

A handle to interact with this API Service

Return type

DSSAPIService API Service handle

list_exported_bundles()
export_bundle(bundle_id)
get_exported_bundle_archive_stream(bundle_id)

Download a bundle archive that can be deployed in a DSS automation Node, as a binary stream. Warning: this stream will monopolize the DSSClient until closed.

download_exported_bundle_archive_to_file(bundle_id, path)

Download a bundle archive that can be deployed in a DSS automation Node into the given output file.

:param path if “-“, will write to /dev/stdout

publish_bundle(bundle_id, published_project_key=None)

Publish a bundle on the Project Deployer.

Parameters
  • bundle_id (string) – The identifier of the bundle

  • published_project_key (string) – The key of the project on the Project Deployer where the bundle will be published. A new published project will be created if none matches the key. If the parameter is not set, the key from the current DSSProject is used.

Return type

dict

Returns

a dict with info on the bundle state once published. It contains the keys “publishedOn” for the publish date,

“publishedBy” for the user who published the bundle, and “publishedProjectKey” for the key of the Project Deployer project used.

list_imported_bundles()
import_bundle_from_archive(archive_path)
import_bundle_from_stream(fp)
activate_bundle(bundle_id, scenarios_to_enable=None)

Activates a bundle in this project.

Parameters
  • bundle_id (str) – The ID of the bundle to activate

  • scenarios_to_enable (dict) – An optional dict of scenarios to enable or disable upon bundle activation. The format of the dict should be scenario IDs as keys with values of True or False.

Returns

A report containing any error or warning messages that occurred during bundle activation

Return type

dict

preload_bundle(bundle_id)
list_scenarios(as_type='listitems')

List the scenarios in this project.

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects”.

Returns

The list of the datasets. If “rtype” is “listitems”, each one as a scenario.DSSScenarioListItem. If “rtype” is “objects”, each one as a scenario.DSSScenario

Return type

list

get_scenario(scenario_id)

Get a handle to interact with a specific scenario

Parameters

str – scenario_id: the ID of the desired scenario

Returns

a dataikuapi.dss.scenario.DSSScenario scenario handle

create_scenario(scenario_name, type, definition=None)

Create a new scenario in the project, and return a handle to interact with it

Parameters
  • scenario_name (str) – The name for the new scenario. This does not need to be unique (although this is strongly recommended)

  • type (str) – The type of the scenario. MUst be one of ‘step_based’ or ‘custom_python’

  • definition (dict) – the JSON definition of the scenario. Use get_definition(with_status=False) on an existing DSSScenario object in order to get a sample definition object (defaults to {‘params’: {}})

Returns

a scenario.DSSScenario handle to interact with the newly-created scenario

list_recipes(as_type='listitems')

List the recipes in this project

Parameters

as_type (str) – How to return the list. Supported values are “listitems” and “objects”.

Returns

The list of the recipes. If “as_type” is “listitems”, each one as a recipe.DSSRecipeListItem. If “as_type” is “objects”, each one as a recipe.DSSRecipe

Return type

list

get_recipe(recipe_name)

Gets a dataikuapi.dss.recipe.DSSRecipe handle to interact with a recipe :param str recipe_name: The name of the recipe :rtype dataikuapi.dss.recipe.DSSRecipe

create_recipe(recipe_proto, creation_settings)

Create a new recipe in the project, and return a handle to interact with it. We strongly recommend that you use the creator helpers instead of calling this directly.

Some recipe types require additional parameters in creation_settings:

  • ‘grouping’ : a ‘groupKey’ column name

  • ‘python’, ‘sql_query’, ‘hive’, ‘impala’ : the code of the recipe as a ‘payload’ string

Args:

recipe_proto: a prototype for the recipe object. Must contain at least ‘type’ and ‘name’ creation_settings: recipe-specific creation settings

Returns:

A dataikuapi.dss.recipe.DSSRecipe recipe handle

new_recipe(type, name=None)

Initializes the creation of a new recipe. Returns a dataikuapi.dss.recipe.DSSRecipeCreator or one of its subclasses to complete the creation of the recipe.

Usage example:

grouping_recipe_builder = project.new_recipe("grouping")
grouping_recipe_builder.with_input("dataset_to_group_on")
# Create a new managed dataset for the output in the "filesystem_managed" connection
grouping_recipe_builder.with_new_output("grouped_dataset", "filesystem_managed")                                    
grouping_recipe_builder.with_group_key("column")
recipe = grouping_recipe_builder.build()

# After the recipe is created, you can edit its settings
recipe_settings = recipe.get_settings()
recipe_settings.set_column_aggregations("value", sum=True)
recipe_settings.save()

# And you may need to apply new schemas to the outputs
recipe.compute_schema_updates().apply()
Parameters
  • type (str) – Type of the recipe

  • name (str) – Optional, base name for the new recipe.

Return type

dataikuapi.dss.recipe.DSSRecipeCreator

get_flow()
sync_datasets_acls()

Resync permissions on HDFS datasets in this project

Returns:

a DSSFuture handle to the task of resynchronizing the permissions

Note: this call requires an API key with admin rights

list_running_notebooks(as_objects=True)

List the currently-running notebooks

Returns:

list of notebooks. Each object contains at least a ‘name’ field

get_tags()

List the tags of this project.

Returns:

a dictionary containing the tags with a color

set_tags(tags=None)

Set the tags of this project. :param dict tags: must be a modified version of the object returned by list_tags (defaults to {})

list_macros(as_objects=False)

List the macros accessible in this project

Parameters

as_objects – if True, return the macros as dataikuapi.dss.macro.DSSMacro macro handles instead of raw JSON

Returns

the list of the macros

get_macro(runnable_type)

Get a handle to interact with a specific macro

Parameters

runnable_type – the identifier of a macro

Returns

A dataikuapi.dss.macro.DSSMacro macro handle

get_wiki()

Get the wiki

Returns

the wiki associated to the project

Return type

dataikuapi.dss.wiki.DSSWiki

get_object_discussions()

Get a handle to manage discussions on the project

Returns

the handle to manage discussions

Return type

dataikuapi.discussion.DSSObjectDiscussions

init_tables_import()

Start an operation to import Hive or SQL tables as datasets into this project

Returns

a TablesImportDefinition to add tables to import

Return type

TablesImportDefinition

list_sql_schemas(connection_name)

Lists schemas from which tables can be imported in a SQL connection

Returns

an array of schemas names

list_hive_databases()

Lists Hive databases from which tables can be imported

Returns

an array of databases names

list_sql_tables(connection_name, schema_name=None)

Lists tables to import in a SQL connection

Returns

an array of tables

list_hive_tables(hive_database)

Lists tables to import in a Hive database

Returns

an array of tables

get_app_manifest()