The main DSSClient class

The REST API Python client makes it easy to write client programs for the DSS REST API in Python. The REST API Python client is in the dataikuapi Python package.

The client is the entrypoint for many of the capabilities listed in this chapter.

For more details about the two Dataiku packages, see Python APIs, Using the APIs inside of DSS and Using the APIs outside of DSS.

Creating a client from inside DSS

To work with the API, a connection needs to be established with DSS, by creating a DSSClient object. Once the connection is established, the DSSClient object serves as the entry point to the other calls.

The Python client can be used from inside DSS. In that case:

  • It’s preinstalled, you don’t need to do anything
  • You don’t need to provide any API key, as the API client will automatically inherit connection credentials from the current context
import dataiku

client = dataiku.api_client()

# client is now a DSSClient and can perform all authorized actions.
# For example, list the project keys for which you have access
client.list_project_keys()

Creating a client from outside DSS

To work with the API, a connection needs to be established with DSS, by creating a DSSClient object. Once the connection is established, the DSSClient object serves as the entry point to the other calls.

When running outside of DSS, you’ll first need to install the client. For that, simply install it from pip

To use the Python client from outside DSS, simply install it from pip.

pip install dataiku-api-client

This installs the client in the system-wide Python installation, so if you are not using virtualenv, you may need to replace pip by sudo pip.

Note that this will always install the latest version of the API client. You might need to request a version compatible with your version of DSS.

When connecting from the outside world, you need an API key. See Public API Keys for more information on how to create an API key and the associated privileges.

You also need to connect using the base URL of your DSS instance.

import dataikuapi

host = "http://localhost:11200"
apiKey = "some_key"
client = dataikuapi.DSSClient(host, apiKey)

# client is now a DSSClient and can perform all authorized actions.
# For example, list the project keys for which the API key has access
client.list_project_keys()

Disabling SSL certificate check

If your DSS has SSL enabled, the package will verify the certificate. In order for this to work, you may need to add the root authority that signed the DSS SSL certificate to your local trust store. Please refer to your OS or Python manual for instructions.

If this is not possible, you can also disable checking the SSL certificate by setting client._session.verify = False

Reference API doc

Also see Reference API documentation of dataikuapi.

class dataikuapi.DSSClient(host, api_key=None, internal_ticket=None, extra_headers=None)

Entry point for the DSS API client

list_futures(as_objects=False, all_users=False)

List the currently-running long tasks (a.k.a futures)

Parameters:
  • as_objects (boolean) – if True, each returned item will be a dataikuapi.dss.future.DSSFuture
  • all_users (boolean) – if True, returns futures for all users (requires admin privileges). Else, only returns futures for the user associated with the current authentication context (if any)
Returns:

list of futures. if as_objects is True, each future in the list is a dataikuapi.dss.future.DSSFuture. Else, each future in the list is a dict. Each dict contains at least a ‘jobId’ field

Return type:

list of dataikuapi.dss.future.DSSFuture or list of dict

list_running_scenarios(all_users=False)

List the running scenarios

Parameters:all_users (boolean) – if True, returns scenarios for all users (requires admin privileges). Else, only returns scenarios for the user associated with the current authentication context (if any)
Returns:list of running scenarios, each one as a dict containing at least a “jobId” field for the future hosting the scenario run, and a “payload” field with scenario identifiers
Return type:list of dicts
get_future(job_id)

Get a handle to interact with a specific long task (a.k.a future). This notably allows aborting this future.

Parameters:job_id (str) – the identifier of the desired future (which can be returned by list_futures() or list_running_scenarios())
Returns:A handle to interact the future
Return type:dataikuapi.dss.future.DSSFuture
list_running_notebooks(as_objects=True)

List the currently-running Jupyter notebooks

Parameters:as_objects (boolean) – if True, each returned item will be a dataikuapi.dss.notebook.DSSNotebook
Returns:list of notebooks. if as_objects is True, each entry in the list is a dataikuapi.dss.notebook.DSSNotebook. Else, each item in the list is a dict which contains at least a “name” field.
Return type:list of dataikuapi.dss.notebook.DSSNotebook or list of dict
get_root_project_folder()

Get a handle to interact with the root project folder.

Returns:A :class:`dataikuapi.dss.projectfolder.DSSProjectFolder`to interact with this project folder
get_project_folder(project_folder_id)

Get a handle to interact with a project folder.

Parameters:project_folder_id (str) – the project folder ID of the desired project folder
Returns:A :class:`dataikuapi.dss.projectfolder.DSSProjectFolder`to interact with this project folder
list_project_keys()

List the project keys (=project identifiers).

Returns:list of project keys identifiers, as strings
Return type:list of strings
list_projects()

List the projects

Returns:a list of projects, each as a dict. Each dictcontains at least a ‘projectKey’ field
Return type:list of dicts
get_project(project_key)

Get a handle to interact with a specific project.

Parameters:project_key (str) – the project key of the desired project
Returns:A dataikuapi.dss.project.DSSProject to interact with this project
get_default_project()

Get a handle to the current default project, if available (i.e. if dataiku.default_project_key() is valid)

create_project(project_key, name, owner, description=None, settings=None, project_folder_id=None)

Creates a new project, and return a project handle to interact with it.

Note: this call requires an API key with admin rights or the rights to create a project

Parameters:
  • project_key (str) – the identifier to use for the project. Must be globally unique
  • name (str) – the display name for the project.
  • owner (str) – the login of the owner of the project.
  • description (str) – a description for the project.
  • settings (dict) – Initial settings for the project (can be modified later). The exact possible settings are not documented.
  • project_folder_id (str) – the project folder ID in which the project will be created (root project folder if not specified)
Returns:

A class:dataikuapi.dss.project.DSSProject project handle to interact with this project

list_apps()

List the apps

Returns:a list of apps, each as a dict. Each dict contains at least a ‘appId’ field
Return type:list of dicts
get_app(app_id)

Get a handle to interact with a specific app.

Parameters:app_id (str) – the id of the desired app
Returns:A dataikuapi.dss.app.DSSApp to interact with this project
list_plugins()

List the installed plugins

Returns:list of dict. Each dict contains at least a ‘id’ field
install_plugin_from_archive(fp)

Install a plugin from a plugin archive (as a file object)

Parameters:fp (object) – A file-like object pointing to a plugin archive zip
install_plugin_from_store(plugin_id)

Install a plugin from the Dataiku plugin store

Parameters:plugin_id (str) – identifier of the plugin to install
Returns:A DSSFuture representing the install process
install_plugin_from_git(repository_url, checkout='master', subpath=None)

Install a plugin from a Git repository. DSS must be setup to allow access to the repository.

Parameters:
  • repository_url (str) – URL of a Git remote
  • checkout (str) – branch/tag/SHA1 to commit. For example “master”
  • subpath (str) – Optional, path within the repository to use as plugin. Should contain a ‘plugin.json’ file
Returns:

A DSSFuture representing the install process

get_plugin(plugin_id)

Get a handle to interact with a specific plugin

Parameters:plugin_id (str) – the identifier of the desired plugin
Returns:A dataikuapi.dss.project.DSSPlugin
sql_query(query, connection=None, database=None, dataset_full_name=None, pre_queries=None, post_queries=None, type='sql', extra_conf=None, script_steps=None, script_input_schema=None, script_output_schema=None, script_report_location=None, read_timestamp_without_timezone_as_string=True, read_date_as_string=False)

Initiate a SQL, Hive or Impala query and get a handle to retrieve the results of the query. Internally, the query is run by DSS. The database to run the query on is specified either by passing a connection name, or by passing a database name, or by passing a dataset full name (whose connection is then used to retrieve the database)

Parameters:
  • query (str) – the query to run
  • connection (str) – the connection on which the query should be run (exclusive of database and dataset_full_name)
  • database (str) – the database on which the query should be run (exclusive of connection and dataset_full_name)
  • dataset_full_name (str) – the dataset on the connection of which the query should be run (exclusive of connection and database)
  • pre_queries (list) – (optional) array of queries to run before the query
  • post_queries (list) – (optional) array of queries to run after the query
  • type (str) – the type of query : either ‘sql’, ‘hive’ or ‘impala’
Returns:

A dataikuapi.dss.sqlquery.DSSSQLQuery query handle

list_users()

List all users setup on the DSS instance

Note: this call requires an API key with admin rights

Returns:A list of users, as a list of dicts
Return type:list of dicts
get_user(login)

Get a handle to interact with a specific user

Parameters:login (str) – the login of the desired user
Returns:A dataikuapi.dss.admin.DSSUser user handle
create_user(login, password, display_name='', source_type='LOCAL', groups=None, profile='DATA_SCIENTIST')

Create a user, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • login (str) – the login of the new user
  • password (str) – the password of the new user
  • display_name (str) – the displayed name for the new user
  • source_type (str) – the type of new user. Admissible values are ‘LOCAL’ or ‘LDAP’
  • groups (list) – the names of the groups the new user belongs to (defaults to [])
  • profile (str) – The profile for the new user, can be one of READER, DATA_ANALYST or DATA_SCIENTIST
Returns:

A dataikuapi.dss.admin.DSSUser user handle

get_own_user()
list_groups()

List all groups setup on the DSS instance

Note: this call requires an API key with admin rights

Returns:A list of groups, as an list of dicts
Return type:list of dicts
get_group(name)

Get a handle to interact with a specific group

Parameters:name (str) – the name of the desired group
Returns:A dataikuapi.dss.admin.DSSGroup group handle
create_group(name, description=None, source_type='LOCAL')

Create a group, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • name (str) – the name of the new group
  • description (str) – (optional) a description of the new group
  • source_type – the type of the new group. Admissible values are ‘LOCAL’ and ‘LDAP’
Returns:

A dataikuapi.dss.admin.DSSGroup group handle

list_connections()

List all connections setup on the DSS instance

Note: this call requires an API key with admin rights

Returns:All connections, as a dict of connection name to connection definition
Return type::dict
get_connection(name)

Get a handle to interact with a specific connection

Parameters:name (str) – the name of the desired connection
Returns:A dataikuapi.dss.admin.DSSConnection connection handle
create_connection(name, type, params=None, usable_by='ALL', allowed_groups=None)

Create a connection, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • name – the name of the new connection
  • type – the type of the new connection
  • params (dict) – the parameters of the new connection, as a JSON object (defaults to {})
  • usable_by – the type of access control for the connection. Either ‘ALL’ (=no access control) or ‘ALLOWED’ (=access restricted to users of a list of groups)
  • allowed_groups (list) – when using access control (that is, setting usable_by=’ALLOWED’), the list of names of the groups whose users are allowed to use the new connection (defaults to [])
Returns:

A dataikuapi.dss.admin.DSSConnection connection handle

list_code_envs()

List all code envs setup on the DSS instance

Note: this call requires an API key with admin rights

Returns:a list of code envs. Each code env is a dict containing at least “name”, “type” and “language”
get_code_env(env_lang, env_name)

Get a handle to interact with a specific code env

Parameters:name (str) – the name of the desired code env
Returns:A dataikuapi.dss.admin.DSSCodeEnv code env handle
create_code_env(env_lang, env_name, deployment_mode, params=None)

Create a code env, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • env_lang – the language (PYTHON or R) of the new code env
  • env_name – the name of the new code env
  • deployment_mode – the type of the new code env
  • params – the parameters of the new code env, as a JSON object
Returns:

A dataikuapi.dss.admin.DSSCodeEnv code env handle

list_clusters()

List all clusters setup on the DSS instance

Returns:
List clusters (name, type, state)
get_cluster(cluster_id)

Get a handle to interact with a specific cluster

Args:
name: the name of the desired cluster
Returns:
A dataikuapi.dss.admin.DSSCluster cluster handle
create_cluster(cluster_name, cluster_type='manual', params=None)

Create a cluster, and return a handle to interact with it

Parameters:
  • cluster_name – the name of the new cluster
  • cluster_type – the type of the new cluster
  • params – the parameters of the new cluster, as a JSON object
Returns:

A dataikuapi.dss.admin.DSSCluster cluster handle

list_global_api_keys()

List all global API keys set up on the DSS instance

Note: this call requires an API key with admin rights

Returns:All global API keys, as a list of dicts
get_global_api_key(key)

Get a handle to interact with a specific Global API key

Parameters:key (str) – the secret key of the desired API key
Returns:A dataikuapi.dss.admin.DSSGlobalApiKey API key handle
create_global_api_key(label=None, description=None, admin=False)

Create a Global API key, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • label (str) – the label of the new API key
  • description (str) – the description of the new API key
  • admin (str) – has the new API key admin rights (True or False)
Returns:

A dataikuapi.dss.admin.DSSGlobalApiKey API key handle

list_meanings()

List all user-defined meanings on the DSS instance

Note: this call requires an API key with admin rights

Returns:A list of meanings. Each meaning is a dict
Return type:list of dicts
get_meaning(id)

Get a handle to interact with a specific user-defined meaning

Note: this call requires an API key with admin rights

Parameters:id (str) – the ID of the desired meaning
Returns:A dataikuapi.dss.meaning.DSSMeaning meaning handle
create_meaning(id, label, type, description=None, values=None, mappings=None, pattern=None, normalizationMode=None, detectable=False)

Create a meaning, and return a handle to interact with it

Note: this call requires an API key with admin rights

Parameters:
  • id – the ID of the new meaning
  • type – the type of the new meaning. Admissible values are ‘DECLARATIVE’, ‘VALUES_LIST’, ‘VALUES_MAPPING’ and ‘PATTERN’
  • (optional) (detectable) – the description of the new meaning
  • (optional) – when type is ‘VALUES_LIST’, the list of values, or a list of {‘value’:’the value’, ‘color’:’an optional color’}
  • (optional) – when type is ‘VALUES_MAPPING’, the mapping, as a list of objects with this structure: {‘from’: ‘value_1’, ‘to’: ‘value_a’}
  • (optional) – when type is ‘PATTERN’, the pattern
  • (optional) – when type is ‘VALUES_LIST’, ‘VALUES_MAPPING’ or ‘PATTERN’, the normalization mode to use for value matching. One of ‘EXACT’, ‘LOWERCASE’, or ‘NORMALIZED’ (not available for ‘PATTERN’ type). Defaults to ‘EXACT’.
  • (optional) – whether DSS should consider assigning the meaning to columns set to ‘Auto-detect’. Defaults to False.
Returns:

A dataikuapi.dss.meaning.DSSMeaning meaning handle

list_logs()

List all available log files on the DSS instance This call requires an API key with admin rights

Returns:A list of log file names
get_log(name)

Get the contents of a specific log file This call requires an API key with admin rights

Parameters:name (str) – the name of the desired log file (obtained with list_logs())
Returns:The full content of the log file, as a string
log_custom_audit(custom_type, custom_params=None)

Log a custom entry to the audit trail

Parameters:
  • custom_type (str) – value for customMsgType in audit trail item
  • custom_params (dict) – value for customMsgParams in audit trail item (defaults to {})
get_variables()

Get the DSS instance’s variables, as a Python dictionary

This call requires an API key with admin rights

Returns:a Python dictionary of the instance-level variables
set_variables(variables)

Updates the DSS instance’s variables

This call requires an API key with admin rights

It is not possible to update a single variable, you must set all of them at once. Thus, you should only use a variables parameter that has been obtained using get_variables().

Parameters:variables (dict) – the new dictionary of all variables of the instance
get_general_settings()

Gets a handle to interact with the general settings.

This call requires an API key with admin rights

Returns:a dataikuapi.dss.admin.DSSGeneralSettings handle
create_project_from_bundle_local_archive(archive_path, project_folder=None)

Create a project from a bundle archive. Warning: this method can only be used on an automation node.

Parameters:
  • archive_path (string) – Path on the local machine where the archive is
  • project_folder (A dataikuapi.dss.projectfolder.DSSProjectFolder) – the project folder in which the project will be created or None for root project folder
create_project_from_bundle_archive(fp, project_folder=None)

Create a project from a bundle archive (as a file object) Warning: this method can only be used on an automation node.

Parameters:
  • fp (string) – A file-like object pointing to a bundle archive zip
  • project_folder (A dataikuapi.dss.projectfolder.DSSProjectFolder) – the project folder in which the project will be created or None for root project folder
prepare_project_import(f)

Prepares import of a project archive. Warning: this method can only be used on a design node.

Parameters:fp (file-like) – the input stream, as a file-like object
Returns:a TemporaryImportHandle to interact with the prepared import
get_apideployer()

Gets a handle to work with the API Deployer

Return type:DSSAPIDeployer
catalog_index_connections(connection_names=None, all_connections=False, indexing_mode='FULL')

Triggers an indexing of multiple connections in the data catalog

Parameters:
  • connection_names (list) – list of connections to index, ignored if all_connections=True (defaults to [])
  • all_connections (bool) – index all connections (defaults to False)
get_scoring_libs_stream()

Get the scoring libraries jar required for scoring with model jars that don’t include libraries. You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Returns:a jar file, as a stream
Return type:file-like
get_auth_info(with_secrets=False)

Returns various information about the user currently authenticated using this instance of the API client.

This method returns a dict that may contain the following keys (may also contain others):

  • authIdentifier: login for a user, id for an API key
  • groups: list of group names (if context is an user)
  • secrets: list of dicts containing user secrets (if context is an user)
Param:with_secrets boolean: Return user secrets
Returns:a dict
Return type:dict
get_auth_info_from_browser_headers(headers_dict, with_secrets=False)

Returns various information about the DSS user authenticated by the dictionary of HTTP headers provided in headers_dict.

This is generally only used in webapp backends

This method returns a dict that may contain the following keys (may also contain others):

  • authIdentifier: login for a user, id for an API key
  • groups: list of group names (if context is an user)
  • secrets: list of dicts containing user secrets (if context is an user)
Param:headers_dict dict: Dictionary of HTTP headers
Param:with_secrets boolean: Return user secrets
Returns:a dict
Return type:dict
get_ticket_from_browser_headers(headers_dict)

Returns a ticket for the DSS user authenticated by the dictionary of HTTP headers provided in headers_dict.

This is only used in webapp backends

This method returns a ticket to use as a X-DKU-APITicket header

Param:headers_dict dict: Dictionary of HTTP headers
Returns:a string
Return type:string
create_personal_api_key(label)

Creates a personal API key corresponding to the user doing the request. This can be called if the DSSClient was initialized with an internal ticket or with a personal API key

Param:label string: Label for the new API key
Returns:a dict of the new API key, containing at least “secret”, i.e. the actual secret API key
Return type:dict
push_base_images()

Push base images for Kubernetes container-execution and Spark-on-Kubernetes

apply_kubernetes_namespaces_policies()

Apply Kubernetes namespaces policies defined in the general settings

get_licensing_status()

Returns a dictionary with information about licensing status of this DSS instance

Return type:dict
get_object_discussions(project_key, object_type, object_id)

Get a handle to manage discussions on any object

Parameters:
  • project_key (str) – identifier of the project to access
  • object_type (str) – DSS object type
  • object_id (str) – DSS object ID
Returns:

the handle to manage discussions

Return type:

dataikuapi.discussion.DSSObjectDiscussions

class dataikuapi.dssclient.TemporaryImportHandle(client, import_id)
execute(settings=None)

Executes the import with provided settings.

Parameters:settings (dict) –

Dict of import settings (defaults to {}). The following settings are available:

  • targetProjectKey (string): Key to import under. Defaults to the original project key
  • remapping (dict): Dictionary of connection and code env remapping settings.
    See example of remapping dict:
    "remapping" : {
      "connections": [
        { "source": "src_conn1", "target": "target_conn1" },
        { "source": "src_conn2", "target": "target_conn2" }
      ],
      "codeEnvs" : [
        { "source": "src_codeenv1", "target": "target_codeenv1" },
        { "source": "src_codeenv2", "target": "target_codeenv2" }
      ]
    }
    

@warning: You must check the ‘success’ flag