API node user API

Predictions are obtained on the API node by using the User REST API.

The REST API

Request and response formats

For POST and PUT requests, the request body must be JSON, with the Content-Type header set to application/json.

For almost all requests, the response will be JSON.

Whether a request succeeded is indicated by the HTTP status code. A 2xx status code indicates success, whereas a 4xx or 5xx status code indicates failure. When a request fails, the response body is still JSON and contains additional information about the error.

Authentication

Each service declares whether it uses authentication or not. If the service requires authentication, the valid API keys are defined in DSS.

The API key must be sent using HTTP Basic Authentication:

  • Use the API key as username

  • The password can remain blank

The valid API keys are defined on the DSS side, not on the API node side. This ensures that all instances of an API node will accept the same set of client keys

Methods reference

The reference documentation of the API is available at https://doc.dataiku.com/dss/api/12/apinode-user

API Python client

Dataiku provides a Python client for the API Node user API. The client makes it easy to write client programs for the API in Python.

Installing

  • The API client is already pre-installed in the DSS virtualenv

  • From outside of DSS, you can install the Python client by running pip install dataiku-api-client

Reference API doc

class dataikuapi.APINodeClient(uri, service_id, api_key=None, bearer_token=None)

Entry point for the DSS API Node client This is an API client for the user-facing API of DSS API Node server (user facing API)

predict_record(endpoint_id, features, forced_generation=None, dispatch_key=None, context=None, with_explanations=None, explanation_method=None, n_explanations=None, n_explanations_mc_steps=None)

Predicts a single record on a DSS API node endpoint (standard or custom prediction)

Parameters:
  • endpoint_id (str) – Identifier of the endpoint to query

  • features – Python dictionary of features of the record

  • forced_generation – See documentation about multi-version prediction

  • dispatch_key – See documentation about multi-version prediction

  • context – Optional, Python dictionary of additional context information. The context information is logged, but not directly used.

  • with_explanations – Optional, whether individual explanations should be computed for each record. The prediction endpoint must be compatible. If None, will use the value configured in the endpoint.

  • explanation_method – Optional, method to compute explanations. Valid values are ‘SHAPLEY’ or ‘ICE’. If None, will use the value configured in the endpoint.

  • n_explanations – Optional, number of explanations to output per prediction. If None, will use the value configured in the endpoint.

  • n_explanations_mc_steps – Optional, precision parameter for SHAPLEY method, higher means more precise but slower (between 25 and 1000). If None, will use the value configured in the endpoint.

Returns:

a Python dict of the API answer. The answer contains a “result” key (itself a dict)

predict_records(endpoint_id, records, forced_generation=None, dispatch_key=None, with_explanations=None, explanation_method=None, n_explanations=None, n_explanations_mc_steps=None)

Predicts a batch of records on a DSS API node endpoint (standard or custom prediction)

Parameters:
  • endpoint_id (str) – Identifier of the endpoint to query

  • records – Python list of records. Each record must be a Python dict. Each record must contain a “features” dict (see predict_record) and optionally a “context” dict.

  • forced_generation – See documentation about multi-version prediction

  • dispatch_key – See documentation about multi-version prediction

  • with_explanations – Optional, whether individual explanations should be computed for each record. The prediction endpoint must be compatible. If None, will use the value configured in the endpoint.

  • explanation_method – Optional, method to compute explanations. Valid values are ‘SHAPLEY’ or ‘ICE’. If None, will use the value configured in the endpoint.

  • n_explanations – Optional, number of explanations to output per prediction. If None, will use the value configured in the endpoint.

  • n_explanations_mc_steps – Optional, precision parameter for SHAPLEY method, higher means more precise but slower (between 25 and 1000). If None, will use the value configured in the endpoint.

Returns:

a Python dict of the API answer. The answer contains a “results” key (which is an array of result objects)

forecast(endpoint_id, records, forced_generation=None, dispatch_key=None)

Forecast using a time series forecasting model on a DSS API node endpoint

Parameters:
  • endpoint_id (str) – Identifier of the endpoint to query

  • records (array) –

    List of time series data records to be used as an input for the time series forecasting model. Each record should be a dict where keys are feature names, and values feature values.

    Example:

    records = [
            {'date': '2015-01-04T00:00:00.000Z',
              'timeseries_id': 'A', 'target': 10.0},
            {'date': '2015-01-04T00:00:00.000Z',
              'timeseries_id': 'B', 'target': 4.5},
            {'date': '2015-01-05T00:00:00.000Z',
              'timeseries_id': 'A', 'target': 2.0},
            ...
            {'date': '2015-03-20T00:00:00.000Z',
              'timeseries_id': 'B', 'target': 1.3}
    ]
    

  • forced_generation – See documentation about multi-version prediction

  • dispatch_key – See documentation about multi-version prediction

Returns:

a Python dict of the API answer. The answer contains a “results” key (which is an array of result objects, corresponding to the forecast records) Example:

{'results': [
    {'forecast': 12.57, 'ignored': False,
      'quantiles': [0.0001, 0.5, 0.9999],
      'quantilesValues': [3.0, 16.0, 16.0],
      'time': '2015-03-21T00:00:00.000000Z',
      'timeseriesIdentifier': {'timeseries_id': 'A'}},
    {'forecast': 15.57, 'ignored': False,
      'quantiles': [0.0001, 0.5, 0.9999],
      'quantilesValues': [3.0, 18.0, 19.0],
      'time': '2015-03-21T00:00:00.000000Z',
      'timeseriesIdentifier': {'timeseries_id': 'B'}},
  ...],
...}

predict_effect(endpoint_id, features, forced_generation=None, dispatch_key=None)

Predicts the treatment effect of a single record on a DSS API node endpoint (standard causal prediction)

Parameters:
  • endpoint_id (str) – Identifier of the endpoint to query

  • features – Python dictionary of features of the record

  • forced_generation – See documentation about multi-version prediction

  • dispatch_key – See documentation about multi-version prediction

Returns:

a Python dict of the API answer. The answer contains a “result” key (itself a dict)

predict_effects(endpoint_id, records, forced_generation=None, dispatch_key=None)

Predicts the treatment effects on a batch of records on a DSS API node endpoint (standard causal prediction)

Parameters:
  • endpoint_id (str) – Identifier of the endpoint to query

  • records – Python list of records. Each record must be a Python dict. Each record must contain a “features” dict (see predict_record) and optionally a “context” dict.

  • dispatch_key – See documentation about multi-version prediction

Returns:

a Python dict of the API answer. The answer contains a “results” key (which is an array of result objects)

sql_query(endpoint_id, parameters)

Queries a “SQL query” endpoint on a DSS API node

Parameters:
  • endpoint_id (str) – Identifier of the endpoint to query

  • parameters – Python dictionary of the named parameters for the SQL query endpoint

Returns:

a Python dict of the API answer. The answer is the a dict with a columns field and a rows field (list of rows as list of strings)

lookup_record(endpoint_id, record, context=None)

Lookup a single record on a DSS API node endpoint of “dataset lookup” type

Parameters:
  • endpoint_id (str) – Identifier of the endpoint to query

  • record – Python dictionary of features of the record

  • context – Optional, Python dictionary of additional context information. The context information is logged, but not directly used.

Returns:

a Python dict of the API answer. The answer contains a “data” key (itself a dict)

lookup_records(endpoint_id, records)

Lookups a batch of records on a DSS API node endpoint of “dataset lookup” type

Parameters:
  • endpoint_id (str) – Identifier of the endpoint to query

  • records – Python list of records. Each record must be a Python dict, containing at least one entry called “data”: a dict containing the input columns

Returns:

a Python dict of the API answer. The answer contains a “results” key, which is an array of result objects. Each result contains a “data” dict which is the output

run_function(endpoint_id, **kwargs)

Calls a “Run function” endpoint on a DSS API node

Parameters:
  • endpoint_id (str) – Identifier of the endpoint to query

  • kwargs – Arguments of the function

Returns:

The function result