Custom prediction models

In addition to standard models trained using the DSS Machine Learning component, the API node can also expose custom models written by the user.

To write “custom prediction” endpoint in an API node service, you write a Python class that implements a predict method.

The custom model can optionally use a DSS managed folder. The code is written in DSS.

Creating the custom endpoint

To create a custom prediction endpoint, start by creating a service. (See Exposing a prediction model for more information). Then, create an endpoint of type “Custom prediction”.

You will need to indicate whether you want to create a Regression (predicting a continous value) or a Classification (predicting a discrete value) model.

DSS prefills the Code part with a sample depending on the selected model type.

Using a managed folder

A custom model can optionally (but most of the time) use a DSS managed folder. When you package your service, the contents of the folder is bundled with the package, and your custom code receives the path to the managed folder content.

A typical usage is when you have a custom train recipe that dumps the serialized model into a folder. Your custom prediction code then uses this managed folder.

../_images/custom_model_folder.png

Structure of the code

To create a custom model, you need to write a single Pyton class. This class must extend dataiku.apinode.predict.predictor.ClassificationPredictor or dataiku.apinode.predict.predictor.RegressionPredictor. The name of the class does not matter. DSS will automatically find your class.

The constructor of the class receives the path to the managed folder, if any.

Regression

A regression predictor must implement a single method: def predict(self, features_df)

This method receives a Panda Dataframe of the input features. It must output one of the following forms:

  • prediction_series
  • (prediction_series, custom_keys_list)

Answer details:

  • prediction_series (mandatory): a Pandas Series of the predicted values. The output series must have the same number of rows as the input dataframe. If the model does not predict a row, it can leave numpy.nan in the output series.
  • custom_keys_list (optional, may be None): a Python list of dictionaries for each input row. Each list entry contains a dict of the customKeys that will be sent in the output (freely usable.)

The predict method must be able to predict multiple rows.

Classification

A classification predictor must implement a single method: def predict(self, features_df)

This method receives a Panda Dataframe of the input features.

It must output one of the following forms:

  • prediction_series
  • (prediction_series, probas_df)
  • (prediction_series, probas_df, custom_keys_list)

Answer details:

  • prediction_series (mandatory): a Pandas Series of the predicted values. The output series must have the same number of rows as the input dataframe. If the model does not predict a row, it can leave None in the output series.
  • probas_df (optional, may be None): a Pandas Dataframe of the predicted probas. Must have one column per class, and the same number of rows as the input dataframe. If the model does not predict a row, it must leave numpy.nan in the probas dataframe.
  • custom_keys_list (optional, may be None): a Python list of dictionaries for each input row. Each list entry contains a dict of the customKeys that will be sent in the output (freely usable.)

The predict method must be able to predict multiple rows.

Testing your code

Developing a custom model implies testing often. To ease this process, a “Development server” is integrated in the DSS UI.

To test your code, click on the “Deploy to Dev Server” button. The dev server starts and load your model. You are redirected to the Test tab where you can see whether your model loads.

You can then define Test queries, i.e. JSON objects akin to the ones that you would pass to the API node user API. When you click on the “Play test queries” button, the test queries are sent to the dev server, and the result is printed.

Using external libraries

If you use external libraries (by installing them in the DSS virtual env - see The Python environment), they are not automatically installed in the API Node virtual env. Installing external packages in the API Node virtual env prior to deploying the package is the responsibility of the API node administrator.

Note that this means that:

  • Two endpoints in the same service may not use incompatible third-party libraries or versions of third-party libraries
  • If you need to have two services with incompatible libraries, you should deploy them on separate API node instances

Note that, while the dataiku.* libraries are accessible, most of the APIs that you use in Python recipes will not work: the code is not running with the DSS Design node, so datasets cannot be read by this API. If you need to enrich your features with data from your datasets, see Enriching queries in real-time. If you need to access a Folder, see Using a managed folder above.

Using your own libraries

You will sometimes need to write custom library functions (for example, shared between your custom training recipe and your custom model).

You can place your custom Python files in the lib/python folder of the DSS installation. Both recipes and custom models can import modules defined there.

When you package a service, the whole content of the lib/python folder is bundled in the package. Note that this means that it is possible to have several generations of the service running at the same time, using different versions of the custom code from lib/python.