Exposing a visual prediction model

The primary function of the DSS API node is to expose as a service a prediction model trained using the DSS visual machine learning component.

The steps to expose a prediction model are:

  • Train the model in Analysis
  • Deploy the model to Flow
  • Create a new API service
  • Create a prediction endpoint using the saved model
  • Create a package of your service
  • Deploy and activate the package on the API node

This section assumes that you already have installed and started a DSS API node instance. Please see Installing the API node if that’s not yet the case.

Creating the model

The first step is to create a model and deploy it to the Flow. This is done using the regular Machine Learning component of DSS. Please refer to the Tutorial 103 of DSS and to Machine learning for more information.

Create the prediction endpoint

To create a custom prediction endpoint, start by creating a service. (See Your first API service for more information). Then create an endpoint of type “Prediction”.

You need to select the model that this endpoint will use. This must be a saved model (ie. a model which has been deployed to the Flow). You also need to give an identifier to your endpoint. The endpoint id will be part of the URL to which your clients connect.

If your model is java-compatible (See: Machine learning training engines), you may select “Java scoring.” This will make the deployed model use java to score new records, resulting in extremely improved performance and throughput for your endpoint.

Testing your endpoint

To ease the process of testing your model and its enrichments, a “Development server” is integrated in the DSS UI.

To test your code, click on the “Deploy to Dev Server” button. The dev server starts and load your model. You are redirected to the Test tab where you can see whether your model loads.

You can then define Test queries, i.e. JSON objects akin to the ones that you would pass to the API node user API. When you click on the “Play test queries” button, the test queries are sent to the dev server, and the result is printed.

Server-side tuning

It is possible to tune the behavior of prediction endpoints on the API node side.

For the prediction endpoint, you can tune how many concurrent requests your API node can handle. This depends mainly on your model (its speed and in-memory size) and the available resources on the server running the API node.

You can configure the parallelism parameters for an endpoint by creating a JSON file in the config/services folder in the API node’s data directory.

mkdir -p config/services/<SERVICE_ID>

Then create or edit the config/services/<SERVICE_ID>/<ENDPOINT_ID>.json file

This file must have the following structure and be valid JSON:

{
    "pool" : {
        "floor" : 1,
        "ceil" : 8,
        "cruise": 2,
        "queue" : 16,
        "timeout" : 10000
    }

}

This configuration allows you to control the number of allocated pipelines. One allocated pipeline means one model loaded in memory that can handle a prediction request. If you have 2 allocated pipelines, 2 requests can be handled simultaneously, other requests will be queued until one of the pipelines is freed (or the request times out). When the queue is full, additional requests are rejected.

Those parameters are all positive integers:

  • floor (default: 1): Minimum number of pipelines. Those are allocated as soon as the endpoint is loaded.
  • ceil (default: 8): Maximum number of allocated pipelines at any given time. Additional requests will be queued. ceil floor
  • cruise (default: 2): The “nominal” number of allocated pipelines. When more requests come in, more pipelines may be allocated up to ceil. But when all pending requests have been completed, the number of pipeline may go down to cruise. floor cruise ceil
  • queue (default: 16): The number of requests that will be queued when ceil pipelines are already allocated and busy. The queue is fair: first received request will be handled first.
  • timeout (default: 10000): Time, in milliseconds, that a request may spend in queue wating for a free pipeline before being rejected.

You can also deploy your service on multiple servers, see High availability and scalability.