Exposing a visual prediction model

The primary function of the DSS API Deployer and API Node is to expose as an API a prediction model trained using the DSS visual machine learning component.

The steps to expose a prediction model are:

  • Train the model in Analysis
  • Deploy the model to Flow
  • Create a new API service
  • Create a prediction endpoint using the saved model
  • Either:
    • Create a package of your API, deploy and activate the package on API nodes
    • Publish your service to the API Deployer, and use API Deployer to deploy your API

This section assumes that you already have a working API node and/or API Deployer setup. Please see Installing an API node and Installing the API Deployer if that’s not yet the case.

Creating the model

The first step is to create a model and deploy it to the Flow. This is done using the regular Machine Learning component of DSS. Please refer to the Tutorial 103 of DSS and to Machine learning for more information.

Create the API

There are two ways you can create your API Service

Create the API directly from the Flow

  • In the Flow, select your model, and click “Create an API”
  • Give an identifier to your API Service. This identifier will appear in the URL used to query the API
  • Within this API Service, give an identifier to the endpoint. A service can contain multiple endpoints (to manage several models at once, or perform different functions)

The URL to query the API will be like /public/api/v1/<service_id>/<endpoint_id>/predict.

Validate, you are taken to the newly created API Service in the API Designer component.

Create the API service then the endpoint

  • Go to the API Designer and create a new service
  • Give an identifier to your API Service. This identifier will appear in the URL used to query the API
  • Create a new endpoint of type “Prediction”. Give an identifier to the endpoint. A service can contain multiple endpoints (to manage several models at once, or perform different functions)
  • Select the model to use for this endpoint. This must be a saved model (ie. a model which has been deployed to the Flow).

The URL to query the API will be like /public/api/v1/<service_id>/<endpoint_id>/predict.

Validate, you are taken to the newly created API Service in the API Designer component.

Testing your endpoint

It’s a good practice to add a few test queries to check that your endpoint is working as expected.

  • Go to test
  • Select add test queries. You can select a “test” dataset to automatically create test queries from the rows of this dataset
  • Click on “Run test queries”
  • You should see the prediction associated to each test query

Test queries are JSON objects akin to the ones that you would pass to the API node user API. When you click on the “Play test queries” button, the test queries are sent to the dev server, and the result is printed.

Each test query should look like

{
        "features" : {
                "feature1" : "value1",
                "feature2" : 42
        }
}

Deploying your service

Please see:

Optimized scoring

If your model is java-compatible (See: Scoring engines), you may select “Java scoring.” This will make the deployed model use java to score new records, resulting in extremely improved performance and throughput for your endpoint.

Performance tuning

Whether you are using directly the API Node or the API Deployer, there are a number of performance tuning settings that can be used to increase the maximum throughput of the API node.

It is possible to tune the behavior of prediction endpoints on the API node side.

For the prediction endpoint, you can tune how many concurrent requests your API node can handle. This depends mainly on your model (its speed and in-memory size) and the available resources on the server(s) running the API node.

This configuration allows you to control the number of allocated pipelines. One allocated pipeline means one model loaded in memory that can handle a prediction request. If you have 2 allocated pipelines, 2 requests can be handled simultaneously, other requests will be queued until one of the pipelines is freed (or the request times out). When the queue is full, additional requests are rejected.

Without API Deployer

You can configure the parallelism parameters for the endpoint by creating a JSON file in the config/services folder in the API node’s data directory.

mkdir -p config/services/<SERVICE_ID>

Then create or edit the config/services/<SERVICE_ID>/<ENDPOINT_ID>.json file

This file must have the following structure and be valid JSON:

{
    "pool" : {
        "floor" : 1,
        "ceil" : 8,
        "cruise": 2,
        "queue" : 16,
        "timeout" : 10000
    }
}

Those parameters are all positive integers:

  • floor (default: 1): Minimum number of pipelines. Those are allocated as soon as the endpoint is loaded.
  • ceil (default: 8): Maximum number of allocated pipelines at any given time. Additional requests will be queued. ceil floor
  • cruise (default: 2): The “nominal” number of allocated pipelines. When more requests come in, more pipelines may be allocated up to ceil. But when all pending requests have been completed, the number of pipeline may go down to cruise. floor cruise ceil
  • queue (default: 16): The number of requests that will be queued when ceil pipelines are already allocated and busy. The queue is fair: first received request will be handled first.
  • timeout (default: 10000): Time, in milliseconds, that a request may spend in queue wating for a free pipeline before being rejected.

Creating a new pipeline is an expensive operation, so you should aim cruise around the expected maximal nominal query load.

With API Deployer

You can configure the parallelism parameters for the endpoint in the Deployment settings, in the “Endpoints tuning” setting.

  • Go to the Deployment Settings > Endpoints tuning
  • Add a tuning block for your endpoint by entering your endpoint id and click Add
  • Configure the parameters

Those parameters are all positive integers:

  • Pooling min pipelines (default: 1): Minimum number of pipelines. Those are allocated as soon as the endpoint is loaded.
  • Pooling max pipelines (default: 8): Maximum number of allocated pipelines at any given time. Additional requests will be queued. max pipelines min pipelines
  • Pooling cruise pipelines (default: 2): The “nominal” number of allocated pipelines. When more requests come in, more pipelines may be allocated up to max pipelines. But when all pending requests have been completed, the number of pipeline may go down to cruise pipelines. min pipelines cruise pipelines ceil pipelines
  • Pooling queue length (default: 16): The number of requests that will be queued when max pipelines pipelines are already allocated and busy. The queue is fair: first received request will be handled first.
  • Queue timeout (default: 10000): Time, in milliseconds, that a request may spend in queue waiting for a free pipeline before being rejected.

Creating a new pipeline is an expensive operation, so you should aim cruise pipelines around the expected maximal nominal query load.