Exporting models

Depending on your license, DSS provides different capabilities for exporting models:

  • The Jupyter export to approximate the model training code used in Visual Machine Learning
  • Score a model in real time though an API
  • Score independently from DSS using Java model export
  • Score independently from DSS using PMML model export

Export to Jupyter notebook

Note

This only applies to models trained using the “In-memory (Python)” engine, both for prediction and clustering. Not all algorithms are supported by this feature.

Once a model has been trained, you can export it as a Jupyter notebook.

DSS will automatically generate a Jupyter (Python) notebook with code to reproduce a model similar to the model that you trained.

To generate a Jupyter notebook:

  • Go to the trained model you wish to export
  • Click the dropdown icon next to the “Deploy” button
  • Select “Export to Jupyter notebook”

Warning

This generated notebook is for educational and explanatory purposes only. In particular, this notebook does not reproduce all preprocessing capabilities of DSS, and is only a best-effort approximation of the model trained in DSS.

To use the exact model trained by DSS, deploy it to the Flow and use API node or scoring recipes

Export a Java class/JAR for a model

For a number of reasons, you may not want to deploy your model as a separate API; for instance, if you are scoring in realtime and want to avoid the overhead of an HTTP call.

Provided you have this feature in your license, you can export a JAR with a Java class that can score a model trained in the DSS Visual ML tool.

Warning

The Java export feature is experimental, with a best-effort support.

The Java export feature requires a specific DSS license. Please contact your Dataiku Account Manager or Customer Success Manager

The model needs to be compatible with Local (Optimized) scoring to be compatible with Java export.

  • Go to the trained model you wish to export (either a model trained in the Lab or a version of a saved model deployed in the Flow)
  • Click the Actions button on the top-right corner
  • Select Download as “fat” JAR (standalone) (aka assembly)
  • Indicate the full-qualified class name you want for your model

Add that JAR to the classpath of your Java application.

If you have several models you wish to use on the same JVM, you can instead export the “thin” JAR for each model, which only contains the class and resources for the model, and not the scoring libraries. In that case, you also need to download the scoring libraries (from the same dropdown menu) and add both JARs to the classpath.

Usage

If you specified the name com.mycompany.myproject.MyModel at export time, you can use it like this once you’ve added the JAR to the classpath:

import com.mycompany.myproject.MyModel;
import com.dataiku.scoring.*;

// ...
MyModel model = new MyModel();
Observation.Builder obsBuilder = model.observationBuilder();
Observation obs = obsBuilder
        .with("myCategoricalFeature", "Some string value")
        .with("myNumericFeature", 42.0d)
        // other .with("featureName", <string or double value>)
        .build();
if (obs.hasError()) {
        System.err.println("Can't build observation: " + obs.getErrorMessage());
        // maybe throw here
}

// For a classification model
Try<ClassificationResult> prediction = model.predict(obs);
if (prediction.isError()) {
        System.err.println("Can't make a prediction: " + prediction.getMessage());
        // maybe throw here
} else {
        ClassificationResult result = prediction.get();
        // predictedClass is one of model.getClassLabels()
        String predictedClass = result.getPrediction();
         // probabilities has the same indices as model.getClassLabels()
         // i.e. 0 to (model.getNumClasses() - 1)
        double[] probabilities = result.getProbabilities();
        // Use result here
}

// For a regression model
Try<RegressionResult> prediction = model.predict(obs);
if (prediction.isError()) {
        System.err.println("Can't make a prediction: " + prediction.getMessage());
        // maybe throw here
} else {
        RegressionResult result = prediction.get();
        double predictedValue = result.getPrediction();
        // Use result here
}

You can find the javadoc for the com.dataiku.scoring package here.

If you want to debug your model, you can run the “fat” jar version with -jar:

java -jar /path/to/dataiku-model-my-model-assembly.jar

… or the “thin” jar version, specifying you model class as the Main class to run:

java -cp /path/to/dataiku-model-my-model.jar:/path/to/dataiku-scoring-libs_DSS_VERSION.jar \
    com.mycompany.myproject.MyModel

This command will take JSON objects with feature values on standard input (one per line), and return predictions as JSON objects on standard output (one per line as well). For instance with a classification model trained on the classical Titanic dataset:

$ echo '{"Sex": "male", "Pclass": 3}' >titanic.txt
$ echo '{"Sex": "female", "Pclass": 1}' >>titanic.txt
$ java -jar dataiku-model-survived-on-titanic-assembly.jar <titanic.txt >out.txt
Nov 26, 2018 3:03:39 PM com.dataiku.scoring.pipelines.Normalization <init>
INFO: Normalize columns
Nov 26, 2018 3:03:39 PM com.dataiku.scoring.builders.Build binaryProbabilisticPipeline
INFO: Loaded model:
Nov 26, 2018 3:03:39 PM com.dataiku.scoring.builders.Build binaryProbabilisticPipeline
INFO: [email protected]
Nov 26, 2018 3:03:39 PM com.dataiku.scoring.builders.Build preprocessingPipeline
INFO: Loaded preprocessing pipeline:
Nov 26, 2018 3:03:39 PM com.dataiku.scoring.builders.Build preprocessingPipeline
INFO: PreprocessingPipeline(
        ImputeWithValue(Pclass -> 2.3099579242636747 ; Parch -> 0.364656381486676 ; SibSp -> 0.5105189340813464 ; Age -> 29.78608695652174 ; Fare -> 32.91587110799433 ; )
        Dummifier(Sex in [female, male, ] ; Embarked in [Q, S, C, ])
        Rescaler(Fare (shift, inv_scale)=(32.91587110799433, 0.01887857758669009) ; Age (shift, inv_scale)=(29.78608695652174, 0.07038582694206309) ; Parch (shift, inv_scale)=(0.364656381486676, 1.2618109015803536) ; Pclass (shift, inv_scale)=(2.3099579242636747, 1.2048162082648861) ; SibSp (shift, inv_scale)=(0.5105189340813464, 0.9172989588087348))
)

$cat out.txt
{"value":{"probabilities":{"died":0.6874695011372898,"survived":0.3125304988627102},"prediction":"died"},"isError":false}
{"value":{"probabilities":{"died":0.062296226501392105,"survived":0.9377037734986079},"prediction":"survived"},"isError":false}

com.dataiku.scoring uses java.util.logging for logging. If you wish to forward it to log4j or logback, you can use a SLF4J bridge.

Limitations

The Java export feature does not support preparation scripts. In your Lab analysis where you trained your model, if the Script tab has steps then those steps are not included in the exported model. If your model has a preparation script, you will need to prepare the data yourself before scoring with the JAR. The expected input of the model (the features you add in an ObservationBuilder to build an Observation) is the output of your preparation script.

Export a PMML file for a model

Warning

The PMML export feature is experimental, with a best-effort support.

The PMML export feature requires a specific DSS license. Please contact your Dataiku Account Manager or Customer Success Manager

Provided you have this feature in your license, you can export a PMML file to score with standard PMML tools.

If your model is compatible with PMML export (see Limitations below):

  • Go to the trained model you wish to export (either a model trained in the Lab or a version of a saved model deployed in the Flow)
  • Click the Actions button on the top-right corner
  • Select Download as PMML

Limitations

The following preprocessing options are compatible with PMML export:

  • Numeric features with regular handling
  • Categorical features with impact-coding handling
  • No Vector, Image, Binary or Text features
  • No feature generation (numerical derivatives, combination…)
  • No dimensionality reduction

The following algorithms are compatible with PMML export:

  • Logistic Regression Classifier
  • Linear Regression
  • Decision Tree (Classification & Regression)
  • Random Forest (Classification & Regression)
  • Extra trees Classification
  • Gradient Boosting Regression

The PMML export feature does not support preparation scripts. In your Lab analysis where you trained your model, if the Script tab has steps then those steps are not included in the exported model. If your model has a preparation script, you will need to prepare the data yourself before scoring. The expected input of the model is the output of your preparation script.

The PMML export feature does not currently allow computing probabilities for classification algorithms. Currently it automatically outputs the class with the highest predicted probability, which is equivalent to setting the threshold to 0.5 for binary classifications.

The PMML export feature does not support probability calibration.