Scoring engines¶
DSS allows you to select various engines in order to perform scoring of your models. This allows for faster execution in some cases.
Note
Scoring engines are only used to actually predict rows. While they are strongly related to training engines, some models trained with one engine can be scored with another.
Engines for the scoring recipe¶
The following scoring engines are available:
Local (DSS server only) scoring. This engine has two variants:
the Python engine provides wider compatibility but lower performance. Allows to compute Individual prediction explanations.
the Optimized scorer provides better performance and is automatically used whenever possible.
Spark: the scoring is performed in a distributed fashion on a Spark cluster
SQL (Regular): the model is converted to a SQL query and executed within a SQL database.
SQL (Snowflake): the model uses Snowflake extended push-down. This provides much faster execution within Snowflake, and extended compatibility. Please see Snowflake for details
The selected engine can be adjusted in the scoring recipe editor. Only engines that are compatible with the selected model and input dataset will be available.
The default settings the following:
If the model was trained with Spark MLLib or Sparkling Water, it will be scored with the Spark engine
Else it will be scored with the Local engine. The optimized engine will be used if available.
If you do not wish to score your model with the “optimized” engine for some reason, you may select “Force original backend” in the scoring recipe editor to revert to the original backend.
Choosing SQL (regular) engine (if your scored dataset is stored in an SQL database and your model is compatible) will generate a request to score the dataset. Note that this may create very large requests for complex models.
Engines for the API node¶
To score rows using the API node, the “Local” engine is used. This engine has two variants:
the Python engine provides wider compatibility but lower performance.
the Optimized scorer provides better performance and is automatically used whenever possible.
The Optimized engine is enabled if you check the “Use Java” option in the endpoint settings.
In other words, only models for which one of “Local (Python)” or “Local (Optimized)” is available can be scored in the API node (this excludes Sparkling-Water models)
Compatibility matrix¶
The compatibility matrix for all DSS models is the following.
Local (Python) and Local (Optimized) engines are available both in scoring recipes and API node. Spark and SQL engines are only available for the scoring recipes.
Note
For models trained with Python, the Optimized Local and Spark engines are only available if preprocessing is also compatible.
The SQL engine is only available if preprocessing is also compatible.
Algorithms¶
Training engine |
Algorithm |
Local (Optimized) |
Local (Python) |
Spark |
SQL (Snowflake) |
SQL (Regular) |
---|---|---|---|---|---|---|
Python in-memory |
Random forest |
Yes |
Yes |
Yes |
Yes |
Yes (except for multiclass) |
MLLib |
Random forest |
Yes |
No |
Yes |
Yes |
Yes (except for multiclass) |
Python in-memory |
Gradient Boosting |
Yes |
Yes |
Yes |
Yes |
Yes (except for multiclass) |
MLLib |
Gradient Boosting (no multiclass) |
Yes |
No |
Yes |
Yes |
Yes (except for multiclass) |
Python in-memory |
LightGBM |
Yes |
Yes |
Yes |
Yes |
Yes (except for multiclass) |
Python in-memory |
XGBoost |
Yes |
Yes |
Yes |
Yes |
Yes (except for multiclass) |
Python in-memory |
Extra Trees (Scikit) |
Yes |
Yes |
Yes |
Yes |
Yes (except for multiclass) |
Python in-memory |
Decision Trees |
Yes |
Yes |
Yes |
Yes |
Yes (no probas for multiclass) |
MLLib |
Decision Trees |
Yes |
No |
Yes |
Yes |
Yes (no probas for multiclass) |
Python in-memory |
Ordinary Least Squares, Lasso, Ridge |
Yes |
Yes |
Yes |
Yes |
Yes |
Python in-memory |
SGD |
Yes |
Yes |
Yes |
Yes |
Yes |
MLLib |
Linear Regression |
Yes |
No |
Yes |
Yes |
Yes |
Python in-memory |
Logistic Regression |
Yes |
Yes |
Yes |
Yes |
Yes |
MLLib |
Logistic Regression |
Yes |
No |
Yes |
Yes |
Yes |
Python in-memory |
Neural Networks |
Yes |
Yes |
Yes |
Yes |
Yes |
Python in-memory |
Deep Neural Network |
No |
Yes |
No |
No |
No |
Python in-memory |
Naive Bayes |
No |
Yes |
No |
No |
No |
MLLib |
Naive Bayes |
No |
No |
Yes |
No |
No |
Python in-memory |
K-nearest-neighbors |
No |
Yes |
No |
No |
No |
Python in-memory |
SVM |
No |
Yes |
No |
No |
No |
Python in-memory |
Custom models |
No |
Yes |
No |
No |
No |
MLLib |
Custom models |
No |
No |
Yes |
No |
No |
Sparkling-Water |
All models |
No |
No |
Yes |
No |
No |
Ensemble model |
No |
Yes |
No |
No |
No |
Preprocessing¶
Local (Optimized)¶
The following preprocessing options are available for Local(Optimized)
Numerical
No rescaling
Rescaling
Binning
Derivative features
Flag presence
Imputation
Drop row
Datetime cyclical encoding
Categorical
Dummification
Target encoding (impact and GLMM)
Ordinal
Frequency
Flag presence
Hashing (MLLib only)
Impute
Drop row
Text
Count vectorization
TF/IDF vectorization
Hashing (MLLib)
SQL (Snowflake)¶
The following preprocessing options are available for SQL (Snowflake) scoring :
Numerical
No rescaling
Rescaling
Binning
Derivative features
Flag presence
Imputation
Drop row
Datetime cyclical encoding
Categorical
Dummification
Target encoding (impact and GLMM)
Ordinal
Frequency
Flag presence
Hashing (MLLib only)
Impute
Drop row
Text
Count vectorization
TF/IDF vectorization
Hashing (MLLib)
SQL (Regular)¶
The following preprocessing options are available for SQL (regular) scoring :
Numerical
No rescaling
Rescaling
Binning
Flag presence
Imputation
Drop row
Categorical
Dummification
Impact coding
Ordinal
Frequency
Flag presence
Impute
Drop row
Text is not supported
Limitations¶
SQL (regular) engine¶
The following limitations exist with SQL (regular) scoring:
Some algorithms may not generate probabilities with SQL scoring (see table above)
Conditional output columns are not generated with SQL scoring
Preparation scripts are not compatible with SQL scoring
Multiclass logistic regression and neural networks require the SQL dialect to support the GREATEST function.