Introduction¶
Generalized Linear Models (GLMs) are a generalization of Ordinary Linear Regression. GLMs allow:
The response variable to be chosen from any exponential distribution (not only Gaussian).
The relationship between the linear model and the response variable to be defined by any link function (not only the identity function).
These models provide flexibility in modeling dependencies between regressors and the response variable. GLMs are widely used in the Insurance industry to address specific modeling needs. The GLM implementation uses the glum package. Regression Splines rely on patsy.
Prerequisites and limitations¶
The GLM plugin is available through the plugin store and requires Dataiku V14+. When downloading the plugin, you will be prompted to create a code environment using Python 3.8, 3.9, or 3.10.
To use Generalized Linear Model Regression and Classification algorithms in visual ML, a specific code environment must be selected in the runtime environment. An exception is raised if not. This code environment must include the required visual ML packages and the glum package.
Note
If you use the integrated visual GLM interface, you do not need to set up a specific code environment, it will be enforced as the plugin code environment.
How to set up¶
Create a Python 3.8, 3.9 or 3.10 code environment in Administration > Code Envs > New Env.
Go to Packages to install.
Click on Add set of packages.
Add the Visual Machine Learning packages.
Add the glum package: glum==2.6.0
Click on Save and Update.
Go back to the Runtime Environment.
Select the environment that has been created.
Once set up, the plugin components listed on the plugin page can be used in your Dataiku projects.