This tutorial requires you to have completed Tutorial 103 on quick predictive analytics.
In this tutorial, you will learn
- how to fine tune your predictive model,
- how to use a predictive model to score another dataset,
- how to hack the code of the bench’s models.
Table of contents
Have you noticed that there were always two results on the bench: Random Forest and Ridge. DSS chooses automatically two algorithms for your bench. Roughly speaking, algorithms correspond to the way our trained models are learning the patterns and interactions in the features to predict the revenue. Each algorithm leads to a slightly different model with performs good or bad depending on the data it is learning on. We must then compare which model performs the best. Click on Select all on the left side of the screen.
The right side of the screen show all kind of metrics to measure the score of models. The best one appears on green and the worst one in red. You can click on the header of the metrics to sort the results.
We might want to change the algorithms. Let’s go to the Algorithms tab.
What we see here is the list of learning algorithms available in the Dataiku Science Studio. Let us try some different algorithms settings.
In the random forest section, set the number of trees to 25 and depth to 0. Turn on Ordinary Least Squares, Ridge regression and Lasso regression. Save your bench and launch your train session.
Wow! Now the model are performing very well with scores close to 0.9!
Select them all, to see at one glance which one is the best.
The random forest (25 estimators) wins. Select this model, click on the Information tab and change its name to BEST MODEL random forest (25 estimators)
... and click on the star on the left side (next to its name) to put this model in your favorites. Doing so, it will be very easy (just by going in the Favorites tab) to find again your favorite models in the future when you’ll have oodles of trained models.
Now that we have a good model, let see what we can do with it. Click the Use tab.
We can use a model in three manners that you are going to learn now.
We are going to compute the revenue prediction of new customers using our model. Let start by downloading the new customers dataset. Click on the dropdown menu on the top left and choose Datasets.
Create a new FTP/ HTTP / SSH dataset.
and fill HTTP input with
and create the dataset.
Now go back to your model bench by clicking on the dropdown menu on the top left and choose Models.
Open your model bench
... and find your best results. Go to the Use tab and click on Create a recipe to compute prediction.
Fill the form to score the dataset new_customers, name the model my_best_model and store output dataset in new_customers_scored on filesystem_managed.
Congratulation. Your model exists now in the flow.
Right click on the output dataset, choose the Build item
... and click on Run in the modal window. Your revenue predictions are being computed. On completion click on the Explore output dataset.
Now you can see at a glance how much revenue new customer are likely to generate!
Data is getting less worthy with time because customers are changing with time. To catch up on the last trends in customers’ habits, it is better to retrain periodically the model on recent data. This is the purpose of the retrain recipe: keep all the tuning you have made on a model and renew the learning of the prediction. Have a look at the onboarding project named DATAIKU TSHIRTS to get an idea of how this can be used. Click on the home button to see all your projects, select DATAIKU TSHIRTS and visit the flow.
Do you like to see what’s under the hood and hack the tuning of your model? Let’s export our training bench into a IPython notebook. Click on Create a Python notebook
... and create a notebook from that model! An IPython notebook has been created.
You can now play with the code and fine tune everything. Enjoy!