Tutorial 104

This tutorial requires you to have completed Tutorial 103 on quick predictive analytics.

In this tutorial, you will learn

  • how to fine tune your predictive model,
  • how to use a predictive model to score another dataset,
  • how to hack the code of the bench’s models.

Fine tuning a model

Have you noticed that there were always two results on the bench: Random Forest and Ridge. DSS chooses automatically two algorithms for your bench. Roughly speaking, algorithms correspond to the way our trained models are learning the patterns and interactions in the features to predict the revenue. Each algorithm leads to a slightly different model with performs good or bad depending on the data it is learning on. We must then compare which model performs the best. Click on Select all on the left side of the screen.

../../_images/114.png

The right side of the screen show all kind of metrics to measure the score of models. The best one appears on green and the worst one in red. You can click on the header of the metrics to sort the results.

../../_images/214.png

We might want to change the algorithms. Let’s go to the Algorithms tab.

../../_images/35.png

What we see here is the list of learning algorithms available in the Dataiku Science Studio. Let us try some different algorithms settings.

In the random forest section, set the number of trees to 25 and depth to 0. Turn on Ordinary Least Squares, Ridge regression and Lasso regression. Save your bench and launch your train session.

../../_images/43.png

Wow! Now the model are performing very well with scores close to 0.9!

../../_images/53.png

Select them all, to see at one glance which one is the best.

../../_images/63.png

The random forest (25 estimators) wins. Select this model, click on the Information tab and change its name to BEST MODEL random forest (25 estimators)

../../_images/73.png

... and click on the star on the left side (next to its name) to put this model in your favorites. Doing so, it will be very easy (just by going in the Favorites tab) to find again your favorite models in the future when you’ll have oodles of trained models.

../../_images/83.png

Now that we have a good model, let see what we can do with it. Click the Use tab.

../../_images/93.png

We can use a model in three manners that you are going to learn now.

Usage 1: Score a dataset with your model

We are going to compute the revenue prediction of new customers using our model. Let start by downloading the new customers dataset. Click on the dropdown menu on the top left and choose Datasets.

../../_images/103.png

Create a new FTP/ HTTP / SSH dataset.

../../_images/115.png

and fill HTTP input with

http://doc.dataiku.com/tutorials/data/104/new_customers.csv

and create the dataset.

../../_images/123.png

Now go back to your model bench by clicking on the dropdown menu on the top left and choose Models.

../../_images/133.png

Open your model bench

../../_images/143.png

... and find your best results. Go to the Use tab and click on Create a recipe to compute prediction.

../../_images/153.png

Fill the form to score the dataset new_customers, name the model my_best_model and store output dataset in new_customers_scored on filesystem_managed.

../../_images/162.png

Congratulation. Your model exists now in the flow.

../../_images/173.png

Right click on the output dataset, choose the Build item

../../_images/183.png

... and click on Run in the modal window. Your revenue predictions are being computed. On completion click on the Explore output dataset.

../../_images/193.png

Now you can see at a glance how much revenue new customer are likely to generate!

../../_images/203.png

Usage 2: Periodically retrain your model

Data is getting less worthy with time because customers are changing with time. To catch up on the last trends in customers’ habits, it is better to retrain periodically the model on recent data. This is the purpose of the retrain recipe: keep all the tuning you have made on a model and renew the learning of the prediction. Have a look at the onboarding project named DATAIKU TSHIRTS to get an idea of how this can be used. Click on the home button to see all your projects, select DATAIKU TSHIRTS and visit the flow.

../../_images/215.png

Usage 3: Hack the code

Do you like to see what’s under the hood and hack the tuning of your model? Let’s export our training bench into a IPython notebook. Click on Create a Python notebook

../../_images/223.png

... and create a notebook from that model! An IPython notebook has been created.

../../_images/233.png

You can now play with the code and fine tune everything. Enjoy!

On board with us?

Congratulations! You have just completed our tutorials on predictive analysis with Data Science Studio.

There’s a lot more to come with Dataiku. For a deeper dive, you can check out the rest of this documentation.

Follow us on Twitter @dataiku and check out our blog !