OpenWeatherMap¶
You can retrieve weather data from the OpenWeatherMap API and enrich datasets containing geographic coordinates.
This capability is provided by the OpenWeatherMap plugin, which you need to install. Please see Installing plugins.
Setup¶
General setup¶
Create an OpenWeatherMap account here.
Log in to the platform and go to the API keys tab.
Use the default key or create a new one. Copy the key.
In Dataiku, go to App > Plugins > Installed > OpenWeatherMap > Settings > OpenWeatherMap API configuration.
Add a new preset, name it, and fill in the details:
OpenWeatherMap API key: The API key you just copied.
System of units: The default system of units you want to use. It can be overwritten when running the recipe.
Language: Language of the text describing the weather. It can be overwritten when running the recipe.
Cache setup¶
OpenWeatherMap uses a cache system to store data locally, in order to avoid repeating identical queries. You can change preferences by following these steps:
In Dataiku, go to App > Plugins > Installed > OpenWeatherMap > Settings > Parameters.
Choose the cache storage location between the following options:
User $HOME directory: The cache will be stored under the $HOME/.cache/dss/plugins/open_weather_map folder.
Custom: You can choose a custom location. Write the absolute location in the input below, for example
/Users/johnsnow/Documents/dss_cache. Be cautious using this with a UIF instance as permission errors could occur.None: Do not use a cache.
Choose the cache parameters:
Cache size (in megabytes): The maximum size of the cache file. The default is 1 GB.
Cache eviction policy: The way you want the data to be deleted once the maximum size is reached. You have the choice between four modes:
Least Recently Stored: Delete the oldest cache records first.
Least Recently Used: Delete cache records that were not used for the longest time.
Least Frequently Used: Delete cache records that are the least frequently used.
No eviction: Override the cache size; the cache will grow without bounds.
How to use¶
The plugin is made of two main components:
A connector that allows you to retrieve data directly from the OpenWeatherMap API and put it in a new dataset.
A recipe that allows you to add weather information to your data containing latitude, longitude, and date columns.
OpenWeatherMap connector¶
Go to your Dataiku flow
Select OpenWeatherMap in the plugin section of the dataset menu
Click on OpenWeatherMap weather generating
Pick a preset of parameters
Write the latitude and the longitude of the location you want the weather of.
Choose the desired granularity
Daily information is available 5 days in the past and 7 days in the future. The output dataset is 12 rows long.
Hourly information is available 5 days in the past and 2 days in the future. The output dataset is 168 rows long.
You can configure more settings by checking Advanced mode. The available options are:
Data type: You can choose whether you wish to have historical data, forecast data, or both. The default is both.
System of units: It overwrites the settings of the preset for this specific job.
Language: It overwrites the settings of the preset for this specific job.
Use cache: It overwrites the settings of the preset for this specific job.
Parse output JSON: If you prefer to get a unique column containing the entire response in JSON format, you can check this.
OpenWeatherMap recipe¶
Go to your Flow.
Select OpenWeatherMap in the plugin section of the recipe menu.
Click on OpenWeatherMap Weather mapping.
Select the input dataset and the output dataset, and then click on CREATE.
Fill in the latitude and longitude columns.
Select whether you need the current weather or the weather at a date provided by a column in date format.
Run the recipe.
In addition to the weather data, a column named error will be added to the output dataset.
If something went wrong when retrieving the data for the specific location/date pair,
this column will tell you what it is.
The errors usually come from the fact that the date is not in the available range [today - 5 days; today + 7 days]
or that you reached your API calls limit.
You can configure more settings by checking Advanced mode. The available options are:
System of units: It overwrites the settings of the preset for this specific job.
Language: It overwrites the settings of the preset for this specific job.
Use cache: It overwrites the settings of the preset for this specific job.
Parse output JSON: If you prefer to get a unique column containing the entire response in JSON format, you can check this. You should check this option if you use the option Append instead of overwrite.