Esri Geo Enrichment

You can geocode and enrich data using Esri ArcGIS Online.

This capability is provided by the Esri geo enrichment plugin, which you need to install. Please see Installing plugins.

It uses the Esri ArcGIS Online API and allows you to:

  • Geocode postal addresses (obtain geo coordinates) based on a full address line or address components

  • Enrich data with a large set of data collections from more than 130 countries, based on XY coordinates or named areas (like postcodes)

This capability requires an ArcGIS Online account. Users can buy credits directly from ArcGIS Online.

How to use

First, open an ArcGIS Online account at https://www.arcgis.com/home/signin.html

Depending on the use case:

  • Geocode your postal addresses by adding an Esri geo enrichment geocoding recipe to your project.

  • Enrich your dataset containing XY coordinates or named statistical areas:

    • In both cases, create a Get catalog content for countries recipe and set the country or country list for the input data. If you want to get the entire set of data collections, create a dataset using the Show enrichment API coverage connector (no API call required).

    • For enrichment of XY coordinates:

      • Set the columns corresponding to the input dataset content.

      • Choose the data collections for enrichment.

      • Check the advanced configuration to save related geometry and set the batch size of XY coordinates per API call.

    • For enrichment based on named areas:

      • Set the columns corresponding to the input dataset content.

      • Choose the data collections for enrichment.

    • In both cases, checking “Add derivative variables” will add all percentages, averages, etc. for the requested data collections. This option may generate a large additional number of columns in the output dataset.

Additional information

This capability calls the ArcGIS Online API. You need an ArcGIS Online account. You may want to check the cost of each API call, which depends on the feature used (geocoding, geo enrichment, getting data collections). Note that this capability is developed for data storage use cases.

The API only supports numerical identifiers (object ids).

Country names can be given either as country names or in ISO format (for example ISO2 US or ISO3 USA). These can be given by the geocoding recipe or the dataset named Show enrichment API coverage. Country is required for enrichment. For geocoding, it is recommended to improve the precision of returned results.

Practical recommendations

  • Dataiku doesn’t automatically back up your data. As the data acquired by this capability has a cost, we recommend that you regularly back up the collected data. An option is available in enrichment recipes to export collected data into the tmp folder of your Dataiku data dir.

  • You may want to remove duplicated data in the input dataset before running enrichment (geocoding or geo enrichment) to avoid N calls to the API for the same data. After the enrichment on unique input data, you may join your original data with the output dataset.

  • Missing values in the input dataset are not submitted to the API.

  • When performing enrichment for several countries, please note that data collections are different (name and content) per country. Thus, cross-country enrichment may generate a huge number of columns. You may choose either to “generate the output as key-value” that can be processed with a preparation script or to create an enrichment recipe per country.

  • For enrichment at a specific named statistical level (e.g. postcode), you may try different settings on the data collection level names before enriching a large dataset. For instance, if you want to enrich data containing UK postcodes, you should first create a recipe named “get content catalog for countries” and check the output dataset to find the required layer_id. At that point, it’s not easy to choose between GB.PostcodeSectors, GB.PostcodeDistricts or GB.PostcodeAreas. This might depend on your input data. Thus, we recommend that you first create a small sample of your input data to check the corresponding layer. NB: the input postcode must be written in the right format for each layer. For example, for the Layer_id GB.PostcodeSectors, the postcode DL12 8UN should be formatted as DL12 8. Don’t forget that Dataiku Visual Prepare can help you in this matter.

  • The dataset Show enrichment API coverage is based on the country list available on the Esri API website as of 2016-02-17. If new countries are supported by the API, the plugin may be updated.

  • For both geocoding and enrichment, the capability provides logs for each batch pushed to the API so you can see which data has been successfully processed and which ended in error. You may want to use the log dataset in “Append” mode (in the Inputs/Outputs tabs of the recipe settings).