Geo-join

This processor performs a geographic nearest-neighbour join between two datasets with geo coordinates.

Example use case

You are processing a dataset of geo-tagged events. You have another dataset containing geo-tagged points of interest, and you want, for each event, to retrieve the identifier and details of the nearest point of interest.

Requirements

The dataset being processed must contain two columns containing the latitude and longitude. The ‘other’ dataset you join with must also contain two columns with latitude and longitude.

For the dataset being processed, the columns may have been generated by a previous step (like the GeoIP resolver).

Parameters

The processor needs the following parameters:

  • Latitude and longitude column in current dataset (which may have been generated by a previous step)

  • Name of the dataset to join with. Note that the dataset to join with must be in the same project.

  • Latitude and longitude column in the joined dataset.

  • Columns from the joined dataset that should be copied to the local dataset, for the nearest row.

Output

The processor outputs all columns from the joined dataset. For each row of the current dataset, the columns will contain the data from the nearest row in the joined dataset.

In addition, the processor outputs a ‘join_distance’ column containing the distance of the found nearest neightbour, in kilometers.