Graph Clustering Recipe

The Graph clustering recipe computes community assignments on selected node groups and edge groups. It can write either a dataset of nodes or an enriched edge dataset.

The recipe runs on a graph database. See graph database recipe settings and algorithm execution and sampling.

Algorithms

The recipe can compute:

  • Fastgreedy

  • Multilevel

  • Infomap

  • Walktrap

Fastgreedy and Multilevel are available only when Directed graph is disabled.

Input / Output

Input
  • Graph folder (Optional): Dataiku Folder that contains your materialized graph database. Leave it empty to run on an unmanaged Neo4j database directly.

Output
  • Output dataset: Dataset containing the computed community assignments.

Settings

Node groups

Choose one or more node groups to include in the computation.

Edge groups

Select the edge groups that define the relationships to consider.

Directed graph

Enable this option to treat relationships as directed. Some algorithms are hidden when directed graphs are selected because they only support undirected graphs.

Weight property

Optionally select a numeric edge property to use as the relationship weight for clustering. The selected property must exist on all selected edge groups.

Output type

Choose Dataset of nodes to write one row per node, or Dataset of edges to keep an edge dataset enriched with community assignments for both endpoints.

Clustering algorithms

Use Select all to compute all algorithms supported by the current graph settings, or select individual algorithms.

Advanced parameters

  • Batch size: Number of result rows processed and written at a time.