Recipes¶
The recipes below can work with a graph database published by Visual Graph to a Dataiku Folder. For Neo4j, Execute Cypher and Compute PageRank can also target an unmanaged database directly by leaving the graph folder empty and selecting Neo4j (unmanaged), the Neo4j connection, and the database name in the recipe settings.
Execute Cypher recipe¶
The Execute Cypher recipe allows you to run a query against a graph database and save the tabular results to a dataset.
You can target a graph published by Visual Graph to a Dataiku Folder or, for Neo4j, an unmanaged database selected directly in the recipe settings.
This is useful for feature engineering, generating analytical reports, or exporting subsets of your graph.
Input / Output¶
- Input
Graph folder (Optional): Dataiku Folder that contains your materialized graph database. Leave it empty to query an unmanaged Neo4j database directly.
- Output
Output dataset: Output dataset to store the results of the executed query.
Settings¶
Database type
If you provided a graph database folder, the database type is detected from that folder. Otherwise, select Neo4j (unmanaged) to query an unmanaged Neo4j database directly.
Neo4j connection
If you selected a Neo4j database type, select the Neo4j connection to use for querying the database.
Database name
If you are querying an unmanaged Neo4j database directly, select the database name to use.
Cypher query
Enter the Cypher query to be executed.
Note
Execute Cypher runs read-only queries against the selected graph database. When you query an unmanaged Neo4j database, a read-only Neo4j preset can help protect that database from accidental changes.
Compute PageRank recipe¶
The Compute PageRank recipe calculates the PageRank centrality for nodes in your graph. This algorithm is a common technique for identifying the most influential nodes based on the graph’s structure.
You can run it on a graph published by Visual Graph to a Dataiku Folder or, for Neo4j, on an unmanaged database selected directly in the recipe settings.
To use this recipe, select Compute PageRank from the list of recipes under Visual Graph.
Note
If you run Compute PageRank on the built-in graph database, see the built-in graph database limitations.
Warning
This recipe requires the Graph Data Science library to be installed on the Neo4j server.
Input / Output¶
- Input
Graph folder (Optional): Dataiku Folder that contains your materialized graph database. Leave it empty to run PageRank on an unmanaged Neo4j database directly.
- Output
Output dataset: Output dataset to store the results of PageRank algorithm.
Settings¶
Database type
If you provided a graph database folder, the database type is detected from that folder. Otherwise, select Neo4j (unmanaged) to run PageRank on an unmanaged Neo4j database directly.
Neo4j connection
If you selected a Neo4j database type, select the Neo4j connection to use for querying the database.
Database name
If you are running PageRank on an unmanaged Neo4j database directly, select the database name to use.
Warning
When running Compute PageRank against Neo4j, the selected Neo4j connection must allow write access to the target database.
Select node groups to rank
Choose one or more node groups to include in the PageRank calculation. Only nodes from these groups will be ranked.
Select edge groups used to rank nodes
Select the edge groups that define the edges to be considered during the ranking process.
Algorithm parameters
Damping factor: The probability at each step that a random walker will continue following an outgoing edge. A typical value is 0.85.
Max iterations: The maximum number of iterations the algorithm will run.
Tolerance: The minimum change in scores between iterations required for the algorithm to be considered converged.
Normalize initial scores to sum to 1: If enabled, the initial scores for all nodes will be normalized to sum to 1.
Advanced parameters
Batch Size: Controls the number of results loaded into memory at a time. Adjust this value to manage memory usage when processing large graphs.
Warning
The built-in graph database, currently powered by Kuzu, supports multiple concurrent readers or a single writer.
Running several Visual Graph recipes at the same time on the same built-in database can lead to lock errors. This can happen, for example, if Compute PageRank runs in parallel with another Visual Graph recipe such as Execute Cypher.
To avoid this, run Visual Graph recipes sequentially on the same built-in graph database.
Collect nodes recipe¶
Collect nodes & edges recipes can prepare your graph data for export to external systems like Neo4j.
Before using this recipe, you must first design your graph and create a Saved configuration within the Visual Graph Editor webapp.
Input / Output¶
- Input
Saved configurations dataset: Dataset containing the saved graph configuration you wish to use.
Datasets used as sources of the node group: Select all the original source datasets that are referenced for the target node group within your saved configuration.
Warning
You must explicitly provide all required source datasets as inputs to the recipe. Due to Dataiku’s security model, the recipe will fail if any source dataset defined in the configuration is not declared as an input.
- Output
Output dataset: Dataset to store the collected nodes. The output will contain columns for the node identifier and all properties as defined in the saved configuration.
Settings¶
Select a saved configuration
From the dropdown list, choose the specific Saved configuration you want to process.
Select a node group
Select the node group whose members you want to collect into the output dataset.
Collect edges recipe¶
The configuration is similar to the Collect nodes recipe.