Dataset SQL Query Tool¶

The Dataset SQL Query tool is an Agent tool that translates natural language queries into SQL queries, and provides answers based on the execution of the SQL queries on the underlying datasets.

Configuration ¶

Once you’ve instantiated a Dataset SQL Query tool, you need to configure the following:

Dataset Selection: Select the project containing the dataset to query and the SQL dataset.
LLM Configuration:
- LLM:Select the LLM to use for the agent reasoning and SQL generation.
- Embedding LLM (optional): Select the embedding LLM for semantic value matching.
- Agent mode: Enable to use the multi-step agentic workflow. Disable to use the faster linear SQL generation pipeline.
Query Limits:
- Max Rows per Query: Determines the maximum number of rows the LLM will have access to from the output of the SQL query
- Max rows in Artifacts: Determines the maximum amount of rows generated in the artifacts
- Max rows in Sources: Determines the maximum amount of rows in the sources
- Agent recursion limit: Maximum LangGraph loop iterations
Security: Define whether the tool should be executed as user calling the tool or the end-user (only relevant if these two are distincts).
Description for LLM (optional): General instructions for usage of this tool (e.g. a description of the dataset, its attributes, associated metrics…)

Input and output ¶

The tool takes a natural language query as input, and returns a synthesized answer, artifacts of the SQL record(s) retrieved and the associated SQL query generated and executed.

Internal details ¶

The Dataset SQL Query tool takes a user’s natural language question and leverages the configured LLM to translate it into a valid SQL query.

When the tool is first instantiated it reads the target dataset to generate a “semantic model spec” (a structured representation of the dataset’s schema, columns, and metadata) and detects the specific SQL dialect of the underlying database.

When the agent mode is disabled, the tool processes user queries in a straight-through pipeline:

Security Context Resolution: It determines who is executing the query to ensure the query runs with the appropriate data access permissions.
Semantic Value Matching (Optional): If an embedding llm was configured, the tool map terms in the user’s question to the exact categorical values stored in the database (e.g., matching the word “NYC” in the prompt to “New York City” in the dataset).
SQL Generation: The core LLM uses the natural language question, the semantic model spec, and the identified SQL dialect to generate a SQL query.
Validation & Execution The generated SQL query is validated and executed.
Payload: It returns a structured response back to the caller containing:
1. A synthesized natural language answer explaining the records returned.
2. The actual SQL query that was generated and executed.
3. The resulting data records (artifacts) retrieved from the database.

Dataset SQL Query Tool¶

Configuration¶

Input and output¶

Internal details¶

Configuration ¶

Input and output ¶

Internal details ¶