You are viewing the documentation for version
9.0 of DSS, which is outdated
An up to date version might be available
for the latest version
Distinct: get unique rows
The “distinct” recipe allows you to deduplicate rows in a dataset by retrieving unique rows. The rows are compared using the columns you specify. You can also choose to get the number of duplicates for each combination. It can be performed on any dataset in DSS, whether it’s a SQL dataset or not. The recipe offers visual tools to setup the specifications and aliases.
Depending on the input dataset types, DSS will adjust the engine it uses to execute the recipe, and choose between Hive, Impala, SparkSQL, plain SQL, and internal DSS. The available engines can be seen and selected by clicking on the cog below the “Run” button.