Distinct: get unique rows

The “distinct” recipe allows you to deduplicate rows in a dataset by retrieving unique rows. The rows are compared using the columns you specify. You can also choose to get the number of duplicates for each combination. It can be performed on any dataset in DSS, whether it’s a SQL dataset or not. The recipe offers visual tools to setup the specifications and aliases.

The “distinct” recipe can have pre-filters and post-filters. The filters documentation is available here.

Engines

Depending on the input dataset types, DSS will adjust the engine it uses to execute the recipe, and choose between Hive, Impala, SparkSQL, plain SQL, and internal DSS. The available engines can be seen and selected by clicking on the cog below the “Run” button.