Data preparation¶
Visual data preparation in DSS lets you create data cleansing, normalization and enrichment scripts in a visual and interactive way.
You can create these scripts directly in a Prepare recipe, or in a Visual Analysis that can be deployed to the Flow as a Prepare recipe.
Note
For a step by step introduction to the data preparation component of Data Science Studio, we recommend that you follow our Basic Courses. This section will focus on advanced and reference topics related to the data preparation component.
We also have courses on advanced data preparation.
- How to Copy Prepare Recipe Steps
- Sampling
- Execution engines
- Processors reference
- Extract from array
- Fold an array
- Sort array
- Concatenate JSON arrays
- Discretize (bin) numerical values
- Change coordinates system
- Copy column
- Rename columns
- Concatenate columns
- Delete/Keep columns by name
- Column Pseudonymization
- Count occurrences
- Convert currencies
- Create if, then, else statements
- Extract date elements
- Compute difference between dates
- Format date with custom format
- Parse to standard date format
- Split e-mail addresses
- Enrich from French department
- Enrich from French postcode
- Enrich with build context
- Enrich with record context
- Extract ngrams
- Extract numbers
- Fill column
- Fill empty cells with fixed value
- Filter rows/cells on date
- Filter rows/cells with formula
- Filter invalid rows/cells
- Filter rows/cells on numerical range
- Filter rows/cells on value
- Find and replace
- Flag rows/cells on date range
- Flag rows with formula
- Flag invalid rows
- Flag rows on numerical range
- Flag rows on value
- Fold multiple columns
- Fold multiple columns by pattern
- Fold object keys
- Formula
- Fuzzy join with other dataset (memory-based)
- Generate Big Data
- Compute distance between geopoints
- Extract from geo column
- Geo-join
- Resolve GeoIP
- Create area around a geopoint
- Create GeoPoint from lat/lon
- Extract lat/lon from GeoPoint
- Extract with grok
- Flag holidays
- Split invalid cells into another column
- Join with other dataset (memory-based)
- Extract with JSONPath
- Group long-tail values
- Compute the average of numerical values
- Translate values using meaning
- Normalize measure
- Merge long-tail values
- Move columns
- Negate boolean value
- Force numerical range
- Generate numerical combinations
- Convert number formats
- Nest columns
- Unnest object (flatten JSON)
- Extract with regular expression
- Pivot
- Python function
- Split HTTP Query String
- Remove rows where cell is empty
- Round numbers
- Simplify text
- Split and fold
- Split and unfold
- Split column
- Switch case
- Transform string
- Tokenize text
- Transpose rows to columns
- Triggered unfold
- Unfold
- Unfold an array
- Convert a UNIX timestamp to a date
- Fill empty cells with previous/next value
- Split URL (into protocol, host, port, …)
- Classify User-Agent
- Generate a best-effort visitor id
- Zip JSON arrays
- Filtering and flagging rows
- Managing dates
- Reshaping
- Geographic processors