Data preparation¶
Visual data preparation in DSS lets you create data cleansing, normalization and enrichment scripts in a visual and interactive way.
You can create these scripts directly in a Prepare recipe, or in a Visual Analysis that can be deployed to the Flow as a Prepare recipe.
Note
For a step by step introduction to the data preparation component of Data Science Studio, we recommend that you follow our quick start for data preparation.
- How to Copy Prepare Recipe Steps
- Sampling
- Execution engines
- Processors reference
- Extract from array
- Fold an array
- Sort array
- Concatenate JSON arrays
- Discretize (bin) numerical values
- Change coordinates system
- Copy column
- Rename columns
- Concatenate columns
- Delete/Keep columns by name
- Column Pseudonymization
- Count occurrences
- Convert currencies
- Create if, then, else statements
- Extract date elements
- Compute difference between dates
- Format date with custom format
- Parse to standard date format
- Split e-mail addresses
- Enrich from French department
- Enrich from French postcode
- Enrich with build context
- Enrich with record context
- Extract ngrams
- Extract numbers
- Fill column
- Fill empty cells with fixed value
- Impute with computed value
- Filter rows/cells on date
- Filter rows/cells with formula
- Filter invalid rows/cells
- Filter rows/cells on numerical range
- Filter rows/cells on value
- Find and replace
- Flag rows/cells on date range
- Flag rows with formula
- Flag invalid rows
- Flag rows on numerical range
- Flag rows on value
- Fold multiple columns
- Fold multiple columns by pattern
- Fold object keys
- Formula
- Fuzzy join with other dataset (memory-based)
- Generate Big Data
- Compute distance between geospatial objects
- Extract from geo column
- Geo-join
- Resolve GeoIP
- Create area around a geopoint
- Create GeoPoint from lat/lon
- Extract lat/lon from GeoPoint
- Extract with grok
- Flag holidays
- Split invalid cells into another column
- Join with other dataset (memory-based)
- Extract with JSONPath
- Group long-tail values
- Compute the average of numerical values
- Translate values using meaning
- Normalize measure
- Merge long-tail values
- Move columns
- Negate boolean value
- Force numerical range
- Generate numerical combinations
- Convert number formats
- Nest columns
- Unnest object (flatten JSON)
- Extract with regular expression
- Pivot
- Python function
- Split HTTP Query String
- Remove rows where cell is empty
- Round numbers
- Simplify text
- Split and fold
- Split and unfold
- Split column
- Switch case
- Transform string
- Tokenize text
- Transpose rows to columns
- Triggered unfold
- Unfold
- Unfold an array
- Convert a UNIX timestamp to a date
- Fill empty cells with previous/next value
- Split URL (into protocol, host, port, …)
- Classify User-Agent
- Generate a best-effort visitor id
- Zip JSON arrays
- Filtering and flagging rows
- Managing dates
- Reshaping
- Geographic processors