Documentation
Discussions
Setup & Configuration
Using Dataiku DSS
Plugins & Extending Dataiku DSS
General Discussion
Job Board
Community Resources
Product Ideas
Knowledge
Getting Started
Knowledge Base
Reference Documentation
Developer Guide
Academy
Quick Start Programs
Learning Paths
Certifications
Course Catalog
Academy Discussions
Community Programs
Upcoming User Events
Find a User Group
Past Events
Community Conundrums
Dataiku Neurons
Banana Data Podcast
What's New
User's Guide
DSS concepts
Connecting to data
Exploring your data
Schemas, storage types and meanings
Data preparation
How to Copy Prepare Recipe Steps
Sampling
Execution engines
Processors reference
Extract from array
Fold an array
Sort array
Concatenate JSON arrays
Discretize (bin) numerical values
Change coordinates system
Copy column
Rename columns
Concatenate columns
Delete/Keep columns by name
Column Pseudonymization
Count occurrences
Convert currencies
Create if, then, else statements
Extract date elements
Compute difference between dates
Format date with custom format
Parse to standard date format
Split e-mail addresses
Enrich from French department
Enrich from French postcode
Enrich with build context
Enrich with record context
Extract ngrams
Extract numbers
Fill column
Fill empty cells with fixed value
Filter rows/cells on date
Filter rows/cells with formula
Filter invalid rows/cells
Filter rows/cells on numerical range
Filter rows/cells on value
Find and replace
Flag rows/cells on date range
Flag rows with formula
Flag invalid rows
Flag rows on numerical range
Flag rows on value
Fold multiple columns
Fold multiple columns by pattern
Fold object keys
Formula
Fuzzy join with other dataset (memory-based)
Generate Big Data
Compute distance between geopoints
Extract from geo column
Geo-join
Resolve GeoIP
Create area around a geopoint
Create GeoPoint from lat/lon
Extract lat/lon from GeoPoint
Extract with grok
Flag holidays
Split invalid cells into another column
Join with other dataset (memory-based)
Extract with JSONPath
Group long-tail values
Compute the average of numerical values
Translate values using meaning
Normalize measure
Merge long-tail values
Move columns
Negate boolean value
Force numerical range
Generate numerical combinations
Convert number formats
Nest columns
Unnest object (flatten JSON)
Extract with regular expression
Pivot
Python function
Split HTTP Query String
Remove rows where cell is empty
Round numbers
Simplify text
Split and fold
Split and unfold
Split column
Switch case
Transform string
Tokenize text
Transpose rows to columns
Triggered unfold
Unfold
Unfold an array
Convert a UNIX timestamp to a date
Fill empty cells with previous/next value
Split URL (into protocol, host, port, …)
Classify User-Agent
Generate a best-effort visitor id
Zip JSON arrays
Filtering and flagging rows
Managing dates
Reshaping
Geographic processors
Charts
Interactive statistics
Machine learning
The Flow
Visual recipes
Recipes based on code
Code notebooks
MLOps
Webapps
Code Studios
Code reports
Dashboards
Workspaces
Dataiku Applications
Working with partitions
DSS and SQL
DSS and Python
DSS and R
DSS and Spark
Code environments
Collaboration
Specific Data Processing
Time Series
Geographic data
Text
Images
Audio
Video
Automation & Deployment
Automation scenarios, metrics, and checks
Production deployments and bundles
API Node & API Deployer: Real-time APIs
Governance
APIs
Python APIs
R API
Public REST API
Additional APIs
Installation & Administration
Installing and setting up
Elastic AI computation
DSS in the cloud
DSS and Hadoop
Metastore catalog
Operating DSS
Security
User Isolation
Other topics
Plugins
Streaming data
Formula language
Custom variables expansion
Sampling methods
Accessibility
Troubleshooting
Release notes
Other Documentation
Third-party acknowledgements
Hive RCFile
MapR
Hive SequenceFile
Guided setup 2: Use an existing VPC
Impute with computed value
Columns selection
Mitigation for PwnKit (CVE-2021-4034)
Incorrect access control allows users to edit discussions
Ability to tamper with creation and ownership metadata
Directory traversal vulnerability in Shapefile parser
Incorrect access control in Jupyter notebooks
Stored XSS in object titles
Stored XSS in object titles
Access control issue on downloading project exports
Access control issue on changing dataset connections
Access control issue on dashboards listing
Access control issue on saving project permissions
PwnKit Linux vulnerability (CVE-2021-4034)
Access control issue on foreign managed folders
Cross-script-scripting on model reports
Code execution through server-side-template-injection
Insufficient access control on managed cluster logs and configuration
Multiple access control issues
Multiple access control issues
Stored XSS in dataset settings
Stored XSS in machine learning results
Insufficient access control on export to dataset
Remote code execution in API designer
Session credential disclosure
Insufficient access control to project variables
Insufficient access control to projects list and information
Insufficient access control in troubleshooting tools
Credentials disclosure through path traversal
Cross-site-scripting through custom metric names
Cross-site-scripting through imported Jupyter notebooks
Host blacklist bypass
Takeover of Jupyter notebooks
Missing authentication on internal API call
Cross-site-scripting through Jupyter notebooks
Race condition on UIF can lead to account takeover
Compatibility of DSS with CIS Benchmark Level 1 on RHEL/CentOS
Third-party acknowledgements (internal usage)
Unstructured data
Dataiku DSS
You are viewing the documentation for version
11
of DSS.
»
Data preparation
»
Processors reference
»
Merge long-tail values
Merge long-tail values
¶
This processor merges values below a certain appearance threshold.