Formula language¶
DSS includes a language to write formulas, much like a spreadsheet.
Formulas can be used:
In data preparation, to create new columns, filter rows or flag rows
More generally, to filter rows in many places of DSS:
In the Filtering recipe, to filter rows
In Machine Learning, to define the extracts to use for train and test set
In the Python and Javascript APIs, to obtain partial extracts from the datasets
In the Public API, to obtain partial extracts from the datasets
In the grouping, window, join and stack recipes, to perform pre and post filtering
Basic usage¶
Formulas define an expression that applies row per row.
Assuming that you have a dataset with columns N1 (numeric), N2 (numeric) and S (string) , here are a few example formulas:
2 + 2
N1 + N2
min(N1, N2)
# Returns the smallest of N1 and N2replace(S, 'old', 'new')
# Returns the value of S with ‘old’ replaced by ‘new’if (N1 > N2, 'big', 'small')
# Returns big if N1 > N2, small otherwise
Note
We also have a How-To that lists common use cases with examples.
Reading column values¶
In almost all formulas, you’ll need to read the values of the columns for the current row.
When the column has a “simple” name (i.e: starting by a letter, contains only letters, numbers and underscores), you just need to use the name of the column in the formula: N1 + 4
For other cases, you can use:
val("column with spaces")
: returns the value of “column with spaces”strval("column with spaces")
: returns the value of “column with spaces” as a stringnumval("column with spaces")
: returns the value of “column with spaces” as a number
Warning
mycolumn
evaluates to the value of the column named mycolumn
Thus, strval(mycolumn)
would not work because it would try to read as a string the value of
the column whose name is the value of the mycolumn column.
Instead, use strval("mycolumn")
.
You may also need to use strval
to avoid DSS automatically typing a numerical variable. See Variables typing and autotyping for more details.
Accessing values from previous rows¶
The val
, strval
, and numval
functions optionally accept an offset argument. This allows you to access values from previous rows in the dataset. See Value access functions for more details.
Note
The offset argument is only available in the Prepare recipe and only with the DSS engine.
Variables typing and autotyping¶
Variables in the formula language can have one of the following types: string, integer, decimal, array, object, boolean, date
Regardless of the schema of the dataset, or the meanings of the columns (in the case of working on dynamically-generated columns), the following rules are applied:
If column values are “standard” decimal, they are automatically parsed to decimal
If column values are “standard” integer, they are automatically parsed to integer
Else, they are kept as string
Columns containing dates, arrays, objects or booleans are not automatically converted to the matching Formula type. However:
all functions that require an array will automatically attempt to convert a string input to array (using the regular DSS JSON syntax)
all functions that require a date will automatically attempt to convert a string input to date (using the ISO-8601 format)
In other words, if the column “begin_date” is a date (and thus contains properly-formatted ISO-8601):
type(begin_date)
returns “string”inc(begin_date, 2, "days")
works as expected because the inc function performed automatic conversion to date
Avoiding auto-typing¶
There are some cases where auto-typing is not a good thing. For example, a value like 012345
would be automatically converted to the number 12345
.
To avoid autotyping to numerical, use the
strval("mycolumn")
function (see above)To manually obtain an array or object, use the
parseJson
functionTo manually obtain a number, use the
numval("mycolumn")
functionTo manually obtain a date, use the
asDate
functionTo manually obtain a boolean, use the
asBool
function
Boolean values¶
The formula language uses “True” and “False”, with quotes, as boolean values for true and false.
Operators¶
The formula language supports the classical arithmetic operators:
Regular math operators:
+
,-
,*
,/
Comparison operators (evaluate to booleans):
>
,>=
,<
,<=
,==
,!=
Arithmetic operators:
//
(integer division) and%
(modulo)Boolean operators:
&&
,||
The
+
operator also performs string concatenation.The comparison operators (and only them) can operate on dates. For dates arithmetic, see the
diff
andinc
functions.
Array and object operations¶
Formula support accessing array elements and object keys using the traditional Python/Javascript syntax:
array[0]
object["key"]
object.key
(only valid if ‘key’ is a valid identifier, i.e. matches[A-Za-z0-9_]*
)
Note: this requires that you actually have an array or object. You might need to use the parseJson
function (see above paragraph about typing)
Object notations¶
For all functions, you can use them the “regular” way: replace(str, 'a', 'b')
or in the “object” way: str.replace('a', 'b')
In object notation, the first argument to the function is replaced by its ‘context’.
For example, the two syntaxes are equivalent:
length(trim(replace(foo, 'a', 'b')))
foo.replace('a', 'b').trim().length()
DSS variables¶
You can retrieve the value of DSS variables via the syntax ${variable_name}
or variables["variable_name"]
. Both syntaxes have slightly different behaviors.
variables[“variable_name”]¶
DSS evaluates variables["variable_name"]
as JSON during the formula evaluation. If there is no variable defined with the supplied name, an empty cell is returned.
For example, given the following global variables
{
"max_height": 500,
"warning_msg": "too tall",
"msgs": {"ok":"OK", "over":"too tall"}
}
The formula if(height < variables["max_height"], "OK", variables["warning_msg"])
is evaluated as if(height < 500.0, "OK", "too tall")
. Note that the max_height is evaluated as 500.0 as JSON consider all numbers to be decimal.
The formula if(height < variables["max_height"], "OK", variables["msgs"]["over"])
is evaluated as if(height < 500.0, "OK", "too tall")
.
${variable_name}¶
During processing DSS replaces the ${variable_name}
placeholder with its verbatim content and then evaluate the formula. If there is no variable defined with the supplied name, an error is returned.
For example, given the following global variables:
{
"max_height": 500,
"warning_msg": "too tall",
"msgs": {"ok":"OK", "over":"too tall"}
}
The formula if(height < ${max_height}, "OK", "${warning_msg}")
is converted into if(height < 500, "OK", "too tall")
before being evaluated. Note that quotes have been added around ${warning_msg} to output an actual string.
The formula if(height < ${max_height}, "OK", ${warning_msg})
is converted into if(height < 500, "OK", too tall)
before being evaluated and will therefore issue a syntax error.
The formula if(height < ${max_height}, "OK", ${msgs}["over"])
is converted into if(height < 500, "OK", {"ok":"OK", "over":"too tall"}["over"])
before being evaluated and will therefore issue a syntax error.
The formula if(height < ${max_height}, "OK", ${msgs.over})
will throw a syntax error as there is no variable named “msgs.over”.
The formula if(height < ${max_height}, "OK", parseJson('${msgs}')["over"])
is converted into if(height < 500, "OK", parseJson('{"ok":"OK", "over":"too tall"}')["over"])
before being evaluated and will be evaluated as if(height < 500.0, "OK", "too tall")
.
Array functions¶
arrayContains(array a, item) boolean
Returns whether the array
a
containsitem
arrayContains([1, 2, 3], 5)
returnsfalse
arrayDedup(array a) array
Returns array a with duplicates removed
arrayDedup([0, 1, 0, 7])
returns[0,1,7]
arrayIndexOf(array a, item) int
Returns the index (starting from 0) of
item
in the arraya
, or -1 if the item is not found in the array.
arrayIndexOf([1 , 0], 2)
returns-1
arrayLen(array a) int
Returns the length of the array
a
arrayLen([1,2,3])
returns3
arrayReverse(array a) array
Reverses array a
arrayReverse([1,2,3])
returns[3,2,1]
arraySort(array a) array
Sorts array
a
arraySort(["b", "1", "a"])
returns["1","a","b"]
get(array a, from index, [to index]) Output depends on arguments
Returns
o[from, to]
. Omitting the optional number to will return what is at the first index specified.
get([1,2,3,4,5], 1, 4)
returns[2,3,4]
get([1,2,3,4,5], 1)
returns2
join(array a, string sep) string
Returns the string obtained by joining the elements of array
a
with the separatorsep
For example, the expression
join(date_elements, '-')
on a columndate_elements
produces:
date_elements
output
[2007, 7, 15]
2007-7-15
[2016, 1, 8]
2016-1-8
objectKeys(object o) array
Returns the keys of an object as an array
objectValues(object o) array
Returns the values of an object as an array
slice(object o, from index, [to index]) Output depends on arguments
If
o
is an array, returns an array containing the items betweenfrom
andto
(excluded). Ifo
is a string, returns the part of the string betweenfrom
andto
(excluded). Omitting the optional number to will return the remainder of the string or array. Note:slice
andsubstring
are identical.
slice([1,2,3,4,5], 1, 4)
returns[2,3,4]
slice('hello', 1)
returns'ello'
substring(object o, from index, [to index]) Output depends on arguments
If
o
is a string, returns the part of the string betweenfrom
andto
(excluded). Ifo
is an array, returns an array containing the items betweenfrom
andto
(excluded). Omitting the optional number to will return the remainder of the string or array. Note:substring
andslice
are identical.
substring('0123456', 2, 5)
returns'234'
substring([1,2,3,4,5,6], 3)
returns[4,5,6]
Boolean functions¶
asBool(o) boolean
Returns
o
converted to a boolean.
asBool(0)
returnsfalse
asBool('yes')
returnstrue
isFalse(boolean b) boolean
Returns whether
b
is false.
isFalse('false')
returnsfalse
(the argument is a string, not a boolean, so this method returnsfalse
)
isFalse(asBool(0))
returnstrue
isTrue(boolean b) boolean
Returns whether
b
is true.
isTrue('true')
returnsfalse
(the argument is a string, not a boolean, so this method returnsfalse
)
isTrue(asBool(1))
returnstrue
and(boolean a, boolean b) boolean
Evaluates logical AND (conjunction) on several statements. All conditions need to be fulfilled for the function to return true. Is equivalent to
a && b
.
and(1==1, 3<4)
returnstrue
not(boolean b) boolean
Evaluates logical NOT (negation) of a statement, returning the opposite of
b
.
not(1!=1)
returnsfalse
or(boolean a, boolean b) boolean
Evaluates logical OR (disjunction) on several statements. At least one condition needs to be fulfilled for the function to return true. Is equivalent to
a || b
.
or(1==1, 3>4)
returnstrue
Date functions¶
asDate(object o, [format1, …]) date
Returns
o
converted to a date, based on the format(s) provided. If your column contains more than one date format, you can give an ordered list of possible formats. If you don’t give a format, ISO-8601 is used by default (see more examples of the syntax for ISO-8601). Here is some example syntax:
‘y’ (Year)
‘M’ (month in year)
‘w’ (week in year)
‘W’ (week in month)
‘d’ (day in month)
‘D’ (day in year)
‘E’ (day name in week)
‘e’ (day number in week. Monday =1)
‘H’ (hour in day, 0-23)
‘m’ (minute)
‘s’ (second)
S (millisecond)
asDate('2020-04-15', 'yyyy-MM-dd')
returns2020-04-15T00:00:00.000Z
datePart(date d, string part, [timezone]) Output depends on arguments
Extracts a date component from
date d
. Returned date components are always in local timezone–unless the optionaltimezone
is included–and in English (see a complete list of supported timezones.). The available components are:
‘years’ (or ‘year’)
‘isoWeekYear’ gives the ISO week-numbering year
‘months’ (or ‘month’)
‘weekOfYear’ gives the week of the year (as per server locale’s convention)
‘isoWeekOfYear’ gives the ISO week number of the year
‘weeks’ (or ‘week’ or ‘w’)
‘isoWeekOfMonth’ gives the ISO week number of the month
‘days’ (or ‘day’ or ‘d’)
‘weekday’ (the name of the day of the week, capitalized)
‘dayofweek’ (day of the week as a number. Monday=1, Tuesday=2, … Sunday=7)
‘hours’ (or ‘hour’ or ‘h’)
‘minutes’ (or ‘minute’ or ‘min’)
‘seconds’ (or ‘second’ or ‘s’)
‘unixTime’ (number of seconds since epoch)
‘millisecond’ (or ‘millisecond’, or ‘ms’)
‘time’ (number of milliseconds since epoch)
datePart('2020-04-15T00:00:00.000Z', 'weekday')
returns"Wednesday"
datePart('2020-01-01T00:00:00.000Z', 'year', '-08:00')
returns2019
diff(date d1, date d2, [string unit]) number
Returns the difference between two dates expressed in given time units. The default unit of time is
days
. The available units are:
‘years’ (or ‘year’)
‘months’ (or ‘month’)
‘weeks’ (or ‘week’ or ‘w’)
‘days’ (or ‘day’ or ‘d’)
‘hours’ (or ‘hour’ or ‘h’)
‘minutes’ (or ‘minute’ or ‘min’)
‘seconds’ (or ‘second’ or ‘s’)
diff('2019-03-15T00:00:00.000Z', '2020-04-15T00:00:00.000Z', 'month')
returns-13
inc(date d, number value, string unit) date
Returns
d
incremented or decremented by the number value in the specified unit of time; a decimal number value will be truncated to an integer. The available units are:
‘years’ (or ‘year’)
‘months’ (or ‘month’)
‘weeks’ (or ‘week’ or ‘w’)
‘days’ (or ‘day’ or ‘d’)
‘hours’ (or ‘hour’ or ‘h’)
‘minutes’ (or ‘minute’ or ‘min’)
‘seconds’ (or ‘second’ or ‘s’)
inc('2020-04-15T00:00:00.000Z', -3, 'week')
returns2020-03-25T00:00:00.000Z
now() date
Returns the current time
trunc(date d, string unit) date
Returns
d
truncated to the unit specified. The available units are:
‘years’ (or ‘year’)
‘months’ (or ‘month’)
‘weeks’ (or ‘week’ or ‘w’)
‘days’ (or ‘day’ or ‘d’)
‘hours’ (or ‘hour’ or ‘h’)
‘minutes’ (or ‘minute’ or ‘min’)
‘seconds’ (or ‘second’ or ‘s’)
trunc('2020-04-03T07:47:45.245Z', 'month')
returns2020-04-01T00:00:00.000Z
Math functions¶
abs(number d) number
Returns the absolute value of a number
abs(-7)
returns7.0
acos(number d) number
Returns the arc cosine of an angle, in the range 0 through PI
acos(-1)
returns3.141592654
asin(number d) number
Returns the arc sine of an angle in the range of -PI/2 through PI/2
asin(1)
returns1.570796327
atan(number d) number
Returns the arc tangent of an angle in the range of -PI/2 through PI/2
atan2(number x, number y) number theta
Converts rectangular coordinates (x, y) to polar (r, theta)
avg(number n1, number n2, …) number
Returns the average of a series of numbers, including numeric strings (e.g., “4.5”), while ignoring empty elements.
avg(1, "2", 3, "")
returns2.0
ceil(number n) number
Returns the ceiling of a number
ceil(4.67)
returns5
ceil(-4.67)
returns-4
combin(number n, number k) number
Returns the number of combinations for n elements divided into groups of k, n!/k!(n-k)!
combin(6, 2)
returns15
cos(number d) number
Returns the trigonometric cosine of an angle
cos(0)
returns1.0
cosh(number d) number
Returns the hyperbolic cosine of a value
cosh(0)
returns1.0
dec2hex(long) string
Returns an hexadecimal representation of the input number
dec2hex(10)
returnsa
dec2hex(256)
returns100
degrees(number d) number
Converts an angle from radians to degrees.
degrees(PI())
returns180.0
even(number n) number
Rounds the number up to the nearest even integer
even(3)
returns4.0
even(2.3)
returns4.0
even(-2.3)
returns-2.0
even(-3)
returns-2.0
exp(number n) number
Returns the exponential of a number
exp(2)
returns7.38905609893065
fact(number i) number
Returns the factorial of a number i
fact(4)
returns24
factn(number i, number d) number
Returns the factorial of a number i, omitting every dth item from the multiplication
factn(7, 3)
returns28
floor(number d) number
Returns the floor of a number
floor(4.7)
returns4
gcd(number d, number e) number
Returns the greatest common denominator of the two numbers
gcd(21, 28)
returns7.0
hash(string) long
Returns a 64 bits numerical hash of the input (not crypto-secure)
hash("goo")
returns2774880816139997119
hex2dec(string) long
Returns a decimal representation of an hexadecimal string
hex2dec("a")
returns10
lcm(number d, number e) number
Returns the least common multiple of the two numbers
lcm(20, 42)
returns420.0
ln(number n) number
Returns the natural log of a number
ln(exp(1))
returns1.0
ln(2.72)
returns1.000631880307906
log(number n) number
Returns the base 10 log of a number
log(100)
returns2.0
max(a, b, …) Output depends on arguments
Returns the greater of two or more numbers, two or more strings, or the more recent of two or more dates
max(-1, 3)
returns3
max('luke', 'leia')
returns'luke'
max('2020-01-01T00:00:00.000Z', '2021-01-01T00:00:00.000Z')
returns2021-01-01T00:00:00.000Z
min(a, b, …) number
Returns the smaller of two or more numbers, two or more strings, or older of two or more dates
min(-1, 3)
returns-1.0
min('luke', 'leia')
returns'leia'
min('2020-01-01T00:00:00.000Z', '2021-01-01T00:00:00.000Z')
returns2020-01-01T00:00:00.000Z
mod(number a, number b) number
Returns a modulus b
mod(5, 3)
returns2
mod(7.8, 3)
returns1
multinomial(number d1, number d2 …) number
Returns the multinomial of a series of numbers,
(sum(d1, d2, d3,...))! / d1! * d2! * d3! ...
multinomial(1, 1, 2, 1)
returns60
odd(number d) number
Rounds the number up to the nearest odd integer
odd(5.3)
returns7.0
odd(-5.3)
returns-5.0
PI() number
Returns the value of PI
pow(number a, number b) number
Returns
a
to the power ofb
pow(2, -1)
returns0.5
quotient(number numerator, number denominator) number
Returns the integer portion of a division
quotient(7, 2)
returns3.0
radians(number d) number
Converts an angle in degrees to radians
radians(180)
returns3.141592653589793
rand([int min], [int max]) double or int
Without arguments, returns a random float between 0 and 1. With
min
andmax
arguments, returns a random integer betweenmin
(inclusive) andmax
(exclusive)
round(number n) number
Returns the rounding of number to the nearest integer
round(3.5)
returns4.0
round(-3.5)
returns-3.0
sin(number d) number
Returns the trigonometric sine of an angle
sin(radians(90))
returns1.0
sinh(number d) number
Returns the hyperbolic sine of an angle
sinh(0)
returns0.0
sqrt(number n) number
Returns the square root of number n.
sqrt(81)
returns9.0
sum(array a) number
Sums the numbers of an array. Skips non-number elements from the array.
sum([1, 2, "string", 3])
returns6.0
tan(number d) number
Returns the trigonometric tangent of an angle
tan(0)
returns0.0
tanh(number d) number
Returns the hyperbolic tangent of a value
toNumber(o) number
Returns o converted to a number
toNumber("5")
returns5
Object functions¶
get(object o, string field, [string defaultValue]) Output depends on arguments
Returns
o[from]
. Ifo[from]
is empty anddefault
has been provided, returnsdefault
.
get(parseJson('{"name":"joe", "age":42}'), "age")
returns42
get(parseJson('{"name":"joe"'), "age", 37)
returns37
hasField(object o, string name) boolean
Returns whether
o
has the fieldname
hasField(parseJson('{""name"": ""joe"", ""age"": 42 }'), ""age"")
returnstrue
htmlAttr(Element e, String s) string
Selects a value from an attribute on an Html Element
htmlText(Element e) string
Selects the text from within an element (including all child elements)
htmlAttr(select(parseHtml('<div><a href="www.dataiku.com"></div>'), '[href]')[0], "href")
returns"www.dataiku.com"
innerHtml(Element e) string
The innerHtml of an HTML element
jsonize(value) JSON literal value
Quotes a value as a JSON literal value
objectDel(object o, key, [key…]) object
Removes one or several keys from an object and returns it. The keys must not be null
objectDel(parseJson('{"firstName": "birdie", "company": "Dataiku" }'), 'firstName')
returns{"company":"Dataiku"}
objectNew(k1, v1, k2, v2, …) object
Creates a new object, optionally pre-filled with key/values. Must get an even number of arguments, as successive key-value pairs. Giving 0 arguments is possible and will return an empty object (you can use objectPut to add to it)
objectNew("firstName", "birdie", "company", "Dataiku")
returns{"firstName": "birdie", "company": "Dataiku" }
objectPut(object o, key, value) object
Adds a key/value pair to an object and returns it. The key must not be null
ownText(Element e) string
Gets the text owned by this HTML element only; does not get the combined text of all children.
parseHtml(string s) HTML object
Parses a string as HTML
parseJson(string s) object or array
Parses a JSON string as an object or array
select(Element e, String s) HTML Elements
Selects an element from an HTML elementn using selector syntax
type(object o) string
Returns the type of
o
type(3.126)
returnsnumber
String functions¶
chomp(string s string tail) string
Removes
tail
from the end ofs
if it’s there, otherwise leaves it alone.
chomp("foobar", "bar")
returns"foo"
coalesce(value1, value2, …) Output depends on arguments
Returns the first non-empty value.
coalesce("", "foo")
returns"foo"
concat((object a1, [object a2, …])) Output depends on arguments
Returns a string of concatenated values. If one of your columns contains leading 0 (zeros), wrap it in a
strval('column_name')
to preserve them.
concat("Birds", " ", "fly")
returns"Birds fly"
concat(1, 2, 3)
returns"123"
contains(string s, string frag) boolean
Returns whether
s
containsfrag
contains("hello world ", "llo")
returnstrue
endsWith(string s, string tail) boolean
Returns whether
s
ends withtail
endsWith("hello world", "rld")
returnstrue
escape(string s, string mode) string
Escapes a
s
using the escaping mode specified. Supported modes: ‘html’, ‘xml’, ‘csv’, ‘url’, javascript’. Note that the escape uses the standard java URLEncoder for urls and StringEscapeUtils for others
format(string format, object… args) string
Formats a string using printf-like formatting using the Java Formatter syntax.
'%4d-%02d'.format(2004,2)
returns'2004-02'
fromBase64(string s, [string charset]) string
Returns the string whose Base64 representation is given. By default, the string is read using the UTF-8 charset.
fromBase64('SA==')
returns"H"
get(string s, from index, [to index]) Output depends on arguments
Returns
s.substring(from, to)
. Omitting the optional number to will return what is at the first index specified.
get('Oh no, kittens!', 0, 5)
returns'Oh no'
get('Oh no, kittens!', 1)
returns'h'
indexOf(string s, string sub) number
Returns the index of the first occurrence of
sub
ins
. Index begins at 0. Returns -1 if there is no such occurrence.
indexOf("hello world", "world")
returns6
lastIndexOf(string s, string sub) number
Returns the index of the last occurrence of
sub
ins
. Index begins at 0. Returns -1 if there is no such occurrence.
lastIndexOf("hello world", "o")
returns7
length(array or string o) number
Returns the length of o
length("hello world")
returns11
length([4,5,6])
returns3
match(string a, string or regexp) array of strings
Returns an array of the matching groups found in
s
. Groups are designated by () within the specified string or regular expression.
match('hello world', 'he(.*)wo(rl)d')
returns["llo ","rl"]
md5(string s) string
Returns the MD5 hash of a string
md5('hi')
returns"49f68a5c8493ec2c0bf489821c21fc3b"
partition(string s, string or regex frag, [boolean omitFragment]) array
Returns an array of strings
[a,frag,b]
wherea
is the part before the first occurrence offrag
ins
andb
is the part after the occurrence. IfomitFragment
is true,frag
is not returned in the array.
partition(""hello"", ""he"")
returns["""",""he"",""llo""]
partition(""hello"", ""he"", asBool(1))
returns["""",""llo""]
partition(""hello"", /.l/)"
returns[""h"",""el"",""lo""]
replace(string s, string or regex f, string replacement) string
Replaces all occurrences of substring / regex
f
found in strings
with thereplacement
string.
replace('hello world', 'hel', 'a')
returns"alo world"
replace(""Oh my!"", /\w/, 'x')
returns"xx xx!"
replaceChars(string s, string f, string r) string
Returns the string obtained by replacing all character in
s
that matchf
with the character inr
at that same position. The function can be used to delete characters by replacing them with nothing.
replaceChars('abcba', 'bc', 'BC')
returnsaBCBa
replaceChars('abcba', 'bc', 'Z')
returnsaZZa
rpartition(string s, string or regex frag, [boolean omitFragment]) array
Returns an array of strings
[a,frag,b]
wherea
is the part before the last occurrence offrag
ins
andb
is the part after the occurrence. IfomitFragment
is true,frag
is not returned in the array.
rpartition("hello world", "o")
returns["hello w","o","rld"]
sha1(string s) string
Returns the SHA-1 hash of a string
sha1('goo')
returns3f95edc0399d06d4b84e7811dd79272c69c8ed3a
sha256(string s) string
Returns the SHA-256 hash of a string
sha512(string s) string
Returns the SHA-512 hash of a string
split(string s, string or regex sep, [boolean preserveAllTokens]) array
Returns the array of strings obtained by splitting
s
with separatorsep
. IfpreserveAllTokens
is true, then empty segments are preserved.
split(""hello"", ""he"")
returns[""llo""]
split(""hello"", ""he"", asBool(1))
returns["""",""llo""]
split(""hello"", /.l/)
returns[""h"",""lo""]
split("Hello world!", /\s+/)
returns["Hello","world!"]
splitByCharType(string s) array
Returns an array of strings obtained by splitting s grouping consecutive chars by their unicode type
splitByCharType("Hello_world 101!?!")
returns["H","ello","_","world"," ","101","!?!"]
splitByLengths(string s, number length1, […]) array
Returns an array of strings obtained by splitting
s
into substrings, each with its own specified length.
splitByLengths("Hello world", 1, 2, 3, 4)
returns["H","el","lo ","worl"]
startsWith(string s, string sub) boolean
Returns whether
s
starts withsub
startsWith("Hello world", "He")
returnstrue
strip(string s) string
Returns copy of the string, with leading and trailing whitespace omitted. This function is the same as trim.
strip(" Hello World ")
returns"Hello World"
toBase64(string s, [string charset]) string
Returns the Base64 representation of a string. By default, the string is written using the UTF-8 charset.
toBase64("H")
returns"SA=="
toLowercase(string s) string
Converts a string to lowercase
toLowercase("HELLO WORLD")
returns"hello world"
toString(o, string format (optional)) string
Returns o converted to a string
toString(5)
returns"5"
toTitlecase(string s) string
Converts a string to titlecase
toTitlecase("hello world")
returns"Hello World"
toUppercase(string s) string
Converts a string to uppercase
toUppercase("hello world")
returns"HELLO WORLD"
trim(string s) string
Returns copy of the string, with leading and trailing whitespace omitted.
trim(" Hello World ")
returns"Hello World"
unescape(string s, string mode) string
Unescapes all escaped parts of the string depending on the given escaping mode. Available modes: ‘html’, ‘xml’, ‘csv’, ‘url’, ‘javascript’
unicode(string s) string
Returns the input string as an array of the unicode codepoints (numbers)
unicode("Hi!")
returns[72,105,33]
unicodeType(string s) string
Returns an array of strings describing each character of the input string in their full unicode notation
unicodeType("y 0H?")
returns["lowercase letter","space separator","decimal digit number","uppercase letter","other punctuation"]
uuid() string
Returns a type 4 (pseudo randomly generated) UUID. The UUID is generated using a cryptographically strong pseudo random number generator.
uuid()
returns7af14645-bcd9-4af2-bca8-390fc67e9a0b
Geometry functions¶
geoBuffer(geometry geom, double distance, [int quadrantSegment]) string
Returns a geometry that represents all points whose distance from this geometry is less than or equal to
distance
. Thedistance
parameter can be either a value or a column name from the dataset. The distance unit depends on the CRS of the given geometries (e.g., degrees for SRID=4326). A negative distance can be used with polygons, which will shrink the polygon rather than expand it. Specify an optionalquadrantSegment
value to set the number of segments to approximate a quarter circle (the default is 8)
geoContains(geometry geomA, geometry geomB) bool
Compute the boolean {geomA contains geomB} for two input geometries A and B.
The implementation of this function depends on the engine it is run on but its definition is standard: “Geometry A contains Geometry B if and only if no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A.” (https://postgis.net/docs/ST_Contains.html)
A polygon A does not contain a point B if B is exactly on its boundary.
geoContains("POLYGON((0 0,3 0,0 3,0 0))", "POINT(1 1)")
returnstrue
geoContains("POLYGON((0 0,3 0,0 3,0 0))", geometry)
returns the evaluation of the boolean predicate for each input geometry row of dataset column geometry
geoEnvelope(geometry geom) string
Returns the envelope of a geometry, i.e., the minimum bounding box containing this geometry.
geoMakeValid(geometry geom) string
Returns a valid representation of an invalid geometry. Valid geometries remain unchanged.
Note
This function can return slightly different results whether the recipe engine is PostgreSQL or a local stream. If the method is unable to make a geometry valid, it returns an empty cell.
geoSimplify(geometry geom, double toleranceDistance) string
Returns a simpler geometry based on the Douglas-Peucker algorithm, with respect to a non-negative
toleranceDistance
parameter. The simplification of the geometry will result in a geometry with fewer vertices. A vertex is removed only if the distance between this vertex and the edge resulting from removing this vertex is within the specifiedtoleranceDistance
. ThetoleranceDistance
parameter can be either a value or a column name from the dataset. The toleranceDistance unit depends on the CRS of the given geometries (e.g., degrees for SRID=4326).
- geoWithin(geometry geomA, geometry geomB) bool
Returns a result opposite to geoContains (see above). geomA is within geomB if geomB contains geomA.
- geoDistance(geopoint geoPointA, geopoint geoPointB, string unit) number
Compute the distance between two geographical points. The available units are:
KILOMETERS
,MILES
.
Value access functions¶
numval(object o, [number offset]) number
Returns the numerical value of a column. If the value is not numerical, an empty value is returned. Specify an optional Offset argument to return the value of a previous row.
Note
Use numval when your column name contains spaces or periods to ensure DSS parses it properly as in
numval("my.column")
ornumval("my column")
. Remember to always write the column name in “quotes” when using numval.
strval(object o, [string defaultValue], [number offset]) string
Returns the string value of a column. If the value is not a string, an empty value is returned. Specify an optional defaultValue to replace empty cells with defaultValue. Specify an optional Offset argument to return the value of a previous row.
Note
Use strval when your column name contains spaces or periods to ensure DSS parses it properly as in
strval("my.column")
orstrval("my column")
. Remember to always write the column name in “quotes” when using strval.
val(object o, [string defaultValue], [number offset]) Output depends on argument
Returns the value of a column. Specify an optional defaultValue to replace empty cells with defaultValue. The type of the resulting column is auto-detected (use strval or numval to force it). Specify an optional Offset argument to return the value of a previous row.
Note
Use val when your column name contains spaces or periods to ensure DSS parses it properly as in
val("my.column")
orval("my column")
. Remember to always write the column name in “quotes” when using val.
Offset argument¶
Note
The offset argument is only available in the Prepare recipe and only with the DSS engine.
Set offset to
1
to retrieve a value from the previous rowSet offset to
2
to retrieve a value from two rows before the current row…
Set offset to
0
to retrieve a value from the current rowAn error is returned if offset is set to a negative value
Control structures¶
Control structures allow you to perform advanced operations.
Beware: control structures cannot use object notation !
filter(array a, variable v, expression e) array
Evaluates expression a to an array. Then for each array element, binds its value to variable name v, evaluates expression e, and pushes the result onto the result array if the result is truish.
filter(['aa', 'bb', 'cc', 'ab', 'bc'], item, item.startsWith('a'))
returns["aa","ab"]
Note
filter(myarray, v, v < 2)
is equivalent to this Python syntax:[v for v in myarray if v < 2]
and returns the array with only the elements below 2
forEach(array a, variable v, expression e) array
Evaluates expression a to an array. Then for each array element, binds its value to variable name v, evaluates expression e, and pushes the result onto the result array.
forEach(['aa', 'bb', 'cc', 'ab', 'bc'], item, item.startsWith('a'))
returns[true,false,false,true,false]
Note
forEach(myarray, v, v + 2)
is equivalent to this Python syntax:[v+2 for v in myarray]
forEachIndex(array a, variable i, variable v, expression e) array
Evaluates expression a to an array. Then for each array element, binds its value to variable name v and its index to variable name i, evaluates expression e, and pushes the result onto the result array.
forEachIndex(['aa', 'bb', 'cc', 'ab', 'bc'], index, item, index < 2 || item.startsWith('a'))
returns[true,true,false,true,false]
Note
forEachIndex(myarray, i, v, v + i)
is equivalent to this Javascript syntax:myarray.map(function(v, i) { return v+i ;})
forRange(from, to, step, variable v, expression e) array
Iterates, starting at
from
, incrementing bystep
each time while less thanto
(or while more thanto
ifstep < 0
). At each iteration, binds the variablev
to the iteration value, evaluates expressione
, and pushes the result onto the result array.
forRange(0, 21, 3, v, v)
returns[0,3,6,9,12,15,18]
forRange(0, 21, 3, v, v*5)
returns[0,15,30,45,60,75,90]
Note
forRange(0, 100, 3, v, v * 2)
is equivalent to this Python syntax:[v * 2 for v in xrange(0, 100, 3)]
and returns [0, 6, 12, 18 …. 198]
if(boolean, expression_true, expression_false) Output depends on expression
Evaluates to
expression_true
if the condition istrue
, toexpression_false
otherwise
if(3>2, "I heart #s", "my brain hurts")
returns"I heart #s"
objectFilter(expression a, variable k, variable v, expression test) object
Evaluates expression a to an object. Then for each element (k, v) of this object, binds its key to variable name k, its value to variable name v, evaluates expression test which should return a boolean. If the boolean is true, pushes (k, v) onto the result object.
objectFilter('{"fName":"Joe", "lName":"Smith", "age":42 }', k, v, k.contains("Name"))
returns{"fName":"Joe","lName":"Smith"}
switch(expression_to_match, match_1, return_1, match_2, return_2, …, [return_default]) Output depends on expression
Compares
expression_to_match
tomatch_1
, then tomatch_2
, etc. When match is found, returns the correspondingreturn_i
. If no match is found, returnsreturn_default
, or nothing if there is noreturn_default
.
switch("Paris", "Paris", 1, "New York", 2, 0)
returns1
switch("Berlin", "Paris", "1", "New York", "2", "Other")
returns"Other"
switch(col1, col2+col3, "sum", "Other")
returns"sum" if value of column col1 equals value of col2+col3, "Other" otherwise
switch("true", col1 >= 10, "good", col1 >= 5 "okay", "bad")
returns"good", "okay", or "bad" depending on col1 value
with(expression o, variable v, expression e) Output depends on expression
Evaluates expression o and binds its value to variable name v. Then evaluates expression e and returns that result. The
with
control allows you to “split” a very big expression into more manageable chunks. It also makes reusing the result of a complex computation easier to read and faster to process.
with("european union".split(" "), a, a.length())
returns2
with("european union".split(" "), a, forEach(a, v, v.length()))
returns[8,5]
Tests¶
Beware: tests cannot use object notation
isBlank(expression o) boolean
Returns whether
o
is null or an empty string. IsBlank() returns true when value is null or empty (“”). It does not consider a string made of blank spaces as blank; it considers an expression with an error as blank.
isBlank("")
returnstrue
isBlank(" ")
returnsfalse
isBlank(abs("a"))
returnstrue
isError(expression o) boolean
Returns whether
o
is an error
isError(abs("a"))
returnstrue
isNonBlank(expression o) boolean
Returns whether
o
is non-null or a not-empty string. IsNonBlank() returns false when value is null or empty (“”). It considers a string made of blank spaces as NonBlank; it considers an expression with an error as blank.
isNonBlank("")
returnsfalse
isNonBlank(" ")
returnstrue
isNonBlank(abs("a"))
returnsfalse
isNotNull(expression o) boolean
Returns whether
o
is not null. IsNotNull() returns false when value is null or empty (“”). It does not consider a string made of blank spaces as not null, and considers an expression with an error as not null.
isNotNull("")
returnsfalse
isNotNull(" ")
returnsfalse
isNotNull(abs(a"))
returnstrue
isNull(expression o) boolean
Returns whether
o
is null or an empty string. IsNull() returns true when value is null or empty (“”). It considers a string made of blank spaces as null, and does not consider an expression with an error as null.
isNull("")
returnstrue
isNull(" ")
returnstrue
isNull(abs("a"))
returnsfalse
isNumeric(expression o) boolean
Returns whether
o
can represent a number