# Formula language¶

DSS includes a language to write formulas, much like a spreadsheet.

Formulas can be used:

• In data preparation, to create new columns, filter rows or flag rows

• More generally, to filter rows in many places of DSS:

• In the Filtering recipe, to filter rows

• In Machine Learning, to define the extracts to use for train and test set

• In the Python and Javascript APIs, to obtain partial extracts from the datasets

• In the Public API, to obtain partial extracts from the datasets

• In the grouping, window, join and stack recipes, to perform pre and post filtering

## Basic usage¶

Formulas define an expression that applies row per row.

Assuming that you have a dataset with columns N1 (numeric), N2 (numeric) and S (string) , here are a few example formulas:

• `2 + 2`

• `N1 + N2`

• `min(N1, N2)` # Returns the smallest of N1 and N2

• `replace(S, 'old', 'new')` # Returns the value of S with ‘old’ replaced by ‘new’

• `if (N1 > N2, 'big', 'small')` # Returns big if N1 > N2, small otherwise

Note

We also have a How-To that lists common use cases with examples.

In almost all formulas, you’ll need to read the values of the columns for the current row.

When the column has a “simple” name (i.e: starting by a letter, contains only letters, numbers and underscores), you just need to use the name of the column in the formula: `N1 + 4`

For other cases, you can use:

• `val("column with spaces")` : returns the value of “column with spaces”

• `strval("column with spaces")` : returns the value of “column with spaces” as a string

• `numval("column with spaces")` : returns the value of “column with spaces” as a number

Warning

`mycolumn` evaluates to the value of the column named mycolumn

Thus, `strval(mycolumn)` would not work because it would try to read as a string the value of the column whose name is the value of the mycolumn column.

Instead, use `strval("mycolumn")`.

You may also need to use `strval` to avoid DSS automatically typing a numerical variable. See Variables typing and autotyping for more details.

### Accessing values from previous rows¶

The `val`, `strval`, and `numval` functions optionally accept an offset argument. This allows you to access values from previous rows in the dataset. See Value access functions for more details.

Note

The offset argument is only available in the Prepare recipe and only with the DSS engine.

## Variables typing and autotyping¶

Variables in the formula language can have one of the following types: string, integer, decimal, array, object, boolean, date

Regardless of the schema of the dataset, or the meanings of the columns (in the case of working on dynamically-generated columns), the following rules are applied:

• If column values are “standard” decimal, they are automatically parsed to decimal

• If column values are “standard” integer, they are automatically parsed to integer

• Else, they are kept as string

Columns containing dates, arrays, objects or booleans are not automatically converted to the matching Formula type. However:

• all functions that require an array will automatically attempt to convert a string input to array (using the regular DSS JSON syntax)

• all functions that require a date will automatically attempt to convert a string input to date (using the ISO-8601 format)

In other words, if the column “begin_date” is a date (and thus contains properly-formatted ISO-8601):

• `type(begin_date)` returns “string”

• `inc(begin_date, 2, "days")` works as expected because the inc function performed automatic conversion to date

### Avoiding auto-typing¶

There are some cases where auto-typing is not a good thing. For example, a value like `012345` would be automatically converted to the number `12345`.

• To avoid autotyping to numerical, use the `strval("mycolumn")` function (see above)

• To manually obtain an array or object, use the `parseJson` function

• To manually obtain a number, use the `numval("mycolumn")` function

• To manually obtain a date, use the `asDate` function

• To manually obtain a boolean, use the `asBool` function

## Boolean values¶

The formula language uses “True” and “False”, with quotes, as boolean values for true and false.

## Operators¶

The formula language supports the classical arithmetic operators:

• Regular math operators: `+`, `-`, `*`, `/`

• Comparison operators (evaluate to booleans): `>`, `>=`, `<`, `<=`, `==`, `!=`

• Arithmetic operators: `//` (integer division) and `%` (modulo)

• Boolean operators: `&&`, `||`

• The `+` operator also performs string concatenation.

• The comparison operators (and only them) can operate on dates. For dates arithmetic, see the `diff` and `inc` functions.

## Array and object operations¶

Formula support accessing array elements and object keys using the traditional Python/Javascript syntax:

• `array[0]`

• `object["key"]`

• `object.key` (only valid if ‘key’ is a valid identifier, i.e. matches `[A-Za-z0-9_]*`)

Note: this requires that you actually have an array or object. You might need to use the `parseJson` function (see above paragraph about typing)

## Object notations¶

For all functions, you can use them the “regular” way: `replace(str, 'a', 'b')` or in the “object” way: `str.replace('a', 'b')`

In object notation, the first argument to the function is replaced by its ‘context’.

For example, the two syntaxes are equivalent:

```length(trim(replace(foo, 'a', 'b')))

foo.replace('a', 'b').trim().length()
```

## DSS variables¶

You can retrieve the value of DSS variables via the syntax `\${variable_name}` or `variables["variable_name"]`. Both syntaxes have slightly different behaviors.

### variables[“variable_name”]¶

DSS evaluates `variables["variable_name"]` as JSON during the formula evaluation. If there is no variable defined with the supplied name, an empty cell is returned.

For example, given the following global variables

```{
"max_height": 500,
"warning_msg": "too tall",
"msgs": {"ok":"OK", "over":"too tall"}
}
```

The formula `if(height < variables["max_height"], "OK", variables["warning_msg"])` is evaluated as `if(height < 500.0, "OK", "too tall")`. Note that the max_height is evaluated as 500.0 as JSON consider all numbers to be decimal.

The formula `if(height < variables["max_height"], "OK", variables["msgs"]["over"])` is evaluated as `if(height < 500.0, "OK", "too tall")`.

### \${variable_name}¶

During processing DSS replaces the `\${variable_name}` placeholder with its verbatim content and then evaluate the formula. If there is no variable defined with the supplied name, an error is returned.

For example, given the following global variables:

```{
"max_height": 500,
"warning_msg": "too tall",
"msgs": {"ok":"OK", "over":"too tall"}
}
```

The formula `if(height < \${max_height}, "OK", "\${warning_msg}")` is converted into `if(height < 500, "OK", "too tall")` before being evaluated. Note that quotes have been added around \${warning_msg} to output an actual string.

The formula `if(height < \${max_height}, "OK", \${warning_msg})` is converted into `if(height < 500, "OK", too tall)` before being evaluated and will therefore issue a syntax error.

The formula `if(height < \${max_height}, "OK", \${msgs}["over"])` is converted into `if(height < 500, "OK", {"ok":"OK", "over":"too tall"}["over"])` before being evaluated and will therefore issue a syntax error.

The formula `if(height < \${max_height}, "OK", \${msgs.over})` will throw a syntax error as there is no variable named “msgs.over”.

The formula `if(height < \${max_height}, "OK", parseJson('\${msgs}')["over"])` is converted into `if(height < 500, "OK", parseJson('{"ok":"OK", "over":"too tall"}')["over"])` before being evaluated and will be evaluated as `if(height < 500.0, "OK", "too tall")`.

## Array functions¶

arrayContains(array a, item) boolean

Returns whether the array `a` contains `item`

`arrayContains([1, 2, 3], 5)` returns `false`

arrayDedup(array a) array

Returns array a with duplicates removed

`arrayDedup([0, 1, 0, 7])` returns `[0,1,7]`

arrayIndexOf(array a, item) int

Returns the index (starting from 0) of `item` in the array `a`, or -1 if the item is not found in the array.

`arrayIndexOf([1 , 0],  2)` returns `-1`

arrayLen(array a) int

Returns the length of the array `a`

`arrayLen([1,2,3])` returns `3`

arrayReverse(array a) array

Reverses array a

`arrayReverse([1,2,3])` returns `[3,2,1]`

arraySort(array a) array

Sorts array `a`

`arraySort(["b", "1", "a"])` returns `["1","a","b"]`

get(array a, from index, [to index]) Output depends on arguments

Returns `o[from, to]`. Omitting the optional number to will return what is at the first index specified.

`get([1,2,3,4,5], 1, 4)` returns `[2,3,4]`

`get([1,2,3,4,5], 1)` returns `2`

join(array a, string sep) string

Returns the string obtained by joining the elements of array `a` with the separator `sep`

For example, the expression `join(date_elements, '-')` on a column `date_elements` produces:

date_elements

output

[2007, 7, 15]

2007-7-15

[2016, 1, 8]

2016-1-8

objectKeys(object o) array

Returns the keys of an object as an array

objectValues(object o) array

Returns the values of an object as an array

slice(object o, from index, [to index]) Output depends on arguments

If `o` is an array, returns an array containing the items between `from` and `to` (excluded). If `o` is a string, returns the part of the string between `from` and `to` (excluded). Omitting the optional number to will return the remainder of the string or array. Note: `slice` and `substring` are identical.

`slice([1,2,3,4,5], 1, 4)` returns `[2,3,4]`

`slice('hello', 1)` returns `'ello'`

substring(object o, from index, [to index]) Output depends on arguments

If `o` is a string, returns the part of the string between `from` and `to` (excluded). If `o` is an array, returns an array containing the items between `from` and `to` (excluded). Omitting the optional number to will return the remainder of the string or array. Note: `substring` and `slice` are identical.

`substring('0123456', 2, 5)` returns `'234'`

`substring([1,2,3,4,5,6], 3)` returns `[4,5,6]`

## Boolean functions¶

asBool(o) boolean

Returns `o` converted to a boolean.

`asBool(0)` returns `false`

`asBool('yes')` returns `true`

isFalse(boolean b) boolean

Returns whether `b` is false.

`isFalse('false')` returns `false` (the argument is a string, not a boolean, so this method returns `false`)

`isFalse(asBool(0))` returns `true`

isTrue(boolean b) boolean

Returns whether `b` is true.

`isTrue('true')` returns `false` (the argument is a string, not a boolean, so this method returns `false`)

`isTrue(asBool(1))` returns `true`

and(boolean a, boolean b) boolean

Evaluates logical AND (conjunction) on several statements. All conditions need to be fulfilled for the function to return true. Is equivalent to `a && b`.

`and(1==1, 3<4)` returns `true`

not(boolean b) boolean

Evaluates logical NOT (negation) of a statement, returning the opposite of `b`.

`not(1!=1)` returns `false`

or(boolean a, boolean b) boolean

Evaluates logical OR (disjunction) on several statements. At least one condition needs to be fulfilled for the function to return true. Is equivalent to `a || b`.

`or(1==1, 3>4)` returns `true`

## Date functions¶

asDate(object o, [format1, …]) date

Returns `o` converted to a date, based on the format(s) provided. If your column contains more than one date format, you can give an ordered list of possible formats. If you don’t give a format, ISO-8601 is used by default (see more examples of the syntax for ISO-8601). Here is some example syntax:

• ‘y’ (Year)

• ‘M’ (month in year)

• ‘w’ (week in year)

• ‘W’ (week in month)

• ‘d’ (day in month)

• ‘D’ (day in year)

• ‘E’ (day name in week)

• ‘u’ (day number in week. Monday =1)

• ‘H’ (hour in day, 0-23)

• ‘m’ (minute)

• ‘s’ (second)

• S (millisecond)

`asDate('2020-04-15', 'yyyy-MM-dd')` returns `2020-04-15T00:00:00.000Z`

datePart(date d, string part, [timezone]) Output depends on arguments

Extracts a date component from `date d`. Returned date components are always in local timezone–unless the optional `timezone` is included–and in English (see a complete list of supported timezones.). The available components are:

• ‘years’ (or ‘year’)

• ‘isoWeekYear’ gives the ISO week-numbering year

• ‘months’ (or ‘month’)

• ‘weekOfYear’ gives the week of the year (as per server locale’s convention)

• ‘isoWeekOfYear’ gives the ISO week number of the year

• ‘weeks’ (or ‘week’ or ‘w’)

• ‘isoWeekOfMonth’ gives the ISO week number of the month

• ‘days’ (or ‘day’ or ‘d’)

• ‘weekday’ (the name of the day of the week, capitalized)

• ‘dayofweek’ (day of the week as a number. Monday=1, Tuesday=2, … Sunday=7)

• ‘hours’ (or ‘hour’ or ‘h’)

• ‘minutes’ (or ‘minute’ or ‘min’)

• ‘seconds’ (or ‘second’ or ‘s’)

• ‘unixTime’ (number of seconds since epoch)

• ‘millisecond’ (or ‘millisecond’, or ‘ms’)

• ‘time’ (number of milliseconds since epoch)

`datePart('2020-04-15T00:00:00.000Z', 'weekday')` returns `"Wednesday"`

`datePart('2020-01-01T00:00:00.000Z', 'year', '-08:00')` returns `2019`

diff(date d1, date d2, [string unit]) number

Returns the difference between two dates expressed in given time units. The default unit of time is `days`. The available units are:

• ‘years’ (or ‘year’)

• ‘months’ (or ‘month’)

• ‘weeks’ (or ‘week’ or ‘w’)

• ‘days’ (or ‘day’ or ‘d’)

• ‘hours’ (or ‘hour’ or ‘h’)

• ‘minutes’ (or ‘minute’ or ‘min’)

• ‘seconds’ (or ‘second’ or ‘s’)

`diff('2019-03-15T00:00:00.000Z', '2020-04-15T00:00:00.000Z', 'month')` returns `-13`

inc(date d, number value, string unit) date

Returns `d` incremented or decremented by the number value in the specified unit of time; a decimal number value will be truncated to an integer. The available units are:

• ‘years’ (or ‘year’)

• ‘months’ (or ‘month’)

• ‘weeks’ (or ‘week’ or ‘w’)

• ‘days’ (or ‘day’ or ‘d’)

• ‘hours’ (or ‘hour’ or ‘h’)

• ‘minutes’ (or ‘minute’ or ‘min’)

• ‘seconds’ (or ‘second’ or ‘s’)

`inc('2020-04-15T00:00:00.000Z', -3, 'week')` returns `2020-03-25T00:00:00.000Z`

now() date

Returns the current time

trunc(date d, string unit) date

Returns `d` truncated to the unit specified. The available units are:

• ‘years’ (or ‘year’)

• ‘months’ (or ‘month’)

• ‘weeks’ (or ‘week’ or ‘w’)

• ‘days’ (or ‘day’ or ‘d’)

• ‘hours’ (or ‘hour’ or ‘h’)

• ‘minutes’ (or ‘minute’ or ‘min’)

• ‘seconds’ (or ‘second’ or ‘s’)

`trunc('2020-04-03T07:47:45.245Z', 'month')` returns `2020-04-01T00:00:00.000Z`

## Math functions¶

abs(number d) number

Returns the absolute value of a number

`abs(-7)` returns `7.0`

acos(number d) number

Returns the arc cosine of an angle, in the range 0 through PI

`acos(-1)` returns `3.141592654`

asin(number d) number

Returns the arc sine of an angle in the range of -PI/2 through PI/2

`asin(1)` returns `1.570796327`

atan(number d) number

Returns the arc tangent of an angle in the range of -PI/2 through PI/2

atan2(number x, number y) number theta

Converts rectangular coordinates (x, y) to polar (r, theta)

ceil(number n) number

Returns the ceiling of a number

`ceil(4.67)` returns `5`

`ceil(-4.67)` returns `-4`

combin(number n, number k) number

Returns the number of combinations for n elements divided into groups of k, n!/k!(n-k)!

`combin(6, 2)` returns `15`

cos(number d) number

Returns the trigonometric cosine of an angle

`cos(0)` returns `1.0`

cosh(number d) number

Returns the hyperbolic cosine of a value

`cosh(0)` returns `1.0`

dec2hex(long) string

Returns an hexadecimal representation of the input number

`dec2hex(10)` returns `a` `dec2hex(256)` returns `100`

degrees(number d) number

Converts an angle from radians to degrees.

`degrees(PI())` returns `180.0`

even(number n) number

Rounds the number up to the nearest even integer

`even(3)` returns `4.0`

`even(2.3)` returns `4.0`

`even(-2.3)` returns `-2.0`

`even(-3)` returns `-2.0`

exp(number n) number

Returns the exponential of a number

`exp(2)` returns `7.38905609893065`

fact(number i) number

Returns the factorial of a number i

`fact(4)` returns `24`

factn(number i, number d) number

Returns the factorial of a number i, omitting every dth item from the multiplication

`factn(7, 3)` returns `28`

floor(number d) number

Returns the floor of a number

`floor(4.7)` returns `4`

gcd(number d, number e) number

Returns the greatest common denominator of the two numbers

`gcd(21, 28)` returns `7.0`

hash(string) long

Returns a 64 bits numerical hash of the input (not crypto-secure)

`hash("goo")` returns `2774880816139997119`

hex2dec(string) long

Returns a decimal representation of an hexadecimal string

`hex2dec("a")` returns `10`

lcm(number d, number e) number

Returns the least common multiple of the two numbers

`lcm(20, 42)` returns `420.0`

ln(number n) number

Returns the natural log of a number

`ln(exp(1))` returns `1.0` `ln(2.72)` returns `1.000631880307906`

log(number n) number

Returns the base 10 log of a number

`log(100)` returns `2.0`

max(a, b, …) Output depends on arguments

Returns the greater of two or more numbers, two or more strings, or the more recent of two or more dates

`max(-1, 3)` returns `3`

`max('luke', 'leia')` returns `'luke'`

`max('2020-01-01T00:00:00.000Z', '2021-01-01T00:00:00.000Z')` returns `2021-01-01T00:00:00.000Z`

min(a, b, …) number

Returns the smaller of two or more numbers, two or more strings, or older of two or more dates

`min(-1, 3)` returns `-1.0`

`min('luke', 'leia')` returns `'leia'`

`min('2020-01-01T00:00:00.000Z', '2021-01-01T00:00:00.000Z')` returns `2020-01-01T00:00:00.000Z`

mod(number a, number b) number

Returns a modulus b

`mod(5, 3)` returns `2`

`mod(7.8, 3)` returns `1`

multinomial(number d1, number d2 …) number

Returns the multinomial of a series of numbers, `(sum(d1, d2, d3,...))! / d1! * d2! * d3! ...`

`multinomial(1, 1, 2, 1)` returns `60`

odd(number d) number

Rounds the number up to the nearest odd integer

`odd(5.3)` returns `7.0`

`odd(-5.3)` returns `-5.0`

PI() number

Returns the value of PI

pow(number a, number b) number

Returns `a` to the power of `b`

`pow(2, -1)` returns `0.5`

quotient(number numerator, number denominator) number

Returns the integer portion of a division

`quotient(7, 2)` returns `3.0`

Converts an angle in degrees to radians

`radians(180)` returns `3.141592653589793`

rand([int min], [int max]) double or int

Without arguments, returns a random float between 0 and 1. With `min` and `max` arguments, returns a random integer between `min` (inclusive) and `max` (exclusive)

round(number n) number

Returns the rounding of number to the nearest integer

`round(3.5)` returns `4.0`

`round(-3.5)` returns `-3.0`

sin(number d) number

Returns the trigonometric sine of an angle

`sin(radians(90))` returns `1.0`

sinh(number d) number

Returns the hyperbolic sine of an angle

`sinh(0)` returns `0.0`

sqrt(number n) number

Returns the square root of number n.

`sqrt(81)` returns `9.0`

sum(array a) number

Sums the numbers of an array. Skips non-number elements from the array.

`sum([1, 2, "string", 3])` returns `6.0`

tan(number d) number

Returns the trigonometric tangent of an angle

`tan(0)` returns `0.0`

tanh(number d) number

Returns the hyperbolic tangent of a value

toNumber(o) number

Returns o converted to a number

`toNumber("5")` returns `5`

## Object functions¶

get(object o, string field, [string defaultValue]) Output depends on arguments

Returns `o[from]`. If `o[from]` is empty and `default` has been provided, returns `default`.

`get(parseJson('{"name":"joe", "age":42}'), "age")` returns `42`

`get(parseJson('{"name":"joe"'), "age", 37)` returns `37`

hasField(object o, string name) boolean

Returns whether `o` has the field `name`

`hasField(parseJson('{""name"": ""joe"", ""age"": 42 }'), ""age"")` returns `true`

htmlAttr(Element e, String s) string

Selects a value from an attribute on an Html Element

htmlText(Element e) string

Selects the text from within an element (including all child elements)

`htmlAttr(select(parseHtml('<div><a href="www.dataiku.com"></div>'), '[href]')[0], "href")` returns `"www.dataiku.com"`

innerHtml(Element e) string

The innerHtml of an HTML element

jsonize(value) JSON literal value

Quotes a value as a JSON literal value

objectDel(object o, key, [key…]) object

Removes one or several keys from an object and returns it. The keys must not be null

`objectDel(parseJson('{"firstName": "birdie", "company": "Dataiku" }'), 'firstName')` returns `{"company":"Dataiku"}`

objectNew(k1, v1, k2, v2, …) object

Creates a new object, optionally pre-filled with key/values. Must get an even number of arguments, as successive key-value pairs. Giving 0 arguments is possible and will return an empty object (you can use objectPut to add to it)

`objectNew("firstName", "birdie", "company", "Dataiku")` returns `{"firstName": "birdie", "company": "Dataiku" }`

objectPut(object o, key, value) object

Adds a key/value pair to an object and returns it. The key must not be null

ownText(Element e) string

Gets the text owned by this HTML element only; does not get the combined text of all children.

parseHtml(string s) HTML object

Parses a string as HTML

parseJson(string s) object or array

Parses a JSON string as an object or array

select(Element e, String s) HTML Elements

Selects an element from an HTML elementn using selector syntax

type(object o) string

Returns the type of `o`

`type(3.126)` returns `number`

## String functions¶

chomp(string s string tail) string

Removes `tail` from the end of `s` if it’s there, otherwise leaves it alone.

`chomp("foobar", "bar")` returns `"foo"`

coalesce(value1, value2, …) Output depends on arguments

Returns the first non-empty value.

`coalesce("", "foo")` returns `"foo"`

concat((object a1, [object a2, …])) Output depends on arguments

Returns a string of concatenated values. If one of your columns contains leading 0 (zeros), wrap it in a `strval('column_name')` to preserve them.

`concat("Birds", " ", "fly")` returns `"Birds fly"`

`concat(1, 2, 3)` returns `"123"`

contains(string s, string frag) boolean

Returns whether `s` contains `frag`

`contains("hello world ", "llo")` returns `true`

endsWith(string s, string tail) boolean

Returns whether `s` ends with `tail`

`endsWith("hello world", "rld")` returns `true`

escape(string s, string mode) string

Escapes a `s` using the escaping mode specified. Supported modes: ‘html’, ‘xml’, ‘csv’, ‘url’, javascript’. Note that the escape uses the standard java URLEncoder for urls and StringEscapeUtils for others

format(string format, object… args) string

Formats a string using printf-like formatting using the Java Formatter syntax.

`'%4d-%02d'.format(2004,2)` returns `'2004-02'`

fromBase64(string s, [string charset]) string

Returns the string whose Base64 representation is given. By default, the string is read using the UTF-8 charset.

`fromBase64('SA==')` returns `"H"`

get(string s, from index, [to index]) Output depends on arguments

Returns `s.substring(from, to)`. Omitting the optional number to will return what is at the first index specified.

`get('Oh no, kittens!', 0, 5)` returns `'Oh no'`

`get('Oh no, kittens!', 1)` returns `'h'`

indexOf(string s, string sub) number

Returns the index of the first occurrence of `sub` in `s`. Index begins at 0. Returns -1 if there is no such occurrence.

`indexOf("hello world", "world")` returns `6`

lastIndexOf(string s, string sub) number

Returns the index of the last occurrence of `sub` in `s`. Index begins at 0. Returns -1 if there is no such occurrence.

`lastIndexOf("hello world", "o")` returns `7`

length(array or string o) number

Returns the length of o

`length("hello world")` returns `11`

`length([4,5,6])` returns `3`

match(string a, string or regexp) array of strings

Returns an array of the matching groups found in `s`. Groups are designated by () within the specified string or regular expression.

`match('hello world', 'he(.*)wo(rl)d')` returns `["llo ","rl"]`

md5(string s) string

Returns the MD5 hash of a string

`md5('hi')` returns `"49f68a5c8493ec2c0bf489821c21fc3b"`

partition(string s, string or regex frag, [boolean omitFragment]) array

Returns an array of strings `[a,frag,b]` where `a` is the part before the first occurrence of `frag` in `s` and `b` is the part after the occurrence. If `omitFragment` is true, `frag` is not returned in the array.

`partition(""hello"", ""he"")` returns `["""",""he"",""llo""]`

`partition(""hello"", ""he"", asBool(1))` returns `["""",""llo""]`

`partition(""hello"", /.l/)"` returns `[""h"",""el"",""lo""]`

replace(string s, string or regex f, string replacement) string

Replaces all occurrences of substring / regex `f` found in string `s` with the `replacement` string.

`replace('hello world', 'hel', 'a')` returns `"alo world"`

`replace(""Oh my!"", /\w/, 'x')` returns `"xx xx!"`

replaceChars(string s, string f, string r) string

Returns the string obtained by replacing all character in `s` that match `f` with the character in `r` at that same position. The function can be used to delete characters by replacing them with nothing.

`replaceChars('abcba', 'bc', 'BC')` returns `aBCBa` `replaceChars('abcba', 'bc', 'Z')` returns `aZZa`

rpartition(string s, string or regex frag, [boolean omitFragment]) array

Returns an array of strings `[a,frag,b]` where `a` is the part before the last occurrence of `frag` in `s` and `b` is the part after the occurrence. If `omitFragment` is true, `frag` is not returned in the array.

`rpartition("hello world", "o")` returns `["hello w","o","rld"]`

sha1(string s) string

Returns the SHA-1 hash of a string

`sha1('goo')` returns `3f95edc0399d06d4b84e7811dd79272c69c8ed3a`

sha256(string s) string

Returns the SHA-256 hash of a string

sha512(string s) string

Returns the SHA-512 hash of a string

split(string s, string or regex sep, [boolean preserveAllTokens]) array

Returns the array of strings obtained by splitting `s` with separator `sep`. If `preserveAllTokens` is true, then empty segments are preserved.

`split(""hello"", ""he"")` returns `[""llo""]`

`split(""hello"", ""he"", asBool(1))` returns `["""",""llo""]`

`split(""hello"", /.l/)` returns `[""h"",""lo""]`

`split("Hello world!", /\s+/)` returns `["Hello","world!"]`

splitByCharType(string s) array

Returns an array of strings obtained by splitting s grouping consecutive chars by their unicode type

`splitByCharType("Hello_world 101!?!")` returns `["H","ello","_","world"," ","101","!?!"]`

splitByLengths(string s, number length1, […]) array

Returns an array of strings obtained by splitting `s` into substrings, each with its own specified length.

`splitByLengths("Hello world", 1, 2, 3, 4)` returns `["H","el","lo ","worl"]`

startsWith(string s, string sub) boolean

Returns whether `s` starts with `sub`

`startsWith("Hello world", "He")` returns `true`

strip(string s) string

Returns copy of the string, with leading and trailing whitespace omitted. This function is the same as trim.

`strip(" Hello World ")` returns `"Hello World"`

toBase64(string s, [string charset]) string

Returns the Base64 representation of a string. By default, the string is written using the UTF-8 charset.

`toBase64("H")` returns `"SA=="`

toLowercase(string s) string

Converts a string to lowercase

`toLowercase("HELLO WORLD")` returns `"hello world"`

toString(o, string format (optional)) string

Returns o converted to a string

`toString(5)` returns `"5"`

toTitlecase(string s) string

Converts a string to titlecase

`toTitlecase("hello world")` returns `"Hello World"`

toUppercase(string s) string

Converts a string to uppercase

`toUppercase("hello world")` returns `"HELLO WORLD"`

trim(string s) string

Returns copy of the string, with leading and trailing whitespace omitted.

`trim(" Hello World ")` returns `"Hello World"`

unescape(string s, string mode) string

Unescapes all escaped parts of the string depending on the given escaping mode. Available modes: ‘html’, ‘xml’, ‘csv’, ‘url’, ‘javascript’

unicode(string s) string

Returns the input string as an array of the unicode codepoints (numbers)

`unicode("Hi!")` returns `[72,105,33]`

unicodeType(string s) string

Returns an array of strings describing each character of the input string in their full unicode notation

`unicodeType("y 0H?")` returns `["lowercase letter","space separator","decimal digit number","uppercase letter","other punctuation"]`

uuid() string

Returns a type 4 (pseudo randomly generated) UUID. The UUID is generated using a cryptographically strong pseudo random number generator.

`uuid()` returns `7af14645-bcd9-4af2-bca8-390fc67e9a0b`

## Geometry functions¶

geoBuffer(geometry geom, double distance, [int quadrantSegment]) string

Returns a geometry that represents all points whose distance from this geometry is less than or equal to `distance`. The `distance` parameter can be either a value or a column name from the dataset. The distance unit depends on the CRS of the given geometries (e.g., degrees for SRID=4326). A negative distance can be used with polygons, which will shrink the polygon rather than expand it. Specify an optional `quadrantSegment` value to set the number of segments to approximate a quarter circle (the default is 8)

geoContains(geometry geomA, geometry geomB) bool

Compute the boolean {geomA contains geomB} for two input geometries A and B.

The implementation of this function depends on the engine it is run on but its definition is standard: “Geometry A contains Geometry B if and only if no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A.” (https://postgis.net/docs/ST_Contains.html)

A polygon A does not contain a point B if B is exactly on its boundary.

`geoContains("POLYGON((0 0,3 0,0 3,0 0))", "POINT(1 1)")` returns `true`

`geoContains("POLYGON((0 0,3 0,0 3,0 0))", geometry)` returns the evaluation of the boolean predicate for each input geometry row of dataset column geometry

geoEnvelope(geometry geom) string

Returns the envelope of a geometry, i.e., the minimum bounding box containing this geometry.

geoMakeValid(geometry geom) string

Returns a valid representation of an invalid geometry. Valid geometries remain unchanged.

Note

This function can return slightly different results whether the recipe engine is PostgreSQL or a local stream. If the method is unable to make a geometry valid, it returns an empty cell.

geoSimplify(geometry geom, double toleranceDistance) string

Returns a simpler geometry based on the Douglas-Peucker algorithm, with respect to a non-negative `toleranceDistance` parameter. The simplification of the geometry will result in a geometry with fewer vertices. A vertex is removed only if the distance between this vertex and the edge resulting from removing this vertex is within the specified `toleranceDistance`. The `toleranceDistance` parameter can be either a value or a column name from the dataset. The toleranceDistance unit depends on the CRS of the given geometries (e.g., degrees for SRID=4326).

geoWithin(geometry geomA, geometry geomB) bool

Returns a result opposite to geoContains (see above). geomA is within geomB if geomB contains geomA.

geoDistance(geopoint geoPointA, geopoint geoPointB, string unit) number

Compute the distance between two geographical points. The available units are: `KILOMETERS`, `MILES`.

## Value access functions¶

numval(object o, [number offset]) number

Returns the numerical value of a column. If the value is not numerical, an empty value is returned. Specify an optional Offset argument to return the value of a previous row.

Note

Use numval when your column name contains spaces or periods to ensure DSS parses it properly as in `numval("my.column")` or `numval("my column")`. Remember to always write the column name in “quotes” when using numval.

strval(object o, [string defaultValue], [number offset]) string

Returns the string value of a column. If the value is not a string, an empty value is returned. Specify an optional defaultValue to replace empty cells with defaultValue. Specify an optional Offset argument to return the value of a previous row.

Note

Use strval when your column name contains spaces or periods to ensure DSS parses it properly as in `strval("my.column")` or `strval("my column")`. Remember to always write the column name in “quotes” when using strval.

val(object o, [string defaultValue], [number offset]) Output depends on argument

Returns the value of a column. Specify an optional defaultValue to replace empty cells with defaultValue. The type of the resulting column is auto-detected (use strval or numval to force it). Specify an optional Offset argument to return the value of a previous row.

Note

Use val when your column name contains spaces or periods to ensure DSS parses it properly as in `val("my.column")` or `val("my column")`. Remember to always write the column name in “quotes” when using val.

### Offset argument¶

Note

The offset argument is only available in the Prepare recipe and only with the DSS engine.

• Set offset to `1` to retrieve a value from the previous row

• Set offset to `2` to retrieve a value from two rows before the current row

• Set offset to `0` to retrieve a value from the current row

• An error is returned if offset is set to a negative value

## Control structures¶

Control structures allow you to perform advanced operations.

Beware: control structures cannot use object notation !

filter(array a, variable v, expression e) array

Evaluates expression a to an array. Then for each array element, binds its value to variable name v, evaluates expression e, and pushes the result onto the result array if the result is truish.

`filter(['aa', 'bb', 'cc', 'ab', 'bc'], item, item.startsWith('a'))` returns `["aa","ab"]`

Note

`filter(myarray, v, v < 2)` is equivalent to this Python syntax: `[v for v in myarray if v < 2]` and returns the array with only the elements below 2

forEach(array a, variable v, expression e) array

Evaluates expression a to an array. Then for each array element, binds its value to variable name v, evaluates expression e, and pushes the result onto the result array.

`forEach(['aa', 'bb', 'cc', 'ab', 'bc'], item, item.startsWith('a'))` returns `[true,false,false,true,false]`

Note

`forEach(myarray, v, v + 2)` is equivalent to this Python syntax: `[v+2 for v in myarray]`

forEachIndex(array a, variable i, variable v, expression e) array

Evaluates expression a to an array. Then for each array element, binds its value to variable name v and its index to variable name i, evaluates expression e, and pushes the result onto the result array.

`forEachIndex(['aa', 'bb', 'cc', 'ab', 'bc'], index, item, index < 2 || item.startsWith('a'))` returns `[true,true,false,true,false]`

Note

`forEachIndex(myarray, i, v, v + i)` is equivalent to this Javascript syntax: `myarray.map(function(v, i) { return v+i ;})`

forRange(from, to, step, variable v, expression e) array

Iterates, starting at `from`, incrementing by `step` each time while less than `to` (or while more than `to` if `step < 0`). At each iteration, binds the variable `v` to the iteration value, evaluates expression `e`, and pushes the result onto the result array.

`forRange(0, 21, 3, v, v)` returns `[0,3,6,9,12,15,18]` `forRange(0, 21, 3, v, v*5)` returns `[0,15,30,45,60,75,90]`

Note

`forRange(0, 100, 3, v, v * 2)` is equivalent to this Python syntax: `[v * 2 for v in xrange(0, 100, 3)]` and returns [0, 6, 12, 18 …. 198]

if(boolean, expression_true, expression_false) Output depends on expression

Evaluates to `expression_true` if the condition is `true`, to `expression_false` otherwise

`if(3>2, "I heart #s", "my brain hurts")` returns `"I heart #s"`

objectFilter(expression a, variable k, variable v, expression test) object

Evaluates expression a to an object. Then for each element (k, v) of this object, binds its key to variable name k, its value to variable name v, evaluates expression test which should return a boolean. If the boolean is true, pushes (k, v) onto the result object.

`objectFilter('{"fName":"Joe", "lName":"Smith", "age":42 }', k, v, k.contains("Name"))` returns `{"fName":"Joe","lName":"Smith"}`

switch(expression_to_match, match_1, return_1, match_2, return_2, …, [return_default]) Output depends on expression

Compares `expression_to_match` to `match_1`, then to `match_2`, etc. When match is found, returns the corresponding `return_i`. If no match is found, returns `return_default`, or nothing if there is no `return_default`.

`switch("Paris", "Paris", 1, "New York", 2, 0)` returns `1`

`switch("Berlin", "Paris", "1", "New York", "2", "Other")` returns `"Other"`

`switch(col1, col2+col3, "sum",  "Other")` returns `"sum" if value of column col1 equals value of col2+col3, "Other" otherwise`

`switch("true", col1 >= 10, "good", col1 >= 5 "okay", "bad")` returns `"good", "okay", or "bad" depending on col1 value`

with(expression o, variable v, expression e) Output depends on expression

Evaluates expression o and binds its value to variable name v. Then evaluates expression e and returns that result. The `with` control allows you to “split” a very big expression into more manageable chunks. It also makes reusing the result of a complex computation easier to read and faster to process.

`with("european union".split(" "), a, a.length())` returns `2`

`with("european union".split(" "), a, forEach(a, v, v.length()))` returns `[8,5]`

## Tests¶

Beware: tests cannot use object notation

isBlank(expression o) boolean

Returns whether `o` is null or an empty string. IsBlank() returns true when value is null or empty (“”). It does not consider a string made of blank spaces as blank; it considers an expression with an error as blank.

`isBlank("")` returns `true`

`isBlank(" ")` returns `false`

`isBlank(abs("a"))` returns `true`

isError(expression o) boolean

Returns whether `o` is an error

`isError(abs("a"))` returns `true`

isNonBlank(expression o) boolean

Returns whether `o` is non-null or a not-empty string. IsNonBlank() returns false when value is null or empty (“”). It considers a string made of blank spaces as NonBlank; it considers an expression with an error as blank.

`isNonBlank("")` returns `false`

`isNonBlank(" ")` returns `true`

`isNonBlank(abs("a"))` returns `false`

isNotNull(expression o) boolean

Returns whether `o` is not null. IsNotNull() returns false when value is null or empty (“”). It does not consider a string made of blank spaces as not null, and considers an expression with an error as not null.

`isNotNull("")` returns `false`

`isNotNull(" ")` returns `false`

`isNotNull(abs(a"))` returns `true`

isNull(expression o) boolean

Returns whether `o` is null or an empty string. IsNull() returns true when value is null or empty (“”). It considers a string made of blank spaces as null, and does not consider an expression with an error as null.

`isNull("")` returns `true`

`isNull(" ")` returns `true`

`isNull(abs("a"))` returns `false`

isNumeric(expression o) boolean

Returns whether `o` can represent a number