# Formula language¶

DSS includes a language to write formulas, much like a spreadsheet.

Formulas can be used:

• In data preparation, to create new columns, filter rows or flag rows
• More generally, to filter rows in many places of DSS:
• In the Filtering recipe, to filter rows
• In Machine Learning, to define the extracts to use for train and test set
• In the Python and Javascript APIs, to obtain partial extracts from the datasets
• In the Public API, to obtain partial extracts from the datasets
• In the grouping, window, join and stack recipes, to perform pre and post filtering

## Basic usage¶

Note

We also have a Howto that lists common use cases with examples

Formulas define an expression, that applies row per row.

Assuming that you have a dataset with columns N1 (numeric), N2 (numeric) and S (string) , here are a few example formulas:

• 2 + 2
• N1 + N2
• min(N1, N2) # Returns the smallest of N1 and N2
• replace(S, 'old', 'new') # Returns the value of S with ‘old’ replaced by ‘new’
• if (N1 > N2, 'big', 'small') # Returns big is N1 > N2, small otherwise

In almost all formulas, you’ll need to read the values of the columns for the current row.

When the column has a “simple” name (i.e: starting by a letter, contains only letters, numbers and underscores), you just need to use the name of the column in the fomula: N1 + 4

For other cases, you can use:

• strval("column with spaces") : returns the value of “column with spaces” as a string
• numval("column with spaces") : returns the value of “column with spaces” as a number

Warning

mycolumn evaluates to the value of the column named mycolumn

Thus, strval(mycolumn) would not work because it would try to read as a string the value of the column whose name is the value of the mycolumn column.

Instead, use strval("mycolumn").

You may also need to use strval to avoid DSS automatically typing a numerical variable. See below for more details.

## Variables typing and autotyping¶

Variables in the formula language can have one of the following types: string, integer, decimal, array, object, boolean, date

Regardles of the schema of the dataset, or the meanings of the columns (in the case of working on dynamically-generated columns), the following rules are applied:

• If column values are “standard” decimal, they are automatically parsed to decimal
• If column values are “standard” integer, they are automatically parsed to integer
• Else, they are kept as string

Columns containing dates, arrays, objects or booleans are not automatically converted to the matching Formula type. However:

• all functions that require an array will automatically attempt to convert a string input to array (using the regular DSS JSON syntax)
• all functions that require a date will automatically attempt to convert a string input to date (using the ISO-8601 format)

In other words, if the column “begin_date” is a date (and thus contains properly-formatted ISO-8601):

• type(begin_date) returns “string”
• inc(begin_date, 2, "days") works as expected because the inc function performed automatic conversion to date

### Avoiding auto-typing¶

There are some cases where auto-typing is not a good thing. For example, a value like 012345 would be automatically converted to the number 12345.

• To avoid autotyping to numerical, use the strval("mycolumn") function (see above)
• To manually obtain an array or object, use the parseJson function
• To manually obtain a date, use the asDate function

## Boolean values¶

The formula language uses “True” and “False”, with quotes, as boolean values for true and false.

## Operators¶

The formula language supports the classical arithmetic operators:

• Regular math operators: +, -, *, /
• Comparison operators (evaluate to booleans): >, >=, <, <=, ==, !=
• Arithmetic operators: // (integer division) and % (modulo)
• Boolean operators: &&, ||
• The + operator also performs string concatenation.
• The comparison operators (and only them) can operate on dates. For dates arithmetic, see the diff and inc functions.

## Array and object operations¶

Formula support accessing array elements and object keys using the traditional Python/Javascript syntax:

• array[0]
• object["key"]
• object.key (only valid if ‘key’ is a valid identifier, i.e. matches [A-Za-z0-9_]*)

Note: this requires that you actually have an array or object. You might need to use the parseJson function (see above paragraph about typing)

## Object notations¶

For all functions, you can use them the “regular” way: replace(str, 'a', 'b') or in the “object” way: str.replace('a', 'b')

In object notation, the first argument to the function is replaced by its ‘context’.

For example, the two syntaxes are equivalent:

length(trim(replace(foo, 'a', 'b')))

foo.replace('a', 'b').trim().length()


## Array functions¶

### arrayContains(array a, item) boolean¶

Returns whether the array a contains item

### arrayDedup(array a) array¶

Returns array a with duplicates removed

### arrayIndexOf(array a, item) int¶

the index (starting from 0) of item in the array a. Return -1 if item is not in a

### arrayLen(array a) int¶

Returns the length of the array a

Reverses array a

Sorts array a

### get(o, number or string from, optional number to) Depends on actual arguments¶

If o has fields, returns the field named ‘from’ of o. If o is an array, returns o[from, to]. if o is a string, returns o.substring(from, to)

### join(array a, string sep) string¶

Returns the string obtained by joining the elements of array a with the separator sep

For example, the expression join(date_elements, '-') on a column date_elements produces:

date_elements output
[2007, 7, 15] 2007-7-15
[2016, 1, 8] 2016-1-8

### objectKeys(object o) array¶

Returns the keys of an object as an array

### objectValues(object o) array¶

Returns the keys of an object as an array

### slice(o, number from, optional number to) Depends on actual arguments¶

If o is an array, returns o[from, to]. If o is a string, returns o.substring(from, to)

### substring(o, number from, optional number to) Depends on actual arguments¶

If o is an array, returns o[from, to]. If o is a string, returns o.substring(from, to)

## Boolean functions¶

### and(boolean a, boolean b) boolean¶

Returns a AND b

### not(boolean b) boolean¶

Returns the opposite of b

### or(boolean a, boolean b) boolean¶

Returns a OR b

## Date functions¶

### asDate(o, format1, format2, … (all optional)) date¶

Returns o converted to a date. If you don’t give a format, ISO-8601 is used. You can give an ordered list of possible formats using this syntax: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

### datePart(date d, string part) number¶

Returns part of a date. The available parts are: ‘hours’ (or ‘hour’ or ‘h’), ‘minutes’ (or ‘minute’ or ‘min’), ‘seconds’ (or ‘second’ or ‘s’), ‘years’ (or ‘year’), ‘months’ (or ‘month’), ‘days’ (or ‘day’ or ‘d’), ‘weeks’ (or ‘week’ or ‘w’), ‘weekday’ (the name of the day of the week, capitalized), ‘time’ (number of milliseconds since epoch)

Returned parts are always in local timezone and english

### diff(o1, o2, optional string unit) number¶

Returns the difference between two dates in given time units. The available units are: ‘hours’ (or ‘hour’ or ‘h’), ‘minutes’ (or ‘minute’ or ‘min’), ‘seconds’ (or ‘second’ or ‘s’), ‘years’ (or ‘year’), ‘months’ (or ‘month’), ‘days’ (or ‘day’ or ‘d’), ‘weeks’ (or ‘week’ or ‘w’)

### inc(date d, number value, string unit) date¶

Returns a date incremented or decremented by the given amount in the given unit of time. The available units are: ‘hours’ (or ‘hour’ or ‘h’), ‘minutes’ (or ‘minute’ or ‘min’), ‘seconds’ (or ‘second’ or ‘s’), ‘years’ (or ‘year’), ‘months’ (or ‘month’), ‘days’ (or ‘day’ or ‘d’), ‘weeks’ (or ‘week’ or ‘w’)

### now() date¶

Returns the current time

### trunc(date d, string unit) date¶

Truncates a date to a given unit. The available units are: ‘hours’ (or ‘hour’ or ‘h’), ‘minutes’ (or ‘minute’ or ‘min’), ‘seconds’ (or ‘second’ or ‘s’), ‘years’ (or ‘year’), ‘months’ (or ‘month’), ‘days’ (or ‘day’ or ‘d’), ‘weeks’ (or ‘week’ or ‘w’)

## Math functions¶

### abs(number d) number¶

Returns the absolute value of a number

### acos(number d) number¶

Returns the arc cosine of an angle, in the range 0 through PI

### asin(number d) number¶

Returns the arc sine of an angle in the range of -PI/2 through PI/2

### atan(number d) number¶

Returns the arc tangent of an angle in the range of -PI/2 through PI/2

### atan2(number x, number y) number theta¶

Converts rectangular coordinates (x, y) to polar (r, theta)

### ceil(number d) number¶

Returns the ceiling of a number

### combin(number n, number k) number¶

Returns the number of combinations for n elements as divided into k

### cos(number d) number¶

Returns the trigonometric cosine of an angle

### cosh(number d) number¶

Returns the hyperbolic cosine of a value

### dec2hex(long) string¶

Returns an hexadecimal representation of the input number

### degrees(number d) number¶

Converts an angle from radians to degrees.

### even(number d) number¶

Rounds the number up to the nearest even integer

### exp(number n) number¶

Returns the exponential of a number

### fact(number i) number¶

Returns the factorial of a number

### factn(number i) number¶

Returns the factorial of a number

### floor(number d) number¶

Returns the floor of a number

### gcd(number d, number e) number¶

Returns the greatest common denominator of the two numbers

### hash(string) long¶

Returns a 64 bits numerical hash of the input (not crypto-secure)

### hex2dec(string) long¶

Returns a decimal representation of an hexadecimal string

### lcm(number d, number e) number¶

Returns the greatest common denominator of the two numbers

### ln(number n) number¶

Returns the natural log of a number

### log(number n) number¶

Returns the base 10 log of a number

### max(number a, number b) number¶

Returns the greater of two numbers

### min(number a, number b) number¶

Returns the smaller of two numbers

### mod(number a, number b) number¶

Returns a modulus b

### multinomial(number d1, number d2 …) number¶

Calculates the multinomial of a series of numbers

### odd(number d) number¶

Rounds the number up to the nearest odd integer

### pow(number a, number b) number¶

Returns a to the power of b

### quotient(number numerator, number denominator) number¶

Returns the integer portion of a division

Converts an angle in degrees to radians

### rand(optional int min, optional int max) double or int¶

Without arguments, returns a random float between 0 and 1. With min and max arguments, returns a random integer between min (inclusive) and max (exclusive)

### round(number n) number¶

Returns the rounding of number to the nearest integer

### sin(number d) number¶

Returns the trigonometric sine of an angle

### sinh(number d) number¶

Returns the hyperbolic sine of an angle

### sum(array a) number¶

Sums the numbers of an array. Skips non-number elements from the array.

### tan(number d) number¶

Returns the trigonometric tangent of an angle

### tanh(number d) number¶

Returns the hyperbolic tangent of a value

### toNumber(o) number¶

Returns o converted to a number

## Object functions¶

### hasField(o, string name) boolean¶

Returns whether o has field name

### htmlAttr(Element e, String s) string¶

Selects a value from an attribute on an Html Element

### htmlText(Element e) string¶

Selects the text from within an element (including all child elements)

### innerHtml(Element e) string¶

The innerHtml of an HTML element

### jsonize(value) JSON literal value¶

Quotes a value as a JSON literal value

### objectDel(key, [key…]) object¶

Removes one or several keys from an object and returns it. The keys must not be null

### objectNew(k1, v1, k2, v2, …) object¶

Creates a new object, optionally pre-filled with key/values. Must get an even number of arguments, as successive key-value pairs. Giving 0 arguments is possible and will return an empty object (you can use objectPut to add to it)

### objectPut(object o, key, value) object¶

Adds a key/value pair to an object and returns it. key must not be null

### ownText(Element e) string¶

Gets the text owned by this HTML element only; does not get the combined text of all children.

### parseHtml(string s) HTML object¶

Parses a string as HTML

### parseJson(string s) object or array¶

Parses a JSON string as an object or array

### select(Element e, String s) HTML Elements¶

Selects an element from an HTML elementn using selector syntax

### type(object o) string¶

Returns the type of o

## String functions¶

### coalesce(value1, value2, …) anything¶

Returns the first non-empty value.

### chomp(string str, string tail) string¶

Removes tail from the end of str if it’s there, otherwise leave it alone.

### contains(string s, string frag) boolean¶

Returns whether s contains frag

### endsWith(string s, string sub) boolean¶

Returns whether s ends with sub

### escape(string s, string mode) string¶

Escapes a string depending on the given escaping mode. Supported modes: ‘html’,’xml’,’csv’,’url’,’javascript’

### format(string format, object… args) string¶

Formats a string using printf-like formatting. For example '%4d-%02d'.format(2004,2) gives '2004-02'.

This uses the Java Formatter syntax

### fromBase64(string s, optional string charset) string¶

Returns the string whose Base64 representation is given. By default, the string is read using the UTF-8 charset.

### indexOf(string s, string sub) number¶

Returns the index of the first ocurrence of sub in s. Returns -1 if there is no such occurrence.

### lastIndexOf(string s, string sub) number¶

Returns the index of the last ocurrence of sub in s. Returns -1 if there is no such occurrence.

### length(array or string o) number¶

Returns the length of o

### match(string, string or regexp) array of strings¶

Returns an array of the groups matching the given regular expression

For example, match('hello world', 'he(.*).*(rl)d') returns ['ll', 'rl']

### md5(string s) string¶

Returns the MD5 hash of a string

### partition(string s, string or regex frag, optional boolean omitFragment) array¶

Returns an array of strings [a,frag,b] where a is the part before the first occurrence of frag in s and b is the part after the occurrence. If omitFragment is true, frag is not returned in the array.

### replace(string s, string or regex f, string replacement) string¶

Replaces all occurrences of a substring or regex in a string

### replaceChars(string s, string f, string r) string¶

Returns the string obtained by replacing all chars in f with the char in s at that same position

For example, replaceChars('abcba', 'bc', 'yz') returns ayzya The function can be used to delete characters, by replacing them by nothing, like in: replaceChars('abcba', 'bc', 'y') returns ayya

### rpartition(string s, string or regex frag, optional boolean omitFragment) array¶

Returns an array of strings [a,frag,b] where a is the part before the last occurrence of frag in s and b is the part after the occurrence. If omitFragment is true, frag is not returned in the array.

### sha1(string s) string¶

Returns the SHA-1 hash of a string

### split(string s, string or regex sep, optional boolean preserveAllTokens) array¶

Returns the array of strings obtained by splitting s with separator sep. If preserveAllTokens is true, then empty segments are preserved.

### splitByCharType(string s) array¶

Returns an array of strings obtained by splitting s grouping consecutive chars by their unicode type

### splitByLengths(string s, number length1, …) array¶

Returns the array of strings obtained by splitting a string into substrings with the given lengths

For example, splitByLengths('abcdefgh', 2, 3, 1) returns ['ab','cde','f'] The function generates one element in the array for each length passed as parameter. If the input string is not long enough, the last elements are empty.

### startsWith(string s, string sub) boolean¶

Returns whether s starts with sub

### strip(string s) string¶

Returns copy of the string, with leading and trailing whitespace omitted.

### toBase64(string s, optional string charset) string¶

Returns the Base64 representation of a string. By default, the string is written using the UTF-8 charset.

### toLowercase(string s) string¶

Converts a string to lowercase

### toString(o, string format (optional)) string¶

Returns o converted to a string

### toTitlecase(string s) string¶

Converts a string to titlecase

### toUppercase(string s) string¶

Converts a string to uppercase

### trim(string s) string¶

Returns copy of the string, with leading and trailing whitespace omitted.

### unescape(string s, string mode) string¶

Unescapes all escaped parts of the string depending on the given escaping mode. Available modes: ‘html’,’xml’,’csv’,’url’,’javascript’

### unicode(string s) string¶

Returns the input string as an array of the unicode codepoints (numbers)

### unicodeType(string s) string¶

Returns an array of strings describing each character of the input string in their full unicode notation

## Value access functions¶

### numval(object o) number¶

Returns the numerical value of a given cell or expression. If the value of the cell or expression is not numerical, an empty value is returned.

For example, with columns type (with values “high” and “low”), cost_high and cost_low, the expression numval('cost_' + type) produces:

cost_low cost_high type output
200 500 low 200.0
100 400 high 400.0
xx 400 low

Warning

mycolumn evaluates to the value of the column named mycolumn

Thus, numval(mycolumn) would not work because it would try to read as a numerical the value of the column whose name is the value of the mycolumn column.

Instead, use numval("mycolumn").

### strval(object o, optional string defaultValue) string¶

Returns the numerical value of a given cell or expression, or the default value if the value of the cell or expression is empty.

For example, with columns type (with values “high” and “low”), code_high and code_low, the expression strval('code_' + type, 'N/A') produces:

code_low code_high type output
200 500 low 200
100 4x0 high 4x0
400 low N/A

Warning

mycolumn evaluates to the value of the column named mycolumn

Thus, strval(mycolumn) would not work because it would try to read as a string the value of the column whose name is the value of the mycolumn column.

Instead, use strval("mycolumn").

## Control structures¶

Control structures allow you to perform advanced operations.

Beware: control structures cannot use object notation !

### filter(array a, variable v, expression e) (returns: array)¶

Evaluates expression a to an array. Then for each array element, binds its value to variable name v, evaluates expression e, and pushes the result onto the result array if the result is truish.

Note

filter(myarray, v, v < 2) is equivalent to this Python syntax: [v for v in myarray if v < 2] and returns the array with only the elements below 2

### forEeach(array a, variable v, expression e) (returns: array)¶

Evaluates expression a to an array. Then for each array element, binds its value to variable name v, evaluates expression e, and pushes the result onto the result array.

Note

forEach(myarray, v, v + 2) is equivalent to this Python syntax: [v+2 for v in myarray]

### forEachIndex(array a, variable i, variable v, expression e) (returns: array)¶

Evaluates expression a to an array. Then for each array element, binds its value to variable name v and its index to variable name i, evaluates expression e, and pushes the result onto the result array.

Note

forEachIndex(myarray, i, v, v + i) is equivalent to this Javascript syntax: myarray.map(function(v, i) { return v+i ;})

### forRange(from, to, step, variable v, expression e) (returns: array)¶

Iterates, tarting at from, incrementing by step each time while less than to. At each iteration, binds the variable v to the iteration value, evaluates expression e, and pushes the result onto the result array.

Note

forRange(0, 100, 3, v, v * 2) is equivalent to this Python syntax: [v * 2 for v in xrange(0, 100, 3)] and returns [0, 6, 12, 18 …. 198]

### if(boolean, expression_true, expression_false) (returns: anything)¶

Evaluates to expression_true if the condition is true, to expression_false otherwise

### objectFilter(expression a, variable k, variable v, expression test) object¶

Evaluates expression a to an object. Then for each element (k, v) of this object, binds its key to variable name k, its value to variable name v, evaluates expression test which should return a boolean. If the boolean is true, pushes (k, v) onto the result object.

### with(expression o, variable v, expression e) (returns: any)¶

Evaluates expression o and binds its value to variable name v. Then evaluates expression e and returns that result.

The with control allows you to “split” a very big expression into more manageable chunks. It also makes reusing the result of a complex computation easier to read and faster to process.

## Tests¶

Beware: tests cannot use object notation

### isBlank(expression o) boolean¶

Returns whether o is null or an empty string

### isError(expression o) boolean¶

Returns whether o is an error

### isNonBlank(expression o) boolean¶

Returns whether o is not null and not an empty string

### isNotNull(expression o) boolean¶

Returns whether o is not null

### isNull(expression o) boolean¶

Returns whether o is null

### isNumeric(expression o) boolean¶

Returns whether o can represent a number