Custom Preprocessing

DSS allows to define custom python preprocessings, in order to plug user-generated code which will process a feature. This is done by selection “Custom preprocessing” in the feature handling options. The way to do this is to implement a class with two methods :

def fit(self, series):
def transform(self, series):

Here, series is a pandas Series object representing the feature column. The fit method does not need to return anything, but must modify the object in-place if fitting is necessary. The transform method must return either a pandas DataFrame or a 2-D numpy array or scipy.sparse.csr_matrix containing the preprocessed result. Note that a single processor may output several numerical features, corresponding several columns of the output. If a numpy array or scipy.sparse.csr_matrix is chosen, then the processor should be also have a “names” attribute, containing the list of the output feature names.

To use your processor in the visual ML UI, you must import it and instantiate it in the code editor, by assigning the processor to the “processor” variable, as follows :

from mymodule import MyProcessor
processor = MyProcessor()

As with any python code component, classes must be defined in a file stored in the lib/python folder of the data directory.