API for plugin formats

class dataiku.customformat.Formatter(config, plugin_config)

Custom formatter

get_output_formatter(stream, schema)

Return a OutputFormatter for this format

Parameters
  • stream – the stream to write the formatted data to

  • schema – the schema of the rows that will be formatted (never None)

get_format_extractor(stream, schema=None)

Return a FormatExtractor for this format

Parameters
  • stream – the stream to read the formatted data from

  • schema – the schema of the rows that will be extracted. None when the extractor is used to detect the format.

class dataiku.customformat.OutputFormatter(stream)

Writes a stream of rows to a stream in a format. The calls will be:

  • write_header()

  • write_row(row_1) …

  • write_row(row_N)

  • write_footer()

write_header()

Write the header of the format (if any)

write_row(row)

Write a row in the format

Parameters

row – array of strings, with one value per column in the schema

Write the footer of the format (if any)

class dataiku.customformat.FormatExtractor(stream)

Reads a stream in a format to a stream of rows

read_schema()

Get the schema of the data in the stream, if the schema can be known upfront.

Returns

the list of columns as [{‘name’:’col1’, ‘type’:’col1type’},…]

read_row()

Read one row from the formatted stream

Returns

a dict of the data (name, value), or None if reading is finished