API for custom formats

class dataiku.customformat.Formatter(config, plugin_config)

Custom formatter

get_format_extractor(stream, schema=None)

Return a FormatExtractor for this format

Parameters:
  • stream – the stream to read the formatted data from
  • schema – the schema of the rows that will be extracted. None when the extractor is used to detect the format.
get_output_formatter(stream, schema)

Return a OutputFormatter for this format

Parameters:
  • stream – the stream to write the formatted data to
  • schema – the schema of the rows that will be formatted (never None)
class dataiku.customformat.OutputFormatter(stream)

Writes a stream of rows to a stream in a format. The calls will be:

  • write_header()
  • write_row(row_1) ...
  • write_row(row_N)
  • write_footer()

Write the footer of the format (if any)

write_header()

Write the header of the format (if any)

write_row(row)

Write a row in the format

Parameters:row – array of strings, with one value per column in the schema
class dataiku.customformat.FormatExtractor(stream)

Reads a stream in a format to a stream of rows

read_row()

Read one row from the formatted stream

Returns:a dict of the data (name, value), or None if reading is finished
read_schema()

Get the schema of the data in the stream, if the schema can be known upfront.

Returns:the list of columns as [{‘name’:’col1’, ‘type’:’col1type’},...]