Extract with regular expression

This processor extracts parts from a column using a regular expression The chunks to extract are delimited using regular expression captures

Unnamed captures

With simple (unnamed) captures, the matches are put in numbered columns starting with the output column prefix. Unnamed capture groups use the (pattern) syntax.

Example:

  • Cell value : id-37-X234
  • Pattern: id-([0-9]*)-([0-9A-Z]*)
  • Output column prefix:extracted_
  • Result : extracted_1=37  extracted_2=X234

Named captures

With named captures, the matches are put in columns starting with the output column prefix and the group name. Named capture groups use the (?<groupname>pattern) syntax.

Example:

  • Cell value : id-37-X234
  • Pattern: id-(?<numidentifier>[0-9]*)-(?<identifier2>[0-9A-Z]*)
  • Output column prefix:extracted_
  • Result : extracted_numidentifier=37  extracted_identifier2=X234

Found column

If you enable this option, a column named ‘prefix found’ will contain a boolean to indicate whether the pattern matched

Notes

  • Regular expressions are not anchored: ([0-9]*) will capture 232 in val-232