Triggered unfold

This processor is used to reassemble several rows when a specific value is encountered.

It is useful for analysis of “interaction sessions” (a series of events with a specific event marking the beginning of a new interaction session). For example, while analyzing the logs of a web game, the “start game” event would be the beginning event.

Warning

Limitations

Triggered unfold offers a a basic session analysis that is very simple to use, but it comes with many limitations.

Triggered unfold assumes that the input data is sorted by time. It only works on “unsplitted” datasets (for example, a single file or a SQL table)

Non-terminated sessions are kept in memory. It is recommended that you do not use Triggered Unfold if you have more than a few thousands sessions

For more advanced sessions analysis, if you have splitted data or a large number of sessions, you should use specific recipes (for example, using SQL or Pig)

For example, let’s imagine this dataset:

user_id

event_type

timestamp

user_id1

login_event

t1

user_id2

login_event

t2

user_id1

event_type2

t3

user_id2

event_type2

t4

user_id1

login_Event

t5

user_id2

event_type3

t6

user_id2

login_event

t7

We know that “login_event” marks the beginning of a new session / new interaction, and we want to track the timestamps of other event types in each session.

We apply a “Triggered unfold” with the following parameters:

  • Column acting as event key: user_id

  • Fold column: event_type

  • Trigger value: login_event

  • Column with data: timestamp

We generate the following result:

user_id

login_event

event_type2

event_type3

login_event_prev

user_id1

t1

t3

user_id2

t2

t4

t6

user_id1

t5

t1

user_id2

t7

t2

We get:

  • One column for each type of event

  • One line for each occurence of “login_event” in the stream

  • The user_id associated to each login_event is kept, and the timestamps of events are restored

  • The “_prev” column tracks the data associated to the previous occurence of “login_event” for each user_id.

For more details on reshaping, please see Reshaping.