Beamline Framework
Beamline is a Java framework designed to facilitate the prototyping and the development of streaming process mining algorithms.
The framework is designed on top of Apache Flink which makes it suitable for extremely efficient computation due to the distributed and stateful nature of its components. The Beamline consists of both algorithms as well as data structures, sources, and sinks to facilitate the development of process mining applications. While redefining the concept of event, Beamline tries to maintain compatibility with OpenXES and the IEEE XES standard.
Streaming process mining
Process mining is a well establish discipline, aiming at bridging data science and process science together, with the ultimate goal of improving processes and their corresponding executions.
Classical process mining techniques take as input so-called event log files: static files containing executions to be analyzed. These event log files are typically structured as XML files according to the IEEE XES standard. These files contain events referring to a fixed period of time and, therefore, the results of the process mining analyses refer to the same time frame.
In streaming process mining, the input is not a static file, but an event stream. As in event stream processing, in streaming process mining the goal is to analyze data immediately and update the analysis immediately.
The picture below refers to the control-flow discovery case but, obviously, the same principle applies when conformance checking or enhancement algorithms are considered.
Beamline
Beamline is a Java framework meant to simplify the research and the development of streaming process mining, by providing a set of tools that can lift researchers from the burden of setting up streams and running experiments.
On the name Beamline
The term Beamline is borrowed from high energy physics, where it indicates the physical structure used to define experiments, i.e., where the accelerated particles travel. In the streaming process mining case, Beamline is used to set up experiments where process mining events are processed and consumed.
Beamline comprises utility classes as well as some algorithms already implemented that can be used for comparing new techniques with the state of the art.
Citation
Please, cite this work as:
- Andrea Burattin. "Beamline: A comprehensive toolkit for research and development of streaming process mining". In Software Impacts, vol. 17 (2023).
BibTeX for citation
@article{BURATTIN2023100551,
title = {Beamline: A comprehensive toolkit for research and development of streaming process mining},
journal = {Software Impacts},
volume = {17},
pages = {100551},
year = {2023},
issn = {2665-9638},
doi = {https://doi.org/10.1016/j.simpa.2023.100551},
url = {https://www.sciencedirect.com/science/article/pii/S266596382300088X},
author = {Andrea Burattin},
keywords = {Process mining, Streaming process mining, Apache Flink, Reactive programming},
abstract = {Beamline is a software library to support the research and development of streaming process mining algorithms. Specifically, it comprises a Java library, built on top of Apache Flink, which fosters high performance and deployment. The second component is a Python library (called pyBeamline, built using ReactiveX) which allows the quick prototyping and development of new streaming process mining algorithms. The two libraries share the same underlying data structures (BEvent) as well as the same fundamental principles, thus making the prototypes (built by researchers using pyBeamline) quickly transferrable to full-fledged and highly scalable applications (using Java Beamline).}
}