Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Accurate and Efficient Discovery of Process Models from Event Logs
    Augusto, Adriano ( 2019)
    Everyday, organizations deliver services and products to their customers by enacting their business processes, the quality and efficiency of which directly influence the customer experience. In competitive business environments, achieving a great customer experience is fundamental to be a successful company. For this reason, companies rely on programmatic business process management in order to discover, analyse, improve, automate, and monitor their business processes. One of the core activities of business process management is process discovery. The goal of process discovery is to generate a graphical representation of a business process, namely business process model, which is then used for analysis and optimization purposes. Traditionally, process discovery has been a time-consuming activity performed either by interviewing relevant process stakeholders and employees, or by observing process participants in action, or analysing process reference documentation. However, with the diffusion of information systems, and specialised software in organizational settings, a new form of process discovery is slowly emerging, which goes by the name of automated process discovery. Automated process discovery allows business analysts to exploit process’ execution data (recorded into so-called event logs) to automatically generate process models. Discovering high-quality process models is extremely important to reduce the time spent to enhance them and avoid mistakes during process analysis. The quality of a discovered process model depends on both the input data and the automated process discovery approach (APDA) that is applied. In this thesis, we provide a systematic literature review (SLR) and benchmark of the state-of-the-art APDAs. Our SLR analyses 34 APDAs, while our benchmark evaluates six representative APDAs on more than 20 real-life datasets and seven quality measures. Our SLR and benchmark highlight that existing APDAs are affected by one (or more) of the following three limitations: (i) they achieve limited accuracy; (ii) they are computationally inefficient to be used in practice; (iii) they discover syntactically incorrect process models. To address these limitations, we propose a novel APDA, namely Split Miner, that we assessed through our benchmark. The results of our evaluation show that Split Miner outperforms the state-of-the-art APDAs over multiple quality dimensions. Most of the APDAs we assessed in our benchmark, including Split Miner, require a number of input parameters. The quality of the discovered models depends on how these parameters are tuned. We have found that automated hyper-parameters optimization leads to considerable improvements in the quality of the models produced by an APDA (including Split Miner). The quality improvement APDAs achieve via hyper-parameters optimization comes, however, at the cost of longer execution times and higher computational requirements, due to the inefficiency of existing accuracy measures for APDAs (in particular precision) and the lack of efficient solution-space exploration techniques available for APDAs. This thesis tackles the problem of APDAs optimization in two parts. First, we propose a set of accuracy measures based on Markovian abstractions, and show that our Markovian accuracy measures are faster than existing accuracy measures and fulfil a set of desirable properties that state-of-the-art measures do not. Next, we propose an optimization framework powered by single-solution-based metaheuristics, which employ our Markovian accuracy measures to efficiently explore the solutionspace of APDAs based on directly-follows graphs (DFGs), in order to discover a process model with the highest accuracy. The evaluation of our optimization framework highlights its effectiveness in optimizing DFG-based APDAs, showing that it allows APDAs to explore their solution-space beyond the boundaries of hyper-parameter optimization and most of the times in a faster manner, ultimately discovering more accurate process models in less time, compared to hyper-parameters optimization. In order to foster reproducibility and reuse, all the artifacts designed and developed for this thesis are publicly available as open-source Java command-line applications. Split Miner, our core contribution to the field of automated process discovery, has also been integrated into Apromore, an open-source business process analytics platform used by academics and practitioners worldwide.