Process Mining
Process Mining
Over the last decade, process mining emerged as a new research field that focuses on the analysis of pro-
cesses using event data. Classical data mining techniques such as classification, clustering, regression,
association rule learning, and sequence/episode mining do not focus on business process models and are of-
ten only used to analyze a specific step in the overall process. Process mining focuses on end-to-end processes
and is possible because of the growing availability of event data and new process discovery and conformance
checking techniques.
Process models are used for analysis (e.g., simulation and verification) and enactment by BPM/WFM
systems. Previously, process models were typically made by hand without using event data. However,
activities executed by people, machines, and software leave trails in so-called event logs. Process mining
techniques use such logs to discover, analyze, and improve business processes.
Recently, the Task Force on Process Mining released the Process Mining Manifesto. This manifesto is
supported by 53 organizations and 77 process mining experts contributed to it. The active involvement
of end-users, tool vendors, consultants, analysts, and researchers illustrates the growing significance of
process mining as a bridge between data mining and business process modeling. The practical relevance
of process mining and the interesting scientific challenges make process mining one of the “hot” topics in
Business Process Management (BPM). This article introduces process mining as a new research field and
summarizes the guiding principles and challenges described in the manifesto.
Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications—Data
Mining
General Terms: Management, Measurement, Performance
Additional Key Words and Phrases: Process mining, business intelligence, business process management,
data mining
ACM Reference Format:
van der Aalst, W. 2012. Process mining: Overview and opportunities. ACM Trans. Manage. Inf. Syst. 3, 2,
Article 7 (July 2012), 17 pages.
DOI = 10.1145/2229156.2229157 http://doi.acm.org/10.1145/2229156.2229157
1. INTRODUCTION
Process mining aims to discover, monitor, and improve real processes by extracting
knowledge from event logs readily available in today’s information systems [van der
Aalst 2011]. Over the last decade there has been a spectacular growth of event data
and process mining techniques have matured significantly. As a result, management
trends related to process improvement and compliance can now benefit from process
mining.
Starting point for process mining is an event log. Each event in such a log refers to
an activity (i.e., a well-defined step in some process) and is related to a particular case
Author’s address: W. van der Aalst, Department of Mathematics and Computer Science, Eindhoven Univer-
sity of Technology, PO Box 513, 5600 MB, Eindhoven, Netherlands; email: [email protected].
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights
for components of this work owned by others than ACM must be honored. Abstracting with credit is permit-
ted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of
this work in other works requires prior specific permission and/or a fee. Permissions may be requested from
the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701, USA, fax +1 (212)
869-0481, or [email protected].
c 2012 ACM 2158-656X/2012/07-ART7 $10.00
DOI 10.1145/2229156.2229157 http://doi.acm.org/10.1145/2229156.2229157
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
7:2 W. van der Aalst
Fig. 1. The three basic types of process mining: (a) discovery, (b) conformance, and (c) enhancement.
(i.e., a process instance). The events belonging to a case are ordered and can be seen
as one “run” of the process. Event logs may store additional information about events.
In fact, whenever possible, process mining techniques use extra information such as
the resource (i.e., person or device) executing or initiating the activity, the timestamp
of the event, or data elements recorded with the event (e.g., the size of an order).
Event logs can be used to conduct three types of process mining, as shown in
Figure 1 [van der Aalst 2011]. The first type of process mining is discovery. A discov-
ery technique takes an event log and produces a model without using any a priori
information. Process discovery is the most prominent process mining technique. For
many organizations it is surprising to see that existing techniques are indeed able
to discover real processes merely based on example behaviors stored in event logs.
The second type of process mining is conformance. Here, an existing process model is
compared with an event log of the same process. Conformance checking can be used
to check if reality, as recorded in the log, conforms to the model and vice versa. The
third type of process mining is enhancement. Here, the idea is to extend or improve an
existing process model thereby using information about the actual process recorded
in some event log. Whereas conformance checking measures the alignment between
model and reality, this third type of process mining aims at changing or extending the
a priori model. For instance, by using timestamps in the event log one can extend the
model to show bottlenecks, service levels, and throughput times.
Unlike traditional Business Process Management (BPM) techniques that use hand-
made models [Weske 2007], process mining is based on facts. Based on observed be-
havior recorded in event logs, intelligent techniques are used to extract knowledge.
Therefore, we claim that process mining enables evidence-based BPM. Unlike existing
analysis approaches, process mining is process-centric (and not data-centric), truly in-
telligent (learning from historic data), and fact-based (based on event data rather than
opinions).
Process mining is related to data mining. Whereas classical data mining techniques
are mostly data-centric [Hand et al. 2001], process mining is process-centric. Main-
stream business process modeling techniques use notations such as the Business
Process Modeling Notation (BPMN), UML activity diagrams, Event-driven Process
Chains (EPC), and various types of Petri nets [Desel and Reisig 1998; van der Aalst
and Stahl 2011; Weske 2007]. These notations can be used model process processes
with concurrency, choice, iteration, etc.
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
Process Mining: Overview and Opportunities 7:3
This article not only introduces process mining as a new research field, but also
familiarizes the reader with the Process Mining Manifesto [TFPM 2011] released by
the Task Force on Process Mining in October 2011. The growing interest in log-based
process analysis motivated the establishment of a Task Force on Process Mining in
2009. This manifesto aims to promote the topic of process mining. Moreover, by defin-
ing a set of guiding principles and listing important challenges, this manifesto hopes
to serve as a guide for software developers, scientists, consultants, business managers,
and end-users. The goal is to increase the maturity of process mining as a new tool to
improve the (re)design, control, and support of operational business processes.
The remainder of this article is organized as follows. Section 2 introduces the notion
of an event log, used as input for process mining. Section 3 shows how process models
can be discovered from scratch using only raw event data. Section 4 discusses the
second type of process mining: conformance checking. Section 5 elaborates on the
third type of process mining: enhancement. The guiding principles and challenges
listed in the manifesto are summarized in Section 6. Section 7 discusses tool support
and shows some real-life examples. Section 8 concludes the article.
3. DISCOVERY
This section introduces the notion of process discovery, that is, automatically construct
models based on observed events.
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
7:4 W. van der Aalst
Fig. 2. One event log and two potential process models (M1 and M2 ) aiming to describe the observed
behavior.
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
Process Mining: Overview and Opportunities 7:5
The state of a Petri net, also referred to as marking, is defined by the distribution of
tokens over places. A transition is enabled if each of its input places contains a token.
For example, in M1 , transition a is enabled in the initial marking of M1 , because the
only input place of a contains a token (black dot).
An enabled transition may fire thereby consuming a token from each of its input
places and producing a token for each of its output places. Firing a in the initial mark-
ing corresponds to removing one token from start and producing two tokens (one for
p1 and one for p2). After firing a, three transitions are enabled: b , c, and d. There is a
non-deterministic choice between b and d. Firing b will disable c because the token is
removed from the shared input place (and vice versa). Transition d is concurrent with
b and c, that is, it can fire without disabling another transition. Transition e becomes
enabled after d and b or c have occurred. Note that transition e in M1 is only enabled
if both input places ( p3 and p4) contain a token. After executing e, three transitions
become enabled: f , g, and h. These transitions are competing for the same token thus
modeling a choice. When g or h is fired, the process ends with a token in place end. If
f is fired, the process returns to the state just after executing a.
It is easy to check that all traces in the event log can be reproduced by M1 . This
does not hold for the second process model in Figure 2. M2 is able to reproduce traces
such as acdeh (455 instances), abdeg (191 instances), and acdefbdeh (33 instances).
Note that M2 has two transitions corresponding to activity f . To refer to them they
are named f1 and f2 . M2 also allows for behavior very different from what can be
observed in the log, for instance, abeg and abdddddf1 bddddeh are possible according
to the model but do not appear in the log. There are also traces in the log that can-
not be replayed by M2 , for instance, adceh (177 instances), adceg (82 instances), and
adcefcdeh (9 instances) are not possible according to M2 .
The two process models in Figure 2 are visualized in terms of Petri nets. In fact,
both models are so-called WF-nets [van der Aalst et al. 2011a]. A WF-net is a Petri net
with one source place and one sink place such that all places and transitions are on
a path from source to sink. Both models in Figure 2 have a source place named start
and a sink place end and all nodes are on a path from start to end.
In general, the notation used to visualize the result may be very different from
the representation used during the actual discovery process. All mainstream BPM
notations (Petri nets, EPCs, BPMN, YAWL, UML activity diagrams, etc.) can be used
to show discovered processes such as M1 [van der Aalst 2011; Weske 2007].
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
7:6 W. van der Aalst
dependency, the corresponding Petri net should have a place connecting a to b . We use
the notation, a > b if and only if there is a trace σ = t1 , t2 , t3 , . . . tn in the log and an
i ∈ {1, . . . , n − 1} such that ti = a and ti+1 = b . a → b if and only if a > b and b > a;
a#b if and only if a > b and b > a; and ab if and only if a > b and b > a. These
four ordering relations are used to create places connecting the different transitions in
the Petri net. The α-algorithm is simple and efficient, but has problems dealing with
complicated routing constructs and noise (like most of the other approaches described
in literature).
Region-based approaches are able to express more complex control-flow structures
without underfitting. State-based regions were introduced in 1989 [Ehrenfeucht and
Rozenberg 1989] and generalized in various ways [Cortadella et al. 1998]. In van der
Aalst et al. [2010], van Dongen et al. [2007], Sole and Carmona [2010] it is shown how
these state-based regions can be applied to process mining. In parallel, several authors
applied language-based regions to process mining [Bergenthum et al. 2007; Werf et al.
2010]. The basic idea of these approaches is to discover places. Note that the addition
of places limits the behavior of the Petri net. The idea is to add places that do not
exclude any of the behavior seen in the event log.
For practical applications of process discovery it is essential that noise and in-
completeness are handled well. Surprisingly, only few discovery algorithms focus on
addressing these issues. Notable exceptions are heuristic mining [Weijters and van
der Aalst 2003], fuzzy mining [Günther and van der Aalst 2007], and genetic process
mining [Medeiros et al. 2007].
ProM’s heuristic miner uses the algorithm described in Weijters and van der Aalst
[2003] (see also Section 6.2 in van der Aalst [2011]). The algorithm first builds a
dependency graph based on the frequencies of activities and the number of times one
activity is followed by another activity. Based on predefined thresholds, dependencies
are added to the dependency graph graph (or not). The dependency graph reveals the
“backbone” of the process model. This backbone is used to discover the detailed split
and join behavior of nodes. If an activity has multiple input arcs, then the heuristic
miner analyzes the log to see whether the join is an AND-join, an XOR-join or an
OR-join. In case of an OR-join, the detailed synchronization behavior is learned. If
an activity has multiple output arcs, then the “split behavior” is learned in a similar
fashion.
See Chapter 6 of van der Aalst [2011] for a more elaborate introduction to the vari-
ous process discovery approaches described in literature.
4. CONFORMANCE
In recent years, powerful process mining techniques have been developed that can
automatically construct a suitable process model given an event log. Whereas process
discovery constructs a model without any a priori information (other than the event
log), conformance checking uses a model and an event log as input. The model may
have been made by hand or discovered through process discovery. For conformance
checking, the modeled behavior and the observed behavior (i.e., event log) are
compared.
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
Process Mining: Overview and Opportunities 7:7
skipped although the model does not allow for this”). Conformance checking can be
used
— to check the quality of documented processes (asses whether they describe reality
accurately),
— to identify deviating cases and understand what they have in common,
— to identify process fragments where most deviations occur,
— for auditing purposes,
— to judge the quality of a discovered process model,
— to guide evolutionary process discovery algorithms (e.g., genetic algorithms need
to continuously evaluate the quality of newly created models using conformance
checking), and
— as a starting point for model enhancement.
This list shows that conformance checking can be used for a variety of reasons ranging
from evaluating a process discovery algorithm to auditing and compliance monitoring.
Note that auditors need to validate information about organizations by determining
whether they execute business processes within certain boundaries set by managers,
governments, and other stakeholders. Clearly, event logs provide valuable input for
this.
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
7:8 W. van der Aalst
5. ENHANCEMENT
It is also possible to extend or improve an existing process model using the event log. A
non-fitting process model can be corrected using the diagnostics provided by the align-
ment of model and log. Moreover, event logs may contain information about resources,
timestamps, and case data. For example, an event referring to activity “register re-
quest” and case “992564” may also have attributes describing the person that regis-
tered the request (e.g., “John”), the time of the event (e.g., “30-01-2012:14.55”), the age
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
Process Mining: Overview and Opportunities 7:9
of the customer (e.g., “45”), and the claimed amount (e.g., “650 euro”). After aligning
model and log it is possible to replay the event log on the model. While replaying one
can analyze these additional attributes.
For example, it is possible to analyze waiting times in-between activities. Simply
measure the time difference between causally related events and compute basic statis-
tics such as averages, variances, and confidence intervals. This way it is possible to
identify the main bottlenecks [van der Aalst 2011].
Information about resources can be used to discover roles, that is, groups of people
frequently executing related activities. Here, standard clustering techniques can be
used. It is also possible to construct social networks based on the flow of work and
analyze resource performance (e.g., the relation between workload and service times).
See Song and van der Aalst [2008] for an overview of various process mining tech-
niques analyzing the organizational perspective based on event logs.
Standard classification techniques can be used to analyze the decision points in the
process model [Rozinat and van der Aalst 2006]. For example, activity e (“decide”) has
three possible outcomes (“pay,” “reject,” and “redo”). Using the data known about the
case prior to the decision, we can construct a decision tree explaining the observed
behavior.
Process mining is not restricted to offline analysis and can also be used for predic-
tions and recommendations at runtime. For example, the completion time of a partially
handled customer order can be predicted using a discovered process model with timing
information [van der Aalst et al. 2011b].
6. PROCESS MINING MANIFESTO
The IEEE Task Force on Process Mining recently released a manifesto describing guid-
ing principles and challenges [TFPM 2011]. The manifesto aims to increase the visi-
bility of process mining as a new tool to improve the (re)design, control, and support of
operational business processes. It is intended to guide software developers, scientists,
consultants, and end-users. Before summarizing the manifesto, we briefly introduce
the task force.
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
7:10 W. van der Aalst
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
Process Mining: Overview and Opportunities 7:11
relation between events in the log and elements in the model serves as a starting
point for different types of analysis.
6.3 Challenges
Process mining is an important tool for modern organizations that need to manage
nontrivial operational processes. On the one hand, there is an incredible growth of
event data. On the other hand, processes and information need to be aligned perfectly
in order to meet requirements related to compliance, efficiency, and customer service.
Despite the applicability of process mining there are still important challenges that
need to be addressed; these illustrate that process mining is an emerging discipline.
Table II lists the eleven challenges described in the manifesto [TFPM 2011].
As an example consider Challenge C4: “Dealing with Concept Drift.” The term con-
cept drift refers to the situation in which the process is changing while being analyzed
[Bose et al. 2011]. For instance, in the beginning of the event log two activities may be
concurrent whereas later in the log these activities become sequential. Processes may
change due to periodic/seasonal changes (e.g., “in December there is more demand” or
“on Friday afternoon there are fewer employees available”) or due to changing condi-
tions (e.g., “the market is getting more competitive”). Such changes impact processes
and it is vital to detect and analyze them [Bose et al. 2011].
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
7:12 W. van der Aalst
Table II. Some of the Most Important Process Mining Challenges Identified in the Manifesto
Finding, Merging, and Cleaning Event Data
C1 When extracting event data suitable for process mining several challenges need to be ad-
dressed: data may be distributed over a variety of sources, event data may be incomplete, an
event log may contain outliers, logs may contain events at different level of granularity, etc.
Dealing with Complex Event Logs Having Diverse Characteristics
C2 Event logs may have very different characteristics. Some event logs may be extremely large
making them difficult to handle whereas other event logs are so small that not enough data
is available to make reliable conclusions.
Creating Representative Benchmarks
C3 Good benchmarks consisting of example data sets and representative quality criteria are
needed to compare and improve the various tools and algorithms.
Dealing with Concept Drift
C4 The process may be changing while being analyzed. Understanding such concept drifts is of
prime importance for the management of processes.
Improving the Representational Bias Used for Process Discovery
C5 A more careful and refined selection of the representational bias is needed to ensure high-
quality process mining results.
Balancing Between Quality Criteria such as Fitness, Simplicity, Precision, and
Generalization
C6 There are four competing quality dimensions: (a) fitness, (b) simplicity, (c) precision, and (d)
generalization. The challenge is to find models that score good in all four dimensions.
Cross-Organizational Mining
C7 There are various use cases where event logs of multiple organizations are available for anal-
ysis. Some organizations work together to handle process instances (e.g., supply chain part-
ners) or organizations are executing essentially the same process while sharing experiences,
knowledge, or a common infrastructure. However, traditional process mining techniques
typically consider one event log in one organization.
Providing Operational Support
C8 Process mining is not restricted to off-line analysis and can also be used for online opera-
tional support. Three operational support activities can be identified: detect, predict, and
recommend.
Combining Process Mining with Other Types of Analysis
C9 The challenge is to combine automated process mining techniques with other analysis ap-
proaches (optimization techniques, data mining, simulation, visual analytics, etc.) to extract
more insights from event data.
Improving Usability for Nonexperts
C10 The challenge is to hide the sophisticated process mining algorithms behind user-friendly
interfaces that automatically set parameters and suggest suitable types of analysis.
Improving Understandability for Nonexperts
C11 The user may have problems understanding the output or is tempted to infer incorrect conclu-
sions. To avoid such problems, the results should be presented using a suitable representation
and the trustworthiness of the results should always be clearly indicated.
None of the commercial software products provides comprehensive support for con-
formance checking, that is, the focus is on process discovery and performance measure-
ment. However, ProM supports the different types of conformance checking described
in Section 4.3.
Some of these products embed process mining functionality in a larger system, for
instance, Pallas Athena embeds process mining in their BPM suite BPM|one. Other
products aim at simplifying process mining using an intuitive user interface.
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
Process Mining: Overview and Opportunities 7:13
Fig. 3. Spaghetti process describing the diagnosis and treatment of 2765 patients in a Dutch hospital. The
process model was constructed based on an event log containing 114,592 events. There are 619 different
activities (taking event types into account) executed by 266 different individuals (doctors, nurses, etc.).
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
7:14 W. van der Aalst
Fig. 4. WF-net discovered based on an event log of a Dutch municipality. The log contains events related to
745 objections against the so-called WOZ valuation. These 745 objections generated 9583 events. There are
13 activities. For 12 of these activities both start and complete events are recorded. Hence, the WF-net has
25 transitions.
Fig. 5. Fragment of the WF-net annotated with diagnostics generated by ProM’s conformance checker. The
WF-net and event log fit well (fitness is 0.98876214). Nevertheless, several low-frequent deviations are
discovered. For example. activity “OZ12 Hertaxeren” (re-evaluation of WOZ value) is started 23 times
without being enabled according to the model.
process. Nevertheless, it was interesting for the municipality to see the deviations
highlighted in the model. Figure 5 shows a fragment of the diagnostics provided by
the ProM’s conformance checker.
The municipality’s log contains timestamps. Therefore, it is possible to replay the
event log while taking the timestamps into account. ProM can visualize the phases of
the process that take most time. For example, the place in-between “OZ16 Uitspraak
start” (start of announcement of final judgment) and “OZ16 Uitspraak complete” (end
of announcement of final judgment) was visited 436 times. The average time spent in
this place is 7.84 days. This indicates that activity “OZ16 Uitspraak” (final judgment)
takes about a week. It is also possible to simply select two activities and measure the
time that passes in-between these activities. On average 202.73 days pass in-between
the completion of activity “OZ02 Voorbereiden” (preparation) and the completion of
“OZ16 Uitspraak” (final judgment). Such examples illustrate that process mining,
unlike classical Business Intelligence (BI) tools, helps organizations to “look inside”
their processes. This is in stark contrast with contemporary BI tools that typically
focus on reporting and fancy looking dashboards.
8. CONCLUSION
This article introduced process mining as a new technology enabling evidence-based
process analysis. We introduced the three basic types of process mining (discovery,
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
Process Mining: Overview and Opportunities 7:15
conformance, and enhancement) using a small example and used some larger
examples to illustrate the applicability in real-life settings. Nevertheless, there
are still many open scientific challenges and most end-user organizations are not
yet aware of the potential of process mining. This triggered the development of
the Process Mining Manifesto by an international task force involving 77 process
mining experts representing 53 organizations. This manifesto can be obtained from
http://www.win.tue.nl/ieeetfpm/. The reader interested in process mining is
also referred to the recent book on process mining [van der Aalst 2011]. Also visit
www.processmining.org for sample logs, videos, slides, articles, and software.
ACKNOWLEDGMENTS
The author would like to thank all who contributed to the Process Mining Manifesto: Arya Adriansyah,
Ana Karla Alves de Medeiros, Franco Arcieri, Thomas Baier, Tobias Blickle, Jagadeesh Chandra Bose,
Peter van den Brand, Ronald Brandtjen, Joos Buijs, Andrea Burattin, Josep Carmona, Malu Castellanos,
Jan Claes, Jonathan Cook, Nicola Costantini, Francisco Curbera, Ernesto Damiani, Massimiliano de Leoni,
Pavlos Delias, Boudewijn van Dongen, Marlon Dumas, Schahram Dustdar, Dirk Fahland, Diogo R. Ferreira,
Walid Gaaloul , Frank van Geffen, Sukriti Goel, Christian Günther, Antonella Guzzo, Paul Harmon, Arthur
ter Hofstede, John Hoogland, Jon Espen Ingvaldsen, Koki Kato, Rudolf Kuhn, Akhil Kumar, Marcello
La Rosa, Fabrizio Maggi, Donato Malerba, Ronny Mans, Alberto Manuel, Martin McCreesh, Paola Mello,
Jan Mendling, Marco Montali, Hamid Motahari Nezhad, Michael zur Muehlen, Jorge Munoz-Gama, Luigi
Pontieri, Joel Ribeiro, Anne Rozinat, Hugo Seguel Pérez, Ricardo Seguel Pérez, Marcos Sepúlveda, Jim
Sinur, Pnina Soffer, Minseok Song, Alessandro Sperduti, Giovanni Stilo, Casper Stoel, Keith Swenson,
Maurizio Talamo, Wei Tan, Chris Turner, Jan Vanthienen, George Varvaressos, Eric Verbeek, Marc
Verdonk, Roberto Vigo, Jianmin Wang, Barbara Weber, Matthias Weidlich, Ton Weijters, Lijie Wen, Michael
Westergaard, and Moe Wynn.
REFERENCES
A DRIANSYAH , A., VAN D ONGEN, B., AND VAN DER A ALST, W. 2011. Conformance checking using cost-based
fitness analysis. In Proceedings of the IEEE International Enterprise Computing Conference (EDOC’11).
C. Chi and P. Johnson Eds., IEEE Computer Society, 55–64.
A GRAWAL , R., G UNOPULOS, D., AND L EYMANN, F. 1998. Mining process models from workflow logs. In
Proceedings of the 6th International Conference on Extending Database Technology. Lecture Notes in
Computer Science, vol. 1377, Springer-Verlag, Berlin, 469–483.
B ERGENTHUM , R., D ESEL , J., L ORENZ , R., AND M AUSER , S. 2007. Process mining based on regions of
languages. In Proceedings of the International Conference on Business Process Management (BPM’07).
G. Alonso, P. Dadam, and M. Rosemann Eds., Lecture Notes in Computer Science, vol. 4714, Springer,
375–383.
B OSE , R. P. J. C., VAN DER A ALST, W., Z LIOBAITE , I., AND P ECHENIZKIY, M. 2011. Handling concept drift
in process mining. In Proceedings of the International Conference on Advanced Information Systems
Engineering (CAISE’11). H. Mouratidis and C. Rolland Eds., Lecture Notes in Computer Science, vol.
6741, Springer, 391–405.
C OOK , J. AND W OLF, A. 1998. Discovering models of software processes from event-based data. ACM Trans.
Softw. Engin. Method. 7, 3, 215–249.
C ORTADELLA , J., K ISHINEVSKY, M., L AVAGNO, L., AND YAKOVLEV, A. 1998. Deriving Petri nets from finite
transition systems. IEEE Trans. Comput. 47, 8, 859–882.
D ATTA , A. 1998. Automating the discovery of as-is business process models: Probabilistic and algorithmic
approaches. Inf. Syst. Resear. 9, 3, 275–301.
D ESEL , J. AND R EISIG, W. 1998. Place/Transition Nets. In Lectures on Petri Nets I: Basic Models, W. Reisig
and G. Rozenberg Eds., Lecture Notes in Computer Science, vol. 1491, Springer-Verlag, Berlin, 122–173.
E HRENFEUCHT, A. AND R OZENBERG, G. 1989. Partial (set) 2-structures: Parts 1 Part 2. Acta Informatica
27, 4, 315–368.
G RECO, G., G UZZO, A., P ONTIERI , L., AND S ACC À , D. 2006. Discovering expressive process models by
clustering log traces. IEEE Trans. Knowl. Data Engin. 18, 8, 1010–1027.
G ÜNTHER , C. AND VAN DER A ALST, W. 2007. Fuzzy mining: Adaptive process simplification based on multi-
perspective metrics. In Proceedings of the International Conference on Business Process Management
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
7:16 W. van der Aalst
(BPM’07). G. Alonso, P. Dadam, and M. Rosemann Eds., Lecture Notes in Computer Science, vol. 4714,
Springer-Verlag, Berlin, 328–343.
H AND, D., M ANNILA , H., AND S MYTH , P. 2001. Principles of Data Mining. MIT Press, Cambridge, MA.
H ERBST, J. 2000. A machine learning approach to workflow management. In Proceedings of the 11th
European Conference on Machine Learning. Lecture Notes in Computer Science, vol. 1810, Springer,
183–194.
M ANYIKA , J., C HUI , M., B ROWN, B., B UGHIN, J., D OBBS, R., R OXBURGH , C., AND B YERS, A. 2011. Big
Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.
M EDEIROS, A., W EIJTERS, A., AND VAN DER A ALST, W. 2007. Genetic process mining: An experimental
evaluation. Data Mining Knowl. Discov. 14, 2, 245–304.
M UNOZ -G AMA , J. AND C ARMONA , J. 2011. Enhancing precision in process conformance: Stability, confi-
dence and severity. In Proceedings of the IEEE Symposium on Computational Intelligence and Data
Mining (CIDM’11). N. Chawla, I. King, and A. Sperduti Eds., IEEE.
R OZINAT, A. AND VAN DER A ALST, W. 2006. Decision mining in ProM. In Proceedings of the International
Conference on Business Process Management (BPM’06). S. Dustdar, J. Fiadeiro, and A. Sheth Eds., Lec-
ture Notes in Computer Science, vol. 4102, Springer, 420–425.
R OZINAT, A. AND VAN DER A ALST, W. 2008. Conformance checking of processes based on monitoring real
behavior. Inf. Syst. 33, 1, 64–95.
S OLE , M. AND C ARMONA , J. 2010. Process mining from a basis of regions. In Applications and Theory of
Petri Nets 2010. J. Lilius and W. Penczek Eds., Lecture Notes in Computer Science, vol. 6128, Springer
226–245.
S ONG, M. AND VAN DER A ALST, W. 2008. Towards comprehensive support for organizational mining. Dec.
Support Syst. 46, 1, 300–317.
TFPM – IEEE T ASK F ORCE O N P ROCESS M INING. 2011. Process mining manifesto. In Proceedings of the
BPM Workshops. Lecture Notes in Business Information Processing Series, vol. 99, Springer.
VAN DER A ALST, W. 2011. Process Mining: Discovery, Conformance and Enhancement of Business Processes.
Springer.
VAN DER A ALST, W. AND S TAHL , C. 2011. Modeling Business Processes: A Petri Net Oriented Approach. MIT
Press, Cambridge, MA.
VAN DER A ALST, W., VAN D ONGEN, B., H ERBST, J., M ARUSTER , L., S CHIMM , G., AND W EIJTERS, A. 2003.
Workflow mining: A survey of issues and approaches. Data Knowl. Engin. 47, 2, 237–267.
VAN DER A ALST, W., W EIJTERS, A., AND M ARUSTER , L. 2004. Workflow mining: Discovering process mod-
els from event logs. IEEE Trans. Knowl. Data Engin. 16, 9, 1128–1142.
VAN DER A ALST, W., R EIJERS, H., W EIJTERS, A., VAN D ONGEN, B., M EDEIROS, A., S ONG, M., AND
V ERBEEK , H. 2007. Business process mining: An industrial application. Inf. Syst. 32, 5, 713–732.
VAN DER A ALST, W., R UBIN, V., V ERBEEK , H., VAN D ONGEN, B., K INDLER , E., AND G ÜNTHER , C. 2010.
Process mining: A two-step approach to balance between underfitting and overfitting. Softw. Syst.
Model. 9, 1, 87–111.
VAN DER A ALST, W., VAN H EE , K., H OFSTEDE , A., S IDOROVA , N., V ERBEEK , H., V OORHOEVE , M., AND
W YNN, M. 2011a. Soundness of workflow nets: Classification, decidability, and analysis. Formal Asp.
Comput. 23, 3, 333–363.
VAN DER A ALST, W., S CHONENBERG, M., AND S ONG, M. 2011b. Time prediction based on process mining.
Inf. Syst. 36, 2, 450–475.
VAN DER A ALST, W., A DRIANSYAH , A., AND VAN D ONGEN, B. 2012. Replaying history on process models
for conformance checking and performance analysis. WIREs Data Mining Knowl. Discov. 2, 2, 182–192.
VAN D ONGEN, B. AND VAN DER A ALST, W. 2004. Multi-phase process mining: Building instance graphs. In
Proceedings of the International Conference on Conceptual Modeling (ER’04). P. Atzeni, W. Chu, H. Lu,
S. Zhou, and T. Ling Eds., Lecture Notes in Computer Science, vol. 3288, Springer, 362–376.
VAN D ONGEN, B. AND VAN DER A ALST, W. 2005. Multi-phase mining: Aggregating instances graphs into
EPCs and Petri nets. In Proceedings of the 2nd International Workshop on Applications of Petri Nets to
Coordination, Workflow and Business Process Management. D. Marinescu Ed., 35–58.
VAN D ONGEN, B., B USI , N., P INNA , G., AND VAN DER A ALST, W. 2007. An iterative algorithm for applying
the theory of regions in process mining. In Proceedings of the Workshop on Formal Approaches to Busi-
ness Processes and Web Services (FABPWS’07). W. Reisig, K. Hee, and K. Wolf Eds., Publishing House
of University of Podlasie, Siedlce, Poland, 36–55.
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.
Process Mining: Overview and Opportunities 7:17
V ERBEEK , H., B UIJS, J., VAN D ONGEN, B., AND VAN DER A ALST, W. 2010. ProM 6: The process mining
toolkit. In Proceedings of BPM Demonstration Track 2010. M. L. Rosa Ed., CEUR Workshop Proceedings
Series, vol. 615, 34–39.
W EIJTERS, A. AND VAN DER A ALST, W. 2003. Rediscovering workflow models from event-based data using
Little Thumb. Integr. Comput.-Aid. Engin. 10, 2, 151–162.
W ERF, J., D ONGEN, B. VAN, H URKENS, C., AND S EREBRENIK , A. 2010. Process discovery using integer
linear programming. Fundamenta Informaticae 94, 387–412.
W ESKE , M. 2007. Business Process Management: Concepts, Languages, Architectures. Springer.
ACM Transactions on Management Information Systems, Vol. 3, No. 2, Article 7, Publication date: July 2012.