Keywords

figure a
figure b

1 Introduction

The servitization of business describes a shift towards offering products as services [44]. This shift makes companies more dependent on user satisfaction; e.g., it has become much easier to change service providers. Investment in user satisfaction pays off [17], which raises the following question: How can we formally model and analyze the way users experience their interaction with a service?

User journeys model services from the users’ perspective [41]. They describe how users employ a service to achieve a goal. User journeys may include many paths, capturing different sequences of actions between a service and its users. These models enable the analysis of user experience along different (intended or unintended) paths through a service. Although most user journeys today are created manually by domain experts and the associated user experience is captured through interviews [22, 41], the method has been successful at providing feedback to improve services. However, tool support for the modeling and analysis of user journeys is sparse [23], which makes the method difficult to apply in complex domains and to services with numerous and diverse users.

A recent line of work aims to automatically mine user journeys and analyze them using formal methods [26, 28, 30, 31]. This significantly reduces the manual effort needed to create models and enables a different scale of complexity in the analyzed services and number of users. Starting from event logs, which are widely available for software services, process mining [1] and automata learning [18] can automatically generate behavioral models of user journeys from these logs, such as finite state automata. These can then be analyzed by model checking [4].

Fig. 1.
figure 1

Steps to create Sankey diagrams from the event logs of the case studies.

This paper goes beyond previous work by modeling user journeys as stochastic games [11]. We exploit the underlying distribution of events in the event log, which was ignored in previous work. Stochastic games allow complex user behavior to be captured, yet the resulting games can still be model checked. Figure 1 summarizes the steps applied to event logs to analyze user experience. These steps elegantly combine and extend several known techniques. Step 1 generates stochastic automata from event logs by means of automata learning. Step 2 converts the learned automata into stochastic weighted games. The resulting games are analyzed using probabilistic model checking to derive optimal strategies. Step 3 ranks critical actions after which users tend to abandon their journey and visualizes the outcome of these novel analyses via a property-preserving visualization technique, to improve the interpretability of the stochastic game results.

We apply these steps to two case studies: an industrial case study [30, 31] and a benchmark [15] from the literature. The case studies are complementary in complexity and differ in the number of users. In both cases, we identified potential service improvements and automatically uncovered caveats. The case studies suggest that our method is able to address two pressing industrial challenges: (1) the automated construction of stochastic user journey models for complex services from event logs, and (2) identification of service bottlenecks by automated analysis of models that reflect user experience. In short, the contributions of this paper are: (1) a formalization of user journeys as stochastic weighted games exploiting the underlying distribution of events in the logs; (2) a tool chain combining automata learning and model-checking techniques to automatically analyze stochastic user journey games; (3) a method for property-preserving model reduction to visualize the stochastic games results; and (4) the automated stochastic modeling and analysis of two case studies to showcase the usefulness and applicability of the proposed combination of techniques and their extensions.

2 Preliminaries

In the following, we write \(\mathcal {D}(X)\) for the set of probability distributions over a set X, where a distribution \(\mu :X \rightarrow [0,1]\) is such that \(\sum _{x \in X} \mu (x) = 1\).

Event Logs. An event log records so-called touchpoints (or events) between users and a service provider. A trace \(\tau = (a_0, \dots , a_n) \in \mathscr {A} ^*\) is a finite, ordered sequence over an alphabet \(\mathscr {A} \) of events. An event log L is a multi-set of such traces [1]. A multi-actor event log \(\mathcal {L}=\langle L,\varPi ,{{\,\mathrm{\alpha }\,}}\rangle \) assigns an initiating actor to each event in an event log L [26]; the set \(\varPi \) contains a set of actors, and the actor-mapping function \({{\,\mathrm{\alpha }\,}}:\mathscr {A} \rightarrow \varPi \) assigns events \(a \in \mathscr {A} \) to an actor \(\pi \in \varPi \).

Automata Learning. To learn stochastic automata from event logs, we use the passive automata learning algorithm IOAlergia [36]. IOAlergia learns stochastic automata for reactive systems defined by MDPs [36], based on Alergia [10]. State merging exploits the underlying probabilities of events in the log. An MDP is a tuple \(\langle \varGamma , A_{\textrm{in}}, A_{\textrm{out}}, \delta , s_0, \lambda \rangle \) with finite sets of states \(\varGamma \), input actions \(A_{\textrm{in}}\) and output actions \(A_{\textrm{out}}\), a stochastic transition function \(\delta :\varGamma \times A_{\textrm{in}} \rightarrow \mathcal {D}(\varGamma )\), an initial state \(s_0\in \varGamma \), and a labeling function \(\lambda :\varGamma \rightarrow A_{\textrm{out}}\). We let \(E_\delta \subseteq \varGamma \times A_{\textrm{in}} \times \varGamma \) denote the finite set of transitions such that \(\delta (s,a)(s') > 0\) for all triples \((s,a,s') \in E_\delta \). We assume MDPs to be deterministic; i.e., \(s' = s''\) holds for all transitions \(\delta (s,a)(s'), \delta (s,a)(s'')\) such that \(\delta (s,a)(s') > 0\), \(\delta (s,a)(s'') > 0\) and \(\lambda (s') = \lambda (s'')\).

Let an input/output log \(L_{\textrm{io}}\) consist of traces \(\tau _{\textrm{io}} = (\lambda (s_0), (i_0, o_0), \ldots , (i_n, o_n))\) in which input and output actions alternate, starting with an initial output \(\lambda (s_0)\), which is only observed in the initial state. Given \(L_{\textrm{io}}\), IOAlergia creates an input/output frequency prefix tree acceptor (IOFPTA), where states are labeled with output actions and transitions with input actions and frequencies. In the IOFPTA, every path in the tree represents a prefix of a trace in \(\tau _{\textrm{io}} \in L_{\textrm{io}}\), and the frequency denotes the number of traces sharing this path. After creating the IOFPTA, IOAlergia merges states. Two states are merged if they (1) have the same output label, (2) are locally compatible, and (3) all their successor states with the same output labels are compatible. Local compatibility is based on the Hoeffding bound [25]: two states \(s, s'\) are compatible if, for all inputs \(i \in A_{\textrm{in}}\),

$$\begin{aligned} \left| \frac{f(s,i,o)}{n(s,i)} - \frac{f(s',i,o)}{n(s',i)} \right| \le \sqrt{\frac{1}{2} \log \frac{2}{\epsilon }} (\frac{1}{\sqrt{n(s,i)}} + \frac{1}{\sqrt{n(s',i)}}), \end{aligned}$$

where f(sio) is the frequency of the transition to state o and n(si) the sum of frequencies, for input i in state s. The parameter \(\epsilon \in (0,2]\) steers the algorithm’s eagerness for state merging; e.g., \(\epsilon = 2\) leads to no state merges. Therefore, the MDP might contain several states representing the same event. When no states can be merged, the transition frequencies are normalized to create an MDP.

User Journey Games. A user journey game [30, 31] is a weighted two-player game \(\langle \varGamma , A_C, A_U, E, s_0, T, T_s, w \rangle \), where \(\varGamma \) is a finite set of states, \(A_C\) and \(A_U\) are disjoint sets of actions, \(E \subseteq \varGamma \times A_c \cup A_U \times \varGamma \) is a transition relation, \(s_0 \in \varGamma \) an initial state, \(T\subseteq \varGamma \) a set of final states, \(T_s\subseteq T\) successful final states, and \(w: E \rightarrow \mathbb {R}\) a weight function. Actions are separated into two disjoint sets: controllable actions \(A_C\) are taken by the service provider and uncontrollable actions \(A_U\) by the user. User journey games are deterministic if \(s' = s''\) for \((s, a, s'), (s, a, s'') \in E\). Uncontrollable actions have higher precedence than controllable actions: hence, the user chooses actions first but might do nothing.

A stochastic multi-player game (SMG) [11] is a tuple \(\langle \varPi , \varGamma , A, (\varGamma _i)_{i \in \varPi }, s_0, \delta \rangle \), where \(\varPi \) is a set of players, \(\varGamma \) a set of states, A a finite set of actions, \((\varGamma _i)_{i \in \varPi }\) a partition of states among players, \(s_0 \in \varGamma \) an initial state, and \(\delta : \varGamma \times A \rightarrow \mathcal {D}(\varGamma )\) a stochastic transition function. SMGs partition the states among the players; players can take enabled actions if the current state is in their partition. An action \(a \in A\) is enabled in a state s if there is a transition to another state with non-zero probability, i.e., \(\exists s' \in \varGamma : \delta (s,a)(s') > 0\). The set of transitions \(E_\delta \) defined by \(\delta \) includes all triples \((s, a, s') \in \varGamma \times A \times \varGamma \) with \(\delta (s,a)(s') > 0\). Games can include a reward structure \(r: E_\delta \rightarrow \mathbb {Q}_{\ge 0}\) mapping transitions to positive rewards (modeling weighted transitions). Rewards accumulate during the game.

Analyzing Stochastic Multiplayer Games. We are interested in analyzing a player’s strategy, which determines the player’s actions in each state. For simplicity, we focus on memory-less strategies, where the choice of action is determined by the current state. A strategy [11] for player \(i \in \varPi \) in an SMG is a partial function \(\varGamma _i \rightarrow \mathcal {D}(A)\) that maps states to distributions over actions.

PRISM-games  [11, 32] extends the probabilistic model checker PRISM  [34] to games. While PRISM can resolve non-determinism to establish strategies for a single player, PRISM-games can resolve nondeterminism for multiple, possibly competing players. The logic Probabilistic Alternating-time Temporal Logic with Rewards (rPATL) allows reasoning about SMGs by expressing temporal properties [11]. The syntax of rPATL is given by:

figure c

rPATL is a CTL-style branching-time temporal logic that extends state properties \(\phi \) to path formula \(\psi \) with probabilistic and reward constraints. Here, p is an atomic proposition. The coalition operator \(\langle \langle \varXi \rangle \rangle \) denotes the subset \(\varXi \subseteq \varPi \) of players that collaborate in a query; these players share a common goal against the remaining adversarial players. The probabilistic operator , where is a comparison operator and \(q \in \mathbb {Q} \cap [0,1]\) is a probability bound, indicates a probabilistic query under bound . The expected cumulative reward operator evaluates the reward structure r for eventually reaching \(\phi \) under bound , where \(\chi \in \mathbb {Q}_{\ge 0}\) is a reward bound and r is a reward structure. The quantitative operators and , with , return the smallest, respectively largest, value that the given coalition of players \(\varXi \) can enforce. The superscript \(*\) of the eventually operator F expresses the cost for paths when \(\phi \) is not reached, it may be infinity (\(\infty \)), zero (0), or accumulated along the path (c). Further temporal logic operators can be constructed from the next operator \({\textbf {X}}\), the until operator \({\textbf {U}}\), and the bounded until operator \({\textbf {U}}^{\le k}\); for example, the globally operator \({\textbf {G}} \phi \) is defined via \({\textbf {U}}\): \(\lnot ( \top {\textbf {U}} \lnot \phi )\) [11].

3 Case Study Overview

We conduct two complementary case studies: an industrial application (GrepS) and a research benchmark (BPIC’17). We explain the steps of our method on GrepS. BPIC’17 includes thousands of journeys and demonstrates scalability.

GrepS. The company GrepS offers programming skill evaluations for Java [6]. The customers of GrepS are organizations that use the service in the hiring process to identify proficient applicants. Users of the service, the assessed trainees, usually complete the assessment within 1–2 weeks. The service comprises three phases: (1) sign up, (2) solve all programming tasks, and (3) review and share the skill report with the customer. In a successful journey, the user completes all tasks and shares the results with the organization. Otherwise, the journey is unsuccessful. The event log contains anonymized user logs as tabular data [29]. To construct multi-actor event logs, the actor-mapping function \({{\,\mathrm{\alpha }\,}}\) was detailed by combining domain knowledge and interaction with a GrepS developer.

BPIC’17. The BPI Challenge 2017 captures a loan application process from a bank. Users can cancel, submit or complete applications, and accept phone calls from the bank. The process can have three different outcomes: (1) an offer can be accepted by the user, (2) the application can be declined by the bank, or (3) the application can be canceled by the user. We exclude declined applications as they occur due to external factors, e.g., indebtedness. Thus, user journeys are successful if the user accepts one of the provided loan offers; cancellations are unsuccessful. The event log contains anonymized user logs as tabular data [15]. To construct multi-actor event logs, the actor-mapping function \({{\,\mathrm{\alpha }\,}}\) was detailed by combining domain knowledge with information given in the BPIC’17 forum.Footnote 1

Interestingly, BPIC’17 contains a substantial change in the service provider’s underlying process, a concept drift [2]. To investigate the impact of the concept drift on the user journey, we split the log: The first part (BPIC’17-1) contains traces until the change occurred in July 2016, and the second part (BPIC’17-2) contains the traces after the change.

The BPIC’17 event log is preprocessed to clear inconsistencies [26, 40]. Specifically, we discretized call durations: A trace might contain several events associated with one call, and calls ranging from seconds to hours. Thus, we aggregate repeated calls and classify them by their duration into “short”, “long”, or “super long”. We exclude calls with an aggregated speaking time of less than 60 seconds. We also distinguish different offers within the same trace. The service provider cancels offers if there is no response after 20 days. We distinguish actively canceled offers and cancellations by the service provider due to timeout. We also found some redundant events; e.g., the event W_Call after offers was always followed by A_Complete, so we merged these events. To remove outliers we kept only traces that appear more than once in the log; in the end, both logs still contain more than 5000 journeys.

4 From Logs to Stochastic Games

We explain how stochastic user journey games are constructed from multi-actor event logs \(\mathcal {L} = \langle L, \varPi , {{\,\mathrm{\alpha }\,}}\rangle \), i.e., the first two steps in Fig. 1. Step 1 generates an MDP M from the multi-actor event log \(\mathcal {L}\). Step 2 constructs a weighted stochastic game, extending M with weights and actor information. These stochastic user journey games combine user journey games and SMGs (see Sect. 2).

In a multi-actor event log \(\mathcal {L}\), the set of actors \(\varPi \) is assumed to include the service provider C, who initiates all actions controlled by the offering company, and the user U, who initiates all remaining actions. We assume that users engage in only one action at a time; hence, our focus here will be on turn-based games as models for user journeys, and not on models with parallelism.

Step 1. We first learn an MDP \(M = \langle \varGamma , A_{\textrm{in}}, A_{\textrm{out}}, \delta , s_0, \lambda \rangle \) with IOAlergia. For the construction of M, we make sure that the traces \(\tau \in L\) are in the required format of input/output pairs by extending each trace \(\tau = (a_0, \dots , a_n)\) to an input/output trace \( \tau _{\textrm{IO}} = (\lambda (s_0), (\textsf{env},\lambda (s_0)^{{{\,\mathrm{\alpha }\,}}(a_0)}), ( act (a_0),a_0), \ldots , (\textsf{env}, a_{n-1}^{{{\,\mathrm{\alpha }\,}}(a_n)}), ( act (a_n),a_n), (\textsf{env}, a_{n}^{{{\,\mathrm{\alpha }\,}}( res )}), ( act ( res ), res ))\). Each \(a_i \in \tau \) is encoded by a pair \((\textsf{env}, a_{i-1}^{{{\,\mathrm{\alpha }\,}}(a_i)})\) where \(\textsf{env}\) is a generic input action indicating the next player, followed by an output action \(a_{i-1}^{{{\,\mathrm{\alpha }\,}}(a_i)}\) that indicates the player who initiates event \(a_i\) from \(a_{i-1}\) according to the actor-mapping function \({{\,\mathrm{\alpha }\,}}\). This pair is followed by a pair \(( act (a_i), a_i)\), which uses a function \( act :\mathscr {A} \rightarrow A_{\textrm{in}}\) to map events to input actions, where the output action corresponds to the event itself. A naive mapping could be \( act (a_i) = a_i\), relating each event to a deterministic action. However, it is often useful to introduce a mapping that abstracts slightly from the events to better reflect the problem domain in the actions. Each \(\tau _{\textrm{IO}}\) starts with an initial output \(\lambda (s_0)\) and ends with a final output \( res \), which is \(\textsf{successful}\) if \(\tau \) records a successful user journey and \(\textsf{unsuccessful}\) otherwise. This resulting set of input/output traces is given to IOAlergia (see Sect. 2). By including input/output pairs \((\textsf{env}, a_{i-1}^{{{\,\mathrm{\alpha }\,}}(a_i)})\) in the traces, the learned MDP provides the probability distribution for the actions of the next player.

Step 2. The MDP M obtained in Step 1 is extended to a stochastic user journey game by means of a weight function \(w: E_\delta \rightarrow \mathbb {R}\), labeling transitions with weights, and partitioning the states \(\varGamma \) into service provider states \(\varGamma _C\) and user states \(\varGamma _U\). For the automatic construction of the weight function w, we exploit the distinction between successful and unsuccessful user journeys in the event log to compute a numerical value that represents the impact of an action on the outcome of the user journey. The calculation of w is based on previous work [30, 31]. For every transition \(e \in E_\delta \), we let \(w(e) = (1 - H(e,L)) \cdot \textit{majority}(e,L)\), where H is the entropy of successful and unsuccessful journeys. The weight is positive if the majority of traversals are successful journeys, otherwise negative. The weight is maximal, respectively minimal, for transitions occurring exclusively in successful, respectively unsuccessful, journeys. The accumulated weight along a path in a user journey game, called gas, then represents the user’s “motivation” to continue the journey [30, 31].

Table 1. Model checking queries for SUJGs.

The controllable and uncontrollable states are identified using the actor-mapping function \({{\,\mathrm{\alpha }\,}}\) to map states to the actors C (service provider) and U (user); e.g., the set of states in \(\varGamma _C\) corresponds to the copies of output actions where C controls the next action: \(a_{i-1}^{{{\,\mathrm{\alpha }\,}}(a_i)}\), where \({{\,\mathrm{\alpha }\,}}(a_i) = C\). Then \(\varGamma _C = \{s \in \varGamma \mid \exists a \in A_{\textrm{out}} : \lambda (s) = a^C \}\), and \(\varGamma _U = \{s \in \varGamma \mid \exists a \in A_{\textrm{out}} : \lambda (s) = a^U \vee \lambda (s) = a \}\).

The weight function w and the state partitioning allows the MDP to be transformed into a weighted, two-player SMG, hereafter called a stochastic user journey game (SUJG), i.e., a tuple \(G = \langle \{C, U\}, \varGamma , A_{\textrm{in}}, (\varGamma _i)_{i \in \{C,U\}}, s_0, \delta , T, T_s, w \rangle \), where final states \(T = \{ s \in \varGamma \mid \lambda (s) = \textsf{successful} \vee \lambda (s) = \textsf{unsuccessful} \}\), successful final states \(T_s = \{ s \in \varGamma \mid \lambda (s) = \textsf{successful} \}\), and w the weight function. Note that every user journey game can be transformed into an equivalent SUJG.

5 Queries for Stochastic User Journey Games

We here assume that users do not interact infinitely with a service provider but eventually stop. Therefore, we consider SUJGs to be stopping games, in which we reach almost surely terminal states with reward zero [33].

Step 3. We now consider the probabilistic model checking of properties that are crucial for the success of user journeys. The violation of these properties allows us to locate problematic states where the user journey may be improved. The constructed SUJG may contain loops with a positive or negative sum of weights. For this reason, we distinguish queries applicable to games with reward structures and with bounded integer encodings. Table 1 lists properties that we analyzed for the case studies, and that we discuss below. The queries are specified in rPATL, where C denotes the service provider and U denotes the user.

Let us first analyze the probability of completing a user journey successfully; i.e., to what extent can service provider C guarantee the successful outcome of the game? Query Q1 quantifies the service provider’s ability to guide an independent user. Searching for states that return a small probability of reaching any \(s \in T_s\) uncovers states from which the service provider has little or no probability of successfully guiding the user. Thus, the journey is likely to fail. Here, \({{\,\mathrm{\textsf{successful}}\,}}\) is a predicate that only holds in the successful final state \(T_s\), and \({{\,\mathrm{\textsf{unsuccessful}}\,}}\) is a predicate that holds in the final states \(T \setminus T_s\).

Reward Structures decouple accumulated rewards from the state space in PRISM-games and allow efficient computation of accumulated rewards. In turn-based SMGs, PRISM-games only supports positive rewards. Thus, we use two reward structures: pos for positive and neg for negative gas (see Sect. 4). The weight of a transition in the SUJG contributes to the corresponding structure, i.e., positive weights add to pos, and negative weights add to neg. Many services contain transitions with negative weights, e.g., reflecting actions that may be unintuitive for the user. To analyze the effect of these transitions, we consider queries concerning the user experience. Query Q2 determines the lower bound for the negative reward that the user must accumulate to achieve any outcome, by assuming that both actors cooperate. Queries Q3 and Q4 determine the minimum neg and maximum pos reward that the service provider can guarantee, independent of the user, over successful and unsuccessful journeys, respectively. Rewards can also be used to relate gas to the number of steps taken so far: Queries Q5 and Q6 return the minimum negative or maximum positive accumulated reward (denoted C) within the first S steps that C can guarantee.

Bounded Integer Encodings combine positive and negative weights in one variable, enabling queries on their difference. Every transition changes the value of this variable by the corresponding positive or negative weight, reflecting the gas along the paths in the game (see Sect. 4). We also consider a step counter that is updated for each transition. To restrict the size of the search space, we give this variable a bound (i.e., \({{\,\textrm{steps}\,}}{:}{=}\min ({{\,\textrm{steps}\,}}+1, X)\) for some X). We then use concentration inequalities such as Markov’s inequality and cumulative reward structures to calculate the expected values of pos, neg, and steps in Q7, and derive upper and lower bounds that include at least a minimum part of the distribution. Note that this construction is only needed in the presence of loops and that the expected total rewards, used to bound the model, are finite as we assume stopping games. Query Q8 determines the service provider’s probability for a successful journey with a minimum amount of gas along the path, a maximum amount of steps, and an overall lower bound for the gas. This multi-objective query searches for a successful final state where \({{\,\textrm{gas}\,}}\ge G_0\) and \({{\,\textrm{steps}\,}}\le S\), while ensuring that \({{\,\textrm{gas}\,}}\) never decreases below \(G_1\), for constants \(G_0, S, G_1\).

Experiments. PRISM-games supports experiments on queries that instantiate a variable, e.g., the maximum number of steps, with all values in a given integer interval. We use experiments to compare different values of player activity by modifying the probabilities for the service provider or user to take their actions first. Additionally, we vary the allowed number of steps to investigate how the probabilities of a successful outcome change with a limited number of steps.

6 Model Reduction for Visualization

Model checking may reveal weaknesses in the service design and unsatisfiable queries may suggest a need for changes. However, an unsatisfiable query does not by itself identify the actions that negatively affect the largest number of users. To help prioritize options during service redesign, we rank actions based on their expected influence on the user journey outcome, to identify the most critical actions for the largest number of users (cf. Step 3, Fig. 1). We synthesize strategies maximizing the probability of a successful outcome by returning a maximizing strategy for the service provider and a minimizing strategy for the user, based on the queries in Sect. 5. These strategies resolve the players’ choice of action in the SUJG via an induced Markov chain \(M'= \langle \varGamma ', \delta ', s_0 \rangle \); the states \(\varGamma '\) of \(M'\) form a, possibly smaller, subset of the states \(\varGamma \) of the original SUJG, i.e., \(\varGamma ' \subseteq \varGamma \). (The construction of the induced Markov chain \(M'\) from an SMG is detailed in [12].)

We say that users are guidable if the probability that they can successfully complete the journey is greater than zero. Let the function \(\mathscr {R} : \varGamma ' \rightarrow [0,1]\) map states \(s \in \varGamma '\) to the (intermediate) results of the probabilistic query Q1, expressing the probability of reaching the successful outcome from s. The difference in guidable users between two neighboring states s and \(s'\) is the absolute difference between \(\mathscr {R} (s)\) and \(\mathscr {R} (s')\), multiplied by the users traversing between these states. Formally, the difference \({{\,\textrm{diff}\,}}:\varGamma ' \rightarrow \mathbb {R}\) in state \(s \in \varGamma '\) is the absolute difference in guidable users between s and all neighboring states \(s'\):

$$\begin{aligned} {{\,\textrm{diff}\,}}(s) = \sum _{s' \in \varGamma '} |\mathscr {R} (s)-\mathscr {R} (s')|\cdot \#_\mathcal {L}^{\varGamma '}(s, s') \ . \end{aligned}$$
(1)

Here, \(\#_\mathcal {L}^{\varGamma '}(s, s')\) denotes the number of users traversing from s to \(s'\) as recorded in the log \(\mathcal {L}\), where \(s'\in \varGamma '\) and \(\delta '(s,s') > 0\). For non-neighboring states, let \(\#_\mathcal {L}^{\varGamma '}(s, s')=0\). States can then be ranked in descending order by their difference.

Visualizations of Results. Real-world processes with complex structures and many users result in models that might be hard for humans to interpret correctly. We discuss a model visualization method based on the model-checking results that allows model reduction while preserving the ranking order.

The state space of \(M'\) can be abstracted into clusters of states with an equal probability of success as defined by \(\mathscr {R} \). Neighboring states with the same results can be merged. States \(\{s' \in \varGamma ' \mid (s, s') \in E_{\delta '} \, \wedge \, \mathscr {R} (s) = \mathscr {R} (s')\}\) can be merged into a state s. We also merge successful final states \(T_s \cap \varGamma '\) and unsuccessful final states \((T\setminus T_s)\cap \varGamma '\). Note that the reduced model preserves all transitions to states that negatively impact the user journey, and that the merge operation is commutative.

To visualize fluctuations in guidable users along the user journey, we transform the reduced model into a Sankey diagram [39]. We opted for Sankey diagrams since they seem accessible to a wide range of stakeholders with some previous insights into the user behavior [19]. Each bar in the diagram illustrates changes in guidable users, divided into flows of lost and gained guidable users. The largest bars indicate states that are promising candidates for improvement. Note that the bars are not monotonic as they do not visualize the absolute number of users in a state, but the weighted difference in guidable users.

A heat map visualizes the result mapping \(\mathscr {R} \) in the reduced Markov chain. By clustering similar states, we can keep diagrams fairly small without compromising the analysis. Figure 2a shows a SUJG with three necessary user actions to reach a successful outcome. States are annotated with the probability of reaching the successful final state, dotted lines represent uncontrollable user actions, annotated with their probabilities. Figure 2b shows the reduced Markov chain, where two actions divide the states into four clusters with \(35\%, 70\%, 100\%\), and \(0\%\) probability of success, respectively. The insights gained from the induced Markov chain are then visualized as a Sankey diagram in Fig. 2c. The example illustrates flow capacities through the distribution of 100 users.

Fig. 2.
figure 2

We visualize the model checking results in a Sankey diagram that is generated from the learned (SUJGs).

7 Case Study Results

Table 2. Model checking results for GrepS and BPIC’17.

We present results for the GrepS and BPIC’17 case studies from Sect. 3. The steps described in Sects. 46 are assembled in a tool chain, implemented in Python 3.10.12, and available online [27]. For automata learning, we use the IOAlergia implementation of AALpy [37] (v. 1.4) and, for model checking, PRISM-games  [11, 32] (v. 3.2.1). All experiments ran on a laptop with \(32\,\textrm{GB}\) memory and an i7-1165G7 @ \(2.8\,\textrm{GHz}\) Intel processor within few hours.

Fig. 3.
figure 3

Simplified model of GrepS’ user journey.

GrepS. Figure 3 shows the generated cyclic game, where touchpoints are represented as states, identified by T and a number. It encodes a heat map, ranging from yellow states to green states; the darker a state’s green, the greater its probability for success (orange is the unsuccessful state). Transitions with negative weights are orange, and those with positive weights green. The figure highlights the three phases of GrepS’ user journey. Phase 1 consists of touchpoints T0–T4, Phase 2 of T5–T20 and Phase 3 of T21–T26. Users receive a new task in T9, T11, T13, T15, and T17. Feedback to users is given after every task. Users share their results with the client company in T26. For readability, we merged the service-provider controlled and user controlled states, which we introduced due to the input/output format of the traces, see Step 1 in Sect. 4, with their preceding touchpoint-labeled states (the full model is available at [27]). For GrepS, we assume that users, when it is their turn, can transition according to the recorded events, or do nothing, i.e., transition to a service-provider state, if available.

We investigate the limits for the positive and negative weights that the service provider can guarantee during the journey, with the user and on its own. Table 2

presents results for model checking the queries Q2–Q4 (see Table 1) for both case studies. For GrepS, the user must endure a significant number of negatively weighted transitions, since the maximum accumulated pos (Q4) is smaller than the minimum accumulated neg (Q3). Cooperation (Q2) results in a \(67.37\%\) reduction in accumulated neg.

We analyze the impact of the users’ and service provider’s activity on the user journey by varying the probability in the game’s transitions, to change how eager a player is in taking action. Figure 4a shows the results for these changes: on the horizontal axis, \(q=0\) means that the player takes action according to the frequencies of the original game, \(-1 \le q<0\) means that the service provider gradually increases the probability of taking action (the service provider always takes an action, if available, with \(q=-1\)). Similarly, for \(1\ge q >0\), the user gradually increases the probability of taking action (until always taking an action, if available, with \(q=1\)). The vertical axis shows the probability of a successful journey (Q1); interestingly, GrepS has a linear gain from being more active and a non-linear loss from being more passive. Figure 4b shows the results for queries Q5 and Q6 by comparing the maximal accumulated positive and the minimum accumulated negative weights for the first S steps of the journey, revealing that negative weights surpass positive weights, especially at the beginning of a journey.

To evaluate whether the service provider can guide users to a successful outcome with limited steps and lower bounds for the gas, we consider the model with bounds derived from query Q7 (see Sect. 5). We bound the integer encodings by 10 times their expected value, which includes at least \(90\%\) of the traces. Figure 4c shows the development in guiding the user under Q8. The plot’s labels are pairs \((G_0,G_1)\), where \(G_0\) is the minimum gas in the final state and \(G_1\) the lower bound for gas along the journey. For pairs with the same results, we only plot pairs with the maximum final gas and the maximum gas along the journey. The plot shows that experiencing a journey with high minimal gas and reaching a successful outcome are conflicting goals; maximizing minimal gas clearly affects the probability of success for the user journey. For the best probability of success (\(51\%\)), GrepS needs to guide the users through the negatively weighted transitions, which reach a minimum gas of \(-64\). Actually, the user never fully recovers positive gas in this journey, which ends with a negative gas of \(-4\).

Fig. 4.
figure 4

Experiment results for the GrepS case study.

The analysis has shown that users face negative experiences and that the service provider can offer guidance. We now consider where the journey can be improved to help users reach a successful outcome. Figure 5 shows the derived Sankey diagram with observed users as flow capacities, as described in Sect. 6. The reduced model contains only 6 states, while the mined one has 65 states. Based on the state ranking function (Eq. 1), state T25, where users accept or reject their test results, appears as the most critical state for a successful journey; it determines whether the user will (or not) reach a successful final state; in fact, \(25\%\) percent of the users recorded in the log fail their journey immediately after this state. The second most critical state is the first task T9 (where \(37.5\%\) of all users are lost), followed by the other tasks. However, at these points in the journey, several user-controlled actions are required for a successful journey, which makes GrepS dependent on the user’s cooperation in these states.

Fig. 5.
figure 5

Sankey diagram of Greps’ user journey for guidable users.

Thus, the SUJGs allow us to identify specific states for enhancing the journey: T9 and T25. Our analysis clearly shows that GrepS needs to be active to achieve a successful user journey (Fig. 4). We note that most negatively weighted transitions are user-controlled, suggesting that GrepS can prevent users from “derailing” from a successful journey by being more active within the user journey. If GrepS provides less guidance, users tend to abandon their journeys more easily.

Stakeholder validation of GrepS Results. We presented the results obtained for GrepS to a company stakeholderFootnote 2 to obtain feedback on our results and their presentation format. The stakeholder was not involved in performing the case study; the other authors only had access to the event log from GrepS, provided in 2021. This validation was done after the analysis results were available.

He was familiar with Sankey diagrams and immediately observed that our analysis makes non-trivial insights accessible to key-stakeholders, varying from concrete recommendations to non-trivial prescriptions on company behavior. From the company’s perspective, prioritizing limited resources to improve the users’ success rate and experience is challenging. Our case study substantiates that automated analyses based on event logs are a viable alternative to current best-practices based on heuristics, and promise to reduce assessment efforts.

The identification of T25 as a candidate for improvement (Fig. 5) had actually been discovered independently by GrepS, confirming our analysis. This step is currently supplemented by a manual follow-up step, since completing the user journey successfully is crucial to provide a good user experience. The second suggested task, T9, is not obvious to GrepS and introduces options they have not yet considered, namely to spend resources on guiding the user rather than further optimizing the negative weighted sign-up phase (see Fig. 4b).

The analysis of actor eagerness related to the probability of success (Fig. 4a) is novel and implies that revenue from resources invested in guiding users can be computed. This allows GrepS to evaluate whether to spend more resources on guiding users, given the linear scaling of success probability, or to cut costs through less guidance, reducing manual work while increasing service adversity.

Figures 4b and 4c can be used to relate user profiles and user journeys. A user’s motivation to complete tests and share results despite negatively weighted actions, is initially unknown. If the company had some prior knowledge about the initial motivation of a user or a group of users, it would be possible to model different journeys through the service. In particular, Fig. 4c can support such endeavors, because different bounds can be identified for different planned journeys with corresponding probabilities for success.

Fig. 6.
figure 6

Parametric eagerness for Q1 in BPIC’17.

BPIC’17. Applying Steps 1 and 2 to BPIC’17 yields models with 95 states for BPIC’17-1 and 131 for BPIC’17-2. Step 3 reduces the models to 32 and 47 states, respectively (i.e., \(+60\%\) reduction). When filtering on reachable states, using the generated strategy, the models shrink to 15 and 19 states, respectively. Figure 7 shows the Sankey diagrams for the two event logs. For readability, we omit the names of states with the least difference in guidable users and use a heat map as in Fig. 3.

The comparison of model checking results between the two models with queries Q2–Q4 (see Table 2) shows some small improvements from BPIC’17-1 to BPIC’17-2. Figure 6 compares different levels of player eagerness for both SUJGs, model checking Q1. It reveals improvements in the service. BPIC’17-2 outperforms BPIC’17-1 starting from \(q=0.06\) when increasing the service provider’s probability to take an action. (Plots showing results for the remaining queries, similar to the queries for the GrepS case study, are available online [27].)

Figure 7 shows the positive impact for BPIC’17-2 after the concept drift. In BPIC’17-1, the number of guidable users remains constant through the user journey, with the most critical state causing only \(27\%\) of the total user difference. In BPIC’17-2, the main critical state causes a total of \(50\%\) difference of guidable users. We also observe a change in loan offers: the 2nd and 3rd offers are prominent in the reduced BPIC’17-2 model (while they were merged with other states or omitted in BPIC’17-1), each with decreasing flow capacity. Furthermore, the probability of guiding users from “customer Create Offer 0” reduced; this state is marked yellow in BPIC’17-1 and orange in BPIC’17-2, indicating a decrease in user experience. In both journeys, the second most critical state, a short call due to incomplete files, is user-controlled, but its fraction of the total guidable user’s difference decreased from \(26.6\%\) to \(12.5\%\). This can be interpreted as evidence that the service provider improved this call state after the concept drift. However, we observe that BPIC’17-2 still lacks proper guidance for the effect of the call, based on the direct transition to the unsuccessful final state.

Threats to Validity. For model learning with IOAlergia, we set the parameter \(\epsilon \) (which regulates state merging) according to the size of the underlying event log and the assumed complexity of the service. For GrepS, we set \(\epsilon = 0.1\) due to a small number of possible journeys, while for BPIC’17, we set \(\epsilon = 0.8\) to capture different decisions and possible executions. Insights from GrepS highly depend on \(\epsilon \), where a larger \(\epsilon \) restricts state merging. For BPIC’17, we observe that the eagerness experiment (Fig. 6) replicates for various \(\epsilon \) values, though with variations for either small or large \(\epsilon \) values. Further investigations are needed to draw rigorous conclusions about this relation. The model-checking analysis in Step 3, which generate Sankey diagrams, do not require a minimal flow of users. Strategies might exploit rarely observed behavior, they do not consider a minimum bound for the coverage of users. Table 1 presented queries that target Pareto optimization problems to optimize multiple conflicting objectives, e.g., limited steps and minimal gas in positive states. We explored solutions to these problems with PRISM-games experiments, but one could also search for all solutions. The efficiency of our technique depends on automata learning and model checking; all presented results are reproducible within \(\sim 9\) h.

Fig. 7.
figure 7

Sankey diagrams generated from the reduced BPIC’17 models.

8 Related Work

Related work primarily focuses on designing domain-specific modeling languages that allow modeling from the user’s perspective. The methods developed [5, 9, 14, 20, 22, 23, 35, 38, 41] concentrate on manually constructing user journeys based on expert knowledge [9], user questionnaires [21, 41], or given event logs [5]. The analysis of the resulting models is typically also performed manually. However, Lammel et al.  [35] propose an ontology-based technique that allows the automatic generation of visualizations to provide further insights.

Process discovery [1] is a technique to automatically generate models from event logs and has been applied to generate different types of user journey models such as customer journey maps (CJM) [7, 8, 24] or transition systems [26, 28, 30, 31, 42]. CJMs represent grouped traces in the event logging, unlike our work where we mine a general model. Existing approaches [26, 28, 31, 43] that use process discovery techniques to mine transition systems ignore the underlying distribution of events. By capturing the probabilities in the model, we can perform a finer analysis and visualization, and provide guidelines to the service provider in case of changing behavior. In our previous work [31], we also generated weighted deterministic user journey games and applied model checking to find bottlenecks in the service. By applying automata learning instead of process discovery techniques, we enhance this approach to generate probabilistic games.

Automata learning techniques [3, 13, 16, 45] have been used to mine process models, e.g., transition systems or Petri nets, from given event logs. However, our proposed approach incorporates the users’ perspective. While existing techniques may also consider the underlying probability distribution of the event log constructing the model, they neglect it for later analysis. Wieman et al.  [45] derive improvements for industrial case studies manually from the learned model.

9 Conclusion

This paper presents two complementary case studies for the automated modeling and analysis of user journeys from event logs. Our analysis tool chain combines automata learning and model-checking techniques, based on a formalization of user journeys as stochastic weighted games that exploits the underlying distribution of events in the log. Model-checking results are used in property-preserving model reduction, which allows us to automatically identify and rank actions that are critical to the outcome of the user journey and visualize their effect. To the best of our knowledge, this is the first work using stochastic games in an automated method to analyze and improve user journeys.

The investigated case studies demonstrate the applicability of our approach to real-world services, varying in size and complexity. The results of the case studies lead us to three main observations: (1) model visualization creates compact Sankey diagrams for complex services that facilitate the interpretation of formal analyses; (2) the model reduction preserves changes in the underlying journeys, e.g., the concept drift for BPIC’17; and (3) the state ranking method effectively identifies candidate states for service redesign, based on user experience. Compared to previous work, our exploitation of the underlying probabilistic distribution of events enabled a more targeted analysis of the user journeys. For future work, automatically capturing the actor information in the event logs would make our approach less dependent on domain knowledge.