1 Introduction

Robotic Process Automation (RPA) is an emerging automation technology in the Business Process Management (BPM) domain that creates software (SW) robots to partially or fully automate rule-based and repetitive tasks (or simply routines) performed by human users in their applications’ user interfaces (UIs) [1]. Despite the growing attention around RPA, when considering state-of-the-art RPA technology, it becomes apparent that the current generation of RPA tools is driven by predefined rules and manual configurations made by expert users rather than automated techniques [7, 8, 10].

The traditional life-cycle of an RPA project can be summarized as follows [14]: (1) determine which process steps are good candidates to be automated in the form of routines; (2) model the selected routines trough flowchart diagrams, which involve the specification of the actions, routing constructs, data flow, etc. that define the behaviour of a SW robot; (3) develop each modeled routine by generating the SW code required to concretely enact the associated SW robot on a target computer system; (4) deploy the SW robots in their environment to perform their actions; (5) monitor the performance of SW robots to detect bottlenecks and exceptions; and (6) maintain routines over time, updating the SW robots when needed. The majority of the previous steps, particularly the early ones, require the support of skilled human experts, which need to understand the anatomy of the candidate routines to automate through interviews, walk-troughs, and detailed observation of workers conducting their daily work, cf. step (1), and manually define the flowchart diagrams representing the structure of such routines, cf. step (2). These diagrams will drive the development of the executable scripts (also called RPA scripts), allowing for the concrete enactment of SW robots at run-time, cf. steps (3) and (4). The problem is that this high degree of human involvement contradicts the underlying objective of RPA, i.e., an increased level of automation.

In this paper, we discuss how process mining can be leveraged to address this problem, enabling new levels of automation and support for RPA. Building on the RPM (Robotic Process Mining) framework [16], we show that the generation of SW robots can be achieved in a semi-automated way directly from the UI logs recording the interactions between workers and SW applications during one or more routine(s) executions, thus eliminating the manual and time-consuming steps (1) and (2) required for modeling the details of the routine structure.

Specifically, in Sect. 2, we first present a reference data model that enables a standardized specification of UI logs. Then, in Sect. 3, we show how the RPM framework can be effectively enacted by researchers/practitioners through the SmartRPA approach [4, 6] and its implemented tool [5], which enables to interpret the UI logs keeping track of many routine executions, and to generate SW robots that emulate the most suitable routine variant for any specific intermediate user input that is required during the routine execution. Finally, in Sect. 4 we conclude the paper by tracing future work.

2 Specifying and Collecting UI Logs

The main source of data for RPA are UI logs, which are a particular kind of event log that record low-level manual activities during the execution of a task in an information system. Examples of events in a UI log include clicking a button, entering a string into a text field, ticking a checkbox, or selecting a value from a dropdown. The specific scope of a UI log, including the definition of relevant activities and attributes to cover, depends on the context in which the log is collected and the purpose for which it is used. Hence, the first challenges when collecting UI logs are often (1) to determine what kind of data is available and (2) to design the data collection process so that the logs are comprehensive enough to cover the desired automation use cases.

Fig. 1.
figure 1

Reference data model for UI logs [2]

To specify a UI log for RPA, one needs to determine which attributes can and should be recorded and how they relate to each other. The UI log should be as standardized as possible to allow for interoperability between different tools, but they also need to be adapted to the individual scenario. To achieve this, they can refer to the reference data model for process-related UI logs, shown in Fig. 1. This reference model defines the core attributes of UI logs but remains flexible with regard to the scope, level of abstraction, and case notion [2]. It defines the activity of a UI log as a combination of an action (e.g., click or input) and a target object in the user interface. It further specifies the possible instances of target objects and their hierarchical relation, as well as task and user components that provide additional (business) context.

After specifying the UI log structure, the actual data needs to be recorded. Generally, there are three ways to achieve this: application-independent logging with screen capture and OCR technology [14, 18], application-specific logging with plug-ins [4, 17], and application-internal logging within the an application’s source code. Not all options are feasible in each application context and they each have certain assets and drawbacks. For example, application-internal logging will typically produce the highest data quality, but it is only possible if we have access to the application’s source code. Application-independent and application-specific logging have to externally reconstruct the events that happen within the application, but can be applied to any tool independent of its origin.

3 SmartRPA: From UI Logs to SW Robots

The approach underlying SmartRPA takes inspiration from the RPM framework presented by Leno et al. in [16]. RPM aims to support analysts to produce executable specifications of routines, in form of SW robots, interpreting the routine executions stored in a UI log. Specifically, RPM envisions a pipeline of two main stages that consist of: (i) interpreting UI logs corresponding to executions of one or more routine executions, by identifying the candidate routines to be automated with RPA tools (i.e., the segmentation issue [9]); and (ii) synthesizing executable RPA scripts to enact SW robots. SmartRPA incorporates these stages within a larger approach, as shown in Fig. 2.

Fig. 2.
figure 2

Overview of the SmartRPA approach

Starting from an unsegmented UI log previously recorded with an RPA tool, the first stage of the SmartRPA approach is to inject into the UI log the end-delimiters of the routines under examination. An end-delimiter is a dummy action added to the UI log immediately after the user action that is known to complete a routine execution. The knowledge of such end-delimiters is crucial to make the approach work, as discussed in [3].

For tackling the segmentation issue, we rely on three main steps: (i) a frequent-pattern identification technique [11] to automatically derive the routine segments from a UI log (i.e., routine segments describe the different behaviours of the routine(s) under analysis, in terms of repeated patterns of performed user actions), (ii) a human-in-the-loop interaction to filter out those segments not allowed (i.e., wrongly discovered from the UI log) by any real-world routine execution by means of declarative constraints [13], and (iii) a routine traces detection component that leverages trace alignment in process mining [12] to cluster all user actions belonging to a specific routine segment into well-bounded routine traces (i.e., a routine trace represents an execution instance of a routine within a UI log). Such traces are finally stored in a dedicated routine-based log, which captures exactly all the user actions happened during many different executions of the routine.

Commercial RPA tools can eventually employ routine-based logs to synthesize executable scripts in the form of SW robots that will emulate the routine behaviour on the UI without the manual modeling of the routines. To this end, the SmartRPA toolFootnote 1 is able to automatically synthesize executable scripts for enacting SW robots at run-time. Notably, the SW robots generated by SmartRPA are obtained to handle the intermediate user inputs that are required during the routine execution, thus enabling to emulate the most suitable routine variant for any specific combination of user inputs as observed in the UI log. This makes the synthesis of SW robots performed by SmartRPA reactive to any user decision found during a routine execution, thus allowing the potential run-time generation of as many SW robots as the routine variants to be emulated [6].

4 Concluding Remarks

The goal of RPA is to automate routines and high-volume tasks, but it currently requires substantial manual intervention of expert users. In this paper, we offer a twofold contribution towards an intelligent and fully automated generation of SW robots from the users’ observed behavior as recorded in UI logs. First, we introduce a reference data model for a standardized specification of UI logs, which enforces interoperability among different RPM-based tools. Second, we present a pipeline of processing steps, implemented trough the SmartRPA approach, to develop executable RPA scripts by solely interpreting the UI logs at hand.

The reference model provides a common, application-independent conceptual framework for user interactions. However, it still has to prove its utility in practice. We therefore want to encourage readers to adopt the model for capturing UI logs in their projects. Compared with the literature approaches to automated RPA script generation from UI logs [15, 18], which enable to automate only the most frequent routine variant among the ones discovered in the UI log, SmartRPA provides a reactive approach that emulates the most suitable routine variant for any specific intermediate user input that is required during the routine execution. As a consequence, this makes the working of SW robots generated by SmartRPA flexible and adaptable to several real-world situations.

The main weakness of SmartRPA relates to the quality of information recorded in real-world UI logs. Since a UI log is fine-grained, routines executed with many different strategies may potentially affect the identification of the routine segments. In addition, SmartRPA is based on a semi-supervised assumption, since the end-delimiters required to untangle the segmentation issue are known a-priori. Conversely, on the positive side, the employed segmentation technique is able to outperform existing literature approaches in terms of supported segmentation variants, in particular when there are many interleaved routine executions recorded in the UI log [3]. For this reason, we consider this contribution as an important step towards the development of an unsupervised approach that employs machine learning techniques to automatically identify the end-delimiters.