1 Introduction

Runtime verification (RV) is a computing analysis paradigm based on observing executions of a system to check its expected behavior. The typical aspects of an RV application are the generation of a monitor from a specification and then the use of the monitor to analyze the dynamics of the system under study. RV has been used as a practical application of formal verification, and as a less ad-hoc approach complementing conventional testing and debugging. Compared to static formal verification, RV gains applicability by sacrificing completeness as not all traces are observed and typically only a prefix of a potentially infinite computation is processed. See [185, 225] for surveys on RV, and the recent book [47].

Most of the practical motivations and applications of RV have been related to the analysis of software. However, there is a great potential for applicability of RV beyond software reliability if one generalizes to new domains beyond computer programs (like hardware, devices, cloud computing and even human centric systems). Novel applications of RV to these areas can have an enormous impact in terms of enabling new solutions or designs, and the potential increase in reliability in a cost effective manner. Many system failures through history have exposed the limitations of existing engineering methodologies and encouraged the study and development of novel formal methods. Ideally, one would like to validate a computational system prior to its execution. However, current static validation methods, such as model checking, suffer from practical limitations preventing their wide use in real large-scale applications. For instance, those techniques are often bound to the design stage of a system and suffer from the state-explosion problem (the unfeasibility to exhaustively explore all system states statically), or cannot handle many interesting behavioral properties. Thus, as of today many verification tasks can only realistically be undertaken by complementary dynamic analysis methods. RV is the discipline of formal dynamic analysis that studies how to detect and ensure, at execution time, that a system meets a desirable behavior.

Even though research on runtime verification has flourished in the last decade,Footnote 1 a big part of the (European) community in the area has recently been gathered via a EU COST action initiativeFootnote 2 in order to explore, among other things, potential areas of application of RV, including finances, medical devices, legaltech, security and privacy, and embedded, cloud and distributed systems.

In this survey paper, we concentrate in the description of different challenging and exciting application domains for RV, others than programming languages. In particular we consider runtime verification in the following application domains:

  • Distributed systems: where the timing of observations may vary widely in a non-synchronised manner (Sect. 2).

  • Hybrid and embedded systems: where continuous and discrete behavior coexist and the resources of the monitor are constrained (Sect. 3).

  • Hardware: where the timing must be precise and the monitor must operate non disruptively (Sect. 4).

  • Security and privacy: where a suitable combination between static and dynamic analysis is needed (Sect. 5).

  • Transactional information systems: where the behavior of modern information systems is monitored, and the monitors must compromise between expressivity and non-intrusiveness (Sect. 6).

  • Contracts and policies: where the connection between the legal world and the technical is paramount (Sect. 7).

  • Huge, unreliable or approximated domains: where we consider systems that are not reliable, or aggregation or sampling is necessary due to large amounts of data (Sect. 8).

In all these cases, we first provide an overview of the domain, and describe sufficient background to present the context and scope. Then, we introduce the subareas of interest addressed in the section, and identify challenges and opportunities from the RV point of view. Sometimes the characteristics and applications are not specific to RV, and in these cases we prefer to describe them in their generality, with the intention to motivate their importance first, to later speculate on how RV can have an impact in these applications and what are the challenges to monitoring. Finally, we do not aim for completeness in the identification of the challenges and admittedly only identify a subset of the potential challenges to be addressed by the RV research community in the next years. We identify the challenges listed as some of the most important.

2 Distributed and decentralized runtime verification

Distributed systems are generally defined as computational artifacts or components that run into execution units placed at different physical locations, and that exchange information to achieve a common goal. A localized unit of computation in such a setup is generally assigned its own process of control (possibly composed of multiple threads), but does not execute in isolation. Instead, the process interacts and exchanges information with other such remote units using the communication infrastructure imposed by the distributed architecture, such as a computer network [32, 117, 173].

Distributed systems are notoriously difficult to design, implement and reason about. Below, we list some of these difficulties.

  • Multiple stakeholders impose their own requirements on the system and the components, which results in disparate specifications expressed in widely different formats and logics that often concern themselves with different layers of abstraction.

  • Implementing distributed systems often involves collaboration across multiple development teams and the use of various technologies.

  • The components of the distributed system may be more or less accessible to analysis, as they often evolve independently, may involve legacy systems, binaries, or even use remote proprietary services.

  • The sheer size of distributed systems, their numerous possible execution interleaving, and unpredictability due to the inherent dynamic nature of the underlying architecture makes them hard to test and verify using traditional pre-deployment methods. Moreover, distributed computation is often characterized by a high degree of dynamicity where all of the components that comprise the system are not known at deployment (for example, the dynamic discovery of web services)—this dynamicity further complicates system analysis.

Runtime Verification (RV) is very promising to address these difficulties because it offers mechanisms for correctness analysis after a system is deployed, and can thus be used in a multi-pronged approach towards assessing system correctness. It is well-known that even after extensive analysis at static time, latent bugs often reveal themselves once the system is deployed. A better detection of such errors at runtime using dynamic techniques, particularly if the monitor can provide the runtime data that leads to the error, can aid system engineers to take remedial action when necessary. Dynamic analysis can also provide invaluable information for diagnosing and correcting the source of the error. Finally, runtime monitors can use runtime and diagnosis information to trigger reaction mechanisms correcting or mitigating the errors.

We discuss here challenges from the domain of Distributed and Decentralized Runtime Verification (DDRV), a broad area of research that studies runtime verification in connection with distributed or decentralized systems, or when the runtime verification process is decentralized. That it, this body of work includes the monitoring of distributed systems as well as the use of distributed systems for monitoring.

Solutions to some of these research efforts exist (see for instance [63, 85, 94, 106, 148, 150, 163, 216, 217]). We refer to [162] for a recent survey on this topic.

2.1 Context and areas of interest

In order to provide context to later describe the challenges for RV, we begin by describing some important characteristics of DDRV and then list some intended applications.

2.1.1 Characteristics

There are a number of characteristics that set DDRV apart from non-distributed RV. These characteristics also justify the claim that traditional RV solutions and approaches commonly do not necessarily (or readily) apply to DDRV. This, in turn, motivates the need for new mechanisms, theories, and techniques. Some characteristics were identified in [79, 158, 203], and recently revisited in [162].

  • Heterogeneity and Dynamicity. One of the reasons that makes distributed systems hard to design, implement and understand is that there are typically many participants involved. Each participant imposes its own requirements ending in a variety of specifications expressed in different formats. In turn, the implementation often involves the collaboration of multiple development teams using a variety of technologies. Additionally, the size and dynamic characteristics of the execution platform of distributed systems allow many possible interleavings of the behaviors of the participating components, which leads to an inherent unpredictability of the executions. Note that the existence of a set of interleavings and the necessity to explore or reason about alternative paths in this set is due to distributed systems being concurrent systems with asynchronous communication. The inherent dynamicity of distributed systems makes this set larger and more complex, and the exploration of the set harder. Consequently, testing and verification with traditional pre-deployment methods are typically ineffective.

  • Distributed Clocks and Latency. Distributed systems can be classified according to the nature of the clocks: from (1) synchronous systems, where the computation proceeds in rounds, (2) timed asynchronous systems, where messages can take arbitrarily long but there is a synchronized global clock, (3) asynchronous distributed systems. In an asynchronous distributed system, nodes are loosely coupled, each having its own computational clock, due to the impracticality of keeping individual clock synchronized with one another. As a result of this asynchrony, the order of computational events occurring at distinct execution units may not be easy (or even possible) to discern.

  • Partial Failure. A requirement of any long-running distributed system is that, when execution units (occasionally) fail, the overall computation is able to withstand the failure. However, the independence of failure between the different components of a distributed system and the unavailability of accurately detecting remote failures, makes designing fail tolerant systems challenging. When designing a solution based on RV, the independence of failure between components is an important characteristic that must be handled by the monitoring infrastructure (for example re-synchronization between living components, rebooting monitors, etc).

  • Non-Determinism. Asynchrony implies fluctuations in latency, which creates unpredictability in the global execution of a distributed system. In addition, resource availability (e.g., free memory) at individual execution units is hard to anticipate and guarantee. These sources of unpredictable asynchrony often induce non-deterministic behavior in distributed computations [155, 156].

  • Multiple Administrative Domains and Multiple Accessibility. In a distributed system, computation often crosses administrative boundaries that restrict unfettered computation due to security and trust issues (e.g., mistrusted code spawned or downloaded from a different administrative domain may be executed in a sandbox). Administrative boundaries also limit the migration and sharing of data across these boundaries for reasons of confidentiality. Also, different components may feature different accessibility when it comes to analysis, maintenance, monitorability, instrumentation, and enforcement. The connected technologies may range from proprietary to the public domain, from available source code to binaries only, from well-documented developments to sparsely documented (legacy) systems.

  • Mixed Criticality. The components and features of a distributed system may not be equally critical for the overall goal of the system. The consequences of malfunctioning of certain components are more severe than the malfunctioning of others. For instance, the failure of one client is less critical than the failure of a server which many clients connect to. Also, some components could be critical for preventing data or financial loss, or alike, whereas others may only affect performance or customer satisfaction.

  • Evolving Requirements. The execution of a distributed system is typically characterized by a series of long-running reactive computational entities (e.g., a web server that should ideally never stop handling client requests). Such components are often recomposed into different configurations (for example, service-oriented architectures) where their intended users change. In such settings, it is reasonable to expect the correctness specifications and demands to change over the execution of the system, and to be composed of smaller specifications obtained from different users and views.

2.1.2 Applications

We briefly mention some of the existing or envisioned application areas of DDRV, namely concurrent software, new programming paradigms such as reversible computing [161], the verification of distributed algorithms or distributed data bases, privacy and security (intrusion detection systems, auditing of policies on system logs [60, 172], decentralized access control [307]), blockchain technology [247], monitoring software-defined networks with software defined monitoring, robotics (e.g., distributed swarms of autonomous robots), and home automation.

2.1.3 Enforcing interleavings

Sometimes the system that one analyzes dynamically—using runtime verification—is distributed in nature. For example, multithreaded programs can suffer from concurrency errors, particularly when executing in modern hardware platforms, as multicore and multiprocessor architectures are very close to distributed systems. This makes the testing of concurrent programs notoriously difficult because it is very hard to explore the interleavings that lead to errors [31]. The work in [233] proposes to use enforcement exploiting user-specified properties to generate local monitors that can influence the executions. The goal is to improve testing by forcing promising schedules that can lead to violations, even though violations of the specified property can also be prevented by blocking individual threads whose execution may lead to a violation. The process for generating monitors described in [233] involves the decomposition of the property into local decentralized monitors for each of the threads.

2.1.4 Observing distributed computations

Checking general predicates in a distributed system is hard, since one has to consider all possible interleavings (which may be exponential in size). Techniques like computation slices [12, 31, 100, 241] have been invented as a datatype for the efficient distributed detection of predicates. Slices allow to circumvent an explicit exploration of a large set of interleaving paths by an implicit exploration of a smaller representation. Slices are a concise approximation of the computation, which are precise enough to detect the predicate because slices guarantee that if a predicate is present in a slice of a computation then the predicate occurred in some state of the computation.

Predicate detection can involve a long runtime and large memory overhead [100] except for properties with specific structure (that is, for some fragments of the language of predicates). Current efficient solutions only deal with sub-classes of safety properties like linear, relational, regular and co-regular, and stable properties. Even though most techniques for predicate detection [12, 110, 241] send all local events to a central process for inspection of its interleavings, some approaches (like [100]) consider purely distributed detection.

2.1.5 Monitor decomposition and coordination

Most approaches to monitoring distributed systems consider that the system is a black-box that emits events of interest, while others use a manual instrumentation and monitor placement. Some exceptions, for example [7, 8, 148, 158, 163, 249], investigate how to exploit the hierarchical description of the system to generate monitors that are then composed back with the original system. The modified system shares the original decomposition (of course implementing its functionality) and includes the monitors embedded, but this approach requires to have access to the system description and is specific to a given development language. Although the work in [148] does not specifically target distributed systems, the compiler can generate a distributed system in which case the monitor will be distributed as well. A similar approach is presented in [30, 93, 94, 163], where a framework for monitoring asynchronous component-based systems is presented based on actors—self contained software entities that are easily distributed.

2.1.6 Monitoring efficiency

Most RV works assume a single monitor that receives all events and calculates the verdicts. Even though a single monitor can be implemented for decentralized and distributed systems by sending all information to a central monitor, distribution itself can be exploited to coordinate the monitoring task more efficiently. Many research efforts study how to gain more efficient solutions by exploiting the locality in the observations to also perform partially the monitoring task locally as much as possible. For example, the approaches in [30, 31, 93, 94, 148] exploit the hierarchical structure of the system to generate local monitors, and [5, 95, 163] exploit the structure and semantics of the specification. In [7], the authors show how decentralized monitor specifications can be consolidated into regular descriptions that guarantee bounded state space. Lowering overheads is also pursued in [106] by offloading part of the monitoring computation to the computing resources of another machine.

When atomic observations of the monitored system occur locally, monitors can be organized hierarchically according to the structure of the original specification [65, 66, 105, 158]. Substantial savings in communication overheads are obtained because often a verdict is already reached in a sub-formula. All these results are limited to LTL and regular languages in [145]. Decentralized monitoring assumes that the computation proceeds in rounds, so distributed observations are synchronized and messages eventually arrive. The assumption of bounded message delivery is relaxed in [105].

2.1.7 Fault tolerance

One of the main and most difficult characteristics of distributed systems is that failures can happen independently (see [160]). Most of the RV efforts that consider distributed systems assume that there are no errors, that is, nodes do not crash and messages are not corrupted, lost, duplicated or reordered. Even worse, failure dependencies between components can be intricate and the resulting patterns of behaviors can be difficult to predict and explain. At the same time, one of the common techniques for fault tolerance is the replication of components so this is a promising approach for monitoring too [159]. For example, [154] studies the problem of distributed monitoring with crash failures, where events can be observed from more than one monitor, and where the distributed monitoring algorithm tries to reach a verdict among the surviving monitors.

Another source of failure is network errors, studied in [4, 54, 63], which targets the incomplete knowledge caused by network failures and message corruptions and attempts to handle the resulting disagreements. Node crashes are handled because message losses can simulate node crashes by ignoring all messages from the crashed node.

2.2 Challenges

The characteristics outlined bring added challenges to obtain effective DDRV setups.

C 2.1

Distributed Specifications. It is a well-established fact that certain specifications cannot be adequately verified at runtime [4, 5, 7, 99, 147, 157, 271]. The partial ordering on certain distributed events, due to distributed clocks hinders the monitoring of temporal specifications requiring a specific relative ordering of these events [31]. As such, the lack of a global system view means that even fewer specifications can be monitored at runtime. Even though some work exists proposing specific languages tailored to distributed systems [298], the quest for expressive and tractable languages is an important and challenging goal.

C 2.2

Monitor Decomposition, Placement, and Control. The runtime analysis carried out by monitors needs to be distributed and managed across multiple execution nodes. As argued originally in [158], and later investigated empirically in works such as [31, 66], the decomposition and placement of monitoring analysis is an important engineering decision that affects substantially the overheads incurred such as the number and size of messages, the communication delay, the spread of computation across monitors [137]. Such placement also affects the administrative domains under which event data is analyzed and may compromise confidentiality restrictions and lead to security violations that may be due to the communication needed by monitors to reach a verdict (for instance if monitors communicate partial observations or partial evaluations of the monitored properties).

C 2.3

Restricted Observability. The flip side of security and confidentiality constraints in distributed systems translates into additional observability constraints that further limit what specifications can be monitored in practice. Distributed monitors may need to contend with traces whose event data may be obfuscated or removed in order to preserve confidentiality which, in turn, affects the nature of the verdicts that may be given [4, 180].

C 2.4

Fault Tolerance. DDRV has to contend with the eventuality of failure in a distributed system [63]. Using techniques involving replication and dynamic reconfiguration of monitors, DDRV can be made tolerant to partial failure. More interestingly, fault-tolerant monitoring algorithms could provide reliability to the monitors. A theory allowing to determine which specifications combined with which monitoring algorithms could determine the guarantees that should be investigated.

C 2.5

Deterministic Analysis. Since monitoring needs to be carried out over a distributed architecture, this will inherently induce non-deterministic computation. In spite of this, the monitoring analysis and the verdicts reported need to feature aspects such as strong eventual consistency [137] or observational verdict determinism [155, 156], and conceal any internal non-determinism. In practice, this may be hard to attain (e.g., standard determinization techniques on monitors incur triple exponential blowup [6]); non-deterministic monitor behavior could also compromise the correctness of RV setup and the validity of the verdicts reported [155].

C 2.6

Limits of Monitorability. Distributed systems impose further limitations on the class of properties that can be detected (see [5, 8, 67, 121, 146, 147, 157, 271, 314] for notions of monitorability for non-distributed systems and [31, 137] for decentralized systems [138]). Associated with the challenge of exploring new specification languages for monitoring distributed systems, there is the need to discern the limitations of what can be detected dynamically.

3 Hybrid systems

Hybrid systems (HS) [189] are a powerful formal framework to model and to reason about systems exhibiting a sequence of piecewise continuous behaviors interleaved with discrete jumps. In particular, hybrid automata (HA) extend finite state-based machines with continuous dynamics (generally represented as ordinary differential equations) in each state (also called mode). HS are suitable modelling techniques to analyze safety requirements of Cyber-Physical Systems (CPS). CPS consist of computational and physical components that are tightly integrated. Examples include engineered (i.e., self-driving cars), physical and biological systems [51] that are monitored and/or controlled through sensors and actuators by a computational embedded core. The behavior of CPS is characterized by the real-time progressions of physical quantities interleaved by the transition of discrete software and hardware states. HA are typically employed to model the behavior of CPS and to evaluate at design-time the correctness of the system, and its efficiency and robustness with respect to the desired safety requirements.

HA are called safe whenever given an initial set of states, the possible trajectories originated from these initial conditions are not able to reach a bad set of states. Proving a safety requirement requires indeed to solve a reachability analysis problem that is generally undecidable [27, 189] for hybrid systems. However, this did not stop researchers to develop, in the last two decades, semi-decidable efficient reachability analysis techniques for particular classes of hybrid systems [14, 29, 102, 119, 120, 164,165,166, 181, 213].

Despite all this progress, the complexity to perform a precise reachability analysis of HS is still limited in practice to small problem instances (e.g., [26,27,28, 189]). Furthermore, the models of the physical systems may be inaccurate or partially available. The same may happen when a CPS implementation employs third-party software components for which neither the source code or the model is available.

A more practical solution, close to testing, is to monitor and to predict CPS behaviors at simulation-time or at runtime [46]. The monitoring technology include the techniques to specify what we want to detect and to measure and how to instrument the system. Monitoring can be applied to:

  • Real systems during their execution, where the behavioral observations are constructed from sensor readings.

  • System models during their design, where the behaviors observed correspond to simulation traces.

In the following, we provide an overview of the main specification-based monitoring techniques available for CPS and HS. We also show the main applications of the monitoring techniques in system design and finally we discuss the main open challenges in this research field.

3.1 Context and areas of interest

To provide some context we first describe specification languages for hybrid systems, then discuss from specific issues of monitoring continuous and hybrid systems and then briefly present the state-of-the-art with respect to tools for monitoring these systems. Finally, we list applications of RV to hybrid systems.

3.1.1 Specification languages

One of the main specification language that has been used in the research community for the formal specification of continuous and hybrid systems is Signal Temporal Logic (STL) [234, 235]. STL extends Metric Interval Temporal Logic (MITL) [15], a dense-time specification formalism, with predicates over real-valued variables. This mild addition to MITL has an important consequence, despite its simplicity—the alphabet in the logic has an order and admits a natural notion of a distance metric. Given a numerical predicate over a real-valued variable and a variable valuation, we can henceforth answer the question on how far the valuation is from satisfying or violating the predicate. This rich feedback is in contrast to the classical yes/no answer that we typically get from reasoning about Boolean formulas. The quantitative property of numerical predicates can be extended to the temporal case, giving rise to the quantitative semantics for STL [134, 144].

We can use with ease STL to specify real-time constraints and complex temporal relations between events occurring in continuous signals. These events can be trivial threshold crossings, but also more intricate patterns, identified by specific shapes and durations. We are typically struggling to provide elegant and precise description of such patterns in STL. We can also observe that these same patterns can be naturally specified with regular expressions, as time-constrained sequences (concatenations) of simple behavior descriptions.

Timed Regular Expressions (TRE) [25], a dense-time extension of regular expressions, seem to fit well our need of talking about continuous signal patterns. While admitting natural specification of patterns, regular expressions are terribly inadequate for specification of properties that need universal quantification over time. For instance, it is very difficult to express the classical requirement “every request is eventually followed by a grant” with conventional regular expressions (without negation and intersection operators). It follows that TRE complements STL, rather than replacing it.

CPS consist of software and physical components that are generally spatially distributed (e.g., smart grids, robotics teams) and networked at every scale. In such scenario, temporal logics may not be sufficient to capture not only time but also topological and spatial requirements. In the past five years, there has been a great effort to extend STL for expressing spatio-temporal requirements. Examples include Spatial-Temporal Logic (SpaTeL) [43, 183], the Signal Spatio-Temporal Logic (SSTL) [251] and the Spatio-Temporal Reach and Escape Logic (STREL) [44].

3.1.2 Monitoring continuous and hybrid systems

We first discuss some issues that are specific to the analysis of continuous and hybrid behaviors. We also provide an overview of different methods for monitoring STL with qualitative and quantitative semantics and matching TRE patterns.

Handling Numerical Predicates In order to implement monitoring and measuring procedures for STL and TRE, we need to address the problem of the computer representation of continuous and hybrid behaviors. Both STL and TRE have a dense-time interpretation of continuous behaviors which are assumed to be ideal mathematical objects. This is in contrast with the actual behaviors obtained from simulators or measurement devices and which are represented as a finite collection of value-timestamp pairs (w(t), t), where w(t) is the observed behavior. The values of w at two consecutive sample points t and \(t'\) do not precisely determine the values of w inside the interval \((t,t')\). To handle this issue pragmatically, interpolation can be used to “fill in” the missing values between consecutive samples. Some commonly used interpolations to interpreted sampled data are step and linear interpolation. Monitoring procedures are sensitive to the interpolation used.

Monitoring STL with Qualitative and Quantitative Semantics An offline monitoring procedure for STL properties with qualitative semantics is proposed in [235]. The procedure is recursive on the structure (parse-tree) of the formula, propagating the truth values upwards from input behaviors via super-formulas up to the main formula. In the same paper, the procedure is extended to an incremental version that computes the truth value of the sub-formulas along the observation of new sampling points.

There are several algorithms available in the literature for computing robustness degree of STL formulas [130, 132, 134, 144, 200, 201, 285]. The algorithm for computing the space robustness of a continuous behavior with respect to a STL specification was originally proposed in [144]. In [132], the authors develop a more efficient algorithm for measuring space robustness by using an optimal streaming algorithm to compute the min and the max of a numeric sequence over a sliding window and by rewriting the timed until operator as a conjunction of simpler timed and untimed operators. The procedure that combines monitoring of both space and time robustness is presented in [134].

Finally, the following two approaches have been proposed to monitor the space robustness of a signal with respect to an STL specification. The first approach proposed in [130] considers STL formulas with bounded future and unbounded past operators. The unbounded past operators are efficiently evaluated exploiting the fact that the unbounded history can be stored as a summary in a variable that is updated each time a new value of the signal becomes available. For the bounded future operators, the algorithm computes the number of look-ahead steps necessary to evaluate these operators and then uses a model to predict the future behavior of the system and to estimate its robustness. The second approach [126] computes instead an interval of robustness for STL formulas with bounded future operators.

Matching TRE Patterns An offline procedure for computing the set of all matches of a timed regular expression in a continuous or hybrid behavior was proposed in [308]. The procedure relies on the observation that any match set can always be represented as a finite union of two-dimensional zones, a special class of convex polytopes definable as the intersection of inequalities of the form \((x < a)\), \((x > a)\) and \((x - y <a)\). This algorithm has been recently extended to enable online matching of TRE patterns [309].

3.1.3 Tools

The following tools are publicly available and they support both the qualitative and the quantitative semantics for monitoring CPSs.

  1. 1.

    AMT 2.0 [254]: available at http://www-verimag.imag.fr/DIST-TOOLS/TEMPO/AMT/content.html

  2. 2.

    Breach [131]: available at https://github.com/decyphir/breach

  3. 3.

    S-Taliro [17]: available at https://sites.google.com/a/asu.edu/s-taliro/

  4. 4.

    U-Check [82]: available at https://github.com/dmilios/U-check

The AMT 2.0 tool [254] provides a framework for the qualitative and quantitative analysis of xSTL, which is an extended Signal Temporal Logic that integrates TRE with STL requirements over analog system output signals. The software tool AMT is a standalone executable with a graphical interface enabling the user to specify xSTL properties, the signals and whether the analysis is going to be offline or incremental. The new version of the tool provides also the possibility to compute quantitative measurements over segments of the signals that match the properties specified using TRE [152]. AMT 2.0 offers also a trace diagnostics [151] mechanism that can be used to explain property violations.

Breach [131] and S-Taliro [17] are add-on Matlab toolboxes developed for black-box testing based verification [143] of Simulink/Stateflow models. These tools have also been used for other applications including parameter mining [328, 332], falsification [2] to synthesis [277].

Finally, U-Check [82] is a stand-alone program written in Java, which deals with statistical model checking of STL formulas and parameter synthesis for stochastic models described as Continuous-Time Markov Chains.

3.1.4 Applications

Specification-based monitoring of cyber-physical systems (CPS) [253] has been a particularly fertile field for research on runtime verification leading to several theoretical and practical applications such as quantitative semantics, simulation-guided falsification, real-time online monitoring, system design and control. Here is an overview of the most relevant applications in the CPS scenario:

  • Real-time Monitoring of CPS. The complexity of the new generation of digital system-on-chip (SoC) and analog/mixed-signal systems (AMS) requires new efficient techniques to verify and to validate their behavior both at physical and software level. The simulation of such systems is now too time-consuming to be economically feasible. An alternative approach is to monitor the system under test (SUT) online by processing the signals and software traces that are observable after instrumentation [253]. This approach leverages the use of dedicated hardware accelerators such as Field Programmable Gate Arrays (FPGA) and of proper synthesis tools [199, 200, 297] that can translate temporal logic specifications into hardware monitors. This will be discussed in more detail in the next section dedicated to hardware supported runtime verification.

  • Falsification-based Testing. Specification-based monitoring is a very useful technique also at design-time. The engineers generally use MathWorks \(^{\mathrm{TM}}\)   SimulinkFootnote 3 or OpenModelicaFootnote 4 toolsets to model CPS functionalities. These models are complex hybrid systems that are very challenging to verify and test. Falsification-based testing [2, 3, 17, 18, 142, 252, 331] aims at automatically generating counter-examples that violate the desired requirements in a CPS model. This approach employs a formal specification language such as STL to specify the desired requirements, and a monitor (the oracle), that verifies each simulation trace for correctness against the requirement and it provides an indication as to how far the trace is from violation. For this reason, in the last decade there was a great effort to develop quantitative semantics for STL [11, 134, 258, 284, 285], where the binary satisfaction relation is replaced with a quantitative robustness degree function. The positive and negative sign of the robustness value indicates whether the formula is satisfied or violated, respectively. This quantitative interpretation can be exploited in combination with several heuristics (e.g., ant colony, gradient ascent, statistical emulation) to optimize the CPS design in order to satisfy or falsify a given formal requirement [2, 3, 17, 18, 45, 133, 142, 252, 331].

  • From Monitoring to Control Synthesis. The use of formal logic-based languages has also enabled control engineers to build tools that automatically synthesize controllers starting from a given specification [70]. Temporal logics such as Metric Temporal Logic (MTL) [214], and Signal Temporal Logic (STL) [234] have been employed to specify time-dependent tasks and constraints in many control system applications [17, 277, 321]. In the context of Model Predictive Control (MPC) [71, 212, 258, 276], the monitoring of temporal logics constraints over the simulated traces of a plant model can be used to find iteratively the input that will optimize the robustness for the specification over a finite-horizon.

3.2 Challenges

Although specification-based monitoring of CPS is a well-established research area, there are still many open challenges that need to be addressed. We now discuss some of the most important remaining challenges.

C 3.1

Autonomous CPS. The recent advances in machine learning (ML) has led to new fascinating artificial intelligence (AI) applications, such as autonomous CPS that can perceive, learn, decide and execute tasks independently, or with minimal human intervention in unpredictable environments. The lack of predictability, that results from using learning-enabled components, requires to think novel approaches for providing assurance. The main challenge is to develop new methods that go beyond the current state-of-the-art of RV technology to guarantee the trustworthiness of autonomous CPS by providing dynamic safety and security assurance mechanisms.

C 3.2

From design-time to runtime. Specification languages for CPS typically assume a perfect mathematical world in which time is continuous and the state variables are all observable with infinite precision. This level of abstraction is suitable to reason about CPS at the time of their design, where the system is modeled with differential equations and can be simulated with arbitrary precision and perfect observability. However, the passage from a CPS model to its implementation results in a number of effects that runtime monitors applied during the system operation need to take into account. For instance, CPS can be only observed at sampled points in time, some state variables may not be observable and the sensors may introduce noise and inaccuracies into measurements, including sampling noise. As a consequence, there is an urgent need to address these questions in the context of runtime verification of CPS.

C 3.3

Limited resources. CPS introduce some specific constraints on available resources that need to be taken into account by runtime verification solutions. CPS are reactive systems operating at a certain frequency, hence the monitor needs to operate at least at the same speed as the system. In contrast to classical software, instrumentation of some components in the CPS can be hard or impossible. It follows that runtime monitors may need to rely on partially observable streams of data. CPS are often safety-critical and have hard timing constraints. As a consequence, runtime monitors must not alter the timing-related behavior of the observed system. Developing monitoring solutions that take into consideration specific limitations of CPS remains an important challenge that needs still to be properly addressed.

C 3.4

From real-time to spatial and spectral specifications. Most of the existing work on runtime monitoring of CPS is focused on real-time temporal properties. However, CPS often consist of networked spatially distributed entities where timing constraints are combined with spatial relations between the components. In addition, many basic properties of continuous CPS entities are naturally definable in spectral (for instance frequency) domain [98, 135]. There is a necessity to study specification formalisms that gracefully integrate these important CPS aspects.

C 3.5

Fault-localisation and explanation. Detecting a fault while monitoring a CPS during its design or deployment time involves understanding and correcting the error. Complementing runtime verification methods with (semi) automated fault localisation [48] and explanation could significantly reduce the debugging efforts and help the engineer in building a safe and secure system.

4 Hardware

Hardware supported runtime verification (HRV) studies how to use hardware to build dynamic solutions for reliabilty assesment. The goal is to alleviate the extensive analysis required for complex designs, by shifting from offline and limited data sets to online simultaneous non-intrusive analysis. The use of hardware brings an immense potential for runtime observation and can even allow the continuous assessment of the behavior exhibited by the system. Observation and simultaneous correctness checking of system internals can reach a level of detail that is orders of magnitude better than today’s tools and systems provide. Note that the use of “hardware” in HRV refers to the use of hardware as an element of the RV solution, even though the system under study can also be analyzed at a low-level that includes hardware characteristics.

Online runtime verification hardware-based approaches may take advantage of multiple technologies, for example, hardware description languages and reconfigurable hardware. The combination of these technologies provides the means for observability, non-intrusiveness, feasibility, expressiveness, flexibility, adaptability and responsiveness of hardware-based monitors that observe and monitor a target system and allow to react to erroneous behavior. In addition, HRV can be used for other analysis, such as performance monitoring.

Several solutions have been proposed that approach runtime verification (RV) differently, diverging on the methodologies used, goals and target system-induced limitations. Whether the monitor executes on external hardware or on-system, what the monitor watches (that is, the meaningful events it cares about: the events of interest), how it is connected to the system and what is instrumented or not, are dependent on both the characteristics of the system being monitored and the goals of the monitoring process.

4.1 Context and areas of interest

To present the context of HRV in order to later describe the challenges, we describe the following aspects separately: the pursue of non-intrusiveness monitoring, the study of the feasibility and limitations of hardware-based monitoring, the landscape of design approaches and the flexibility. We finally list some existing use cases.

4.1.1 Non-intrusiveness

Ideally, observing and monitoring components should not interfere with the normal behavior of the system being observed, thus negating what is called “the observer effect” or “the probe effect” [168], in which the observing methodology hinders the system behavior by affecting some of its functional or non-functional (e.g., timeliness) properties. Hardware-based approaches are inherently non-intrusive, while software-based solutions normally exhibit some degree of intrusiveness, even if minimal. Therefore, it is widely acknowledged that these approaches must be used with care.

For example, the delays implicitly associated with the insertion of software-based probes may ill affect the timing and synchronisation characteristics of concurrent programs. Moreover, and perhaps less intuitively, the removal of such probes from real-time embedded software which, in principle, leads to shorter program/task execution times and may render a given task set unschedulable due to changes in the corresponding cache-miss profile [232, 248, 320]. Non-intrusiveness, i.e. the absence of interference may then be referred to as a RV constraint. RV constraints are not only relevant, but in fact fundamental, for highly critical systems [268].

A comprehensive overview of various hardware (including on-chip), software and hybrid (i.e., a combination of hardware and software) methodologies for system observation, monitoring and verification of software execution in runtime is provided in [316].

System observing solutions can be designed to be directly connected to some form of system bus, enabling information gathering regarding events of interest, such as data transfers and signalling taking place inside the computing platform, namely instruction fetch, memory read/write cycles and interrupt requests, with no required changes on the target system’s architecture. Examples of such kind of hardware-based observation approaches are proposed in [207, 265, 270, 280].

As emphasized in [316] observing mechanisms should: (1) be minimally intrusive, or preferably completely non-intrusive, so as to respect the RV constraint; (2) provide enough information about the target system so that the objectives of runtime verification can be met.

4.1.2 Observability

Another important aspect raised in [316] is the occasional limited observability of program execution with respect to its internal state and data information. In general, software-based monitoring may have access to extensive information about the operation of a complex system, in contrast to the limited information available to hardware probes [316].

Thus, one first challenge is that hardware-based probes must be capable of observing enough information about the internal operation of the system to fulfil the purpose of the monitoring [316]. Gaining access to certain states or information is often problematic, since most systems do not provide access to system operation and software execution details. So, observability is sometimes limited to the data made available or accessible to observing components. Low observability of target system operation affects not only traditional hardware monitors, but also may jeopardize hybrid monitoring and may deem these observing and monitoring techniques ineffective.

4.1.3 Feasibility

General purpose Commercial Off-The-Shelf (COTS) platforms offer limited observing and monitoring capabilities. For example, in those platforms based on Intel x86 architectures observability is restricted to the Intel Control Flow Integrity [196] and to the Intel Processor Trace [282] facilities. Trying to enhance system observability through physical probing implies either a considerable engineering effort [209] or is restricted to specific behaviors, such as input/output operations [265].

The trend to integrate the processing entities together with other functional modules of a computing platform in an Application Specific Integrated Circuit (ASIC), often known as System on a Chip (SoC), can dramatically affect the overall system observability, depending on whether or not special-purpose observers are also integrated.

The shortcomings and limitations of debug and trace resources regarding runtime system observation is analyzed in [222], concluding that the deep integration of software and hardware components within SoC-based devices hinders the use of conventional analysis methods to observe and monitor the internal state of those components. The situation is further exacerbated whenever physical access to the trace interfaces is unavailable, infeasible or cost prohibitive.

With the increased popularity of SoC-based platforms, one of the first on-chip approaches to SoC observability was introduced in [301], where the authors presented MAMon, a hardware-based probe-unit integrated within the SoC and connected via a parallel-port link to a host-based monitoring tool environment that performs both logic-level (e.g., interrupt request assertion detection) and system-level (e.g., system call invocation) monitoring. This approach can either be passive (by listening to logic- or system-level events) or activated by (minimally intrusive) code instrumentation.

Many SoC designs integrate modules made from Intellectual Property (IP) cores. An IP core design is pre-verified against its functional specification, for example through assertion-based verification methods. In hardware-based designs, assertions are typically written in verification languages such as the Property Specification Language (PSL) [195] and System Verilog Assertions (SVA) [194]. The pre-verification of IP core designs contributes to reduce the effort placed in the debug and test of the system integration cycle.

The work [306] presents an in-circuit RV solution that targets the monitoring of the hardware itself rather than software. Runtime verification is done by means of in-circuit temporal logic-based monitors. Design specifications are separated into compile-time and runtime properties, where runtime properties cannot be verified at compile-time, since they depend on runtime data. Compile-time properties are checked by symbolic simulation. Runtime properties are verified by hardware monitors being able to run at the same speed as the circuits they monitor.

System-wide observation of IP core functionality requires the specification of a set of events to be observed and a set of observation probes. The IP core designer will be the best source of knowledge for determining which event probes can provide the highest level of observability for each core. Such kind of approach is followed in [221], for the specification of a low-level hardware observability interface: a separate dedicated hardware observability bus is used for accessing the hardware observation interface.

The approach described in [221] was further extended in [222] to include system level observations, achieved through the use of processor trace interfaces. The solution discussed in [222] introduces a System-level Observation Framework (SOF) that monitors hardware and software events by inserting additional logic within hardware cores and by listening to processor trace ports. The proposed SOF provides visibility for monitoring complex execution behavior of software applications without affecting the system execution. Engineering and evaluation of such approaches has resorted to FPGA-based prototyping [221, 222].

Support for such kind of observation can be found also in modern processor architectures with multiple cores, implemented as single chip solutions and natively integrating embedded on-chip special-purpose observation resources, such as the ARM CoreSight [22, 255].

4.1.4 Design approaches

Nowadays there are two approaches for embedded multicore processor observation. Software instrumentation is easy to use, but very limited for debugging and testing (especially for integration tests and higher levels). A more sophisticated approach and key element in multicore observation are embedded trace based emulators. A special hardware unit observes the processor’s internal states, compresses and outputs this information via a dedicated trace port. An external trace device records the trace data stream and forwards the data after the observation period to, e.g. a personal computer for offline decompression and processing. Unfortunately, this approach still suffers from serious limitations in trace data recording and offline processing:

  • Trace trigger conditions are limited and fixed to the sparse functionality implemented in the “embedded trace” unit.

  • Because of the high trace data bandwidth it is impracticable on today’s storage systems to save all the data obtained during an arbitrary long observation.

  • There is a discrepancy between trace data output bandwidth and trace data processing bandwidth, which is usually several orders of magnitude slower. This results in a very short observation period and a long trace data processing time, which renders the debugging process inefficient.

Hardware supporting online runtime verification could overcome these limitations. Trace data is not stored before being pre-processed and verified, because both are done online. Debugging and runtime verification are accomplished without any noticeable interference with the original system execution. Verification is based on a given specification of the system’s correct behavior. In case a misbehavior is detected, further complex processing steps are triggered. This challenging solution enables an autonomous, arbitrary enduring observation and brings out the highest possible observability from “embedded trace” implementations.

Other solutions place the observation hardware inside the processing units, which may, in some situations, require their modification. Some simple modifications may enable lower-level and finer-grained monitoring, for example by allowing the precise instant of an instruction execution to be observed. The choice of where to connect a runtime verification hardware depends on the sort of verification one aims to perform and at which cost, being a design challenge.

A Non-Intrusive Runtime Verification (NIRV) observer architecture for real-time SoC-based embedded systems is presented in [270]. The observer (also called Observer Entity, OE) synchronously monitors the SoC bus, comparing the values being exchanged in the bus with a set of configured observation points, the events of interest. Upon detection of an event of interest, the OE time-stamps the event and sends it an external monitor. This approach is extended in [178] to enforce system safety and security using a more precise observation of programs execution, which are secured through the (non-intrusive) observation of the buses between the processor and the L1 cache sub-system.

A wide spectrum of both functional and non-functional properties can be targeted by these RV approaches, from timeliness to safety and security, preventing misbehavior overall. The effectiveness of system observability is crucial for securing the overall system monitoring. Hardware-based observation is advantageous given its non-intrusiveness, but software-based observation is more flexible, namely with respect to capturing of context-related data.

4.1.5 Flexibility: (self-)adaptability and reconfiguration

Requirements for (self-)adaptability to different operational conditions call for observers (and monitors) flexibility, which may be characterized by a ready capability to adapt to new, different, or changing needs. Flexibility implies that observing resources should be re-configurable in terms of the types and nature of event triggers. This configurability may be defined via configuration files, supported online by self-learning modules, or a combination of both. Reconfigurable hardware implementations usually provide sufficient flexibility to allow for changes of the monitored specification without re-synthesising the hardware infrastructure. This is a fundamental characteristic since logic synthesis is a very time-consuming task and therefore unfit to be performed online. Observer and monitor reconfigurability can be obtained in the following ways:

  • Using reconfiguration registers that can be changed online [270], a flexible characteristic that supports simple to moderate adaptability capabilities. Examples include to redefine the address scope for a function stack frame, upon its call, or to define function’s calling addresses upon dynamic linking with shared object libraries.

  • Selecting an active monitor or a monitor specification from a predefined set of mutually exclusive monitors [286]. This corresponds to a mode change in the operation of the system. Mode changes needs to secure overall system stable operations [266].

  • Using a reconfigurable single monitor [275], which allows to update the monitor through the partial reconfiguration capabilities enabled by modern FPGAs.

The approach in [275] implements intrusion detection in embedded systems by detecting behavioral differences between the correct system and the malware. The system is implemented using FPGA logic to enable the detection process to be regularly updated and adapt to new malware and changing system behavior. The idea is to protect against the execution of code that is different from the correct code the system designer intends to execute. The technique uses hardware support to enable attack detection in real time, using finite state machines.

System adaptation triggered by non-intrusive RV techniques is approached in [286] for complex systems, such as Time- and Space-Partitioned (TSP) systems, where each partition hosts a (real-time) operating system and the corresponding applications. Special-purpose hardware resources provide support for: partition scheduling, which are verified in runtime through (minimally intrusive) RV software; process deadline violation monitoring, which is fully non-intrusive while deadlines are fulfilled. Process level exception handlers, defined the application programmer, establish the actions to be executed by software components when a process deadline violation is detected. The monitoring component which analyzes the observed events (the trace data) may be a component belonging to RV hardware itself, checking the system behavior as it observes.

4.1.6 Use case examples

Given the numerous possibilities for implementing RV in hardware, multiple contributions have been made that tackle the ongoing search for improvement of hardware-based RV monitors. Some solutions address monitoring and verification in a single instance [280]. Here, the verification procedure is mapped into soft-microcontroller units, embedded within the design, and use formal languages such as past-time Linear Temporal Logic (ptLTL). An embedded CPU is responsible for checking ptLTL clauses in a software-oriented fashion.

A System Health Management technique was introduced in [281] which empowers real-time assessment of the system status with respect to temporal-logic-based specifications and also supports statistical reasoning to estimate its health at runtime. By seamlessly intercepting sensor values through read-only observations of the system bus and by on-boarding their platform (rt-R2U2) aboard an existing FPGA already built into the standard UAS (Unmanned Aerial Systems) design, system integration problems of software instrumentation or added hardware were avoided, as well as intrusiveness.

A runtime verification architecture for monitoring safety critical embedded systems which uses an external bus monitor connected to the target system, is presented in [206]. This architecture was designed for distributed systems with broadcast buses and black-box components, a common architecture in modern ground vehicles. This approach uses a passive external monitor which lines up well against the constraints imposed by safety-critical embedded systems. Isolating the monitor from the target system helps ensure that system functionality and performance is not compromised by the inclusion of the monitor.

The use of a hardware-based NIRV approach for mission-level adaptation in unmanned space and aerial vehicles is addressed in [287] with the goal to contribute to mission/vehicle survivability. For each phase of a flight, different schedules are defined to three modes: normal, survival, recovery. The available processor time is allocated to the different vehicle functions accordingly with its relevance within each mode: normal implies the execution of the activities defined for the mission; survival means the processor time is mostly assigned to fundamental avionic functions; recovery foresees also the execution of fault detection, isolation and recovery functions.

Gouveia and Rufino [178] attack the problem of fine-grained memory protection in cyber-physical systems using a hardware-based observation and monitoring entity are presented. To ensure the security of the observer itself, the monitor is designed as a black box, allowing it to be viewed in terms of its input and output but not its internal functioning and thus preventing malicious entities from hijacking its behavior.

No previous study concerning hardware-based observability has tackled the problem of applying the concepts and techniques to the non-intrusive observation and monitoring of programs in interpreted languages, such as Python and Java bytecode, running on the corresponding virtual machines.

4.2 Challenges

C 4.1

Observability. There is no general results on defining which hardware entities (system bus, processor internal buses, IP core internals) of a system should be instrumented to guarantee the required observability and how to probe such entities. In general, observation at different levels of abstraction should be supported, from logic-level events (e.g., interrupt, request, assertion) up to system (e.g., system call invocation) and application levels (e.g., value assigned to a given variable).

C 4.2

Effectiveness. To ensure that hardware-based probing is able to provide effective system observability, meaning all the events of interest should be captured, while maintaining the complexity of hardware instrumentation in conformity with SWaP (Size, Weight and Power) constraints. This is especially important for observation and monitoring of hardware components, where the RV resources should have a much lower complexity than the observed infrastructure, but this results could also be applicable to the monitoring of software components.

C 4.3

Feasibility and flexibility. To handle the potentially high volumes of trace data produced by extensive system observation is challenge. It includes confining the observed events of interest, and the use of advanced compression, pre-processing and runtime verification techniques to reduce the gap between trace data output and trace data processing capabilities. Also, mapping of formal specification of system properties into actual observing and monitoring actions, making use of a minimal set of highly effective hardware/software probing components and monitors. If applicable, provide support for flexible observation and monitoring, thus opening room for the integration of RV techniques in (self-)adaptable and reconfigurable systems.

C 4.4

Hybrid approaches for observability. Combining software-based instrumentation with hardware-based observability in a highly effective hybrid approach, to: (1) Capture program execution flows and timing, without the need for special-purpose software hooks; (2) Observe fine-grained data, such as read/write accesses to global and local variables; (3) Monitor bulk data (e.g. arrays) through the observation of read/write accesses to individual members.

C 4.5

Advanced system architectures. Extending hardware-based observability to advanced system architectures, such as processor and memory virtualisation, including time- and space-partitioning, and also to the execution of interpreted languages including bytecode that runs on virtual machines, like JVM.

5 Security and privacy

In the last years there has been a huge explosion in the availability of large volumes of data. Large integrated datasets can potentially provide a much deeper understanding of both nature and society and open up many new avenues of research. These datasets are critical for addressing key societal problems—from offering personalized services, improving public health and managing natural resources intelligently to designing better cities and coping with climate change. More and more applications are deployed in our smart devices and used by our browsers in order to offer better services. However, this comes at a price: on one side most services are offered in exchange of personal data, but on the other side the complexity of the interactions of such applications and services makes it difficult to understand and track what these applications have access to, and what they do with the users’ data. Privacy and security are thus at stake.

Cybersecurity is not just a buzzword, as stated in the recent article “All IT Jobs Are Cybersecurity Jobs Now” [239] where it is said that “The rise of cyberthreats means that the people once assigned to setting up computers and email servers must now treat security as top priority”. Also, “The largest ransom-ware infection in history” [304]. Referring to the event above, the Europol chief stated in a recent BBC interview that “Cybersecurity should be a top line executive priority and you need to do something to protect yourself” [69].

Besides the above examples, which are well-known given their massive impact in the media and society, we know that security and privacy issues are present in our daily lives in different forms, including botnets, distributed denial-of-service attacks (DDoS), hacking, malware, pharming, phishing, ransomware, spam, and numerous attacks leaking private information [299]. The (global) protection starts with the protection of each single computer or device connected to the Internet. However, nowadays only partial solutions can be done statically. Runtime monitoring, verification and enforcement are thus crucial to help in the fight against security and privacy threats.

Remark. Given the breadth of the Security and Privacy domain, we do not present an exhaustive analysis of the different application areas. We deliberately focus our attention on a small subset of the whole research area, mainly privacy concerns from the EU General Data Protection Regulation (GDPR), information flow, malware detection, browser extensions, and privacy and security policies. Even within those specific areas, we present a subset of challenges emerging from this areas.

5.1 Context and areas of interest

We present now the context and state-of-the-art of monitoring in the following security relate sub-areas: GDPR, information flow, malware detection, browser extensions and privacy and security policies.

5.1.1 GDPR (general data protection regulation)

The European General Data Protection Regulation [118] (GDPR)—which as adopted on 27 April 2016 and entered into application on 25 May 2018—subjects companies, governmental organizations and any other data collector to stringent obligations when it comes to user privacy in their digital products and services. Consequently, new systems need to be designed with privacy in mind (privacy-by-design [97]) and existing systems have to provide evidence about their compliance with the new GDPR rules. This is mandatory, and sanctions for data breaches are tough and costly.

As an example, Article 5 of GDPR, related to the so-called data minimization principle, states: “Personal data must be adequate, relevant, and limited to the minimum necessary in relation to the purposes for which they are processed”. While determining what is “adequate” and “relevant” might seem difficult given the inherent imprecision of the terms, identifying what is “minimum necessary in relation to the purpose” is easier to define and reason about formally.

Independently on whether we are considering privacy by design or giving evidence about privacy compliance for already deployed systems, there are some issues to be considered. Not all the obligations stated in the regulations can be easily translated into technical solutions, so there is a need to identify which regulations are enforceable by technical means. For those rules or principles identified as being enforceable by software, it is hard for engineers to assess and provide evidence of whether a technical design is compliant with the law due to the gap existing between a legal document written in natural language and a technical solution in the form of a software system.

Consider again the data minimization principle. One way to understand minimization is on how the data is used, that is we could consider ways to identify the purpose for which the input data collected is used in the program. In this case we would need to look inside the program and track the usage of the data by performing static analysis techniques like tainting, def-use, information flow, etc. This, in turn, requires a precise definition of what “purpose” means and a way to check that the intended purpose matches the real actions that the program take to process the data at runtime. Another aspect of minimization is related to when and how the data is collected in order to limit the collection of data to what is actually needed to perform the purpose of the program. In this case we could consider that the purpose is given by the specification of the program, which is the approach followed by Antignac et al. [19]. This results indicate that it may be possible to enforce data minimization at runtime, at least in what concerns some of its aspects. But other privacy principles are more difficult to tackle.

5.1.2 Information flow

In computer systems, it is often necessary to prevent some objects to access specific data. These permissions are usually defined through security policies, and enforced using access control mechanisms. However, such mechanisms are typically insufficient in practice. For instance, an application could require to access both private data—such as the user contact list—and to connect to Internet but, once the application is granted by the operating system’s access control policy, one would like to ensure that no data from the contact list (assumed to be confidential) leaks to the Internet (a public channel). Enforcing such fine-grained security policies require information flow control mechanisms. These mechanisms allow untrusted applications to access confidential data as soon as they do not leak these data to public channels. Denning’s seminal work [123, 124] in that field proposed static verification techniques to ensure that a program does not leak any confidential data. This property is usually called non-interference, first formalized by Goguen and Meseguer [176]. More generally, non-interference states that no private data leaks to a public channel, either directly or indirectly. An indirect non-secure flow may appear for instance when two different values of some public data may be emitted on a public channel depending on some private conditions. In this case, an observer can infer part of the private information just by observing public data. From the eighties to the early 2000s, many efforts have been put in verifying non-interference properties statically [290, 315].

In 2004 Vachharajani et al. [310] abandoned static approaches and proposed Rifle, a runtime information flow security system. After that, dynamic information flow approaches have been proposed for different settings (e.g. JavaScript [34], or applied to databases [333]). The main advantage of dynamic information flow is its ability to deal with dynamic languages and dynamic security policies. It is also usually more permissive than static approaches with respect to non-interference: dynamic approaches may accept secure flows that would be rejected statically. However, pure dynamic approaches have a major drawback: they cannot take into account the branches uncovered by the examined executions and so they may miss (indirect) insecure flows. In particular, Russo and Sabelfeld [289] demonstrated that pure dynamic approaches cannot be sound with respect to flow-sensitive non-interference, in the form of Hunt and Sands [193]. However flow-sensitivity is a very useful feature in practice, since it is more permissive than flow-insensitivity by accepting that memory locations store values of different security level.

In 2006 Le Guernic et al. [220] proposed a hybrid approach that combines soundness of a static approach and permissiveness of a dynamic approach. In recent years, hybrid information flow has received a lot of attention, for instance for languages such as C [41], Haskell [84], and JavaScript [187, 294]. To deal with the unsoundness of dynamic approaches, it is also possible to consider multiple executions [127] or multiple facets [35], the latter consisting in mapping a variable to several values (or facets), each of them corresponding to a particular security level.

Different variants of non-interference and ways of verifying them are described by Hedin and Sabelfeld’s [188] and by Bielova and Rezk [78].

5.1.3 Malware detection and analysis

Malware refers to a malicious software specifically designed to disrupt, damage, or gain unauthorized access to a computer system. Malware usually exploits specific system vulnerabilities, such as a programming bug in software (e.g., a browser application plugin) or a bug in the underlying platform or OS. Malware infiltration effects range from simple disruption of the proper behavior of the system to destruction or theft of private and sensitive data. The huge number of devices interconnected through the Internet has turned the infection of malware a very serious threat, even more with the current trend of digitizing almost all human activities, notably economical transactions.

Malware detection is concerned with identifying software that is potentially malicious, ideally before the malware acts destructively. Malware analysis is about identifying the true intent and capabilities of malware by looking at some aspects of the code (statically) or by running it (dynamically).

Static analysis examines malware with or without viewing the actual code. The technical indicators gathered with basic static analysis can include file name, hashes, file type, file size and recognition by using tools like antivirus. When it is possible to inspect the source code, static malware analyzers try to detect whether the code has been intentionally obfuscated or try to identify concrete well-known malicious lines of code. Dynamic analysis, on the other hand, runs the malware in a controlled environment to observe its behavior, in order to understand its functionality and identify indicators of potential danger. These indicators include domain names, IP addresses, file path locations, and whether there are additional files located on the system. See [197, 292, 299, 334] for surveys on malware detection techniques.

5.1.4 Browser extensions

Browser extensions are small applications executed in a browser context in order to provide additional capabilities and enrich the user experience while surfing the web. The acceptance of extensions in current browsers is unquestionable. For instance, as of 2018, Chrome’s official extension repository has more than 140,000 applications, with some of these extensions having more than 10 million users. When an extension is installed, the browser often pops up a message showing the permissions that this new extension requests and, upon user approval, the extension is then installed and integrated within the browser. Extensions run through the JavaScript event listener system. An extension can subscribe to a set of events associated with the browser (e.g., when a new tab is opened or a new bookmark is added) or the content (e.g., when a user clicks on an HTML element or when the page is loaded). When a JavaScript event is triggered, the event is captured by the browser engine and all extensions subscribed to this event are executed.

Research on the understanding of browser extensions, detecting possible privacy and security threats, and mitigating them is on its infancy. The potential danger of extensions has been highlighted in [192] where extensions were identified to be “the most dangerous code to user privacy” in today’s browsers. Some recent works have focused on tracking the provenance of web content at the level of DOM (Document Object Model) elements [24].

Another relevant issue is the order in which extensions are executed. When installed, extensions are pushed to an internal stack within the browser, which implies that the last installed extension is the last one that will be executed.

Recent works [267] demonstrates empirically that this order could be exploited by an unprivileged malicious extension (i.e., one with no more permissions than those already assigned when accessing web content) to get access to any private information that other extensions have previously introduced. To the best of our knowledge, there still is no solution to this problem.

Finally, there is the problem of collusion attacks, which occurs when two or more extensions collaborate to extract more information from the user based on the individual permissions of each extension. Even tough in isolation they cannot do any harm, they can exercise an additional power by collaborating and combining their privileges. With few exceptions [291], this is an unexplored area.

Given that extensions may subscribe to events after they have been installed (i.e., at runtime), there is no way to statically detect potential attacks.Footnote 5 One of the few works providing a runtime solution to information flow in browsers (Chromium in particular) is [68].

Overall, there still are concerns regarding the effect of browser extensions on security and privacy. Giving the limitations on what can be obtained by static analysis, solutions to mitigate these issues must be accomplished by means of runtime monitoring techniques.

5.1.5 Privacy and security policies

One way to mitigate security and privacy threats is to have suitable and powerful policies which are enforced statically or at runtime. This, however, is not easy for different reasons. First, defining precisely a policy language requires to introduce its syntax (what the policies can talk about), characterize its scope (what are the limitations, i.e., what cannot be expressed/captured by the language), and define an enforcement mechanism (how to implement the mechanism that ensures the policies are to be respected). Getting a sound and complete result is too restrictive in general. Second, static policies may be enforced only in very specific cases and have to be done by designers and programmers at a very early stage of the software development process. In some cases, this may be done at runtime when the code is downloaded, but it requires to isolate the code to perform the analysis, which is not always possible. Last, security and privacy policies could be enforced at runtime: by mitigating the attack right after it is detected. This is not possible in general as we cannot foresee all possible future threats and sometimes when an attack is detected, it is usually too late.

5.2 Challenges

C 5.1

Monitoring GDPR. One of the main challenges is to identify which privacy principles might be verified or enforced by using monitors. As the regulation is quite extensive, we advocate to start with the principle of data minimization as an example of the kind of challenges the community might face.

C 5.2

Monitoring Data Minimization. When considering how the data is used, a challenge is that we will not be able to do runtime verification in a black box manner. Getting access to proprietary code can be an issue. Concerning when and how the data is collected, we could do runtime verification in a black box manner, but data minimization is not monitorable in general [269]. For the more general notion of distributed data minimization, the property is not monitorable, therefore new techniques using grey box runtime verification might be needed [80].

C 5.3

Hybrid Information Flow. As mentioned earlier, it is not possible to have a sound yet permissive dynamic information flow analysis [289]. Therefore, an important challenge for information flow monitoring is the design of a hybrid (static/dynamic) mechanism that is efficient yet permissive, and that can deal with real programs and security policy.

C 5.4

Monitoring Declassification and Quantitative Information Flow. Non-interference is often too strong a property. For instance, a password checker usually leaks one bit of information: whether the password is correct. Declassification and quantitative information flow aim to solve this issue, but verifying these properties is very hard. In spite of some initial work on hybrid approaches [74], monitoring these properties remains an unresolved challenge.

C 5.5

Generic Language for Information Flow. There are many variants and flavors of important properties like non-interference, but there is currently no mainstream accepted language that encompasses all these security policies, which are now recognized to be hyper-properties [103]. The challenge is the design and adoption of a formalism for the hyperproperties of interest in information flow security and the thorough study of its monitoring algorithms and limitations.

C 5.6

Browser extensions. One challenge on the enforcement side is how to ensure that malicious extensions do not expose private information from a user’s homepage. This private leakage might be done by an external entity or by another extension which may aggregate this information with the information the extension has already collected, eventually performing a collusion attack. A related issue has to do with implementation: a robust runtime enforcement mechanism might need to modify the core of the browser (e.g., Chromium), which is quite invasive and requires a high level of expertise.

C 5.7

Privacy and security policies. One challenge is how to define security and privacy policy languages to write policies about concrete known threats. Also, this challenge involves the use of runtime monitoring techniques in order to detect potential and real threats, log that information and give this to an offline analyzer to identify patterns in order to generalize existing policies, or create new ones. A related challenge is how to learn the policies at runtime. This could be done by learning them from the attacker models (e.g., as in [1]), and improve the precision taking feedback from the runtime monitors.

6 Reliable transactional systems

The human society is increasingly dependent on computing systems, even in areas like entertainment (e.g., Netflix), social (e.g., Facebook) and economic interactions (e.g., Amazon). The ubiquity of computer systems, and the large scale at which they operate, make hardware and software failures both common and inevitable. At first glance it might seem that the majority of systems should not experience failures as frequently because they do not serve a world-scale user base. But with the advent of Infrastructure as a Service (IaaS) products (e.g., Amazon EC2) small and medium-sized companies are deploying their systems over IaaS offerings [23], which are supported by fault prone large-scale clusters [175]. This setting exploits modern hardware systems features to provide fault tolerance while keeping the software systems running efficiently, correctly, and with ease to develop and use, hence building computer systems with improved reliability and resilience and lower energy consumption.

Database systems have successfully exploited parallelism for decades, both at software and hardware levels. Databases can improve their performance by issuing many queries simultaneously and by running those queries on multiple computing systems in parallel, while preserving the same programming model as if the queries were executed one at a time in a single computing system. Transactions are at the core of most database systems. A transaction is an abstraction that specifies a program semantics where computations behave as if they are executing one at a time with exclusive access to the database. Transactional systems implement a serializable model. This means that even if the system allows multiple transactions to execute concurrently, the final result of their execution must be indistinguishable from executing one after the other (in some total order). Consequently, a transaction is a sequence of actions that appear to execute instantaneously as a single, indivisible, operation. The transactional system manages concurrency between transactions automatically, and is free to execute transactions concurrently as long as the result is equivalent to some serial execution of the transactions.

State machine replication (SMR) [218, 295] is the standard way to build such fault-tolerant systems. An SMR system maintains multiple replicas that keep a copy of the system’s data, and coordinates the execution of operations on each of those data replicas. Since replicas also execute every operation submitted to the system, the system can continue operating as long as a majority of correct replicas execute the operations. When requests to execute operations arrive, an “agree-execute” protocol keeps replicas synchronized: they first agree on an order to execute the incoming operations, and then execute the operations sequentially in the agreed order, driving all replicas to the same final state. However, to take advantage of contemporary hardware systems, one should use all the available processor cores to execute multiple operations at the same time. That said, this concurrent execution of operations is at odds with the “agree-execute” protocol because concurrent execution is inherently non-deterministic so replicas may arrive at different final states and the system could become inconsistent.

Improving SMR’s efficiency and performance can be achieved by exploiting multi-core processors, while still preserving determinism and correctness. This, however, requires to have operations that can be expressed as serializable transactions, and that the concurrency control protocol ensures that the concurrent execution of transactions respects the order replicas have agreed upon.

In a typical SMR setting, a set of clients concurrently submit requests to the system. The system, made of replicas, runs an agreement protocol, e.g., Paxos [219], that totally orders the incoming requests. Each replica executes the requests sequentially in the agreed order, driving all the (correct) replicas to the same final state. Essentially, we can divide state machine replication in two phases. First, the agreement phase, where replicas agree on an order for all requests. This is then followed by the execution phase, where replicas execute the requested operations in the agreed order. When using SMR there is a clear tension between the fact that the replicas have multi-core processors and the requirement that replicas execute the operations in a specific order.

Recovery and reparations in transactional systems [108] are multi-layered: when recovering within a transaction which may still succeed, reparations may be expressed in a try-catch fashion. However, if the action is considered to have failed, then any previously completed parts of the transaction need to be rolled back. This is done to preserve the atomicity of the transaction, i.e., either the transaction entirely succeeds or entirely fails. The problem arises when it is not possible to isolate a transaction with the result that its actions affect other parts of the system before the transaction is committed. This usually happens due to the long-life nature of the transaction—making it infeasible to lock the relevant resources for a long duration.

6.1 Context and areas of interest

Transactional systems cover a broad area. To later present challenges to RV, we describe here some of the important aspects, in particular dependable storage, coordination services, network services and memory contention management.

6.1.1 Dependable storage systems

Main database vendors, such as IBM and Oracle, have business solutions for high-performant dependable storage systems. Innovative approaches to such dependable storage systems are based on state machine replication, either in KV-stores [73, 81, 300], filesystems [96, 226], or transactional storages [140, 170]. These systems are frequently used to build business-critical (and sometimes even life-critical) systems and must be constantly monitored to assess the correct behavior of the storage system. Monitoring these systems, specially those involving SMR, is challenging, as it allies the challenges of monitoring distributed systems with the challenges of monitoring transactional systems, both in terms of the architecture of monitoring system itself and of the information to be collected to reason upon [75, 202].

6.1.2 Coordination services

Concurrent operations on distributed applications frequently need to be coordinated to ensure system correctness, otherwise the operations may be executed out-of-order or, in the case of SMR, the nodes may diverge and render the system inconsistent. These services are often provided by a small database, which stores configuration data to implement resource locking, leader election, message ordering, etc. Such coordination systems have been recently used in more complex solutions, for example in: i) Google’s Chubby distributed lock service [87], which is used by Bigtable (now in production in Google Analytics and other products); ii) the Ceph storage system [318], where the coordination system is part of the monitor processes to agree which OSDs are up and in the cluster; iii) the Clustrix distributed SQL database, which leverages on a coordination system for distributed transaction resolution. A monitor for such systems must incorporate the complexities of the coordination/decision rules and of the control system itself.

6.1.3 Network services

Software-Defined Networks (SDNs) are a step towards the separation of the network control and data planes, aiming at improving the manageability, programmability and extensibility of computer networks. In these SDNs, the controller should neither be a bottleneck nor a single point of failure. State machine replication is a natural answer to such fault-tolerance requirements. For example, the Ananta distributed load balancer [264] uses Paxos for maintaining high-availability in its manager component and serves thousands of data flows per day in the Windows Azure cloud. Such network services are transparently used by applications running in the cloud, and are yet another example of a SMR system, with the same monitoring requirements.

6.1.4 Main memory contention management

The transactional model as used by database systems can be of use to manage the contention to shared data residing in main memory. This was first observed by Lomet in 1977 [227], and proposed as a hardware solution by Herlihy and Moss in 1993 [191], and by Herlihy et al. in 2003 [190] as the first practical software only solution. Some programming languages include memory transactions in their core, such as Closure, or as a library, such as Java, Haskel, OCaml, Python. In the case of C and C++, there is ongoing work to include it in their standards.

6.2 Challenges

C 6.1

Low-overhead monitoring. A step towards the reconciliation of SMR with the current computer processor architecture, i.e. multicore processors, is to devise new concurrency control protocols that explore pre-ordered transactions to ensure the correctness of a SMR system where individual replicas execute the local operations concurrently [311]. The correctness of such new concurrency protocols must be assessed by intensive testing and monitoring of the system behavior. Any deviations to the specification must be fully diagnosed and corrected. Understanding what is happening at the level of the concurrency protocol itself (including the algorithm internal state and the ordering of concurrent events) plays an important role in such process and must be supported by lightweight (non-intrusive) monitoring techniques, so that the errors are not masked when monitoring is active.

C 6.2

Reduction of the conflicting window. When using the typical API to declare transactions (e.g., begin, read, write, and commit) the system is blind to the application’s semantics, i.e., how values read are used by the application. Since transactional speculation is only effective when it succeeds, there is also the need to reduce the number of conflicting transactions by introducing variations in the typical API to declare transactions. The allows clients to express more clearly the intended semantics of the program while executing over an abstract replica state, resulting in fewer conflicts and thus more successful speculative executions. How to reduce both the interactions with the remote database nodes (replicas) and to the “conflicting window” for transactions? Some work has been done on delaying read accesses to the database using futures [40] and double barriers and epochs [278]. Such concepts are still not mainstream in monitoring and logging of transactional systems. Another alternative would be to increase the expressiveness of the transactional API to better express the application semantics and hence improving transactional performance in SMR.

C 6.3

Expressiveness of logs. The performance of concurrency control protocols depends on whether concurrent transactions conflict with each other. The decision of whether two transactions conflict depends on how aware of the concurrency control protocol is of the transactions’ semantics. How to do the automatic translation of existing applications into the new transactional SMR infrastructure and how to ensure the new application (using the new transactional API) is functionally equivalent to the original? Any changes to the protocol will create a new transactional infrastructure and any changes to the API will create a new application. In both cases, the new system must be backwards compatible with the original system. Such backward compatibility must be assessed by observing the dynamic behavior of both systems and reason over the collected information to detect any deviations of the new system to the expected behavior. In addition to the huge logs, this challenge raises another question on expressiveness of the logs: What information is registered and how does it express the semantics of the intended transactional operations.

C 6.4

Unification of multiple system huge logs. Observing long living distributed computations such as transactional systems replicated using SMR, may be a main requirement to automatically decompose transactions [330] and/or ensure that the workload is safe [329]. In these cases, if the workload changes or new operations are created, the whole system must be monitored, re-analyzed and re-deployed. In such a distributed setting, possibly many huge logs are collected (one per processor or one per replica) that must be dealt with (see Sect. 8) and possibly unified into a single log, raising issues on resources’ usage and consistency of the multiple observations.

C 6.5

Expressing reparations in transactional systems. In non-transactional applications monitors typically need to have their own reparation code that executes in case the monitor flags a problem. In the case of transactional application monitoring, reparations are readily available and the monitor simply needs to trigger them. While this is more of an opportunity, the challenge lies in how to improve upon current practices and express the behavior of reparations formally and succinctly in a specification language—similarly to the way monitors are defined. There have been several works in this regard [107, 109] for example through the use of compensating automata. However, future work can focus on further simplifying the specification language and perhaps providing a library of ready-made constructs which developers can use directly.

C 6.6

Management of historic data to be used in the reparations. From a more pragmatic point of view, compensations and rollbacks present the challenge of managing historic data values to be used in the reparation code. In this respect runtime monitors can be useful in the same way software monitors are typically stateful. Reparations can be parametrized through the monitors’ state, avoiding complex wiring to pass the data around. To the best of our knowledge this approach has not been implemented.

C 6.7

Monitoring transactional memory. The time-scale for transactional memory is orders of magnitude smaller than transactional databases. In transactional memory, each access to a shared memory location must be handled by the transactional monitor and considered for the success or failure of the memory transaction. Any additional probing or logging introduced by a monitoring system may influence the scheduling and have a strong impact in a malfunctioning transactional memory application, by changing the serialization order of the transactions, possibly masking or hiding previously observed errors. Researchers have partially addressed this challenge in the past [128, 129, 230, 256] aiming at both correctness and performance.

7 Contracts and policies

The term contract is overloaded in computer science, so it may be understood in different ways depending on the community:

  1. (i)

    Conventional contracts are legally binding documents, establishing the rights and obligations of different signatories, as in traditional, judicial and commercial, activities.

  2. (ii)

    Normative documents are a generalization of the notion of legal contracts. The main feature is the inclusion of certain normative notions such as obligations, permissions, and prohibitions, either directly, or by representing them indirectly. These include legal documents, regulations, terms of services, contractual agreements and workflow descriptions.

  3. (iii)

    Electronic contracts are machine-oriented, and may be written directly in a formal specification language, or translated from a conventional contract. In this context, the signatories of a contract may be objects, agents, web services, etc.

  4. (iv)

    Behavioral interfaces are considered to be contracts between different components specifying the history of interactions between different agents (participants, objects, principals, entities, etc.). Rights and obligations are thus determined by “legal” (sets of) traces which are permissible.

  5. (v)

    The term “contract” is sometimes used for specifying the interaction between communicating entities (agents, objects, etc.). It is common to talk then about a contractual protocol.

  6. (vi)

    Programming by contract or design by contract is an influential methodology popularized first in the context of the programming language Eiffel [238]. “Contract” here means a relation between pre- and post-conditions of routines, method calls, etc. This concept of contract is also used in approaches such as the KeY program verification tool [211].

  7. (vii)

    In the context of web services, “contracts” may be understood as service-level agreements usually written in an XML-like language like IBM’s Web Service Level Agreement (WSLA [325]).

  8. (viii)

    More recently, the term “contract” is used in the context of blockchain and other distributed ledger technologies as programs that ensure certain properties concerning transactions. These programs are called smart contracts [305], as popularized by the Ethereum platform [88].

In this section we focus on the use of the term in the computational domain but with a richer interpretation than just a specification or property. In particular, we consider two types of contracts: (ii) normative documents (including conventional contracts and their electronic versions as described above), and (viii) smart contracts. In both cases, we refer to “full contracts” [257], that is agreements between different entities regulating not only the normal interactive behaviors, but also exceptional ones. A common aspect of such contracts is that they should express not only the sequence and causality of events, but also what obligations, permissions and prohibitions the participating entities have (basic modalities studied in deontic logic [324]), as well as the associated penalties in case of violations.

An example of a full contract in the case of a normative document in the context of a stringent renting agreement, would be one containing for instance the following clauses (among others): “1. The tenant must pay 200 EUR, in advance, on the 5th of each calendar month. 2. In case of not complying with clause 1, the tenant will have till the 15th of the month to pay the above mentioned sum plus an additional fee of 5% of the amount. 3. In case of not complying with clause 2, the tenant will have to leave the premises before the end of the month and the deposit will be retained by the landlord.” Note that the contract includes clauses which may be violated, but includes reparatory clauses to cover such cases. Although violating clause 1 and paying late is a behavior covered by the contract, it is clearly less desirable (in terms of compliance) than if clause 1 were to be satisfied. In the case of a smart contract, the corresponding program should implement all the above, including the exceptional behavor (i.e., not only the primary obligations but also enforce the penalties associated with the non-compliance of such obligations). A contract not containing clauses stipulating the penalties and deadlines associated with the non-compliance with the written obligations, would not be considered to be a “full contract.”

The specification of such contracts requires a formal language rich enough to capture these deontic notions, temporal and dynamic aspects, real-time issues such as deadlines, the handling of actions (events) and exception mechanisms. The main aim is not only to specify such contracts, but to analyze them using techniques like model checking and runtime verification. Clearly, the use of contracts is only meaningful if there is a mechanism to validate their fulfillment.

A related concept is that of policies. At a certain level of abstraction, policies can be seen as contracts in the sense that they prescribe behavior. Since the term policy is also very generic with a broad scope, we concentrate on privacy policies (or privacy settings) and more specifically in the context of Online Social Networks (OSN) like Facebook and Twitter.

As mentioned before, deontic logic is a natural formalism to represent normative documents as they mostly talk about obligations, permissions and prohibitions, as well as to capture what happens in case of violations. In the case of privacy policies, one may be interested in prescribing who should know what about whom and under which circumstances. So, it makes sense then to use epistemic logic [141] to reason about privacy policies. That said, note when describing such policies we informally use deontic modalities, who should (not) access certain information, and who is allowed to perform certain actions (e.g., to make a friend request). Those (deontic) normative concepts are, however, not needed as primitives in this context. Giving a detailed explanation on why this is the case is beyond the scope of the paper (see for instance the formalization of privacy policies for OSNs presented in [260,261,262]).

What is important here is that from a runtime verification perspective, monitoring privacy policies for OSNs and normative documents, have similarities mostly in what concerns their challenges as explained at the end of this section.

7.1 Context and areas of interest

We provide now some more detail context of the following aspects of contracts: contracts as normative documents, the so-called smart contracts, and policies for online social networks.

7.1.1 Contracts: normative documents

The complete specification of full contracts—normative texts which include tolerated exception, and which enable reasoning about the contracts themselves—can be achieved using a combination of temporal and deontic concepts [257]. Formalizing such contracts requires operators and combinators for choice, obligations over sequences, contrary-to-duty obligations, and the representation of how internal and external decisions may be incorporated in an action- or state-based language for specifying contracts. There have been several interpretations and approaches for the development of such a logic [257], including modal extensions of logics and automata in order to address the issue of how contracts can be formalized and reasoned about. See, for example [37, 89, 169, 228, 242, 272, 273, 326], just to mention a few.Footnote 6

Why is there a need for a logic or some other formal language? One of the aims of formalizing contracts is not simply to use them as specification, but also to be able to prove properties about the contracts themselves, to perform queries on the contracts (like what each party is agreeing to), and ultimately to ensure at runtime that the contract is satisfied (or alternatively to detect for violations). An alternative approach is to use machine learning (or other artificial intelligence techniques). For instance, one may avoid the use of formal methods by using natural language processing (NLP) combined with machine learning to directly perform queries on the textual representation. While this is feasible in certain cases, it is well known that the state of the art in NLP is still far from being able to deliver fully automatic and sufficiently reliable techniques. Moreover, performing semantic queries or running simulations still require a formal representation. This is an important and interesting research area in itself, but here we are concerned not with the problems of obtaining such normative documents but with the specific issue of monitoring their satisfaction or violation.

In terms of monitoring of contracts, most of the current work start from some form of formal semantics. There are various outstanding questions of what subsets of deontic logics are tractably and practically monitorable. For example, are more standard logics, like classic or temporal logics, enough? How important is to get full complex semantics (e.g., based on Kripke semantics) for the logic? For a full representation and analysis of contracts, Kripke semantics might be necessary, but for monitoring purposes a much simpler approach considering trace semantics seems to be sufficient.

Concerning monitoring, an ideal goal is to automatically extract a monitor from the document’s formal representation, but this is, in general, not feasible. We assume then that we obtain the monitors from a given contract manually or semi-automatically. This is still not an easy task, as there is no standard, easy and direct way to extract a model from a document in natural language.

The use of controlled natural languages (CNL) [215] has been proposed in different works in order to facilitate bridging the gap between the natural language description of the original document and a more formal representation in the form of a formal language [89, 91, 327]. In a legal specification setting, there is initial work in this direction, but we are still far from reaching this goal [86, 90, 91].

7.1.2 Smart contracts

If the computer science community borrowed the notion of contracts by remarking on the similarity between specifications and legal agreements, the legal community saw an opportunity in viewing computer code as a form of executable enforcement or enactment of agreements or legislation. The notion that executable code regulates the behavior of different parties very much in the same manner that legal code does was proposed by Lessig [223]. The dual view, that the use of executable smart contracts can enforce compliance as an integral part of the behavior, was argued earlier by Szabo [305].

The introduction of blockchain [247] and other distributed ledgers technologies, which enable the automated management of digital assets, has changed the way in which computer systems can regulate the interaction between real-world parties. In particular, these technologies have enabled the deployment of Szabo’s notion of smart contracts in a distributed setting, without the participation of trusted central authorities or resource managers. For instance, the Ethereum [322] blockchain supports smart contracts which can be expressed using a Turing-complete programming model, to be executed on the Ethereum Virtual Machine (EVM) and typically programmed using one of a number of languages supporting a higher level of abstraction.

Smart contracts are executable specifications of the way the contract will update the state of the underlying system. Although specifications can be executable or not (see [167] and [186]), it is generally accepted that executable specifications must elucidate how to achieve the desired state of affairs, while non-executable specifications simply characterize properties that the desired state should satisfy. The former is substantially more complex, which is why the fields of validation and verification arose to explore ways in which executable specifications (code) can be verified against non-executable ones (properties).

This gives rise to a challenge: that of verifying that smart contracts indeed perform as they should. Although one can argue that the challenge behind verification of such executable code is no different from that of verifying standard programs, there are a number of issues which are particular to smart contracts. There has been little work yet addressing the special idiosyncrasies of smart contracts. Static analysis techniques for the verification of smart contracts has been proposed in [76], via a translation from smart contracts into another language (F* in this case) for verification. See [10] for a discussion on some challenges concerning the verification of smart contracts using deductive verification techniques. From a runtime perspective, there has been some work on using blockchain technology to regulate distributed systems (see [171, 179, 274, 317]), but the focus of this work is not on the verification of the smart contracts themselves. Initial attempts to address runtime verification of smart contracts and building tools to automate this have started to appear [104, 139], but many challenges remain to be addressed [36].

One particular aspect that presents specific challenges is that these smart contracts are typically mainly concerned with the movement of digital assets, with built-in notions of failing transactions and computation roll-back to handle failure. Although this has been investigated in the domain of financial system verification [109, 263], there is a major difference. Before the rise of cryptocurrencies, all such systems were deployed on a central trusted system, typically residing within the infrastructure of the payment institution. In contrast, in the context of distributed ledgers, the storage and computation are, by their very nature, distributed, and particularly runtime verification require the instrumentation and deployment to take this into consideration.

There is a major difference with regular financial transaction software deployed on, or interacting with, payment institutions. That is that given the critical nature of such systems (payment applications have been built using a strict validation process) ensuring compliance to legislation and adherence to specifications. However, with what has been hailed as the democratization of currency systems, came the popularization of payment application development, with many smart contracts being developed without the necessary care and responsibility. This approach has suffered a number of huge financial losses due to bugs [33]. The need for lightweight runtime validation of such systems, whether inbuilt in the execution of the smart contracts or inherent in the blockchain or alternative distributed ledger technology is essential to ensure user safety.

Turing-complete environments for smart contracts suffer from the possibility of non-termination or excessively long computation. Rather than limit the power of the programming language, the solution adopted in systems such as Ethereum was that of introducing the notion of gas—a resource required to enable computation and that has to be paid for using other digital assets, typically the underlying cryptocurrency. Although efficiency of computation has always been an important issue in computing, it has typically been detached from functional correctness issues addressed by formal methods. With the notion of gas, the direct correlation between execution steps and financial cost is a new challenge for runtime verification. As a direct corollary, additional computation to check for correctness will directly induce additional cost. However, there is also the issue that gas affects computation, in that once gas runs out, computation is reverted, which has been exploited in a number of smart contract attacks. Finally, the use of gas throughout the computation may justify qualitative dynamic analysis to measure the extent of satisfaction or violation using a distance metric to detect failure due to lack of gas.

Finally, the multitude of contracts and interaction platforms provided by the underlying distributed technology is likely to give increased importance to contract comparison and negotiation. We envision a scenario, in which one may negotiate for increased dependability (e.g. by monitoring additional logic) against a stake paid by the developer or provider of the contract. At a more complex level, one can have a system where different or additional functionalities are negotiated upon setting up a smart contract. In both cases, the process is a form of meta-contract which regulates how the parties may interact to negotiate and agree upon a contract which will be set up.

See [10] and references therein for a discussion on the verification of smart contracts, as well as papers in [283] for recent advances and a discussion on open issues in the area.

7.1.3 Privacy policies for OSNs

Policies may be understood, at a certain level of abstraction, as contracts: they prescribe what actions are allowed or not. The term policy is generic and may be applied to many different cases or applications. We focus here on privacy policies, and in particular on privacy policies for Online Social Networks (OSNs). OSNs provide an opportunity for interaction between people in different ways depending on the kind of relationship that links them. One of the aims of OSNs is to be flexible in the way one shares information, being as permissive as possible in how people communicate and disseminate information. While preserving the spirit of OSNs, users would like to be sure that their privacy is not compromised. One way to do so is by providing users with means to define privacy policies and provide them with guarantees that their requested policy will be respected.

For defining policies one might use simple checkbox privacy settings (as it is the case in most OSNs today), or allow user to define more richer policies using expressive formal languages or logics. Given means to specify privacy policies is not enough, as these policies must be enforced at runtime. Enforcement of checkbox privacy settings is rather well-understood, at least for most of the kind of policies currently implemented in existing OSNs. However, if one wants to allow the definition of richer policy languages, the challenge goes beyond identifying an appropriately expressible language to the problem of automatically extracting a runtime monitor to act as an enforcement mechanism. This is currently beyond the state of the art and no concrete solutions exist.

Furthermore, the state of the art today is focused on static policies. For instance, in Facebook users can state polices like “Only my friends can see a post on my timeline” or “Whenever I am tagged, the picture should not be shown on my timeline unless I approve it”. However, no current OSN provides the possibility of defining and enforcing evolving (dynamic) privacy policies. Policies may evolve due to explicit changes done by the users (e.g., a user may change the audience of an intended post to make it more restrictive), or because the privacy policy is dynamic per se. Consider for instance: “Co-workers cannot see my posts while I am not at work, and only family can see my location while I am at home”,“Only up to 3 posts disclosing my location are allowed per day on my timeline”,“My boss cannot know my location between 20:00-23:59 every day”, and “Only my friends can know my location from Friday at 20:00 till Monday at 08:00”. No current OSN addresses the specification and enforcement of such policies. Formal languages are needed to express such time and event-dependent recurrent policies, and suitable enforcement mechanisms need to be defined. This could be done by defining real-time extensions of epistemic logic, or combining existing static privacy policy languages with automata, as done for instance in [259,260,261].

7.2 Challenges

C 7.1

Formalizing natural language contracts. A major challenge is the identification of techniques to extract a formal model from a normative document in an automatic manner. In particular, the challenge is to adapt NLP techniques and use machine learning techniques to (semi-)automatically translate natural language text into a suitable CNL.

C 7.2

Formal reasoning about legal documents. A challenge in the formalization of legal documents is the choice of the right formal language adequate for the type of analysis required, as there is a trade-off between expressiveness and tractability. In particular, the notion of permission (and rights) poses challenges in monitoring, since one party’s permission to perform an action typically entails an obligation on the other party to allow the action, and this obligation may not be observable unless the right is exercised.

C 7.3

Operationalization of legal documents. Most legal texts are written in a declarative style, and typically require to be operationalized for automated analysis. Furthermore, parts of these texts may refer to events or attributes which are not observable and thus not monitorable. Most runtime monitoring and verification approaches for legal texts interpret the term runtime to refer to the time during which the legal text regulates. Another possible interpretation is that of monitoring the process of drafting of a contract or legislation, or the negotiation of a contract. A monitoring regime could be useful in this setting.

C 7.4

Smart contract monitoring and verification. How to adapt dynamic verification to smart contract monitoring is unclear, particularly because once a problem arises, it is not always possible to take reparatory action to recover. An open question is how enforcement, verification and reparation can be combined in a single formalism and framework.

C 7.5

Monitoring gas in smart contracts. Another challenge is the use of the notion of ‘gas’ to justify computation on ledger systems such as Ethereum, although it is unclear how dynamic analysis can be used effectively to track such a non-functional property. Furthermore, the introduction of runtime verification overheads in terms of gas poses new challenges for monitoring.

C 7.6

Compliance between legal and smart contracts. The relation between the underlying legal document and smart contracts is still to be addressed. The challenge here is how to monitor compliance between both versions of the contract, and relate violations in the execution of the smart contract with the corresponding clause in the real legal contract.

C 7.7

Policy monitoring and verification. The challenges we identified for contracts also apply to policies. In particular, there might be a need to combine the enforcement mechanism with machine learning techniques and with natural language processing. For instance, a post might contain a sentence like “I am here with John drinking a glass of wine”, where “here” clearly refers to a place which might be inferred from the location associated with the post. This kind of inference is difficult to do automatically by machine.

C 7.8

Policy monitoring in OSNs. For Online Social Networks (OSNs), the use of epistemic logic to reason about whether and how explicit (and derived) knowledge of users adhere to policies has been explored. However, the operationalization of such policies and the extraction of monitors from policies have proved to be particularly difficult.

C 7.9

Policy monitoring and verification. The evolution of policies due to specific events or timeouts also poses a number of challenges. Some initial work has been recently done on the specification side with a proof of concept implementation. The work in [260, 261] presents an approach based on extending a privacy language with real-time, while [259] proposes a combination of static privacy policy language with automata. However, a general working solution to this challenge is still missing.

8 Huge data and approximate monitoring

This section describes runtime verification challenges related to the analysis of very large logs or streams of events from the system under observation. The general goal when dealing with huge data streams is to develop algorithms that offer scalability, specification language expressiveness, accuracy, and utility. Below we discuss the advances made along each of these dimensions and some of the remaining challenges.

8.1 Context and areas of interest

Before we present the challenges for RV in the area of huge data and approximate monitoring, we first provide some context and state-of-the-art related to the following areas: scalability, expressiveness, accuracy and utility.

8.1.1 Scalability

In runtime verification, the focus to date has mainly been on efficiency, expressiveness, and correctness, and less so on scalability to Big Data in realistic scenarios. A few exceptions exist and are summarized below, which mostly address offline monitoring.

Barre et al. [42] and Hallé and Soucy-Boivin [184] use Hadoop’s MapReduce framework to scale up the monitoring of propositional LTL properties using parallelization. In their experiments, they used event logs with more than nine million entries. In these approaches, formulas are processed bottom up using multiple MapReduce iterations. While the evaluation in the map phase is completely parallelized for different time points from the event log, the results of the map phase for a subformula for the whole log are collected and processed by a single reducer. In a single iteration there are as many reducers as there are independent subformulas with the same height. The reducers, therefore, become bottlenecks that limit the scalability.

Bianculli et al. [77] extend this approach to the offline monitoring of large traces, for properties expressed in MTL with aggregation operators. Similarly to the aforementioned approaches, the memory consumption of the reducers limits the scalability of this approach. More specifically, reducers (that implement the semantics of temporal and aggregate operators) need to keep track of the positions relevant to the time window specified in the formula: the more time points there are the denser the time window becomes, with a consequent increase in memory usage. Bersani et al. [72] worked around this problem by considering an alternative semantics for MTL, called the lazy semantics. This semantics evaluates temporal formulas and Boolean combinations of temporal-only formulas at any arbitrary time instant. It is more expressive than the point-based semantics and supports the sound rewriting of any MTL formula into an equivalent one with smaller, bounded time intervals. The lazy semantics has the drawback that basic logical properties do not hold anymore. This disallows formula simplifications and complicates the formalization of properties given in natural language, since familiar concepts have a different meaning. Unlike the previous approaches, Bersani et al. implemented the monitor on top of the Apache Spark framework [337] that is optimized for iterative distributed computations.

Parametric trace slicing [101, 279] is a technique for monitoring a parametric LTL property by grounding it to several plain LTL properties. In this approach logged events are grouped into slices based on the values of the parameters. A slice is created for each parameter value or for each combination of values depending on the number of parameters. The individual slices are then processed by a propositional LTL monitor unaware of the parameters. The initial main goal of this approach was not scalability, but rather monitoring the more expressive parametric LTL specification language. However, the approach is also relevant for scalability since it easily lends itself to parallelization.

Another line of work [53, 58] similarly splits the logged events into slices, but it avoids grounding first-order properties altogether. This is enabled by using a more powerful monitor, MonPoly [55, 59, 61, 62], to process the slices. Overall, the approach allows for scalable offline monitoring of properties expressed in Metric First-Order Temporal Logic (MFOTL). The core idea in this work is to split the log into multiple slices and check the same formula on each slice independently. This allows the solution to scale, by handling one slice on a single computer. The key component is a log-splitting framework used to distribute the log to different parallel monitors based on data and time. The framework takes as input the formula and a splitting strategy and splits the log ensuring soundness and completeness. The approach was implemented in Google’s MapReduce framework where the log-splitting framework is executed in the map phase. The approach is, however, limited to offline monitoring since it uses MapReduce. Parallelization is not limited as in the previous approaches, but it is potentially wasted, since to ensure correctness, the log splitting framework may completely duplicate the original log into some of the individual slices. Another limitation is that the slicing framework relies on a domain expert to supply a splitting strategy manually. For example, if a monitored property involves events parametrized with “servers” and “clients”, one could split the log along the different “servers”, along the different “clients”, or along both.

Loreti et al. [229] discuss two MapReduce architectures to tame scalability in the context of compliance monitoring of business processes, using the SCIFF framework [13]. Such a framework provides a logic-based proof procedure for checking declarative constraints on sequences of events, in terms of expectations and happened events. The two MapReduce architectures proposed in this work were adapted from similar ideas in process mining [312] and distinguish between vertical and horizontal distribution. In the vertical distribution all nodes receive the complete specification and a subset of the complete log. During the map phase, the log is split across the various nodes such that all the events of a trace are sent to the same node. In the reduce phase, each node checks the conformance of each log fragment to the specification. In horizontal distribution both the specification and the logs are partitioned across the nodes. Each node checks a partial specification on a fragment of the log that contains only the events used in the partial specification. The results of all the nodes are then merged together with a logical AND. The limitation of the approach is the expressiveness of the SCIFF logic programming framework that cannot handle parametric specification.

Yu et al. [336] propose an approach for parallel runtime verification of programs written in the Modeling, Simulation and Verification Language (MSVL), with properties expressed in Propositional Projection Temporal Logic (PPTL). The approach divides each program trace into several segments, which are verified in parallel by threads running on under-utilized CPU cores. The verification results of all segments are then merged and further analyzed to produce a verdict.

8.1.2 Expressiveness

Most of the works on runtime verification borrow logics from static verification approaches and focus on designing algorithms that either (1) generate a monitor that can analyze a trace online, or (2) can process dumps of traces offline. Optionally, one could use a general programming language or a domain-specific language to write the queries that process the input traces online or offline. In both cases, we would like to monitor Big Data with a highly expressive specification language. More expressive logics naturally require more computation resources for monitoring. Thus, a worthwhile research question is: What are the limits of the specification language expressiveness to achieve scalable monitoring of Big Data? Below we discuss some directions of how expressive specification languages could look like.

Complex Event Processing (CEP) and Data Stream Management Systems (DSMS), for example, can serve as specialized languages for building stream processors (see [237] for a recent survey). The query languages of DSMS are mostly extensions of SQL (e.g., with window operators [21]), and thus typically much weaker than logics such as MFOTL due to the absence of proper negation and more limited capabilities for expressing temporal relationships. Moreover, DSMS tend to focus on efficient query execution at the expense of sacrificing a clean semantics of the property specifications. The reference model of DSMS has been defined in the seminal work on the Continuous Query Language (CQL) [21]. In CQL, the processing of streams is split in three steps. (i) Stream-to-relation operators—that is, windows—select a portion of each stream thus implicitly creating static database table. (ii) The actual computation takes place on these tables, using relation-to-relation (mostly SQL) operators. (iii) Finally, relation-to-stream operators generate new streams from tables, after data manipulation. Several variants and extensions have been proposed, but they all rely on the same general processing abstractions defined above.

CEP [231, 237] systems are closely related to DSMS. CEP systems analyze timestamped data streams by recognizing composite events consisting of multiple atomic events from the original stream that adhere to certain patterns. The user of a CEP system controls the analysis by specifying such patterns of interest. The predominant specification languages for patterns are descendants of SQL [182]. An alternative is given by rule-based languages, such as Etalis [16], which resembles Prolog. Although CEP systems improve the ease of specification of temporal relationships between events over DSMS, they are still significantly less expressive than MFOTL due to their restricted support for parametrization of events and lack of quantification over parameters. Interestingly, some CEP systems use interval timestamps. In this model, each data element is associated with two points in time that define the first and the last moment in time in which the data element is valid [296, 319].

For logical specification languages such as LTL a recent trend has been to incorporate regular-expression-like constructs in the logic. This gave rise to the industrially standardized Property Specification Language (PSL) [136], the development of Regular Linear Temporal Logic (RLTL) [224, 293] and its more recent incarnation in the form of (Parametric) Linear Dynamic Logic ((P)LDL) [149, 174] and its metric counterpart (MDL) [56]. Due to the extension with regular expressions, those languages are more expressive than LTL in that they capture all \(\omega \)-regular languages. Vardi [313] observed that these extensions were essential for the practical usage of PSL in many industrial application settings. First-order extensions of languages like PSL, RLTL, (P)LDL, and MDL, which should be more expressive than MFOTL, have not yet been considered for monitoring.

However, to keep things manageable for Big Data, it may be necessary to restrict or even remove features from our property specification languages. The usage of negation is a candidate for restriction while the first-order aspect of MFOTL is a candidate for removal (or for replacement with freeze quantifiers). Many works [53, 60,61,62, 64] had to define (efficiently) monitorable fragments using similar restrictions. A syntactic restriction (e.g., of the allowed occurrences of negation) is preferable over a modification of the semantics as seen on the example of negation in many data stream management systems (DSMS). The user of a specification language with a syntactic restriction can at least rely on the familiar semantics. Moreover, properties outside of the monitorable fragment can be often automatically rewritten into equivalent formulas within the fragment.

8.1.3 Accuracy

Compromising on soundness is not a common approach in runtime verification. However, when faced with very large logs (or streams) of data and hard real-time constraints on providing verdicts, it can become a very useful compromise. In some cases, sound algorithms cannot be used in practice. For example, a sound algorithm that determines the number of distinct elements in a data stream must use space linear in the cardinality it estimates, which is often impractical. Determining cardinality is a large component of many practical monitoring tasks such as detecting worm propagations, denial of service (DoS) attacks, or link-based spam. Ideally, tradeoffs between monitoring efficiency and accuracy of the provided verdicts should be formulated as an additional input to the monitor. We call such an extension approximate monitoring.

Approximate monitoring deals with the issue of providing approximate (or inaccurate) results to the standard monitoring problem, with bounds on the “distance” between the actual (correct) results and the provided ones. The definition of such a distance depends on the particular output that a monitor provides. For instance, in the case of a simple stream of violations, distance can be defined as the percentage of unreported violations, or the percentage of spuriously reported violations. For other monitoring outputs that contain richer verdicts, distance can be defined to further include the accuracy of the additional information in the verdicts.

One should make a clear distinction between approximate monitoring and monitoring probabilistic properties. The latter deals with monitoring specification languages that can express probabilistic and statistical properties of data streams. However, it still provides correct verdicts given the semantics of the specification language. A related facet is the monitoring of uncertain data, which deals with the problems of data collection and data reliability, and it often carries over to monitoring by invalidating certain assumptions on the data stream. There are many sources of uncertainties in the monitored data: timestamps can be imprecise due to clock skew, logs may be incomplete due to outages, or even disagree when coming from various sources. Uncertainty can come from the monitored systems themselves which can exhibit stochastic and faulty behavior. Another related field is state inference of the monitored system using probabilistic approaches where a belief state is maintained and updated during monitoring. Although these approaches provide probabilistic guarantees as part of the resulting belief state, they perform a specific monitoring task.

Existing work on approximate monitoring stems from the fields of databases [38], streaming algorithms [250], and property testing [177]. All approaches can be classified based on two criteria: the specific queries they approximate and the resources they optimize. Commonly approximated queries in the literature are cardinality estimation [153], top-k items [39], frequent items (heavy hitters) [210, 236, 335], quantiles [114, 335], frequency moments [112, 115], entropy [20], other non-linear functions over (possibly distributed) streams, and distance queries [9]. Orthogonally, the approaches either optimize memory consumption, communication cost, execution time, or the monitor’s overhead.

Optimizing memory consumption has led Morris to develop his well-known approximate algorithm for counting [243]. The HyperLogLog algorithm [153] tackles the cardinality estimation problem mentioned in the example above. Counting the most frequent items in a stream is a very common query. In fact there has been an ample amount of work in devising good approximation algorithms. One of the oldest streaming algorithms for detecting frequent items is the MJRTY algorithm [83] and its generalizations [122, 208, 240].

Optimizing communication cost is a common problem in the field of streaming databases. Consider k data streams and a monitor that consisting of \(k+1\) distributed components—one for every stream and an additional central coordinator. Components are only allowed to send messages to the central coordinator. The goal is to track a (reasonably accurate) value of a function defined over the data in all k streams at the central coordinator, while minimizing the number of messages sent. This problem is a good abstraction of many network monitoring tasks where the goal is to detect global properties of routed data. The communication cost is the primary measure of complexity of a tracking algorithm. Initial work dealt with optimizing the top-k items query [39]; it was then extended to non-temporal functions [115, 323]. Temporal queries are facilitated by introducing various types of windows, and the approximation is achieved by maintaining a uniform sample of events per window at the coordinator [111, 116].

Optimizing execution time using approximation methods involves ignoring parts of the input, predicated on strong statistical guarantees on the accuracy of the output. This is enabled by sampling techniques [113] that are shown to work for specific queries. These techniques are often referred to as Approximate Query Processing (AQP) and they are implemented by many existing systems [9, 244, 245, 302]. When sampling, a random sample is a “representative” subset of the data, obtained via some stochastic mechanism. Samples are quicker to obtain, smaller than the data itself and are hence used to answer queries more efficiently. A histogram summarizes the data by grouping its values into subsets (or “buckets”) and then computing a small set of summary statistics for each bucket. These statistics allow to approximately reconstruct the data in each bucket. Wavelets are techniques by which a dataset is viewed as a set of M elements in a vector, i.e., a function defined on the set \(\{0,1,2,\dots ,M - 1\}\). Such a function can be seen as a weighted sum of some carefully chosen wavelet “basis functions”. Coefficients that are close to zero in magnitude can then be ignored, with the remaining small set of coefficients serving as the data summary. Sketches are particularly well-suited to streaming data. Linear sketches view a numerical dataset as a matrix, and multiply the data by some fixed matrix. Such sketches are massively parallelizable and used to successfully estimate answers to set cardinality, union and sum queries, as well as top-k or min-k queries.

Optimizing monitoring overhead is a problem often encountered in runtime verification. When optimizing overhead, one must consider the monitored system in addition to the event stream. In this setting, computing resources (time, memory, and network) are shared by the monitored system and the monitor. Overhead can be seen as the percentage of the resources used by the monitor. Bartocci at al. [50, 205, 303] use dynamic knowledge about the monitored system to control the amount of resources that are allocated for monitoring. More precisely they enable and disable monitoring of certain events as needed. This can be seen as sampling, however the stochastic mechanism is informed by the probabilistic model of the monitored system. Given how likely it is that an event will participate in a violation of a given temporal property, the system decides to include it in the monitored stream. The aforementioned approaches all differ in the probabilistic formalism used to model the monitored system [49].

8.1.4 Utility

Another important dimension is the usefulness (or utility) of the monitoring output. The expected output of the monitoring problem is often underspecified and usually different approaches employ different assumptions derived from the implementation details of the monitoring algorithms. Yet, the underlying time and space complexity of the monitoring problem highly depends on its precise output specification.

For instance, some monitoring algorithms output a single Boolean verdict stating that, overall, the trace satisfies or violates the monitored property. Other monitoring algorithms solve a strictly harder problem—they output a stream of Boolean verdicts attesting to the satisfaction of the monitored property for every prefix of the trace (or stream). While the complexity of the former variants have been studied for various specification languages [85, 150, 216], the latter have mostly been ignored.

An interesting distinction to make is between outputting a stream composed only of violations, versus giving a (more general) stream of verdicts that includes satisfactions of the monitored property as well.

Traditional monitoring algorithms for temporal logics with future operators, scale poorly when subjected to high-velocity event streams. One reason is that the monitor is constrained to produce outputs strictly in the order defined by the incoming events. It can be shown that this ordering constraint, although providing more usable output, makes for a more complex monitoring problem. An interesting special case of monitors producing out-of-order output are monitors that output violations as soon as possible, i.e., as soon as they have enough information from the input to pinpoint some violation. Monitors that produce ordered output violate this seemingly natural monitoring requirement.

Orthogonally, in contrast to reporting all violations of a property, there are many valid use cases where monitors report only some (most relevant) violations. Examples include reporting only the first, or the last (most recent) violation. However, the impact of these choices on the monitoring complexity is unclear.

It is also possible to design algorithms that produce non-Boolean verdicts, for example using Stream Runtime Verification [121], which allows to compute streams from arbitrary domains. Other system use verdicts that target specific (potentially relaxed) output requirements and may or may not contain enough information to reconstruct the standard Boolean verdict output. For example, Basin et al. [57] proposed the so-called equivalence verdicts that state that the monitor does not know the Boolean verdict at a particular point in the event stream, but it knows that the verdict will be equal to another, also presently unknown, verdict at a different point. The equivalence verdicts carry enough information to reconstruct a stream of Boolean verdicts. To do so, one must reorder the verdicts reported in the output stream and propagate Boolean verdicts to the equivalent ones.

All output variations mentioned so far compromise utility for the sake of scalability. However, sometimes starting from a stream of verdicts, it is quite nontrivial to understand why a complex property is satisfied (or violated) at some point in the trace. One can increase the utility of the monitors by replacing the stream of Boolean verdicts with a stream of proof objects that encode the explanations as to why property has been satisfied or violated. The proof objects can take the forms of minimal-size proof trees [52], or a compressed summary trace capturing the essentials of the original trace that contribute to a violation.

8.2 Challenges

C 8.1

Combining Horizontal and Vertical Parallelization. The different approaches to parallelize monitoring algorithms have different advantages and limitations. Horizontal parallelization as in Barre et al. [42] and Hallé and Soucy-Boivin [184] does not dependent on the actual events but is limited by the formula’s structure. Vertical parallelization as in Basin et al. [53] or parametric trace slicing [101, 279] offers an a priori unbounded amount of parallelization but may lead to data duplication for certain formulas. A combination of the approaches may achieve the best of both worlds and is worth investigating.

C 8.2

Scalable Monitoring in Online Setting. Most of the described approaches rely on MapReduce as a technical solution for distributed fault tolerant computation. However, its batch-processing nature restricts monitoring to the offline setting, in which the complete log of events is given as input to the monitor at once. More recently, systems research has moved towards a proper streaming paradigm, as witnessed by widely adopted streaming frameworks such as Apache Flink [92] or Timely Dataflow [246]. These frameworks can be used to achieve scalability in the online setting, in which individual events steadily arrive at the monitor. The challenge thereby is to adapt the offline approaches (both horizontal and vertical) to the online setting.

C 8.3

Adaptive Scalability A related challenge that arises only in the online setting is adaptivity. To retain scalability, a parallel monitor, and in particular its log slicing component, may need to adapt to changes in behavior of the monitored system. For example, an event-rate increase or change in the occurrence distribution of some system events. Detecting such changes and adequately reacting to them are both challenging. In particular, the latter will most likely require a reshuffling of the parallel monitors’ states in a way that maintains a consistent global state, that is, it does not compromise the soundness of monitoring.

C 8.4

Automatically Synthesizing Splitting Strategies. Log slicing techniques, like Basin et al. [53] rely on a domain expert to supply a splitting strategy. An open challenge is how to synthesize such a splitting strategy automatically, based on the monitored property and some formalized domain knowledge, for example, statistics on types of events in the log. The holy grail would be an algorithm that picks the optimal splitting strategy, i.e., one that minimizes the amount of duplicated data between the slices and creates balanced slices that require equal computational effort to monitor.

C 8.5

Expressive Specification Languages. Richer specification languages allow to capture more sophisticated properties. For example, hyperproperties allow to express relational properties (essentially properties that relate different traces). These traces can come from a single large trace that is processed offline. For example, a specification can relate two traces, which are extracted from the large trace as requests coming from different users or different requests performed at different points in time. This richer language would allow to express properties like differential SLA that are beyond the expressiveness of the specification formalisms currently used. Another family of specification languages that allow to express rich properties is stream runtime verification languages. Currently, these languages only have online and offline evaluation algorithms for small traces, in the sense of traces that can be stored in a single computer. A challenge is then to come up with parallel algorithms for large traces.

C 8.6

Richer Verdicts and Concise Model Witnesses. Classical specification formalisms from runtime verification, borrowed from behavioral languages used in static verification, generate Boolean outcomes from a given trace, which indicate whether the trace observed is a model of the specification. One challenge is to compute richer outcomes of the monitoring process. Examples include computing quantitative verdicts, like for example how robustly was the specification satisfied or computing statistics from the input trace, like the average number of retransmissions or the worst-case response time. A related challenge is the computation of witnesses of the satisfaction or violation of the property for offline traces. The main goal is that the monitoring algorithm computes the verdict and, as by-product, a compressed summary trace, where irrelevant information has been omitted and consolidated. Algorithms will have to be created to (1) check that the summary trace is indeed a summary of the input trace, and (2) that the summary trace has the claimed verdict against the specification. This process, if successful, will allow to check fast and independently that the runtime verification process was correctly performed.

C 8.7

Approximate monitoring. The monitoring setting should provide a systematic and explicit way to specify tradeoffs between the resources the monitoring algorithms may utilize (e.g., maximum memory consumption or running time) and the accuracy of the verdicts they provide. Existing work provides such tradeoffs for a few fixed monitored properties (usually involving aggregations), however, support for complete language fragments is an open problem.

C 8.8

Impact of utility on monitoring complexity. The existing work on the complexity of monitoring [85, 150, 216] (called path checking in this context) only considers the problem of providing a single Boolean verdict in an offline manner. Tight complexity bounds for the online monitoring problem or other variants of the problem with different output utility (e.g., a verdict stream) have not yet been established. The impact of the different kinds of verdicts on the complexity of the resulting monitoring problem needs to be better understood.

9 Conclusion

Runtime verification techniques have been traditionally applied to software in order to monitor programs. One of the missions of the EU COST Action IC1402 (Runtime Verification beyond Monitoring) was to identify application domains where runtime verification and monitoring could be applied, and describe the challenges that these domains would entail. This paper has explored seven selected areas of application, namely, distributed systems, hybrid systems, hardware based monitoring, security and privacy, transactional systems, contracts and policies and monitoring large and unreliable traces. For each of these seven domains, we survey the state-of-the-art focusing on monitoring techniques in these areas, and finally presented some of the most important challenges (collecting a total of 47 challenges) to be addressed by the runtime verification research community in the next years.