Safe and secure system architectures for cyber-physical systems

Cyber-physical systems are at the core of our current civilization. Countless examples dominate our daily life and work, such as driverless cars that will soon master our roads, implanted medical devices that will improve many lives, and industrial control systems that control production and infrastructure. Because cyber-physical systems manipulate the real world, they constitute a danger for many applications. Therefore, their safety and security are essential properties of these indispensable systems. The long history of systems engineering has demonstrated that the system quality properties—such as safety and security—strongly depend on the underlying system architecture. Satisfactory system quality properties can only be ensured if the fundamental system architecture is sound! The development of dependable cyber-physical architectures in recent years suggests that two harmonical architectures are required: a design-time architecture and a run-time architecture. The design-time architecture defines and specifies all parts and relationships, assuring the required system quality properties. However, in today’s complex systems, ensuring all quality properties in all operating conditions during design time will never be possible. Therefore, an additional line of defense against safety accidents and security incidents is indispensable: This must be provided by the run-time architecture. The run-time architecture primarily consists of a protective shell that monitors the run-time system during operation. It detects anomalies in system behavior, interface functioning, or data—often using artificial intelligence algorithms—and takes autonomous mitigation measures, thus attempting to prevent imminent safety accidents or security incidents before they occur. This paper’s core is the protective shell as a run-time protection mechanism for cyber-physical systems. The paper has the form of an introductory tutorial and includes focused references.


Introduction
Cyber-physical systems are computer-controlled, networked systems that interact with the physical environment, often in a control loop, some of them in an autonomous way [1][2][3][4][5][6]. Typical examples include autonomous cars, autopilot in an airplane, a heart pacemaker, or cooperating robots in a manufacturing line. Because of their impact on the realworld, cyber-physical systems must be built so that they cannot harm or damage people, property, or the environment: Their behavior must be safe and secure. Engineering safe and secure cyber-physical systems has become a specific, exciting, and essential engineering discipline.
Frank J. Furrer frank.j.furrer@bluewin.ch 1 Faculty of Computer Science, Technical University of Dresden, Dresden, Germany A long time ago, computers were just processing data, such as keeping accounts or managing inventory. Then they slowly started interacting with the physical world, for example, in the form of embedded computers controlling a combustion engine or as supervisory control and data acquisition (SCADA) systems governing industrial plants. Today, computers controlling all sorts of cyber-physical systems are pervasive-we find them everywhere. They have taken over control from small devices like a heart pacemaker to large applications, such as an autonomous container ship.
The system receives information about the environment from sensors (temperature, wheel rotation rate, camera, radar, gyroscope, etc.) and acts on the physical environment through actuators (motors, pumps, valves, etc.). The system comprises a number of interacting control algorithms, many of them closed-loop feedback algorithms. Some of these algorithms are based on self-learning (machine learning), for example, an autonomous vehicle's video processing software.

Software
Cyber-physical systems are controlled by software, that is, most of their functionality is implemented in software. Control by software carries some risks: A failure, fault, error, or successful cyber-attack-either in the software or in the execution platform-can have grave consequences, such as safety accidents, security incidents, crashes, or casualties. In today's environment, malicious interactions, such as hacking, malware, infiltration, etc., can also inhibit the correct operation and lead to dangerous consequences. Therefore, the quality properties of the cyber-physical system-especially safety and security-must be assured during all phases of system development, operation, and evolution [7][8][9][10][11][12].

Architecture
At the heart of a cyber-physical system is its architecture [13][14][15][16]: "Fundamental concepts or properties of an entity in its environment (= Context of surrounding things, conditions, or influences upon an entity) and governing principles for the realization and evolution of this entity and its related life cycle processes" [17]. A long-and sometimes painful-history of systems has proven that adequate, sound architecture is indispensable [18]. The architecture provides the foundation for the efficient development and evolution of the cyber-physical system and enables to a large extent also the quality properties!

Drift into failure
Fortunately, most modern system engineering processes are strongly safety and security aware [22][23][24][25][26]. In the majority of cases, these processes produce dependable and trustworthy systems. The organizations which make cyber-physical systems are almost always careful and diligent. Nonetheless, the press regularly reports security incidents and safety accidents. Why the discrepancy?
There are many reasons. First, the enormous complexity of today's (and even more: tomorrow's!) cyber-physical systems makes it impossible to avoid all vulnerabilities. Second, the operating environment of these systems becomes more hostile every year (higher probability of failures, greater sophistication of malicious activities). Third, the market pressure demands low development and production cost. Fourth, the high rate of change often entices the developers to "cut corners," that is, reduce or skip necessary quality assurance measures, such as modeling, reviews, verification, validation, and thorough testing. The result is an accumulation of technical debt [18,26] and architecture erosion [18,27]. This slow, hardly noticeable effect is called drift into failure [28] and constitutes a grave risk for evolving cyber-physical systems.

Last defense
As numerous examples show beyond doubt, it is not possible to eliminate all vulnerabilities from a complex cyberphysical system during development/extension/deployment time. Unfortunately, a likelihood always exists that the system will experience a security incident or generate a safety accident during operation.
Are there mechanisms other than a very diligent development process to reduce the impact/damage of a security incident or a safety accident? Fortunately, the answer is yes and reads: Run-Time Monitoring [29][30][31][32][33][34]. In run-time monitoring, the system's behavior is observed and automatically checked for compliance against the desired behavior. The desired behavior is defined in policies, specifications, rules, or models. The run-time monitor attempts to identify anomalies, that is, any deviation from the desired behavior. Preferably, the run-time monitor works in real-time: In this case, the monitor can detect, inhibit, or mitigate anomalous behavior before a safety accident or a security incident occur. The run-time monitor, therefore, acts as a last line of defense ( Fig. 1): The system's engineering process attempts to eliminate the vulnerabilities in the system. However, a (hopefully small) number of vulnerabilities remain in the run-time system! A malicious threat or an unforeseen failure in the run-time system can thus provoke a security incident or generate a safety accident. If the run-time monitor works correctly and in real-time, it may prevent-or at least substantially reduce the negative impact-of the security incident or the safety accident. The functionality of the run-time monitor thus forms the last line of defense of the cyber-physical system!

Run-time monitoring principle
"Run-Time Monitoring as a Last Line of Defense" of a cyber-physical system is used increasingly in various industries (e.g.: [31,35]). The principle of run-time monitoring is explained in Fig. 2: The real behavior is continuously com- The functional specifications, expressed in a formal, machine-readable language (e.g.: [34,40,41]). A set of policies, expressed in a formal, machine-readable language [42]. A set of rules, expressed in a formal, machine-readable language [32]. Structural and behavioral models, expressed in a formal, machine-readable language [43,44]. In addition, the comparison makes use of information, such as operational data, log files, and the context (environment, partner systems, public information).
If a deviation of the real behavior from the desired behavior is detected, the run-time monitor takes corrective action, whenever possible in real-time. Many types of corrective actions are possible, all aiming to avoid or reduce the negative impact of a safety accident or security incident [12].
Using run-time monitoring (often called "active run-time monitoring" because of its real-time intervention capabilities) requires two types of system architecture: 1. The design-time architecture 2. The run-time architecture

Design-time architecture
The design-time architecture aims to avoid as many vulnerabilities in the system as possible. This is achieved by a diligent, security-and safety-aware system's engineering process and a subsequent vulnerability elimination process (Fig. 3: e.g.: [10]).

Run-time architecture
As soon as the design-time architecture of the cyber-physical system is judged to be sufficiently safe and secure, the system is deployed, that is, transferred to its operational environment and handed over to the users (Fig. 3). Unfor- tunately, the run-time system may still contain vulnerabilities-which constitute a considerable risk for its usage. Therefore, an additional architectural element protects the run-time system: the (active) run-time monitoring (Fig. 4). The run-time monitoring embraces the run-time system and attempts to protect it from the impact and the consequences of threats and failures-whenever possible in real-time. This additional layer of protection can be seen as a protective shell that enfolds the running system. The idea of a protective shell as a separate architectural element and engineering artifact was presumably introduced by Lance Eliot under the name of "AI Guardian Angel Bots" for systems controlled by machine learning [36]. Here, the less exotic name Protective Shell is preferred [12].

Protective shell
The engineering design and the capabilities of a protective shell strongly depend on the run-time system to be pro-tected. A generic architecture of a system with a protective shell is shown in Fig. 5. In the core of Fig. 5, the operational cyber-physical run-time system, including its interfaces to the real world and the network connections, is featured. Enfolding the run-time system is the protective shell. The protective shell disposes of more information than the runtime system from additional sources, possibly even from additional hardware. Examples of additional information sources include (Fig. 5): Operational data, log files, functional specifications, behavior models, policies, and specific rule sets Context information (From the environment, from other systems, from public sources, etc.) From access to the sensors (inputs) and actuators (outputs), possibly even using additional sensors or measuring instruments From the network usage, monitoring, and logging In addition to traditional techniques, such as range and rate checks of sensors, and discrepancy and plausibility  checks on actuator values, the protective shell often uses artificial intelligence and machine learning to detect anomalies [37-39, 45, 46, 50]. Any anomaly in behavior detected is immediately analyzed, assessed, and corrective actions are taken. Corrective action may include stopping the system, leading the system into a safe state, or into a safe degraded operation.

Emergent behavior
Most cyber-physical systems today consist not of one single, homogeneous system but are assembled from various constituent systems-thus forming a system-of-systems ( Fig. 6; [51,52]). A number of self-contained systems with specific functionality are interconnected to realize higherlevel objectives. By combining the functionality of the constituent systems, superior functionality can be achieved, which cannot be provided by any of the constituent systems alone. An example is the various driver assistance systems in modern cars, such as lane-keeping, distance control, electronic stability control, traffic sign recognition, emergency braking capability, obstacle detection, automatic speed limiter, and airbags. Individually they offer assistance for specific potential accident situations. However, if the functionality of these systems is combined, a much safer car results. The emerging functionality from combining obstacle recognition with automatic emergency braking capability and electronic stability control will prevent significantly more accidents than each of the individual systems possibly could. This desired, valuable emergent functionality is the reason why the system-of-systems is designed and built! Unfortunately, assembling system-of-systems from their constituent systems can also generate unexpected, undesired, potentially damaging behavior. The constituent systems' interconnection may generate unexpected failure modes, unanticipated system weaknesses, or new attack

Autonomy and machine learning
Modern cyber-physical systems exhibit a strong tendency towards autonomous behavior (e.g.: [55]): Such systems can change their behavior due to learning from experience or in response to unanticipated situations during operation. They are characterized by computers (i.e., software) making decisions affecting the physical world, such as autonomous vehicles. In many applications, these decisions are based on machine-learning algorithms [56][57][58], such as recognizing obstacles, their trajectories, and speeds from video, radar, or lidar images. Often, the machine-learning algorithms are not based on deterministic calculations but, for example, on statistical or training data evaluation. This can introduce a high degree of uncertainty and unpredictability in the autonomous system [36,56,59], which, in turn, introduces the risk of safety accidents or security incidents. Again, anomaly detection during run-time would be the last defense because predicting, assessing, and mitigating all safety and security risks during the development/ deployment process is improbable in the context of autonomy and machine learning (Fig. 7).

Conclusions
A protective shell is a technique that can significantly enhance the safety and security of cyber-physical systems at run-time. It is a current, active research area, and some industries producing mission-critical cyber-physical systems are already implementing it.
However, the challenges of implementing a protective shell are that: Using a protective shell requires a very high degree of formalization for reliable anomaly detection [47]. Designing a protective shell to protect against damaging run-time behavior is a highly challenging engineering task. The protective shell consumes additional run-time resources (power, CPU, memory). Designing and implementing a protective shell needs highly educated engineers [48]. The protective shell's code and data increase the system's complexity, which may generate additional failure modes and possibly also enlarges the attack surface [49].
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4. 0/.