Keywords

1 Introduction

An increasing trend to higher product varieties leads to more and more complex production systems [1]. Furthermore, factors of global competition, sustainable product design and digitalization intensify the competition and time to market pressure on technological developments and at the same time increase the complexity of processes and products. This is caused by changing consumer behavior and technological changes. The supply and demand situation today often has to be answered much faster and more versatile. These changes lead to a need for adaptation for products, systems, and companies with their processes. However, the increasing complexity is not only evident for the market with its participants, but also takes on other dimensions such as infrastructures, networks, and supply. A greater need for coordination must also be mastered. Managing the increased complexity thus poses challenges for the design of systems at the various levels.

One possibility to master the increasing uncertainty is the paradigm of resilience. In this paradigm the considered technical system, production system, or supply-chain system is able to master, learn from, and adapt to disruptions within their lifetime, which were not considered explicitly within the design process.

This approach requires a change in an engineer’s mindset, as engineers are trained to design systems and products in a deterministic process, where the definition of requirements happens at the beginning and covers only specific disruptions. This deterministic view leads to a reductive design approach, which means reducing or omitting the existing uncertainty that arises during the usage phase or within production. Traditionally, if the uncertainty in the production or usage period cannot be neglected, the system only has to respond to changing conditions in a robust way. “A robust system proves to be insensitive or only insignificantly sensitive to deviations in system properties or varying usage” [2, Glossary]. The mentioned deviation is often compensated by impinging a safety factor, and thus supersizing, which allows the system to withstand the changing properties without any impact on the system’s functionality. However, a more sophisticated approach has been developed, too, referred to as Robust Design [3], [2, Section 3.3], [2, Section 3.5].

On the contrary, the resilience paradigm augments this traditional point of view, cf. [2, Section 3.5], by accepting the fact that most systems face unforeseeable disruptions within their lifetime.

Fig. 1.
figure 1

Exemplary progression of the functional performance f of a system showing resilient behavior. A minimum performance is defined by \(f_{\text {min}}\). At the time \(t_{\text {pre}}\) the (severe) disruption starts, while at the time \(t_{\text {post}}\) the new performance level is reached again. The system is able to master the disruption based on an adaptation and conceivably a learning procedure. Furthermore, the system does not fall below the minimum performance \(f_{\text {min}}\). This example is adapted from the classic resilience triangle approach shown by [4] and [5].

The principle progression of a system’s functional performance over time for a resilient behavior is shown in Fig.  1. The performance decreases after the onset of the (severe) disruption, but is kept above the required minimum performance \(f_{\text {min}}\). After a period of time, which often depends on abating of the disruption the system’s functional performance recovers at least to a certain extent.

To derive a comprehensive understanding of the abilities the resilience paradigm can provide for a technical system, we provide a brief definition. We define a technical system compliant with standard definitions of mechanical and mechatronic systems as shown by [6, Chapter 1]. Here, the term system describes the “totality of all elements considered” [2, Glossary]. It is delimited from the environment by its system boundary and usually consists of multiple subsystems. “Setting a system boundary defines the (...) product” [2, Glossary], which is developed. A technical system fulfills one or more predefined functions.

In the following we only consider mechanical and mechatronic systems, which usually consist of a mechanical structure and a predefined number of actuators and sensors, which are required to fulfill the predefined function. Here, we refer to single components like pumps or pistons as well as more complex, e.g. load-carrying systems [2, Section 3.6]. Examples would be transmission systems, industry-scale fluid distribution systems, chemical plants or brake systems in vehicles. These shown technical systems distinguish themselves from socio-economic systems by being rather complicated than complex systems [7]. This reduced complexity leads to challenges in the adaption of the resilience paradigm, since the reduced complexity yields less flexibility to adapt to disruptions.

In the following, we present results obtained in an interdisciplinary group from the engineering, mathematics and psychology domain. The group developed methodologies and reference systems to apply the paradigm of resilience in the mechanical engineering domain. Subsequently, we outline a concept of resilience in load-carrying systems and derive key functionalities each resilient system might fulfill according to our current point of view. Furthermore, we point out the challenges and potentials for a wide adoption of resilience in the engineering domain.

2 Overview of Resilience Concepts

Resilience is a paradigm widely used in different disciplines cf. [8]. It is derived from the Latin word resilire, which can be translated with “bounce back”, [9, p. 184]. This translation of the origin only describes a very small part of resilience concepts and misleads the understanding as general systems should not only return to the state before the occurrence of disruptions, but learn from the endured experiences.

An extended view that can be seen as a major step within resilience research is given by the significant contribution of Holling in 1973 [10]. Holling enforced a new understanding of resilience, which led to significant contributions in the domain of ecology, socio-ecology and socio-technical system design.

These previously mentioned systems can be summarized under the term complex adaptive systems [11]. These systems consist of multiple agents that can act on disruptions based on their intrinsic motivation. Therefore, the system can be seen as an adaptive system. Its behavior is often non-linear, affected by agents with different goals and abilities, and often leads to unexpected outcomes. As each agent acts individually the complexity of systems, like socio-technical, ecological [10, 12], or sociological systems, is far more pronounced than in mere technical systems. Nevertheless, in practice of technical systems the borders between complex and complicated systems are fuzzy [13]. In the field of reliability research for instance, the focus is on so-called high-reliability systems, such as nuclear power plants. These systems are extensively known but still classified as rather complex because unpredictable interdependencies can occur. Here, researchers try to design resilience as a safety paradigm. These systems are also understood as socio-technical, i.e. both the technical components and the human being is understood as an acting and reacting part of the system. Other systems, as for instance a star-shaped robot developed by Bongard and Lipson [14] can be described as a complicated system, which means extensive influences have an impact on the system, but it is theoretically ascertainable and predictable.

All systems have in common that resilience must be measured with the help of specific metrics to distinguish a more resilient system from a reference system. Therefore, the research in engineering has mostly focused so far on the definition of meaningful resilience metrics. This leads to a high number of metrics, which were proposed in the literature, as shown for instance by [15, 16]. Most of these metrics related to technical systems were developed and used for network-like structures that can be represented by a mathematical graph. Examples are for instance water or electricity supply systems. In the graph representation, network properties like k-shortest paths [17] are considered as metrics to measure the resilience in case of rare events like component failures. In this approach systems are mostly considered as quasi-static, and they should fulfill a predefined minimum functionality even in the event of arbitrary system failures. To derive a resilient design of the underlying graph representation, they are improved algorithmically or in multiple iterations. For instance, by using a simulation-based approach [18].

Furthermore, Thoma et al. [7] criticize that much of the work in the area of technical systems has so far been too much conceptual. They see engineering research as having an obligation to go even further into the design of systems at all levels and to generate more concrete designs and solutions.

3 Our Approach—Definition, Resilience Functions and Metrics

In 2017 a group of roughly ten mechanical engineers and mathematicians, supported from 2019 on by one psychologist, started to work within the Collaborative Research Center (CRC) 805 on resilience of technical systems. After looking into other scientific domains and their approaches, it became clear to us that there was a discrepancy between complex socio-technical systems the resilience community worked on and the rather complicated systems typical (mechanical) engineers face in their daily work. Thus, we derived a definition of resilience specifically for technical systems, [5], [2, Section 6.3]:

A resilient technical system guarantees a predetermined minimum of functional performance even in the event of disturbances and failures of system components, and a subsequent possibility of recovering.

Resilience, from our point of view, is considered as complementary to robustness approaches, which are conventionally used for designing load-carrying systems in mechanical engineering.

Especially, for complicated systems, like technical systems, the resilience must already be considered within the design phase. Additionally, a resilient design of technical subsystems in combination with a resilience-considering design strategy can result in a composition of more resilient systems. Based on the system boundary even complex systems can then be considered within our approach.

Furthermore, relying on the work of Hollnagel [19, 20], we define resilience functions that a technical system needs to have: monitoring, responding, learning, anticipating.

In addition, we have derived a set of resilience metrics specifically for technical systems [2, 5], which allow quantifying resilience. We used those metrics to quantify the resilience of a by-wire car brake system [21], a water supply system [22], a dynamic vibration absorber [2, Section 6.3.6], a pumping system [23], a joint break [5], and a truss topology design [5].

4 Design of Resilient Technical Systems

After knowing what a resilient technical system seems to be, and how it can be evaluated, the question “How to design a resilient system” remains. In this section, we present practical implications and examples of more resilient technical system designs.

4.1 Practical Implications

Resilient technical systems cannot be seen detached from the conventional approaches for system design like the Robust Design approach. Some functions conventionally designed systems provide, and the models they are described with, also contribute to the description and development of resilient technical systems.

Besides this, common definitions in the resilience community like for instance “stress” and “shock”, cf. [24], can be transferred to the mechanical engineering domain, where it is known as disturbances and component failures.

The application of the resilience paradigm results in an integration of the product design and the product usage phase [2, Section 7.2.3].

Furthermore, resilient technical systems can handle disturbances and/or failures by applying at least the first two of the already introduced four resilience functions monitoring, responding, learning and anticipating, cf. [2, Section 6.3.2]. For instance a system measures its current state and changes accordingly, if it detects a deviation from the “normal” state. This is also known from fault detection and diagnosis, cf. [25].

If the system fails completely, usually a human intervention is intended, which enables the system to achieve the final desired state. For either a change of the system itself, seen as its response to the monitored data, or the intervention of a human operator require the system’s ability to (self-)adapt [26].

More resilient technical system designs also integrate a learning procedure to enable the system to learn from the endured disturbances and/or failures and the success of measures and strategies to handle the disruption. Learning can be understood as a reduction of model and data uncertainty through permanent model identification and adaptation during the life of a product. A further property also found in the resilience community is the possibility to anticipate. Anticipation is a predictive process (and system) change with the aim of reducing uncertainty. Thus, further more sophisticated controller strategies, like known from adaptive control [27], are suitable for resilient technical systems.

For systematic design of systems the general product development process according to VDI 2221 [28] can be applied to both mere robust design and more resilient design. Especially, both design methods necessitate the definition of requirements at the beginning and the design is supposed to be suitable for disruptions due to uncertainty, whereby resilience allows mastering uncertainty to a further extend than robustness.

Resilience design however requires an extension of the conventional design methods and models, as a central aspect of resilient behavior is the purposeful adaptivity of the system and a superior structure that specifies the resilience strategy for potential disruptions [26]. The models and methods for robust design are not necessarily able to describe a system’s adaptivity. Thus, we developed additional models and extensions of known models.

The resilience application model is applicable for analyzing and comparing systems according to their resilience level and properties, but also for the synthesis of resilient properties in systems cf. [29]. It comprises the resilience characteristics, behavior, the considered disruption, and potential correlating signals for the description of the system and influencing factors.

A central model in conventional systematic design processes is the functional structure model, cf. [30, p. 242 ff.]. The model describes systems in a determined and inflexible way in its original form. We extended the model with representations for disrupted sub-functions, redundancy, adaptivity within the system, and a superordinate resilience function structure to make it applicable for the development of resilient systems, cf. [21].

4.2 Example Systems

So far, we presented a methodological approach to resilience of technical systems. In the following we will present three selected examples from research within the CRC 805, to present a path towards the resilient design of technical systems.

By-Wire Car Brake System. In by-wire car brake systems resilient approaches are realized already. This system includes a car’s braking mechanism from the brake pedal’s signal to the deceleration of the wheels and also comprises assistant systems like the anti-lock braking system. The brake system can be disturbed by a decrease of the board net voltage, which serves as the energy source for several subsystems of the car including the brake system. This scenario can, e.g., occur when the battery temperature is low and another subsystem, that requires high currents, like the engine starter, is running. The resilient functionality addresses this disruption by shutting down less important subsystems, like the assistant systems, in case of a decrease in the voltage level to keep up a minimum functionality to maintain the opportunity of braking, cf. [21]. As braking is highly safety relevant for cars only braking can be defined as the minimum functionality of the brake system. To be able to respond to a voltage decrease, monitoring of the voltage itself is required. For a more sophisticated resilience functionality further influencing parameters of the board net voltage like the battery temperature need to be detected. The monitored data could then be interpreted by the computer system, enable an anticipation of the upcoming voltage decrease and allow to initiate the response before a possible disruption occurs [21]. For monitoring of all parameters of interest multiple sensors are required. Another subsystem of cars that supports the resilience approach, e.g. for the brake system, is the automated start-stop. Making the monitored data of the automated start-stop available for, e.g., the brake system could enable more sophisticated resilient properties with little additional effort for implementing the monitoring.

Water Supply System. An optimization-based approach to design a resilient water supply system for high-rise buildings is given in [22]. To supply all levels in a high-rise building with fresh water, usually pumping systems are required. In the given example, the authors developed an algorithmic approach to consider the failure of up to three arbitrary pump failures and still derive energy- and investment-efficient system designs of decentralized water supply systems that can fulfill a predefined minimum functionality, as shown in Fig. 1. They used a Mixed-Integer Nonlinear Program and derived system designs that are more energy- and cost-efficient than classically designed systems with a comparable given resilience property. Furthermore, the given approach computes a control strategy in case pump failures occur.

Pumping System. A more resilient pumping system was derived in [23] and [2, Section 6.3.8]. It uses the previously mentioned four functions of resilient systems as a starting point. For each function one or more algorithmic approaches were developed. A subset has also been practically evaluated at the developed pumping system test rig to assure the transfer and applicability to real systems. A specific focus was set on a system design that is on the one hand complicated and at the same time able to improve its functional performance if previously unseen disturbance patterns occur. The underlying algorithms are based on model identification, time series analysis and forecasting methods, which are commonly used within machine learning. These approaches can enable a more resilient system behavior, since they allow to increase the flexibility and to learn from endured experiences.

5 Challenges and Potentials

Next to the shown understanding of resilient systems and first design approaches, we also present challenges and potentials of this new paradigm.

5.1 Challenges

The realization of resilience in mechanical engineering poses a bunch of challenges due to the intrinsic properties of technical systems, their development, and usage.

Scope. Engineers tend to have a deterministic view. To understand a given problem set, engineers first define the system boundaries. Disruptions lying within the defined boundaries are considered while developing a solution, others are neglected. The concept of arbitrary disruptions is hard to grasp for engineers. If arbitrary disruptions are taken into account, two things can happen: i) the development is slowed down because of too many “but if’s”, ii) the system design becomes “over-engineered”, thus being cost inefficient.

Adaptivity. The engineering approach to deal with complex systems is to break them down into subsystems, making each of the subsystems less complex. The flexibility and adaptivity of these subsystems is low. Without these properties however, the recovery of the functional performance (Fig. 1) after a disruption is hard to achieve. This applies especially for purely mechanical systems. If something breaks, it usually does not regain it’s initial performance level.

Methodology. Engineering science has produced a high amount of methods and methodologies for product development and system design. Resilience being a paradigm, tends to be waived because there is a high uncertainty on how to achieve resilience within technical systems. So far systems have been analyzed and synthesis approaches have been deduced on an abstract level. The system analysis showed that resilience approaches already exist in current systems, especially mechatronic systems, like the mentioned by-wire car brake system. This provides example-based guidelines for the realization of resilience [21, 29]. Yet, the systematic approaches need to be completed to a comprising resilience design methodology and evaluated by application to actual developments. Furthermore, the resilience design methodology requires further empirical testing.

Robustness. The distinction between robustness and resilience remains a challenge for engineers, especially discussing specific systems. Robust Design is well known in the engineering domain, and includes many aspects of the resilience paradigm, cf. [31,32,33], [32] and [33].

Stakeholders. The typical context, in which technical systems are developed ,is a customer relationship. The customer defines requirements, the supplier defines a specification of what he is able to deliver. Ideally, after negotiating, both stakeholders know, what they can expect and what they have to deliver. After delivery, the specifications are either met or they are not fulfilled. The introduction of arbitrary disruptions into these requirements-specification domain is challenging, because it implies uncertainty for both stakeholders. Furthermore, the state-of-the-art for production processes is to define performance measurements, which are fixed. This goes back to Henry Ford and the so called “Austauschbau” [2, Chapter 2]. Theses fixed performance measurements lead to a conflict with self-adaptive systems. During further research it is important to meet those challenges to successfully establish the resilience paradigm in the engineering domain, cf. [2, Chapter 3], [2, Section 5.1.1] and [2, Chapter 7].

5.2 Potentials

The resilience paradigm offers potentials to master uncertainty for technical system designs in a rapidly changing environment. Hence, the interest in this field is evolving. For instance cities enforce the resilience of their infrastructure [34]. In 2020 the Covid-19 pandemic disclosed the vulnerability of global production and supply chains. These developments will affect technical systems as well. To increase attention on the topic within engineering domain use cases, the following possible potentials are emerging:

Flexibility. With a focus on resilience, more flexibility [2, Section 3.5] can be created for processes and products. This results from the fact that systems are no longer designed deterministically, but that changes can always be made.

New Mechanism of Actions. Through new systems, mechanisms can be explored and tested that were not previously considered in the usual way.

Learning from Errors. By integrating learning as a property of the technical system, it is possible to better analyze errors and malfunctions and learn from them. This can lead to a successive improvement of the systems. Thereby, especially highly safety relevant systems can be addressed because resilience enables a reduction of the risk of failure and thus an increase in the safety level.

6 Conclusion

The resilience paradigm differs from existing approaches to master uncertainty in the engineering domain. Typically, engineers try to identify uncertainty and design a system as robust as necessary. Today, one cannot say whether a resilient design might result in an even increased performance at similar effort. Nevertheless, addressing the paradigm of resilience is an important task for engineers. They are in a position to develop technical systems for the future—for a future, in which there is a high demand for resilient systems due to crises such as climate change or Covid-19. However, it is important to understand the deeper implications of the resilience paradigm. This includes that there is not one but a variety of possibilities to make a system resilient and that resilient systems do not have to absorb every potential disruption—it is even more important to strengthen the system to master likely ones. Especially in specific domains, such as critical infrastructure, a resilient technical system design can be beneficial. In other technical domains, a resilient design will not be required. Therefore, the context of the technical system is important and must always be considered. Furthermore, resilience should be understood as a process and not only as an output. While having resilience as an objective in mind during a product development, it can lead to solutions that have not been considered in advance. In order to approach the concept of resilience, it is therefore indispensable to have an interdisciplinary exchange.