Keywords

1 Introduction

In complex systems, small local deviations can emerge to unforeseeable critical disturbances (Ribeiro et al. 2011). With increasing complexity, high-impact events happen more frequently while predictions become more difficult, if not impossible (Taleb et al. 2009). In our globalized interconnected world, most companies did not anticipate the financial crisis 2007–2008 or the COVID-19 pandemic for example. In manufacturing, complex systems are common. Manufacturing systems and processes rely on the complex interplay of multiple factors related to material properties, process parameters, machines, human workforce (Herrera Vidal and Coronado Hernández 2021) as well as wear and environmental conditions (Bergs et al. 2020). Even for processes such as fine blanking, which is in practice for nearly 100 years, the theoretical understanding of the interdependencies between tool, material, and process is still limited (Aravind et al. 2021). At the same time, most manufacturing companies operate in interconnected, global manufacturing networks (Lanza et al. 2019). Hence, companies are subject to numerous interdependencies with partners and uncertainties of external events, which renders them susceptible to unpredictable disruptions (Peukert et al. 2020).

While complexity and uncertainties pose major challenges for manufacturing systems (Lanza et al. 2019), (complex) biological systems thrive in volatile environments and benefit from external stressors. The biological evolution represents an illustrative example. Evolution benefits from volatility in the form of mutations and from stressors through natural selection. Bacteria, for example, develop resistances to human-made drugs in only a few months and even adapted to benefit from industrial by-products in wastewater from polymer manufacturing (Danchin et al. 2011; Negoro et al. 1983). Biological systems have mechanisms that go beyond concepts such as resilience or robustness. A resilient system returns to its initial state as quickly as possible after the occurrence of stressors (Equihua et al. 2020). Robust systems preserve their initial state despite being exposed to stressors and volatility. In contrast, biological systems even benefit from stressors and volatility—without having to rely on predictions. As unpredictable events are inevitable in complex, technological systems present in manufacturing, it seems logical to adopt this trait from biology and to consider unpredictable events as a potential source for improvement rather than something strictly negative. Coming from research in risk management and applied probability, Taleb coined the term “antifragility” to describe this phenomenon (see Fig. 1) (Taleb 2013). Adapting the concept of antifragility promises a potential solution to address complexity and uncertainties present in manufacturing.

Fig. 1
A block diagram. It includes fragility, resilience, robustness, and antifragility with corresponding descriptions.

Definition of fragility, resilience, robustness and antifragility

The idea of applying mechanisms found in biology to complex technical systems is not new. However, most of the existing manufacturing literature focuses on survivability and fault prevention. Antifragility goes further by emphasizing how to gain from stressors and how to “love” errors. Hence, it provides a strategy to thrive in a complex, unpredictable environment (Taleb 2013). With the intent to find inspiration for the development of antifragile manufacturing systems, this article surveys existing examples, which are or may be considered antifragile, from multiple research fields. More concretely, the survey is focused on publications from biology, biotechnology, risk management as well as software engineering representing another domain with complex, technological systems. Besides, examples indicating antifragile-like traits from the field of manufacturing itself are presented. Furthermore, a framework for antifragile manufacturing is derived.

We favor the concept of antifragility over resilience, because antifragility exceeds the resilient ability to respond to a stress by resisting damage and recovering, i.e., reverting to the previous state. Antifragility, in contrast, learns from stressors and improves. Antifragile systems are designed, such that the use of unexpected events as a source of information for a targeted transformation is facilitated. Thus, we want to contribute our ideas of antifragility to an advanced Aachen Model for Transformation Research as an original contribution to transformation.

The remainder of this article is organized as follows: Sect. 2 introduces concepts that manufacturing already adapted from biological systems to overcome challenges associated with complexity and uncertainty. Section 3 highlights existing examples of antifragility from the aforementioned domains of biology, biotechnology, software engineering, risk management, and manufacturing. In Sect. 4, the framework to establish antifragility in manufacturing is proposed. Section 5 addresses challenges of antifragile manufacturing for future research, followed by a conclusion in Sect. 6.

2 Biologically Inspired Approaches Addressing Uncertainty and Unforeseeable Disturbances in Manufacturing

Resilience is frequently discussed in different academic fields as a desirable property for complex systems to deal with unforeseeable shocks and disturbances. Asokan et al. (2017) suggest that resilience is enabled through flexibility. Many-to-one and one-to-many mappings between components and functions facilitate flexibility in biological systems. Asokan et al. note that multi-functionality and overlapping functions of components likewise facilitate flexibility and resilience of manufacturing systems. However, in engineering resilience typically aims at returning to a state. In contrast, biological evolution emphasizes a more transformative form of resilience (Asokan et al. 2017).

Self-organization is another concept that is common in biological systems (Camazine et al. 2001) and that many researchers discussed as a solution to complexity and uncertainty in manufacturing. In a self-organizing system, the global behavior solely emerges from interactions of lower-level components, which act based on local information and a set of rules. Zhang et al. (2017) introduce two concepts, which are prevalent in the literature dealing with self-organization in manufacturing: holons and multi-agent systems. The philosopher Arthur Koestler coined the term “holon” to describe the organization of biological and social systems. A holon represents a basic unit of a system, which is in itself an autonomous whole, but also part of something, for instance another holon (Babiceanu and Chen 2006). Multi-agent systems aim to implement a distributed intelligence that is composed of multiple autonomous (software) agents. Zhang et al. propose to use agents as representations of physical entities, such as products, workers or machines, and subsequently aggregate different agents to functional modules, for instance for job scheduling or material transportation, based on the philosophy of a holonic organization (Zhang et al. 2017).

Bionic manufacturing systems (BMS) represent an alternative concept which is similar to holonic manufacturing systems (Tang et al. 2020), sharing the idea of decentralized, autonomous units and a focus on adaptivity. BMS have a hierarchical structure inspired by life forms being ordered in a hierarchy of cells, organs, lives and populations. Single autonomous production units correspond to cells. Based on the “DNA” of tasks, cells are combined through self-organization to provide the required manufacturing functions. In order to prevent conflicts between cells, the system is extended by coordinating units (analogous to enzymes in biological systems) (Tharumarajah 1996).

Lee et al. (2011) introduce the idea of engineering immune systems. Inspired by biological immune systems and the human nervous system, the concept envisions endowing manufacturing systems with self-maintenance capabilities allowing to survive in complex and uncertain environments. Lee et al. propose multi-agent systems to model the required functionalities of the engineering immune system, which comprise health assessment and prognosis, (maintenance) task planning and task execution. Darmoul et al. (2013) also present a framework inspired by biological immune systems to deal with disruptions in manufacturing. In analogy to biological immune systems, their blueprint for an artificial immune system is composed of artificial counterparts of cells, tissue, immune cells, pathogens, antigen presenting cells, B-Cells, Th-cells and memory cells (again implemented through multi-agent systems). These different components mirror mechanisms of biological immune systems to detect and classify abnormalities, assess consequences, derive and coordinate counter-measures and memorize successful reaction strategies in a decentralized way. Tang et al. (2020) propose a control model, which is inspired by the interplay of nervous system, endocrine system and immune system in nature. A main difference to related approaches is that it acknowledges the central nervous system as a centralized control unit. Hence, the model from Tang et al. includes a centralized shop floor controller mimicking the nervous system, which supervises and imposes constraints to distributed, cooperative and autonomous units.

Barbosa et al. (2011) hypothesize that mechanisms found in biological systems provide further inspiration to optimize multi-agent systems in manufacturing. More precisely, they discuss swarm optimization algorithms as well as mimicking pheromone-based communication of social insects (called stigmergy). In another paper Leitão and Barbosa (2010) survey bioinspired methods, such as self-organization as well as optimization algorithms, inspired by swarm intelligence and evolutionary theory, and their applications to engineering problems in complex, adaptive manufacturing systems.

Neves and Barata discuss evolvable production systems (EPS) as an approach to cope with unpredictable events, particularly in the context of assembly companies (Neves and Barata 2009). EPS are related to holonic manufacturing systems and bionic manufacturing systems and also based on hierarchically organized, autonomous modules (Ribeiro et al. 2011). However, according to Neves and Barata, EPS have a more dynamic notion (Neves and Barata 2009). Evolvable production systems are capable of adaption as a short-term response to opportunities or disturbances, but also of evolution in the long term. While a system adapts for instance by changing its behavior (e.g., through self-organization capabilities), evolution comprises a gradual introduction of new features (Ribeiro et al. 2011). The evolvability of complex parts (e.g., a production line or cell) is enabled through high flexibility on low-complexity levels (e.g., single devices within the system) (Neves and Barata 2009). Sufficient descriptions of modules in terms of required space, mechanical aspects, electrical specifications, control aspects, communication interfaces, etc., are prerequisites to replace, re-configure or expand modules (Hofmann 2010). From a technical perspective, ontologies (Parreiras 2012) and software agents provide tools to facilitate module interoperability (Neves and Barata 2009) and hence system reconfigurability.

In conclusion, the existing literature offers many examples of biologically inspired approaches to deal with unforeseeable disturbances in complex manufacturing environments. Typically, publications focus on recovery from shocks. Evolvable production systems are a notable exception as they also lay a foundation to gradually improve rather than just recover. As elaborated in the following section, the philosophy of antifragility goes beyond these concepts. Antifragile systems do not only benefit from volatility and disturbances, “the antifragile loves randomness and uncertainty, which also means—crucially—a love of errors” (Taleb 2013). This love of errors, i.e., the emphasis on exploiting errors and stressors constitutes the difference to existing approaches, which emphasize avoidance or compensation of errors.

3 Antifragility

As mentioned before, antifragile systems go beyond resilience and robustness, in that they benefit from volatility and stressors. In manufacturing, volatility may for instance arise in the form of fluctuations in material properties that occur despite unchanged material specifications (Harsch et al. 2018). A stressor is a source of harm, e.g., a disturbance in the supply chain in consequence of the Suez Canal blockage in 2021 (Yee and Glanz 2021). Mathematically, antifragility can be described by the probability distribution of positive (“gains”) and negative effects (“losses”) on the system resulting from volatility or undesirable events. In a robust system, the effects of stressors are very likely to be small. Even in the presence of unlikely, unforeseeable events, a robust system remains mainly unchanged. This results in a narrow probability distribution of effects on the system (see Fig. 2c). Similarly, a resilient system behavior leads to a narrow probability distribution as resilient systems return to their original state after being exposed to stressors. A system that returns to its original state neither improves (gains) nor deteriorates (loses). In fragile systems, the consequences of most events that occur have small effects, however some rare events can lead to extreme negative effects, possibly resulting in an irreversible loss of function of the system. Therefore, the distribution of effects on the system has a “heavy left tail”, i.e., there is a certain (low) probability of events leading to a high loss for the system. While fragile systems may by chance improve from unforeseen events, they are always characterized by this “heavy left tail” (see Fig. 2a, b). In contrast, negative effects in antifragile systems are limited (thin left tail of the distribution), while positive effects can potentially be large (fat right tail) (Equihua et al. 2020; Taleb 2013).

Fig. 2
4 sets of graphs, A to D, present fragile, robust or resilient, and antifragile. Exemplary temporal sequence, D has a higher gain value among others. Probability distribution, graphs have fat and thin left and right tails. Non-linearity of payoff function, graphs have concave and convex patterns.

Exemplary temporal sequence, probability distribution of positive and negative effects and asymmetries in payoff functions in fragile, robust and antifragile systems, respectively [based on (Taleb and Douady 2012; Aven 2015)]

The bottom row of Fig. 2 depicts an alternative way of describing antifragility mathematically. In fragile systems, varying the size of a stressor or event (denoted as x in the bottom row of Fig. 2) may lead to high, potentially fatal losses. Mathematically this corresponds to the “payoff” of the system responding concavely in the loss domain because of variations of x (see Fig. 2a, b). Here, the term “payoff” denotes the gains or losses a system experiences, when x is varied. In case of a convex payoff function (see Fig. 2d), losses are limited and higher advantages are to be expected, if x is volatile. Accordingly, antifragility is defined as a convex response to volatility or disturbance variables (for a defined range of variation). Robust systems (see Fig. 2c) remain almost unchanged to variations in x.

Very unlikely or unexpected critical events cannot be predicted (reliably). However, the previously described distinction of fragility and antifragility based on convexity or concavity respectively allows to (heuristically) detect the fragility or antifragility of a system (Taleb and Douady 2012). If a system reacts convexly to volatility, predictions become obsolete, since downsides are limited while there is chance that upsides will significantly outweigh the downsides.

To detect whether the payoff of a system is convex, a more specific and system-dependent definition of “gains” and “losses” as well as the stressor is required. For example, in the financial markets, “gains” might be actual monetary returns while the market volatility represents the stressor. In that example, a small stock investment could be considered antifragile, since the loss is limited to the initial investment, while the stock price could surge significantly higher than the potential losses. In the case of a manufacturing process, “gains” could, e.g., be expressed in terms of reduced costs, reduced makespan or improved quality, while varying material properties represent a stressor.

The mathematical perspectives on fragility, robustness, resilience and antifragility illustrated in Fig. 2 are idealized and will typically not hold outside of a certain scope. For instance, the muscular system of a human might benefit from external stress (e.g., through exercising) and hence could be considered antifragile. However, if the stress exceeds a certain limit, it will result in injuries (i.e., fragility) rather than muscle growth.

In the following subsections examples of antifragile behavior are surveyed. The examples are divided into four different domains. First, examples from biology are presented (Sect. 3.1). Following, already existing examples of antifragility in human-made systems from the domains of software engineering (Sect. 3.2) and risk management (Sect. 3.3) are discussed. Finally, examples from manufacturing, which demonstrate antifragile behavior, are described in Sect. 3.4.

3.1 Antifragility in Biological Systems

Many examples of (potentially) antifragile systems are present in nature. Biological evolution is a result of novelty creation in response to various stress factors. Living beings and ecosystems experience diversity generation by relatively low impact incidents that yield tremendous benefits in case of extreme events by improving the chance of survival (Negoro et al. 1983). Organisms can adapt to environmental changes by modification of metabolic pathways, down- or up-regulation of particular functions. In general, living organisms are characterized by an excess of functional diversity and genetic variation. Polyextremophiles are especially successful in surviving broad range of environmental conditions. The term polyextremophiles denotes organisms that thrive in the face of more than one extreme (e.g., extreme temperature and radiation). They are characterized by a high degree of phenotypic plasticity. One example of such polyextremotolerant organisms is black fungi Aureobasidium pullulans. These organisms evolved to develop numerous protective pathways (i.e., production of melanin or polyphosphates) that help to endure different environmental events mostly unbearable for other species. Capable to survive catastrophic events, they take an advantage of the availability of resources freed after the elimination of other organisms that could not survive. Even in the very stable and uniform laboratory conditions, the morphology of black fungi significantly differs. In addition, they can form facultative associations with other species to benefit from oligotrophic conditions. These features of black fungi as well as the given examples of tinkering allows Grube et al. to call them antifragile (Grube et al. 2013).

Here it is important to mention the term evolvability—the ability of biological systems to produce phenotypic diversity which is both heritable and adaptive (Kirschner and Gerhart 1998; Payne and Wagner 2019). Evolvability and its delicate interplay with robustness are important prerequisites of evolution and natural antifragility (Kim et al. 2020). In recent years, much progress is achieved in understanding the molecular basis of evolvability, such as mechanisms of phenotypic diversity generation, robustness in genetic systems and adaptive landscape topography. Phenotypic variation is caused by stochastic gene expression, errors in protein synthesis, protein promiscuity and epigenetic modifications. Recent review from Payne and Wagner elaborates on this topic in detail (Kirschner and Gerhart 1998). The development of antibiotic resistance in bacteria is an example of antifragility provided by stochastic gene expression and complex regulatory mechanisms involved in stress-response reactions and smart genetic information management (Levin-Reisman et al. 2017; Lewis and Shan 2017; Wencewicz 2019). The same principles are applied when bacteria evolve to catabolize xenobiotic substrates. In this case, the presence of unknown substances activates a stress-response resulting in release of reactive oxygen species provoking mutations that can potentially lead to acquiring of novel functions (Akkaya et al. 2018; Händel et al. 2015; Lorenzo 2014). Evolutionary principles are also used for directed evolution, a method used for protein engineering (Bornscheuer et al. 2019).

Antifragile dynamics are present in all the levels of organization of life. Thus, on the molecular level the antifragility of biological systems is provided by redundancy of genetic code and highly resilient natural protein sequence space that allows evolvability and sufficient functional stability to ensure the heredity of beneficial changes. On the cell level, complex metabolic pathways regulate the response of individual cells to changes in the environment by tuning the expression of enzymes, defining cell differentiation routes and influencing the cell cycle. Cells that evolved in environments with more perturbations demonstrate antifragile dynamics. Here, CD4 + T-cells may serve as good example of highly-dynamic antifragile systems that react to presence of pathogens in the body and send the signal to other immune cells to initiate the immune response in accordance with the nature of infection agents. Sub-population of CD4 + T-cells differentiates into memory cell that contribute significantly to antifragile behavior of immune system. Muscle tissue is a very illustrative example of antifragility on a tissue level. In response to stress caused by excessive exercise muscles are growing and becoming more enduring. Muscle development is regulated by a complex cell signaling network that reacts on molecular level to type and intensity of training. For instance, resistance exercises result in growth of muscle mass, whereas endurance exercise promotes the increase of capillary density, mitochondrial protein, oxidation enzymes, and more metabolically efficient forms of actin and myosin (Keller et al. 2011; Nader 2006). Similarly, antifragility is observed on organ and system level. Staying with the case of physical training as stress-event, heart and cardiovascular system can serve as further antifragility examples on organ and system level. Thus, aerobic exercises lead to the enlargement in heart dimensions, increase in blood volume, number of microcirculatory vessels and oxygen delivery to muscles. Such changes lead to improvement of the health of the individual organism in general (Hellsten and Nyberg 2011). Furthermore, antifragility can be easily observed on population level, where it is mostly ensured by the diversity. Next are the ecosystem and biosphere levels, the most complex and diverse systems that benefit from environmental variability and go beyond robustness and resilience (Equihua et al. 2020). In such hierarchical organization of life, antifragility on higher levels of organization is ensured by shared set of underlying processes and phenomena of the lower levels.

3.2 Antifragility in Software Engineering

In software engineering, it is well-known that bugs lead to errors. To avoid that, debugging is a steady task for software engineers. But while software systems have become too complex to fix bugs in a satisfactory way, so-called “failure self-injections” are used to test the “error-recovery capabilities” of software systems (Monperrus 2017). Following that, software engineers operate antifragile and not resilient since the steady exposure to errors should help to improve the software performance of Internet-based and distributed systems (Monperrus 2017; Basiri et al. 2016). Hence, programmers highlight the design and execution of such stress tests over the building of the software. A famous example here is Netflix’ “Simian army”, a group of self-injected failures that harm the software to train its running-capability despite the ongoing occurrence of errors (Tseitlin 2013). Since these errors are neither known nor foreseeable in their impact, software designers refer to this method as “chaos engineering” (Basiri et al. 2016). In the context of monitoring such systems, this double non-knowledge is also called “unknown unknowns” (Fighel 2017; Kim 2012). Especially with the aid of machine learning, engineers hope to gain new insights into the behavior of complex systems. De Florio (2014) argues in this context that the combination of elasticity in testing options and resilience in counterbalancing shocks defines antifragility in machine learning. Baruwal Chhetri et al. (2019) hypothesize that reinforcement learning offers a potential solution to reduce unknown unknowns through the exploration of (yet unseen) variations to known events. Additionally, there is a psychological aspect in antifragility for software engineers. Following Russo and Ciancarini (2016) in their “antifragile software manifesto”, programmers should not look for bugs anymore but develop an “error-loving” attitude since errors are the “primary source” in antifragility.

3.3 Antifragility in Risk Management

Contrary to risk assessment in economics, antifragile systems do not imply a concept of risk that depends on psychological notions such as subjective preferences or risk aversions (Rothschild and Stiglitz 1971). Instead, in a mathematical sense, risk can be applied to all systems as a “heuristic” to detect the fragility of systems described in nonlinear functions (Taleb and Douady 2012). This “detection heuristic”, as Taleb and Douady phrased it, should help to map so-called threshold-values in the function from where the loss stays low, but the profit can grow exponentially (Taleb and Douady 2012; Derbyshire and Wright 2014) (see also the convex function in Fig. 2). In consequence, this notion of risk neither refers to a known value of probability distribution nor to a moderate risk value based on scenario techniques with its emphasis on causation (Derbyshire and Wright 2014; Knight 1921). Instead, risk is defined as a steady exposure to unknown events until the system withstands the external stressors and starts to improve its performance. Aven (2015) argues that there are no antifragile systems, only antifragile heels in a system’s behavior. Risk assessment should therefore focus more on the description of key concepts like resilience and antifragility than on probability numbers of rare events (ibid.). However, Johnson and Georghe (Hespanhol 2017) used antifragility as an empirical category next to robustness and fragility to measure the performance of a simulated American smart grid power system. In their simulation, they examine ten categories of antifragility on a scale of −10 to +10. Risk is here proportional to efficiency in the analysis, since the more redundant procedures the system develops the less efficient it is, but also the less fragile in the long run. Additionally, more efficiency requires more resources which increases the risk probability (Johnson and Gheorghe 2013). Also, regarding emergency scenarios for urban planning, Hesphanol describes risk as a factor that can be minimized over time due to “cultivating redundancy of resources” (Mothes 2015). Using digital technologies like smartphone-apps and public campaigns for the preparedness of citizens, unforeseeable shocks lose their impact and become iterative risks of small shocks to the community. Antifragile risk management does not refer to psychological categories nor to methods of probability calculation. It contains a strategy to expose a system to harmful events while developing a redundancy in the way the system uses its resources to overcome the shocks and improve in the long run.

3.4 Antifragility in Manufacturing

Within the field of manufacturing, antifragility is yet largely unexplored. One potential explanation could be that in manufacturing it is more difficult to cap downsides—e.g., compared with software engineering (cf. Sect. 3.2), where it is possible to rollback to a backup of the software if a critical failure occurs. This allows software engineers to embrace stressors in order to test and advance their systems while keeping risks limited. If a machine on a shopfloor is damaged or scrap is produced, a manufacturing company cannot load a backup to return to a previous state. Thus, the hurdles to investigate the concept of antifragility in manufacturing seem to be higher. Moreover, the manufacturing domain has put a strong emphasis on efficiency and reducing volatility (see, e.g., lean manufacturing). In contrast, antifragility embraces volatility and accepts short-term inefficiency, for example in form of redundancies allowing to absorb shocks before learning from them to benefit in the long run.

So far, only a few production-related publications explicitly address the subject of antifragility. For example, Mothes (2015) acknowledges potential benefits of antifragility for manufacturing companies. He proposes modular production systems enabling flexible reactions to market fluctuations. However, as Mothes notes himself, antifragility is more than just flexibility. Derbyshire and Wright (2014) argue that excess stock inventory is antifragile to market volatility. If a crisis causes a shortage of an important material, its price will surge. Thus, manufacturers that held a buffer for that material will gain from the stressor (i.e., the crisis).

The literature additionally provides examples, which are at least related to antifragility in manufacturing. Although evolvable production systems (see Sect. 2) do not emphasize stressors or randomness as being desirable, the concept of (biological) evolution is inherently antifragile (see Sect. 3.1). Strain hardening describes the increasing strength and hardness of polymers and metals caused by distortion of the material or more precisely of its crystalline structure (Gooch and Gooch 2007; Manutchehr-Danai and Manutchehr-Danai 2009) and can be seen as antifragile reaction to distortion. Moreover, the semiconductor industry found ways to benefit from uncontrollable random variations in manufacturing. For instance, so-called physical unclonable functions (PUF) exploit (physical) variations in semiconductors to generate cryptographic keys (Shen et al. 2016; Yanambaka et al. 2018). The security of PUF gains from the randomness of the manufacturing process. Raghunathan et al. (2013) propose an algorithm, which exploits core-to-core variations in multi-core chips. Their algorithm cherry picks the ideal subset of cores from a chip based on the characteristics of a given application. The semiconductor industry has created options from variations.

The so-called pulsed laser-assisted wire-based laser metal deposition (LMD-w) is a manufacturing process showing potentially antifragile behavior. LMD-w is a process by which a wire-based feedstock material is cladded on a substrate or semi-finished product by applying laser energy (DVS 2011; Ngo et al. 2018). The main applications are part functionalization, hybrid workpieces and repair. In contrast to powder use, wire-based processes offer a material efficiency of almost 100% (Bambach et al. 2018). The wire is easy to handle and causes less harmful effects on human health (Kaierle et al. 2012). Its production is less cost-intensive than powder fabrication (Abioye et al. 2013). Despite these advantages of wire use, most of today’s established industrial LMD processes are powder-based. This is due to the comparatively low stability of LMD-w processes, where the stable process window is small (Abioye et al. 2013; Gipperich et al. 2021). Even weak variations of the process conditions can lead to process interruptions and significant defects. The instability of LMD-w processes is mainly caused by the complex melt pool dynamics of laser-based processes (Arrizubieta et al. 2017). The complexity of forces and interactions is even increased in the case of LMD-w, where the solid wire is connected to the liquid melt pool.

One process variant to increase the LMD-w stability consists in adding a pulsed wave (pw) laser to the continuous wave (cw) process laser beam. The pw power is low (2–5% of cw power), but the process dynamics are highly influenced by the second laser (Gipperich et al. 2020). In this pulsed laser-assisted LMD-w, the pw laser can be identified as a stressor. During the process, it evaporates part of the melt pool material, which results into the formation of a vapor cloud. The vapor interacts both with the laser beams (change of the effective absorption coefficient) and with the melt pool. As the evaporation goes along with a material expansion, a force acting on the melt pool is created (Bergs et al. 2019). As a consequence, the melt pool shape and thereby the resulting welding bead cross section are altered with regard to a conventional LMD-w process. Moreover, the repeated evaporation by single pw laser pulses imposes small periodic oscillations to the melt pool. The irregular melt pool movements with higher amplitude occurring in conventional LMD-w are suppressed. Consequently, a reduction of the bead’s surface roughness is observed in some ranges of the pw parameters (Gipperich et al. 2022). Recent studies show that the combination of modified absorption behavior and the change of melt pool shape and dynamics contributes to an overall process stabilization and an improvement of the part quality (Gipperich et al. 2020, 2021, 2022). As the process benefits from the pw laser as a stressor, these observations can be identified as antifragile-like behavior. In the future, it has to be investigated how the concept of antifragility can be used to further increase the understanding and stability of pulsed laser-assisted LMD-w processes.

4 Framework for Antifragility in Manufacturing

The examples discussed in Sect. 3 show that absorbing negative effects will not be sufficient to achieve antifragile manufacturing. Antifragility also requires exploiting stressors, e.g., through learning, selective pressure or optionality. Therefore, manufacturers have to implement mechanisms, which allow to generate upsides from shocks and volatility. In this regard, properties such as resilience and robustness are a necessary component for antifragile manufacturing systems, as they enable the systems to cap downsides and survive shocks and volatility while exploring the upsides associated with antifragility in the long run. Finally, antifragility requires the gains to prevail the potential losses (while preventing fatal losses). Therefore, means are necessary, which enable the detection and monitoring of fragility and antifragility, respectively, to balance robustness and resilience and the “love” of errors. In summary, three main components jointly contribute to achieving antifragility: (1) monitoring and detecting (anti)fragility, (2) increasing robustness and/or resilience, and (3) exploiting stressors. The antifragility examples presented in Sect. 3 rely on principles, which provide potential building blocks to implement these three main components. In the following, key principles from the different domains discussed in Sect. 3 are briefly highlighted, before they are mapped onto the three aforementioned main components and elaborated in more detail in Sects. 4.1, 4.2, 4.3 and 4.4.

Publications from the field of software engineering stress a mindset of “loving errors” and even propose self-injection of errors to learn and improve software systems. The learning process is based on trial and error and of exploratory nature to uncover “unknown unknowns”. Resilience is a necessary prerequisite of antifragile software systems, which allows to absorb shocks and enables learning in the first place.

In manufacturing, flexibility and optionality already contribute to a more antifragile behavior. Having multiple options and flexibility, manufacturers increase their chances to successfully react to disturbances as well as opportunities. Another principle is redundancy, which enables a system to absorb shocks, but also provides opportunities in volatile environments (as in the case of buffer inventory). Evolvable production systems envision evolvability in manufacturing through adaptivity of low-level system components and flexibility to replace components and reorganize.

In the field of biotechnology and synthetic biology, evolutive approaches are often applied to generate required properties on the organism or molecular level. Directed evolution (Nobel Prize in Chemistry 2018) that mimics the process of natural selection has become a widely accepted and broadly applied method for protein engineering (Bornscheuer et al. 2019). A directed protein evolution experiment comprises two main steps: generating diverse mutant libraries and screening for improved protein variants. Thus, directed evolution campaign the improvement of protein properties is achieved by generation of genetic diversity using mutagenesis followed by selection of better variants under “shock” conditions (i.e., high temperatures, presence of unusual solvents, extreme concentrations of salts, etc.) (Bornscheuer et al. 2019). Consequently, the performance of the proteins is improved as a result of selective pressure in an accelerated laboratory evolution format (Markel et al. 2020). The quality of a mutant library is decisive for the success of a directed evolution experiment and many methods have been developed for generating diversity at the gene level. These random mutagenesis methods (e.g., error-prone PCR (epPCR), SeSaM) differ significantly in the mutational spectra, mutation frequency and are differently affected by the redundancy of the genetic code. The experimental finding of improved variants is from a theoretical perspective highly surprising when the astronomical size of the protein sequence space (10,520; peptide with 400 amino acids) is taken into account. On the molecular level, protein sequence space can be considered antifragile by offering high degree of sequence diversity and “evolvability” that helps to withstand “shocking events” and unusual environments. Recently, several approaches toward continuous directed evolution were reported (Badran and Liu 2015; d’Oelsnitz and Ellington 2018; Hubbard et al. 2015; Morrison et al. 2020; Wang et al. 2018). These techniques allow performing many rounds of protein evolution without human intervention. Adaptive laboratory evolution is another example of mimicking natural evolution in artificial laboratory environment (Dragosits and Mattanovich 2013; Lee and Kim 2020). In this case, novel industrial strains of microorganisms are evolved toward improved metabolic pathways for their implementation in microbial production processes.

From the perspective of risk assessment, an antifragile manufacturing system would first avoid strong causal relationships between the structures and sub-structures. Although such causal connections facilitate the predictions about the systems behavior, they do not indicate big failures. Second, the question would be how to interconnect the subsystems as a nonlinear system to maintain stability (Hole 2016)? One way of risk assessment for nonlinear systems is to generate standardized options with a modular design (Mothes 2015). Such nonlinear systems are highly recursive (Taleb 2007) that is why the design of standardized options is also a redundant process. However, a certain number of subsystems has to stay exposed to stressors with unforeseeable consequences (like the welding process benefits from the stressor of the laser as a subsystem, see Sect. 3.4). In risk assessment of financial economics, these risk distributions between a majority of standardized low-risk and a minority of high-risk operations are called “barbell strategy”, which should keep the risk of high losses very low and create options for high gains (Derbyshire and Wright 2014). Another method in risk assessment how to deal successfully with nonlinear systems is the monitoring of the so-called “unknown unknowns”. While a lot of problems can be predicted in manufacturing systems, “unknown unknowns” are neither identifiable nor predictable in their consequences (Kim 2012). A manufacturing system that is exposed to such a randomness tends more to be prepared for events than to predict them (Taleb 2007). Thus, as a part of a framework for antifragile systems, the techniques of risk assessment put emphasis on the creation of standardized, redundant operations between loosely connected subsystems. Based on this, the system can be exposed to stressors with unknown risks and consequences to improve the systems performance without suspending the system to risks of high damage. Material as well as personal resources should therefore be more invested into the reliability of infrastructures made out of the redundant operations than into the quest for possible causal relationships in past and harmful events. The openness toward unknown risks emphasizes also a virtue of resistance to think in temporal orders of cause and effect. In manufacturing, this might be a counterintuitive way of thinking.

Figure 3 depicts a framework for antifragile systems comprising building blocks from different domains. The framework intends to offer guidance for practitioners to implement antifragility in manufacturing as well as for future research. A brief explanation of the single building blocks follows in the subsections below.

Fig. 3
A process flow diagram. Monitor and detect fragility leads to increased robustness or resilience, followed by learn and transform. It presents the corresponding building blocks from different disciplines such as biology, risk management, software engineering, and manufacturing for each process.

Framework to design antifragile systems and guide future research for antifragile manufacturing systems

4.1 Building Blocks from Biology

(B1) Overlapping functions and multi-functionality: Different components fulfill the same functionality in biological systems. Besides, single components fulfill multiple functions. These two features, also called functional redundancy and functional plasticity (Asokan et al. 2017), increase the robustness to a failure of system components.

(B2) Redundancy on multiple layers: Biological systems exhibit redundancy on multiple layers, starting already at the molecular level in organization of genetic code and continuing up to ecosystem level where richness of the biota and species redundancy contribute to reliable functioning of ecosystem (Naeem 1998).

(B3) Diversity: Diversity increases nature’s chances to have a suitable solution to unforeseeable shocks. This is especially true in the context of (directed) evolution.

(B4) Phenotypic plasticity manifests itself as changes in organism’s characteristics in response to environmental signals. In a heterogeneous environment, plasticity is highly favored as the organism can convert to optimum phenotype upon changes in the internal or external conditions. Organisms have hierarchical molecular organization and regulation starting from genome and moving further to transcriptome and proteome that ultimately defines any characteristics or action. Internal or external signals can interfere at any level by regulating transcription, translation or enzyme activity causing phenotypic heterogeneity. The mechanisms of phenotypic heterogeneity include stochastic gene expression, protein synthesis errors, protein promiscuity and epistatic modifications (Schlichting and Smith 2002).

(B5) Genetic information management: Replication of genetic information is not a perfect error-free process. Genetic mutations resulting from errors in DNA replication can increase the genetic diversity without affecting the phenotype. This enhances organism’s evolvability or ability to produce heritable variation. Evolvability is balanced with robustness required to preserve functionality. Genetic mutations that improve protein stability enhance the robustness by widening the range of possible follow-up mutations that do not cause the loss of functionality. On the other hand, robustness assists evolvability by providing a certain degree of acceptable diversity in the genetic pool. This diversity can further enhance evolvability through, for example, epistatic interactions or recombinations (Kirschner and Gerhart 1998).

(B6) Selective pressure: Through selective pressure the best solutions present in a biological system prevail (“survival of the fittest”). Thereby, the populations gain from unforeseen shocks.

4.2 Building Blocks from Risk Management

(R1) Taleb’s convexity heuristic: Taleb’s convexity heuristic (cf. Sect. 3) allows detecting antifragility or fragility, respectively. The shape of the nonlinearity of a system’s output allows to prioritize efforts to achieve antifragility (see Fig. 4).

Fig. 4
Left. Gain domain versus loss domain depicts 3 types of system behavior with concave and convex patterns. Right. A schematic presents 3 steps, namely increase robustness and resilience, increase learning capabilities or system transformability, and explore unknown scenarios and exploit volatility.

Three types of system behavior [based on (Taleb and Douady 2012)] and their implications

(R2) Barbell strategy: This strategy is characterized by combining low and high risk, while avoiding medium risks. If a majority of a company’s operations is associated with no or low risks, it becomes possible to explore high risks with potentially high gains. Since the majority of risks is small, overall losses should be limited (Derbyshire and Wright 2014).

(R3) Redundancy of resources: Redundancy of resources allows to withstand adverse events and is a prerequisite to achieve antifragility (Hespanhol 2017).

(R4) Loosely connected subsystems: Loosely coupled subsystems mitigate the risk of failure propagation. A link between two subsystems is weak if the damage caused by misbehavior of one subsystem to a second dependent subsystem is low. Moreover, connections should be sparse and break quickly, if a subsystem misbehaves. Hole proposes to implement “circuit breakers” between modules of a system, which ensure correct behavior, rather than direct links (Hole 2016).

(R5) Standardized options: Standardized options provide a form of redundancy in terms of scaling processes or systems (Manutchehr-Danai and Manutchehr-Danai 2009).

4.3 Building Blocks from Software Engineering

(S1) System-level monitoring: It is not feasible to monitor if single components of complex systems meet their specifications. Therefore, engineers at Netflix monitor system-level variables, which reflect if the system meets its ultimate goal (i.e., providing streams to customers). Moreover, they define variables that reflect real-world events such as server crashes and monitor whether system behavior variables are affected by changes of these event variables (Basiri et al. 2016).

(S2) Fallback solutions: A prerequisite for Netflix’ Chaos Engineering approach (see Sect. 3.2) are fallback solutions, ensuring graceful degradation if a service fails (completely). For instance, if the bookmark service (allowing customers to resume watching from the previous location) fails, the video will start at the beginning rather than throwing an error. If servers fail, customers are rerouted to other servers (Basiri et al. 2016).

(S3) Limited scope of exposure: Deliberately exposing software systems to stressors allows to test the error-recovery capabilities of complex software. Limiting these experiments to subsets of the software’s users is crucial to mitigate risk (Basiri et al. 2016).

(S4) Error self-injection: Self-injection of errors enables programmers to test and improve the error-recovery capabilities of software systems. Since it is infeasible to simulate the real world, injecting errors into live software systems provides insights that are more realistic.

(S5) Mindset of loving errors: Software engineering literature related to antifragility promotes a mindset, in which errors are a beneficial source of information. In contrast, manufacturing literature generally focuses on avoiding errors.

(S6) Machine Learning: Baruwal Chhetri et al. (2019) and de Florio (2014) propose machine learning to endow software systems with learning capabilities in order to become antifragile. On the other hand, Taleb (2013) points out that predictive models cause fragility, as they do not work well in case of low probability events. However, reinforcement learning algorithms explore yet unknown variations of their action space and benefit from trial and error. Hence, they have the potential to contribute to antifragile systems.

4.4 Building Blocks from Manufacturing

(M1) Redundancy: Redundancy, for instance in the form of buffer inventory, allows manufacturing systems to absorb negative consequences of shock events.

(M2) Flexibility: A flexible manufacturing system allows companies to adapt, for instance to market volatility or changing customer requirements. Hence, it increases the resilience, but also allows seizing opportunities (e.g., from a sudden increase in demand). Evolvable production systems, for instance, flexibly adapt their behavior through self-organization of autonomous, interoperable modules. Interoperability and self-organization also allow to flexibly add, replace or combine modules to introduce new features.

(M3) Inherent antifragility: Identifying and incorporating inherently antifragile phenomena, such as strain hardening or the pulsed laser-assisted wire-based laser metal deposition (see Sect. 3.4) may contribute to building more antifragile manufacturing systems overall.

(M4) Optionality: Examples from the semiconductor industry (presented in Sect. 3.4), illustrated that it is possible to utilize random deviations in produced components. If manufacturers were able to develop optional use cases for parts with quality deviations, they would become less susceptible to randomness or might even benefit from it.

5 Challenges of Antifragile Manufacturing and Future Research

The framework proposed in the previous section intends to provide a starting point for advancing antifragility in manufacturing. For example, a manufacturer could utilize the barbell strategy by defining a tolerable amount of extra scrap to explore new process setups which potentially lead to improvements, e.g., in sustainability or costs, while at the worst producing the predefined tolerable scrap amount. Self-injecting volatility or stressors with a limited scope as already done in software engineering could provide manufacturers with opportunities to improve their systems. For example, deliberately using raw material with poorer quality for a limited amount of workpieces could be a way to learn in which process steps problems occur in case of material quality variations and how they can be mitigated/compensated in subsequent process steps. Creating optionality, e.g., by learning how to achieve the desired product properties from different raw materials provides another potential opportunity to become antifragile. When there is a shortage of the preferred raw material, a manufacturer that is still able to manufacture the products from another material, might be able to charge higher prices.

However, challenges remain and must be overcome to realize antifragile manufacturing. For example, manufacturing companies typically strive to increase their efficiency. However, optionality and redundancy, which are crucial for antifragility, cause inefficiency. Hence, there is a trade-off between antifragility and efficiency (Derbyshire and Wright 2014; Blečić and Cecchini 2019). To become antifragile (long-term benefits) and also remain competitive (short term), manufacturing companies need tools that take the “price” of antifragility into consideration as well.

The mathematical definition of antifragility via the concept of convexity (see Sect. 3) is intuitive. However, its application in practice comes with challenges. Manufacturers have to determine suitable variables to monitor whether the response of a system to variations is convex. To that end, measurable dependent variables representing gains/losses as well as independent variables that represent the impact of unpredictable events are required. As Blečić and Cecchini (2019) put it, the question to be answered is “antifragility of what to what?”.

The illustration of convexity (respectively antifragility) in Sect. 3 was two-dimensional. In reality, the behavior of complex technical systems depends on the interplay of multiple variables. Hence, real-world problems are high-dimensional. The effect of single variables, e.g., process parameters depends on the values of other variables, for instance material properties. Whether or not variations of a variable will cause a convex, hence antifragile, system response may depend on other variables (which are possibly unknown or not measured). In the presence of interaction effects between variables reliably assessing the antifragility of a system based on a convexity metric becomes challenging—especially when it is infeasible to measure all (relevant) variables. Potentially, reinforcement learning-based solutions, which balance robustness and exploration, may contribute to achieving convex behavior in high-dimensional manufacturing (sub-)systems. This also urges the question of how and at which scale data has to be sampled.

Concluding, the following research questions for future research arise: How can manufacturing companies balance short-term inefficiencies and long-term gains in the design of antifragile technical systems? What makes a variable suitable to monitor the antifragility of technical systems in manufacturing? How can manufacturing companies ensure the convexity of a system response in the presence of interaction effects and high dimensionality?

6 Conclusion

Complex systems are common in manufacturing. Small, local deviations can propagate to unpredictable, critical disturbances in such systems. The existing literature addresses this challenge by developing solutions to avoid negative effects of volatility and shock events through concepts such as resilience or robustness. Taleb coined the term “antifragility” to describe systems, which gain from stressors and therefore go beyond robustness and resilience. He recognizes that antifragile behavior is common in biological systems. Even though antifragility seems to be superior to resilience and robustness, the concept has received little attention in the manufacturing literature so far. Therefore, this article surveyed existing examples of antifragility from the domains of biology, risk management, software engineering and manufacturing itself. Moreover, a framework to design antifragile systems was proposed, intending to serve as guidance for practitioners as well as starting point for future research on the topic. The framework is comprised of three main components: (1) Monitoring and detecting (anti)fragility, (2) increasing robustness and/or resilience and (3) exploiting stressors. Potential building blocks to implement these three components were derived from the antifragility examples of the four, previously surveyed domains. While the domain of software engineering illustrates that antifragility offers advantages in technical, human-made systems, challenges of antifragile manufacturing remain. In particular, future research has to address challenges associated with the trade-off between long-term antifragility and short-term efficiency as well as reliably monitoring (anti)fragility and the “costs” of antifragility.