A novelty-based multi-objective evolutionary algorithm for identifying functional dependencies in complex technical infrastructures from alarm data

In this work, a Multi-Objective Evolutionary Algorithm (MOEA) is developed to identify Functional Dependencies (FDEPs) in Complex Technical Infrastructures (CTIs) from alarm data. The objectives of the search are the maximization of a measure of novelty, which drives the exploration of the solution space avoiding to get trapped in local optima, and of a measure of dependency among alarms, which drives the uncovering of functional dependencies. The main contribution of the work is the direct identification of patterns of dependent alarms; this avoids going through the preliminary step of mining association rules, as typically done by state-of-the-art methods which, however, fail to identify rare functional dependencies due to the need of setting a balanced minimum occurrence threshold. The proposed framework for FDEPs identification is applied to a synthetic alarm database generated by a simulated CTI model and to a real large-scale database of alarms collected at the CTI of CERN (European Organization for Nuclear Research). The obtained results show that the framework enables the thorough exploration of the solution space and captures also rare functional dependencies.


Introduction
The identification of Functional Dependencies (FDEPs) in Complex Technical Infrastructures (CTIs) has gained interest in the last years (Billinton and Allan 1992;Zio 2016;Serio et al. 2018;Rebello et al. 2018;Hickford et al. 2018;Antonello et al. 2019;Cantelmi et al. 2021). Given the CTIs complexity and evolutionary behaviour, the identification of FDEPs by classical methods of system decomposition and logic analysis is quite unattainable (Zio 2016;Rebello et al. 2018).
In small-and medium-scale systems, functionally dependent components or dependent abnormal behaviours are typically identified by analysing the system structure and the functional logic, considering design information and theoretical operation scenarios (Zio 2007).
For CTIs, some works have recently emphasized the importance of the identification of functionally dependent components or abnormal behaviours, which are typically unknown. General guidelines and conceptual definitions have been provided in Zio (2016). In this context, datadriven methods for the identification of FEDPs in CTIs using alarm data have been developed (Serio et al. 2018;Antonello et al. 2019;Antonello et al. 2021a). They are based on the application of the Association Rule Mining (ARM) (Agrawal and Imieliński 1993;Srikant and Agrawal 1996;Witten and Frank 2016) algorithm for scanning the alarm databases and identifying associations among patterns of alarms in the form of "if (antecedent) then (consequent)" rules; from these, the FDEPs are derived. Specifically, Apriori-based algorithms employ a level-wise iterative search mechanism, which scans the database to identify "frequent" patterns, and drives the search for other "frequent" patterns which contain the alarms of those patterns previously identified (Srikant and Agrawal 1996). A pattern is considered only if its frequency of occurrence is larger than a predefined threshold, called minimum support. Once a group of functionally dependent components has been identified, the causal chains of malfunctions can be obtained resorting to the knowledge of operators and experts of the CTI or by applying algorithms ad hoc developed to this aim. For example, a modified version of the quicksort algorithm has been developed in Antonello et al. (2020a) for the identification of the causal sequence of malfunctions from the probabilistic analysis of the temporal sequences of the alarms.
A main challenge in the application of the Apriori-based algorithms to alarm databases is the difficulty of identifying rare FDEPs, which are typically unknown and can be actually the most relevant for CTI vulnerability (Wang et al. 2000;Kim and Yun 2016;Zio 2016;Antonello et al. 2021a). Their identification typically requires the use of a small value for the minimum support threshold, which renders the search computationally unaffordable (Lin and Tseng 2006;Wulandari et al. 2019) and leads to the generation of a very large set of rules, which are not strongly supported and hard to analyse for discovering vulnerabilities in the CTI (Marin et al. 2008;Zhang et al. 2013). As a consequence, a relatively large value of minimum support threshold is employed at the risk of (i) not identifying rare patterns of alarms and (ii) extracting somewhat trivial association rules, which are already known to the CTI operators (Antonello et al. 2021a).
Multi-Objective Evolutionary Algorithms (MOEAs) have been recently proposed to directly identify association rules, also rare ones, eliminating the intermediate step of frequentpattern mining and the related setting a minimum support threshold (Yan et al. 2009;Mukhopadhyay et al. 2014;Badhon et al. 2019). MOEAs are meta-heuristic approaches inspired by the laws of biological evolution, based on operations such as selection, recombination and mutation. The application of MOEAs for rule mining requires to evolve a population of candidate association rules according to properly defined rule metrics (Mukhopadhyay et al. 2014;Badhon et al. 2019). A limitation of the use of MOEAs for rule mining is the tendency of converging toward one or a limited set of optimal solutions, even though ARM applications usually require the identification of all the relevant rules (Martín et al. 2016).
Specific to the context of FDEPs identification in CTIs, an analyst is interested in identifying all the FDEPs influencing the CTI vulnerability (Antonello et al. 2021a). Thus, multiple solutions should be identified during the search and maintained in the population for effectively exploring the solution space and preventing premature convergence to local optima. In Antonello et al. (2020c), a MOEA has been developed for the identification of FDEPs in CTIs, employing the novelty search proposed in Lehman and Stanley (2011) to drive the exploration of the solutions space without being trapped in local optima. The novelty search drives the selective pressure to favour diversification in the population by dynamically rewarding the chromosomes based on their novelty with respect to other chromosomes, instead of rewarding them considering static fitness functions (Gomes et al 2017). While the approach allows the identification of rare FDEPs in CTIs Antonello et al. (2021d), the following issues still need to be resolved: (a) The MOEA tends to identify several rules including "spurious" alarms that have occurred by chance with other alarms, even if they do not belong to real FDEPs. Notice that the identification of patterns with spurious alarms can lead to possible errors when modelling FDEPs and cascading failures (Antonello et al. 2021c). (b) The MOEA identifies several association rules derived by the same FDEP but differing for the combination of alarms in the antecedent and consequent parts. Considering a FDEP involving R alarms, the number of association rules which can be generated is 3 R − 2 R+1 + 1 (Del Jesus et al. 2011). Thus, when large alarm databases are considered, thousands (or tens of thousand) of associations rules are typically generated and, then, have to be post-processed to identify the relevant FDEPs, leading to a very large computational burden.
The present work extends the MOEA proposed in Antonello et al. (2020b) to address the above-mentioned issues. To this aim, the recently proposed metric of dependency (Antonello et al. 2021c), which has been shown to discriminate rules with spurious alarms from rules describing actual FDEPs (Antonello et al. 2021c), is used as fitness function within the MOEA search.
The main contributions of the proposed method are • It allows discovering patterns of dependent alarms, without the preliminary step of identifying association rules; • It allows discovering rare FDEPs and is robust with respect to spurious alarms occurring by chance at the same time of other real alarms; • It incorporates for the first time in the MOEA the metric of dependency proposed in Antonello et al. (2020c); and • It significantly reduces the computational burden required for the identification of rare FDEPs with respect to the Apriori-based algorithms.
The effectiveness of the proposed method is shown by means of its application to (i) an artificial case study, which mimics the complexity of a real CTI, and (ii) a real largescale database of alarms generated by different supervision systems of the CTI of CERN, where a 27-km-perimeter ring particle accelerator composed by thousands of components is located.
The remainder of the paper is organized as follows: Sect. 2 describes the problem setting and the considered alarm database representation. In Sect. 3, the proposed MOEA is described. Section 4 introduces the case studies and discusses the obtained results. Finally, Sect. 5 draws some conclusions.

Alarm data representation
We consider a CTI formed by a large number of components, N c ≫ 1 , and a database containing a large number of alarm messages,N al >> 1 , generated by the CTI during a long period of time [ t 0 , t f ]. The types of alarms associated to the generic j-th component, c j , are M al j , and the total number of types of alarm messages is M al = ∑ N c j=1 M al j . The label a k j j refers to the k j -th type of alarm message associa t e d t o t h e j -t h c o m p o n e n t a n d is the set of all the types of alarm messages. Alarm messages are generated when the monitored signals exceed pre-set thresholds and are stored in the alarm message database (Fig. 1 a). The overall time interval [ t 0 , t f ] is subdivided into Z consecutive small time intervals of the same length Δt = t f −t 0 Z ( Fig. 1 (b)). A Boolean variable, s k j j (z) , is associated to the occurrence of an alarm of type a k j j generated by component c j in the z-th time interval: and the state of the CTI during the generic z-th time interval is represented by the Boolean vector: Finally, the database of alarms (t i , m i ), i = 1, … , N al , is transformed into the Boolean matrix ( Fig. 1 (b)): whose generic z-th row represents the state of the CTI during the z-th time interval. Therefore, T provides a dynamic representation of the CTI state evolution in the time interval [ t 0 , t f ].

Functional dependency (FDEP)
Two components are considered functionally dependent if the operation of one is influenced by the operation of the other (Etesami and Kiyavash 2017). In particular, considering alarm messages, which are typically triggered when components have abnormal behaviours or malfunctions, we assume that there is a FDEP among two components of a CTI, c 1 and c 2 , if a malfunction of component c 1 , revealed by an alarm,a k 1 1 , causes a malfunction of components c 2 , revealed by another alarm,a k 2 2 , or vice versa. This definition of FDEP assumes that the CTI monitoring system can detect all possible component malfunctions by measuring the proper physical quantities. In practice, some physical quantities related to rare or unknown component malfunctions are not monitored. As a consequence, FDEPs containing malfunctions not properly monitored cannot be identified by the analysis of the alarm messages.

A metric for the identification of FDEPs from alarm databases
Considering a generic pattern of R ≤ M al alarms, j , for the sake of notation simplicity, can be assessed using the metric (Antonello et al. 2021c): where is a parameter defined in the interval [0, 1] . The metric is based on the definition of functional dependency according to which the probability of occurrence of a pattern of R functionally dependent alarms, X FDEP = a k 1 j 1 , … , a k R j R , is and, therefore, the ratio is the largest probability of occurrence among the probabilities of occurrence of the alarms of X , is motivated by the need of eliminating spurious alarms from the patterns. It derives from the following considerations: (a) given a pattern of functionally dependent alarms, ⊆ A , the probability of occurrence of any alarm P(a k r j r ) , r = 1, … , R, can be decomposed into the sum of two contributions: where P X FDEP (a k r j r ) and P Ind (a k r j r ) are the probabilities that a k r j r occurs due to the FDEP and due to an event independent of the FDEP, respectively (Mosleh 1991; Zio 2009; O'Connor and Mosleh 2016); (b) the probability of occurrence of a generic alarm a k r j r due to a rare FDEP, P X FDEP (a k r j r ), is expected to be close to the probability of occurrence of the whole pattern involved in the FDEP,P(X FDEP ) ∶P X FDEP (a k r j r ) ≅ P(X FDEP ); (c) the probability of occurrence of the pattern, P(X FDEP ), is expected to be larger than the probability of cooccurrence (by chance) of any alarm a k r j r of the pattern, r = 1, … , R, and a spurious (independent) alarm a k s j s , P X FDEP > P(a k r j r ) • P(a k s j s ).
Therefore, the necessary condition for a generic pattern of alarms,X = a k 1 j 1 , … , a k R j R ⊆ A , to be functionally dependent is that the probability of occurrence of the pattern,P(X) , is greater than a fraction ∈ [0, 1] of the probability of occurrence, P(a k r j r ) , of each alarm,a k r j r , of the pattern (Antonello et al. 2021c): The setting of parameter should consider the trade-off between the desiderata of identifying rare FDEPs, which are among the most interesting for CTI vulnerability analysis (Wang et al. 2000;Lee et al. 2005;Antonello et al. 2021c) and excluding spurious alarms. Considering Eq. 10, the use of a large value would drive the search to discover frequent FDEPs, with the associated risk of not identifying rare FDEPs. On the opposite, some spurious patterns can be identified as actual FDEPs using small values of . In this work, the parameter is set equal to the value of 0.03, which has allowed the identification of rare FDEPs in two different CTIs (Antonello et al. 2020c). Also, the analysis reported in the same work shows that no spurious alarms are identified using values of in the range [0.01; 0.08].
Notice that the probability of occurrence of a generic pattern X = a k 1 j 1 , … , a k R j R can be estimated from the alarm database using its support: where n(X) is the counter of the number of vectors � ⃗ T(z) of characterized by the occurrences of at least all the alarms of the pattern X (i.e. ∀x j ⊂ X, s j (z) = 1 ). Therefore, Eq. 4 becomes

Work objective
The objective of this work is the development of a method for the identification of FDEPs which satisfy the definition given in Sect. 2.3 using the alarm data introduced in Sect. 2.1.

Method
The problem described in Sect. 2 is addressed by developing a MOEA. Section 3.1 describes the encoding-decoding procedure adopted for representing FDEPs by means of chromosomes, Sect. 3.2 introduces the novelty search-based MOEA and Sect. 3.3 illustrates the search objectives, initial population and genetic operators (Sect. 3.3).  Figure 2 gives an example of chromosome decoding.

Novelty Search
The key idea of novelty search is to reward the divergence of a chromosome from those already in the population, instead of only considering the performance as evaluated by the chromosomes fitness functions (Lehman and Stanley 2011).
(9) S(X) = n(X), In practice, the uniqueness of a chromosome with respect to the rest of the population is evaluated introducing a metric of novelty, which measures the sparseness of the search space in correspondence of the chromosome as its average distance to the other chromosomes of the population. For a generic chromosome, ind i , the metric of novelty is where dist is a domain-dependent measure of the distance among chromosomes. The Jaccard distance dist J , which has been shown to be effective in evaluating sparseness in pattern mining and Association Rule Mining applications (Tummala et al. 2018), is used here to evaluate the distance between a pair of chromosomes ind i and ind j where | | refers to the cardinality of the pattern of alarms, i.e. the number of alarms contained in the pattern. Notice that, if the patterns X i and X j are identical, i.e. involve the same pattern of alarms, dist J ind i , ind j is equal to zero, otherwise, if the two patterns do not share any common alarm, dist J ind i , ind j is equal to 1. This metric favours the identification of novel patterns and allows comprehensively exploring the solution space by putting a constant evolutionary pressure on the search, but, at the same time, preserves unique and novel chromosomes (Lehman and Stanley 2011;Gomes et al. 2017).

MOEA algorithm
We use a Genetic Algorithm (GA) due to its straightforward principles, its simplicity of implementation and the fact that it has been already successfully applied to ARM and to other pattern mining problems (Anand et al.  (Deb 2000), which is considered the most effective optimization algorithm for multi-objective rule mining. In this work, the objectives of the MOEA search are the maximization of (1) the novelty measure defined by Eq. (11), (2) the measure of dependency I FDEP and (3) the support metric. Given the inclination of I FDEP to favour rare patterns of alarms, as shown in Antonello et al. (2021c), the use of the support as third fitness function is needed for identifying frequent FDEPs as well. The combined use of these three objectives allows identifying patterns of dependent alarms while deeply exploring the solution space and avoiding premature convergence in local optima. When one-bit genes are employed, the initialization of the population using sub-optimal chromosomes characterized by a limited number of one-bit genes equal to 1 can facilitate the GA convergence, as shown in the context of features selection problems (Baraldi et al. 2016) and association rules identification (Del Jesus et al. 2011;Antonello et al. 2020b). In this work, the initial population of chromosomes is created considering all the possible patterns, X ′ , made of 2 alarms ( |X � | = 2 ) which verify Eq. (8), and therefore, whose metric of dependency,I FDEP X � , is larger than 0. Then, according to Antonello et al. (2020), we select the best N pop chromosomes following the NSGA-II algorithm to set the initial population. This is consistent with the objective of the search, which is the identification of FDEPs, given that by selecting only the patterns which satisfy Eq. (8), we a priori discard the patterns made of 2 alarms which do not belong to

Case studies
The proposed method is applied to a synthetic database of alarms generated by simulating the behaviour of a CTI and to a database of real alarms generated by the technical infrastructure of CERN during one year of operation.

Synthetic alarm database
We consider the alarm database of (Antonello et al. 2021b). It contains the alarm messages generated by the simulation of a CTI formed by N c =300 components, each of which can be in five mutually exclusive and exhaustive states D ∈ {1, 2, 3, 4, 5} corresponding to healthy ( D = 1), partially degraded ( D = 2) , degraded ( D = 4) , very degraded ( D = 4) and failed ( D = 5) conditions. A generic component c j , j = 1, … , N c , performs transitions among the states at exponentially distributed random times. Table 1 reports the constant transition rates among the different states. The alarm a 1 j , j = 1, … , N c , is triggered when component c j performs the transition from D =2 to D =3 and the alarm a 2 j when it performs the transition from D =3 to D=4, whereas all the other transitions do not generate alarms.
We further assume the existence of seven different FDEPs (Table 2) among CTI components. They are originated by the transition from state 2 ('partially degraded') to state 3 ('degraded') of a component, which can cause the transition of an ordered sequence of components from state 4 to state 5. The probability of propagation of the functional dependencies from a component to the successive one of the chain is reported in Table 2. Notice that functional dependencies No 6 and No 7 are the rarest, since their initiation depends on the failure of components characterized by low probabilities of failure.
The proposed MOEA algorithm is applied using an initial population of N pop = 500 binary chromosomes of M al bits. The mutation probability is set to 1∕M al and the crossover probability to 0.8 according to (Anand et al. 2009). The algorithm has been run for 2000 generations, obtaining a final set of 500 patterns in a computational time of 657 s on an Intel core (TM) i7-4790 CPU@ 3.6 GHz, 16 GB RAM. The final population includes several patterns involving the alarms of each of the FDEPs of Table 2. Table 3 gives, for each of the FDEPs of Table 2, the pattern in the final population with the largest I FDEP value. Table 3 reports some examples of patterns of the final population containing groups of alarms which belong to the FDEPs of Table 2. Notice that many patterns only partially describe the FDEPs, i.e. they do not contain all the involved alarms. For example, pattern nos. 2, 3, 4 and 5 contain only 8 of the 10 alarms of FDEP 1. On the other side, it is interesting to observe that the pattern with the largest I FDEP value always contains all the alarms of the corresponding FDEP. This highlights the capability of the metric of dependency to identify the pattern formed by all the alarms of the FDEP, which simplifies the post-processing of the results. Also, the analysis of all the patterns of the final population has shown that none of them contains spurious alarms. The postprocessing procedure for the identification of the FDEPs from the patterns of the final population requires to (1) sort them with respect to I FDEP and (2) eliminate the patterns containing subsets of alarms already contained in patterns with larger I FDEP . Table 4 reports the results of the proposed MOEA on the same database considering different combinations of fitness functions. As expected, when novelty search is not used, the final population converges to a population of patterns describing only few functional dependencies. Also, the use of the metric of dependency I FDEP guarantees that the identified patterns do not contain spurious alarms.
The results obtained by the proposed novelty-based MOEA have been compared with the results of the Apriori-based ARM algorithm proposed in Antonello et al. (2021) and of the MOEA for ARM identification proposed in Antonello et al. (2020). The Apriori-based ARM algorithm performs an exhaustive search among all the possible combinations of alarms but requires to set small values of the minimum support and minimum confidence thresholds (here chosen equal to 5 and 0.6, respectively) to identify all the rare FDEPs; otherwise, with larger values of these thresholds, it would not find them (Antonello et al. 2021b). The MOEA for ARM identification evolves a population of 500 association rules encoded in binary chromosomes of 2 × M al bits and employs the novelty measure (Eq. 9) and the metrics of Interestingness and Length (Pachón Álvarez and Mata Vázquez 2012; Dhaenens and Jourdan 2016) as search objectives. Table 5 reports the obtained results. The Apriori-based ARM algorithm requires a computational effort more than 500 times larger than the approach proposed in this work and produces 2000 different association rules, which must be post-processed to discriminate the rules containing spurious alarms and to identify the rare FDEPs of particular interest for vulnerability analysis. The MOEA for ARM identification finds 500 association rules, which contain all the 7 FDEPs in a computational time slightly larger than the proposed approach. A limitation of this approach is that 15% of the generated association rules contain spurious alarms, which requires the identified rules to be analysed one by one by plant experts in order to distinguish the actual FDEPs.
To conclude, the comparison has shown that the proposed MOEA is able to i) correctly identify the functional dependencies with a reduction of the computational effort with respect to the other two approaches considered; ii) discover rare FDEPs without requiring the setting of a very low value for minimum support; and iii) be robust against spurious alarms.

CERN complex technical infrastructure
The CTI of CERN is composed by several systems working together to support the operation of the LHC, which is the largest existing particle accelerator in the World (Nielsen and Serio 2016). It consists of a 27-km ring of superconducting magnets and infrastructures, extending over the Swiss and French borders and located about 100 m underground.
A database of alarms generated during the period [ t 0 , t f ] = [01 January 2016; 31 December 2016] by three Considering the expected time of propagation of a FDEP, which is influenced by the physical characteristics of the systems and processes involved, the time interval for the analysis is set equal to Δt = 30 min to ensure the identification of all FDEPs while minimizing computational resources and spurious alarms (Antonello et al. 2021). Therefore, the one-year period [01 January 2016; 31 December 2016] is divided into Z = 17,500 time intervals. Setting the mutation probability equal to 1∕M al and the crossover probability equal to 0.8 (Anand et al. 2009), an initial population of N pop = 500 individuals, encoded into chromosomes of M al bits, is evolved for 2000 generations obtaining a final set of 500 patterns of alarms in 641 s on an Intel core (TM) i7-4790 CPU@ 3.6 GHz, 16 GB RAM. The patterns describe FDEPs involving components of different systems, whose failures can cause a local malfunction to propagate across the CTI systems and sub-systems, and originate unexpected cascades of failures over vast geographic areas (Thacker et al. 2017;Antonello et al. 2021b). Table 6 reports a selection of five discovered patterns which have been considered as most representative example of novel and unknown chains of events by CERN experts. The first pattern describes the correlation among malfunctions involving a breaker of the electric system, powering the cryogenic system (EKD202_ SLASH_8U) and malfunctions of two pumps of the cooling and ventilation system (SU_8_UPKA802_AL6, SU_8_UPEA802_AL6). The second pattern describes the associated occurrence of a malfunction in a breaker of the electrical system (EKD204_SLASH_8U) and three different malfunctions in the helium compressors of the cryogenic system (QSCB_8_CSY_C2). Pattern 3 and Pattern 4 describe the propagation of a malfunction triggered by problems in the cryogenic electric system distribution switchboard ('EKD104_SLASH_8HM_I1314', 'EKD107_ SLASH_8HM_I1314'), which propagate and lead to malfunctions of the Cryogenic system helium refrigerator, dryer and compressors ('QSAB_8_QSA_TS3.IST', 'QSCB_8_CSY_C1_SI3.IST', 'QSRB_8_CV003_FS1. IST'). Pattern 5 describes the propagation of a malfunction of the electric system 'EKD210_SLASH_8U_S3S16' to the Cryogenic system helium compressors ('QSCA_8_CSC1_ SI3.IST', 'QSCA_8_CSC1_SI6.IST'). Patterns 2, 3 and 4 can be considered as rare since the corresponding alarms occur in the same time interval only three times in a period of one year, during which a whole of 112591alarms has been generated. Notice that patterns with support smaller than 3 are not identified by the algorithm due to the need of avoiding spurious FDEPs, which satisfy Eq. 10 by chance Antonello et al. (2021c).
An independent expert analysis has confirmed that the involved components are, indeed, part of chains of malfunctions that occurred in 2016. According to the CTI experts, the identification of rare functional dependencies is useful for (i) updating the maintenance plan of the components involved in the functional dependencies, for example, by increasing the frequency of inspection for those components that cause the chains of events with the objective of reducing the probability of their initiation; (ii) upgrading the most critical components; (iii) introducing barriers to contrast the propagation of the chain of events; and (iv) facilitating root cause analysis.
To further analyse the advantages of the proposed evolutionary approach, a traditional Apriori-based algorithm (Antonello et al. 2019) is applied to the same database. In order to be able to identify rare rules, the value of minimum support threshold is now set equal to 3 and the minimum confidence is set equal to 0.6%. The search has produced 1049 association rules in a computational time of 43,959 s on an Intel core (TM) i7-4790 CPU@ 3.6 GHz, 16 GB RAM. Notice that the proposed MOEA allows reducing the computational effort (641 s) with respect to traditional ARM (43,959 s) and does not require the setting of a minimum support.

Conclusions
Functional Dependencies (FDEP) in Complex Technical Infrastructures (CTIs) need to be identified to analyse potential vulnerabilities. This work proposes a MOEA based on novelty search and on a recently proposed metric of dependency for the identification of FDEPs from alarm data. A main novelty with respect to the other state-of-the-art approaches is the direct identification of patterns of dependent alarms, without the preliminary step of identifying association rules. This avoids setting minimum support and minimum confidence thresholds and allows swiftly disclosing also rare FDEPs. The novelty metric drives the search to favour diversification in the results, allows to explore the solution space and avoids to be trapped in local optima. Moreover, the use of the metric of dependency allows to discriminate spurious alarms and, therefore, eliminates the results post-processing step and reduces the computational burden.
An application to a synthetic database of alarms has shown the ability of the proposed MOEA to effectively explore the solution space for identifying all actual (i.e. not spurious) simulated functional dependencies. Comparison with an Apriori-based ARM algorithm and a MOEA for association rules identification shows (i) the ability of the proposed approach to be robust to spurious alarms in the FDEPs, (ii) the low computational effort required by the proposed approach and (iii) the reduction in the number of redundant or not completely identified patterns of FDEPs found by the proposed approach, which would complicate the post-processing of the results.
The application of the proposed algorithm to a large-scale database collected at CERN CTI has allowed identifying patterns of alarms which describe unknown and rare FDEPs in the CTI, which have then been confirmed by CERN experts as indeed responsible of sequences of malfunctions occurred in the past.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.