Keywords

1 Introduction

Railway Interlocking Systems (RIS) implement the controlling logic that regulates train traffic and prevents collisions in stations and railroad crossings. In Italy, RIS were first implemented decades ago with electro-mechanical circuits based on relays. Although this solution is working safely, the adopted technology limits flexibility and makes maintenance difficult. For these reasons, the Italian Railway Network (RFI) is migrating to a new software-based solution.

A tool-supported methodology for the specification, implementation, and verification of interlocking systems has been recently developed in an ongoing project between Fondazione Bruno Kessler (FBK) and RFI [5]. Here, C code is automatically generated starting from standardized requirements written in Controlled Natural Language (CNL) by railway experts. Given the critical nature of the application, the project involves support for formal verification and testing of the produced code against the specifications it was generated from.

A key observation is that the legacy relay-based circuits are considered to be the golden specification of what RIS should implement and guarantee: hence, the formalized requirements that the new software originates from must be checked to be compliant with the old implementation, and the project includes a strategy for validating the migration to the new development.

The comparison between the new software (\(\textrm{SwRIS}\)) and the old circuits (\(\textrm{ReRIS}\)) faces several difficulties. First, \(\textrm{ReRIS}\) were engineered and optimized several years agoFootnote 1, and a formal collection of requirements applicable to a software development process is missing. Moreover, they are available only as hand-written drawings on paper, making the inspection and their understanding even more limited to a handful of people.

Second, \(\textrm{SwRIS}\) define generic types and classes that can be instantiated to generate the code for a given train station by assigning values to parameters. Such parameters define the topology of the station (e.g., the number and orientation of routes, switches, etc.) and their instantiation is called a configuration. Instead, \(\textrm{ReRIS}\) exist only in their concrete form, for specific train stations.

Third, \(\textrm{SwRIS}\) and \(\textrm{ReRIS}\) differ in the set of variables used and, most importantly, in the computational model. While the software is cycle-based, with a discrete interpretation of time, the circuits depend on their intrinsic continuous aspects (flowing of electrical current, delays in capacitors charging, ...).

Finally, a complete equivalence between \(\textrm{ReRIS}\) and \(\textrm{SwRIS}\) is not expected. On one hand, the new software is specified to cover more functionalities than before. On the other hand, \(\textrm{ReRIS}\) ’s safety relies on several normative rules that the railway operators should respect when interacting with the system: many of these rules are now integrated into the controlling logic of the new software, therefore preventing human errors by blocking illicit signals. Hence, it is expected to have different behaviors under certain stimuli, e.g., a scenario that is possible in \(\textrm{ReRIS}\) cannot be simulated in the \(\textrm{SwRIS}\) if it breaks the normative \(\textrm{ReRIS}\) rules. Moreover, it would be important to properly document which assumptions, that were implicit in the \(\textrm{ReRIS}\), have been embedded in the \(\textrm{SwRIS}\).

In this paper, we describe our strategy to address these difficulties when pursuing the goal of testing \(\textrm{SwRIS}\) against test cases extracted from \(\textrm{ReRIS}\).

For making available \(\textrm{ReRIS}\) in a format that is amenable to formal analyses, we leverage the tool Norma  [4] developed for RFI to digitalize and compile in timed transition systems the drawings of the relay-based circuits. We benefit from the Norma ’s functionality that allows the user to analyze a portion of the whole \(\textrm{ReRIS}\)Footnote 2, and inject constraints and assumptions on the open inputs and environment. By combining a part of the \(\textrm{ReRIS}\) with a stub that summarizes in an abstract way the surrounding circuits, we are able to symbolically handle a set of concrete configurations simultaneously.

Facing the comparison of two different computational models, we build on the Abstraction Modulo Stability (AMS) framework [7]. AMS allows the user to isolate the stable states traversed when accomplishing an action, disregarding the internal transient steps needed. Applied in this context, by filtering the stable states, we obtain scenarios that should be shared by the two implementations although they are internally working differently.

Guided by different coverage criteria we extract a set of stable scenarios, i.e., sequences of stable states, from the \(\textrm{ReRIS}\). In contrast with classic model-based testing, where test cases are extracted from the abstract model of the system’s specifications, we derive test cases from a different concrete implementation used as a reference. With a mapping provided by domain experts, every \(\textrm{ReRIS}\) scenario (expressed in terms of electrical variables) is then translated in a \(\textrm{SwRIS}\) test case (expressed in terms of software variables). Due to the generality of the stub added to the \(\textrm{ReRIS}\) transition system, the obtained tests can be instantiated in multiple configurations. After running the tests on the corresponding configured code, we analyze the failing cases and distinguish (1) scenarios that have been deliberately excluded in the \(\textrm{SwRIS}\) (and that must be documented), or (2) scenarios witnessing real bugs in the \(\textrm{SwRIS}\).

We show the benefits of integrating such an activity in the ongoing deployment of the new software. We extract scenarios from the relay-based implementation of a railroad switch and run the tests on the new code for a number of real train stations. The analysis of the failed tests reported more than 10 real bugs originating from errors in the specifications the new model-based development starts with: although compliant with its requirements, the new software was unintentionally behaving differently from the legacy reference implementation. Other failed tests produced documentation about expected differences between \(\textrm{SwRIS}\) and \(\textrm{ReRIS}\) .

The rest of the paper is structured as follows: Sect. 2 gives a high-level description of the legacy and the new development processes; Sect. 3 focuses on the extraction of scenarios from \(\textrm{ReRIS}\) and their translation in tests for \(\textrm{SwRIS}\); Sect. 4 studies related works; Sect. 5 shows the results of including this testing strategy in the development of \(\textrm{SwRIS}\); we conclude in Sect. 6.

2 Operational Setting

Let \( p \) represent a configuration for a train station assigning parameters such as the number of railway switches, the number and disposition of routes passing through the station, etc.

On the left-hand side of Fig. 1, it is described how relay-based RIS had been developed decades ago (see label legacy development). Starting from laws and regulations documented in official books, a human modeler drew the schematics of one train station at a time by combining copies of basic “general-purpose” circuits (e.g., the one for a generic railroad switch) and linking their terminals. Only these final results, named \(\textrm{ReRIS} [ p ]\) in Fig. 1, are now available.

On the right-hand side of Fig. 1, the label new development points at the main phases of the new methodology for the definition of software-based RIS.

Fig. 1.
figure 1

Legacy development of relay-based RIS (on the left), tool set supporting the development of a software-based implementation (on the right).

Aida is the part taking care of the generation of the new code. The new process starts with Functional Requirements Specifications (FRS), written by domain experts in Controlled Natural Language. From these, SysML diagrams (and documentation) are automatically produced: such diagrams model in the form of an Extended Finite State Machine (EFSM) how the main railway entities interact with each other from a general (or abstract) perspective, i.e., without referring to a specific station. Generic C code (\(\textrm{SwRIS} \).c) is automatically generated from the diagrams: a configured version of the logic is obtained by plugging the parameters of a specific configuration \( p \), hence obtaining an executable \(\textrm{SwRIS} [p]\).

The new development in Aida is supported by other tools: Carmen, providing both parametric [12] and software model-checking [18], and Tosca, providing testing functionalities. Interestingly, abstract test cases (.atosca files) are written by the user in Controlled Natural Language, or generated covering the EFSM models. These tests are called abstract since they specify general constraints on the declared variables which may identify multiple configurations. Tosca allows for instantiating an abstract test case to concrete and executable versions (.ctosca) for all the known configurations \( p \) that are consistent with the declarations. An execution environment, including a simulator of the physical yard and a simulator of the operator interface, then executes the test running the corresponding configured logic \(\textrm{SwRIS} [p]\).

Finally, the tool Norma [4, 25] completes the picture: it allows for the digitalization of the \(\textrm{ReRIS} [p]\) and their translation in timed transition systems (in Timed SMV language). This step is fundamental to enable standard analyses, such as simulations and reverse-engineering, on the circuits, otherwise available only as handwritten drawings on paper.

3 Bridging the Gap Between \(\textrm{ReRIS}\) and \(\textrm{SwRIS}\)

In this section, we describe the process of extraction of test cases from \(\textrm{ReRIS}\) corresponding to the red part in Fig. 1. In Sect. 3.1 we show how to collect simulations from a generic transition system according to a given coverage criterion. In Sect. 3.2 we describe how the transition system compiled from a subset of circuits can be combined with a handwritten \( Stub \) for the missing parts, and obtain a model that abstracts multiple configured schemas. In Sect. 3.3 we show how to map a simulation into an abstract test case, and in Sect. 3.4 we discuss the outcomes of the analysis of the failing cases.

Background. We assume familiarity with Satisfiability Modulo Theory (SMT) [6] and Linear Temporal Logic (LTL) [11, 24]. We use capital letters XY for sets of variables, \(X', Y'\) for their next versions, and xy for their interpretations. We abuse the notation and write \(P=Q\) for \(P\leftrightarrow Q\) when P and Q are Boolean variables. Similarly for \(P = p\), where \(p \in 2^P\). We work with timed transition systems \(\mathcal {S} = \langle X, Y, C, I(X), T(X, Y, X') \rangle \), where X are the Boolean and theory state variables, Y are the Boolean input variables, C are the real-valued clock variables, I(X) and \(T(X, Y, X')\) are the initial and discrete transition formulae. We denote with \(\mathrm {\Pi } (\mathcal {S})\) the set of paths of \(\mathcal {S} \).

We work in the context of Abstraction Modulo Stability (AMS) [7], where \(\sigma \) is a stability criterion defined by the user. While the AMS framework comes with a number of suggestions for the stability definition, in this paper, we fix \(\sigma \) as the non-urgency condition: a \(\sigma \)-stable state is a state where time can elapse, while a transient (or urgent) state is forced to move with a discrete transition. Urgent states are often introduced to model causal relations between components: intuitively, they correspond to states traversed when accomplishing a complex – but instantaneous – action. For example, an (instantaneous) relay is an electrical component that immediately closes a remote electrical switch when traversed with current: it follows that the activation of a relay may trigger the activation of several relays in sequence, until all effects are propagated. By choosing \(\sigma \) as the non-urgency condition, all intermediate steps in this chain of activations are seen as transient states (satisfying \(\lnot \sigma \)), leading to the final \(\sigma \)-state. Such a \(\sigma \) definition was chosen by domain experts among the ones suggested in [7]. The use of other more aggressive stability definitions (e.g., considering as transient the states that are traversed in a “short” time as well) is future work.

3.1 Simulations Extractor

We consider the problem of extracting a set of simulationsFootnote 3 from a transition system according to a coverage criterion. The coverage criterion defines a set of test targets, i.e., features to be stressed by at least one test in the test suite. When leveraging model checking for tests extraction, the target to be covered is considered a trap property [17] or a never-claim [15] for which a counter-example is looked for. A counter-example for a trap property is a path of the model that shows how the target under consideration is reached, i.e., covered.

Let \(\mathcal {S} \) be a transition system on X variables and Y inputs, VW be sets of Boolean variables, with \(V \subseteq X\), \(W\subseteq X \mathrel {\cup }Y\). For a coverage criterion \( cov.crit \), let \(\textrm{Targ} ( cov.crit )\) return the corresponding targets, and for a \( t \in \textrm{Targ} ( cov.crit )\), let \(\llbracket t \rrbracket \) return the corresponding LTL trap property. We consider three coverage criteria:

  • \(\textrm{states}(V) \): the targets to cover are the possible assignments to V variables: \(\textrm{Targ} (\textrm{states}(V)) \mathrel {\buildrel \mathrm {.} \over {=}}2^V\), and for each \(v \in \textrm{Targ} (\textrm{states}(V))\), \(\llbracket v \rrbracket \mathrel {\buildrel \mathrm {.} \over {=}}(V = v)\).

  • \(\textrm{trans}(V) \): the targets to cover are the possible transitions on V variables: \(\textrm{Targ} (\textrm{trans}(V)) \mathrel {\buildrel \mathrm {.} \over {=}}2^V \times 2^V\), and for each \((v_1, v_2) \in \textrm{Targ} (\textrm{trans}(V))\), \(\llbracket (v_1, v_2) \rrbracket \mathrel {\buildrel \mathrm {.} \over {=}}(V = v_1 \wedge V' = v_2)\).

  • \(\textrm{trans}_\sigma (V, W) \): the targets to cover are the possible \(\sigma \)-stable transitions on V variables, with W inputs: \(\textrm{Targ} (\textrm{trans}_\sigma (V, W)) \mathrel {\buildrel \mathrm {.} \over {=}}2^V \times 2^W \times 2^V\), and for each \((v_1, w, v_2) \in \textrm{Targ} (\textrm{trans}_\sigma (V, W))\),

    $$ \llbracket (v_1, w, v_2) \rrbracket \mathrel {\buildrel \mathrm {.} \over {=}}(\sigma \wedge V = v_1) \wedge \textrm{G} (W = w) \wedge \textrm{X} \bigl ( (\lnot \sigma ) \textrm{U} (\sigma \wedge V = v_2) \bigr ). $$

    Intuitively, a path covering the trap property \(\llbracket (v_1, w, v_2) \rrbracket \) witnesses that with the reception of only input w a \(\sigma \)-state where \(v_1\) holds moves to a \(\sigma \)-stable state where \(v_2\) holds, possibly passing through a sequence of transient steps.

The test suite extracted from \(\mathcal {S} \) according to \( cov.crit \) is a finite set \(\textrm{TS} (\mathcal {S}, cov.crit ) \subseteq \mathrm {\Pi } (\mathcal {S})\) such that:

$$ \forall t \in \textrm{Targ} ( cov.crit ) \mathrel {.}\exists \pi \in \textrm{TS} (\mathcal {S}, cov.crit ) \mathrel {.}\pi \,\models \, \textrm{F} (\llbracket t \rrbracket ). $$

Namely, the test suite includes the simulations \(\pi \) for the system that show how the candidate targets can be reached. In practice, the path \(\pi \) is obtained as a counter-example of the model-checking query \(\mathcal {S} \,\models \, \lnot \textrm{F} (\llbracket t \rrbracket )\). For the candidate traps that are not covered in \(\textrm{TS} (\mathcal {S}, cov.crit )\), there is a proof of the fact that \(\mathcal {S} \,\models \, \lnot \textrm{F} \llbracket t \rrbracket \), i.e., that they are unreachable.

The understandability and readability of the generated tests are important for the engagement of domain experts in the process. Notably, the coverage criterion of \(\textrm{trans}_\sigma (V, W) \) allows for extracting test cases that are similar to the ones that an engineer would manually write by looking at the circuits. By using as V the variables representing the status of some relays and as W the variables representing the switches connected to them, we obtain scenarios showing that under certain inputs the relays will change configuration, and that under others the relays will stay still. Considering stable transitions, instead of one-step ones, is crucial to visualize the stable effects of a change in the inputs, and to disregard the internal transient steps needed by the relays to propagate their signals. Our method is designed to support this engineers’ standard practice of test case extraction in an automated and complete way.

3.2 Working with a Partial \(\textrm{ReRIS}\) Model

For a parametrization \( p \), let \(\mathcal {S} _ p \) be the entire corresponding \(\textrm{ReRIS}\), which consists of many tables and schematics that may be very large or not yet fully translated in transition systems by Norma. Let \(\mathcal {R} \) be the subset of circuits that are available in the form of a transition system. We consider the composition of \(\mathcal {R} \) with a system \( Stub \), introduced to mock the behaviors of the non-available circuits with respect to \(\mathcal {R} \)’s inputs. Writing an adequate \( Stub \) is a hard task since it should summarize many circuits and let \(\mathcal {R} \) receive only the correct input sequences. In our process, a domain expert is directly involved in this phase, supported by verifying a set of properties as sanity checks on the produced model. The \( Stub \) is also validated by checking the \(\textrm{TS} \) extracted from the model. If a missing (resp, spurious) target was noticed by the expert, they could fix the \( Stub \) implementation by enlarging (resp, refining) its constraints.

The definition of the \( Stub \) attempts to generalize the concrete configuration \( p \) of the schematics where \(\mathcal {R} \) was taken from. It symbolically includes all the parametrizations that might share the \(\mathcal {R} \) part. We denote with \({ p ^{\sharp }} \) such an abstract parametrization, and with \(\mathcal {S} _{ p ^{\sharp }} \) the composition of \(\mathcal {R} \) and \( Stub \).

Example 1

Consider as the original \(\mathcal {S} _ p \) the interlocking logic of Trento supporting 20 routes (named A, B, C...) with 10 railroad switches (named r1, r2, r3...). Assume that the available part \(\mathcal {R} \) is the one representing the switch r7, that is shared by 2 concurrent routes A and B, running left-to-right and right-to-left respectively. The safety logic ensures that switch is booked by one route at a time. Instead of considering a stub only for such A and B, we use a wider \( Stub \), that mimics how any pair of routes interact with a similar switch, by treating symbolically their running direction.

3.3 Mapping a \(\textrm{ReRIS}\) Simulation into an Abstract \(\textrm{SwRIS}\) Test

Since \(\mathcal {S} _{ p ^{\sharp }} \) is made of a generic \( Stub \), its simulations induce abstract test cases, that can be concretized in multiple instantiations. In the middle of Fig. 2, we show the skeleton of a sample test.atosca, with placeholders (in \(\texttt {<..>}\)) to be filled with data read from the simulation \(\pi \). While defining \( Stub \), the domain expert collects the constraints defining the compatible instantiations. The abstract test starts with the declaration of the constraints on the used variables (see the “let” namespace), read from the abstract configuration \({ p ^{\sharp }} \). Based on these constraints, the expert also provides a dictionary where \(\textrm{ReRIS}\) expressions are mapped in \(\textrm{SwRIS}\) expressions. In Fig. 2 we denote such a rewriting with function \(\textrm{map} ()\). This is a key element that allows for translating a \(\pi \in \textrm{TS} \), i.e., a sequence of assignments to \(\textrm{ReRIS}\) variables, into a sequence of test steps. Specifically, each test step corresponds to a unique state of the simulation \(\pi \) and, except for the initialization one, it corresponds to a command (or stimulus), read from the \(\pi \) inputs (see keyword “do”). Each test step \(\texttt{assume}s\) the stub behavior, and \(\texttt{assert}s\) the system state, both read from the corresponding \(\pi \) state: for a state s, i.e., a complete assignment to the \(\mathcal {S} _{ p ^{\sharp }} \) variables, we denote with \(s|_ Stub \) the assignments to \( Stub \) variables only.

Fig. 2.
figure 2

Extraction and analysis of concrete test cases from \(\mathcal {S} _{ p ^{\sharp }} \). \(\sigma \)-stable states of \(\mathcal {S} _{ p ^{\sharp }} \) are colored in gray (i.e., \(s_0, s_2, s_5\) satisfy \(\sigma \), while \(s_1, s_3, s_4\) do not).

As it is discussed in [7], \(\textrm{ReRIS}\) simulations show transient implementation-dependent states that are irrelevant from the perspective of high-level railway functionalities. After an input is received, the system reacts with a chain of internal steps, until stability is accomplished. We believe that in order to obtain a test scenario for a different implementation we have to disregard such internal steps, and consider only the stable ones. For this reason, the mapping of the \(\pi \) states into test steps is limited to the \(\sigma \)-stable, i.e., non-urgent, states (colored in gray in Fig. 2): we disregard the discrete steps that in the simulation are executed at the same time (in a super-dense time domain) and retain all the states where time elapses. In our models, we easily classify the states by checking the value of the \(\delta \) variable (introduced by Timed nuXmv for synchronizing the clocks) that represents the time dwelling in each state.

The obtained sequence of test steps is about stable railway aspects only. Since the \(\textrm{SwRIS}\) execution will go through its implementation-dependent steps as well, we need to take stability into account in the test too, and allow for more internal steps between the stimuli and the assertions. For this, we leverage a Tosca ’s syntactic construct specifying that the assertions should be verified within N cycles (“within 100 cycles”, in Fig. 2).

In the .atosca file we also add traceability information mapping each test step back to the originating state in \(\pi \). Finally, we minimize the resulting set of tests by removing the ones that are syntactically a prefix of another.

3.4 Test Execution

Each abstract test mapped from a \(\pi \in \textrm{TS} (\mathcal {S} _{ p ^{\sharp }}, cov.crit )\) is then instantiated in a set of concrete tests by looking for a concrete \( p \) that is consistent with the variables declarations. Then, each test.ctosca\([ p ]\) is executed on the corresponding code \(\textrm{SwRIS} [ p ]\). If the test passes, then the scenario of \(\pi \) is reproducible in the \(\textrm{SwRIS} [ p ]\). If the test fails, then \(\textrm{ReRIS}\) and \(\textrm{SwRIS}\) react in different ways to the same stimuli. A domain expert analyzes the failing case and decides whether it witnesses a bug of the \(\textrm{SwRIS}\), or whether the inputs received in the \(\textrm{ReRIS}\) simulation are in fact not allowed by the rules that a human operator should follow when controlling the interlocking. The latter case corresponds to a spurious failed test case (the “expected” case in Fig. 2), that can be fixed by refining the environment \( Env \) around \(\mathcal {S} _{ p ^{\sharp }} \) before restarting the procedure. We highlight the importance of this iteration for reverse engineering and documentation purposes. As a matter of fact, there is no proper report of the rules that are assumed by the \(\textrm{ReRIS}\) and are instead implemented in the \(\textrm{SwRIS}\).

4 Related Works

Several techniques exist for developing and verifying railway interlocking systems [9, 14, 20, 21, 23]. For relay-based RIS [1], tools for graphical modeling supporting verification exist [19, 22]. In these works, the circuits are described with discrete formulae between the stable states of the components. In [3], CSP modeling is proposed to also represent transient states, which are fundamental for complete verification. Our work differs because our modeling format is electrically accurate and includes transient states by construction; since in this context we do not focus on verifying the circuits, we afterward formally select the stable states to obtain more significant scenarios in a reverse-engineering perspective. The importance of modeling the surrounding environment is also faced in [2].

Legacy systems migration is considered in [8], where the legacy and the new systems are tested to behave consistently on a common test suite. In [26], the migration of legacy systems to the cloud is validated, similarly to our case, by generating tests from the legacy implementation. The key difference is that in [26], the legacy system is manually abstracted and reverse-engineered into a requirements model, whose paths are then used as test cases. We instead extract simulations from the legacy model and abstract (i.e., remove transient states) in each of them individually, therefore avoiding computing an abstract state machine whose paths may be spurious.

5 Experimental Evaluation

We evaluate the benefits of including our approach in the development process of the new software-based RIS. We started with the (drawing of an) electrical circuit of a railroad switch. A domain expert wrote a generic \( Stub \), able to cover symbolically every pair of routes interacting with the switch, for any running direction. Let \(\mathcal {S} _{ p ^{\sharp }} \) be the resulting transition system compiled by Norma. We extracted 6 sets of simulations of \(\mathcal {S} _{ p ^{\sharp }} \), according to different coverage criteria: the criterion “\(\textrm{states}\) 8” (resp, “\(\textrm{trans}\) 4”) induces the set of simulations covering all the reachable states (resp, transitions) of 8 (resp, 4) relays chosen by the domain expert as significant. The sets induced by the criteria “\(\textrm{trans}_\sigma \)-A”, “\(\textrm{trans}_\sigma \)-B”, “\(\textrm{trans}_\sigma \)-C”, “\(\textrm{trans}_\sigma \)-D” are considered by the domain expert the most significant scenarios, because they show every way in which a specific relay changes (or does not change) position according to the controlling signals. The extraction of the simulations is performed with the model-checker nuXmv  [10], leveraging ic3ia algorithm [13]. We mapped each simulation in an abstract test case (891 in total) which we instantiated on 4 different railway stations (S1, S2, S3, S4), and on two different railroad switches each (c1, c2), hence obtaining 7128 concrete tests.

Table 1. Sizes of the test suites before (\(\textrm{TS} \)) and after refining the environment (\(\textrm{TS} _ Env \)); \(\texttt{FAIL}\) percentages on original version of \(\textrm{SwRIS}\) (v0.1) and the one (v0.2) obtained by fixing the signalled bugs; \(\texttt{FAIL}\) cases for each concrete test instantiation.

In the first three columns of Table 1, we show the size of the test suites and the \(\texttt{FAIL}\) percentage obtained on the current version of the code, denoted with v0.1. It’s important to observe that this version already went through other tests and verifications foreseen in the new development process. By inspecting some of the failing tests, a railway expert recognized expected failures (as in Fig. 2), where the scenario was breaking the normative rules that a railway operator should follow. We documented such cases and consequently refined the model by adding an environment \( Env \) in the transition system. By restarting the procedure on \(\mathcal {S} _{ p ^{\sharp }} \!+\! Env \), we extracted new test suites (denoted as \(\textrm{TS} _ Env \) in Table 1) for each criterion (totaling 5832 concrete tests), with more significant scenarios.

In the remaining failures, the domain expert found more than 10 actual bug sources in the software logics, originating from errors in the Functional Requirements Specifications (FRS, the entry point of the new development, as in Fig. 1). Based on the produced bug report, the FRS were fixed and a new version of the code (denoted with v0.2) was generated. The “v0.2 \(\texttt{FAIL}\) %” column of Table 1 shows that the \(\texttt{FAIL}\) percentage actually decreases.

The remaining failures are currently under analysis and may lead to further refinements of the environment (and more documentation), or to new bug reports. The last 8 columns of Table 1 show how the remaining \(\texttt{FAIL}\) cases are divided in different instantiations of the abstract test cases (four stations, 2 concrete switches each). We see that a \(\textrm{ReRIS}\) simulation may induce tests failing only in some configurations of the parameters. This highlights the importance of generalizing the \( Stub \) and instantiating in multiple stations the abstract tests.

Between the considered \( cov.crit \), “\(\textrm{states}\) 8” and “\(\textrm{trans}\) 4” induce longer simulations (50 states on average), while the “\(\textrm{trans}_\sigma \)-” criteria induce shorter ones (30 states on average): the latter are more significant and understandable for the domain expert, who chose to prioritize the analysis of these test suites.

We also evaluated the coverage on \(\textrm{SwRIS}\) (in terms of lines of code) with the execution of our tests. As expected, a good coverage level is limited to the code of the railroad switch (our system-under-test). Notably, a domain expert manually analyzed some of the uncovered lines and confirmed that they are related to functionalities added in the new implementation that did not exist in the legacy \(\textrm{ReRIS}\). We plan to automatize this process and produce additional documentation on the migration.

6 Conclusions

We described our contribution within an ongoing industrial collaboration between Fondazione Bruno Kessler and the Italian Railway Network (RFI), currently migrating from analog to software-based railway interlocking systems. We applied test case generation via model-checking by using another concrete implementation as a reference model; we avoid building an abstract model for the legacy implementation, rather, we abstract each simulation into a scenario that is significant when comparing two different computational models (analog vs cycle-based), by skipping implementation-dependent transient states.

The approach we proposed in this paper is now integrated into the ongoing process of development and validation of the new code. Although the latter was already subject to substantial scrutiny in terms of other properties, this new methodology targets the comparison with the legacy functionalities. Our approach proved to be effective already from its first application on the railroad switch circuits, as it allowed RFI engineers (i.e., not formal-methods experts) to find more than 10 real bugs in the new software. As an additional feature, our pipeline supports the documentation of the expected differences between the two implementations due to changes in regulations.

We plan to apply the procedure to other circuits and analyze the bug reports on the newly developed versions of the software.