1 Introduction

Nowadays, computer systems are responsible for every aspect of our lives. Ranging from the life-sciences to the transportation field, they currently cover a central role to assure people and environment integrity. Nevertheless, the deployment of digital devices in safety-critical domains remains a very complex task, since this requires the adoption of rigorous and expensive quality assurance processes from international standards, such as those from the International Electrotechnical Commission (IEC) and European Committee for Electrotechnical Standardization (CENELEC) (International Electrotechnical Commission, 2010; International Organization for Standardization, 2011; Comité européen de normalisation en électronique et en électrotechnique, 1999).

For what pertains to software, engineers of each field have developed, during the years, many safe versions of the waterfall process model (Sommerville, 2015). That is exemplified by the V-Model, proposed by CENELEC (European Committee for Electrotechnical Standardization) EN50128 standard (Comité européen de normalisation en électronique et en électrotechnique, 2011).

In order to correctly apply a waterfall-based process, a company needs to: (i) work with stable user requirements; (ii) have in-depth knowledge of the delivered product, and (iii) establish clear interfaces between the various software and hardware subsystems (Sommerville, 2015; Boehm & Turner, 2003; Cockburn, 2006). These hypotheses are not difficult to guarantee when working with known products, but they will become hard to follow within the conditions of an innovation process. Indeed, the innovation of a safe system is very complicated, since uncertain safety risks could arise from new technologies. Moreover, critical systems could be made of many parts, requiring the coordination and work of many experts.

In that situation, engineers have to work together, in order to explore the side effects of every new solution before starting to use it. Here an agile methodology could help companies to improve team communication and efficiency during the entire research project. However, agility still encounters obstacles when referring to secure and safe systems, where the adoption of automatic tools to design, develop, test and continuously integrate components is conflicting with the need of copying with strict standards, that mainly refer to traditional waterfall models.

Indeed, some pioneering works concerning the adoption of agile methods in the safety-critical domain concluded, hastily, that the former and the latter are incompatible (Cawley et al., 2010; Hajou et al., 2014). Nevertheless, more recent results questioned this conclusion, identifying four main challenges inherently arising while adopting agile methods in the safety-critical context, i.e.: (i) documentation, since it is not essential in agile software development; (ii) requirements, since traditional safety-critical development processes discourage requirement changes (Notander et al., 2013); (iii) project life-cycle, since safety-critical projects are developed neither iteratively nor incrementally (Ge et al., 2010), and (iv) testing, which, in the safety-critical context, is done only at final stages of the development (McCaffery et al., 2016).

As for the adoption of agile methodologies in the development of secure systems, authors in (Casola et al., 2020) introduced a novel methodology to extend the DevOps approach towards secure systems. They mainly implemented an automated security-by-design approach that can be easily mapped with the well-known Scrum framework (Schwaber & Sutherland, 2020).

In this paper, we propose an extension of the Scrum agile methodology, namely Scrum for Safety (S4S), to guide and help research & development (R&D) groups involved in the design of safe solutions for the railway domain. In this domain, software development is not linear as expected, yet the output of a regular mediation between multiple stakeholders, heterogeneous complex technologies and mandatory regulations to be satisfied. Therefore, an agile-based process instead of a classical waterfall has been developed.

The reminder of this paper is structured as follows. In Sect. 2 we review the current state of the art. In Sect. 3 we present our novel proposal for the adoption of Scrum in the development of safe software. In Sect. 4 we illustrate the applicability of our methodology over a real case-study from the railway domain. Finally, in Sect. 5 we will discuss some conclusions and future work.

2 Related works

In order to identify possible related work, we started analyzing the application of agile methodologies in different domains, critical for security and safety. We located many scientific works with a focus on different open issues, and we also located some (few) approaches towards the adoption of the well-known Scrum methodology in safety critical domains.

2.1 Agile adoption in critical domains

Among the open issues identified in the scientific literature, documentation is considered one of the major barrier hindering the adoption of agile methods in the safety-critical context (McHugh et al., 2012; Misra et al., 2010; Stålhane et al., 2012).

Indeed, regulatory agencies responsible for inspecting of software do not agree to less documentation of software requirements and design (Vogel, 2006), since this makes hard to determine the quality of systems (Wolff, 2012). The scientific literature, however, empirically proved that the documentation is not a problem, since agile processes strive to deliver what is requested by the customer, which includes evidence to prove the safety of critical software (Gary et al., 2011; McHugh et al., 2012). In addition, in order to keep that evidence at a minimum, the purpose of the documentation itself must be considered, determining which knowledge needs to be expressed (Grenning, 2001; Misra et al., 2010).

Testing is yet another aspect that seriously limits the adoption of agile methods in the safety critical context, since incorporating verification techniques is challenging, and these activities are work intensive (Paige et al., 2008). Indeed, while test-driven development is widely used in the agile community (Nerur et al., 2005), in safety-critical software development, instead, testing is done only in the final phases (McCaffery et al., 2016). Moreover, some standards, such as the CENELEC EN50128 (Comité européen de normalisation en électronique et en électrotechnique, 2011), mandate that the testers must be responsible for specifying the test and that developers and testers must be separate persons (Jonsson et al., 2012). This is in contrast with test-driven development, which requires developers to write the tests themselves. The literature reports examples of safety-critical software development in which test-first processes have been implemented successfully (Drobka et al., 2004; Górski & Łukasiewicz, 2013). In (VanderLeest & Buter, 2009), for instance, authors proposed a test-aware development process: test developers are involved in the development of requirements, in order to ensure that the latter are testable at the needed level. This allows mitigating the risk of requirement changes due to untestable requirements.

2.2 Scrum-based methodologies

Recently, the Scrum framework has been profitably used in a variety of contexts, including military (Messina et al., 2016; Benedicenti et al., 2016), railway (Myklebust et al., 2015) and aerospace (Smith et al., 2019). Furthermore, some recent works employed the framework to formalize better-articulated methodologies, such as R-Scrum (Fitzgerald et al., 2013) and Safe-Scrum (Hanssen et al., 2018).

The first is a comprehensive description of how the Ireland company QUMAS Inc adopted the Scrum framework to develop software for the pharmaceutical domain. The work represents a revelatory case study for companies interested in adopting an agile quality management process in regulated fields. However, the paper did not address how to fit the proposed techniques in the context of large projects, which are composed of different subsystems, and in the presence of other parallel processes deputed to analyze and control safety (Comité européen de normalisation en électronique et en électrotechnique, 1999). In addition, there is no discussion of how the process documents and artifacts assist the final certification process.

The second work represents the result of a theoretical work in which Scrum has been brought into compliance with various standards in the critical systems world, including the IEC 61508 (International Electrotechnical Commission, 2010) and the CENELEC EN50128 (Comité européen de normalisation en électronique et en électrotechnique, 2011). Although the entire work presented a strong relationship with some industrial standards, it must be corroborated with practical demonstrations about its main benefits. Moreover, given the difficulties of testing independence and documentation management, the authors proposed the adoption of test-driven development and the presence of a dedicated team for software documentation, which may be not very efficient and applicable for vital systems.

Towards the related works, our purpose is to define a novel agile process in order to:

  1. 1.

    discuss how agility can help the innovation of safety-critical products, in order to improve the efficiency and safety of research projects;

  2. 2.

    merge the core concepts of R-Scrum and SafeScrum, including a more consistent proposal for documentation and agile quality management;

  3. 3.

    expand the current empirical evidence on the possibility and advantages of using agility in safety-critical domains, with a real-world case study extracted from the railway domain.

3 Scrum for safety

Scrum for Safety (S4S) aims to guide and help research & development groups involved in the exploration of effective, efficient and safe solutions in the railway domain. Nevertheless, it can be adopted in every domain in which safety must be considered. Indeed, in the R&D context, software development is not a linear and graceful activity, rather the output of a regular mediation between multiple actors, various complex technologies and strict regulations to satisfy. Therefore, an agile-based process allows engineers to rapidly explore and validate every single possibility before taking any crucial decision.

In this Section, we provide the reader with full details concerning our proposed methodology, including its context, principles, roles, workflow, and metrics.

3.1 Context

The S4S context is not limited to a single software project. Actually, in the safety-critical world, a system is made of various hardware and software components. In the railway domain, for instance, an entire signalling infrastructure is typically broken into small parts. Each of these is then developed following the CENELEC V-Model (Comité européen de normalisation en électronique et en électrotechnique, 1999, 2011, 2003). Therefore, S4S was constructed in order to work inside the context of a global system project, as described in Fig. 1.

As shown, the software life-cycle is embedded into the global system process, with a set of well established relationships between the two. This separation has the first advantage of helping the Scrum team to adapt and change his approach, with no side effects on other system parts. The second advantage, as defined in SafeScrum (Myklebust et al., 2015), is that it creates the fundamental isolation of non-agile activities, such as the RAMSFootnote 1 life-cycle (Comité européen de normalisation en électronique et en électrotechnique, 1999), at the system level.

Fig. 1
figure 1

S4S integration with the global system life-cycle

Regarding the inputs needed by the S4S agile process, the two most important artifacts are the System Requirements Specification (SRS) and the Hazard Analysis Report. The first contains the definition of the user requirements that have been allocated to the software. The second, instead, includes the specification of the safety countermeasures that developers have to consider during the design and implementation of their functions. These two documents constitute the core set of requirements used to define the Scrum Product Backlog.

The Product Backlog is the expression of what the developers have to implement in order to fulfil an essential set of system functions. Moreover, for its specification, team members can employ dedicated safety stories in order to maintain a clear trace of safety requirements life-cycle.

For what pertains to the outputs of the S4S agile process, there is the need for developers to return regular feedbacks to system engineers if some errors were discovered inside the defined requirements. Feedbacks are the most valuable output of the agile process, since they represent an actual validation of user requirements.

3.2 Principles

S4S embraces all the agile core principles (Beck et al., 2001) and Scrum values (Schwaber & Sutherland, 2020), yet extends them with some new objectives, which are derived from critical software requirements. Results of this extension are the following eight principles, which constitutes the main base of the development process:

  1. 1.

    Cover all the alternatives before making some decision: all architectural or detailed design decisions must be preceded by an in-depth evaluation of all possible options. Valuable solutions could be cut-off if one restricts and focus its attention on a single alternative;

  2. 2.

    Experiment and fail frequently: the best way to evaluate the effect of a single design choice is to try it, and potentially fail. System modeling and simulation are good tools, but developers have also to implement and test their solutions on target architectures to prove their real effectiveness;

  3. 3.

    Deliver software continuously to the users: as soon as the research starts to produce partially implemented software architectures, principal financiers have to begin a review of the achieved results. This is one of the core values of agile where the stakeholders have a central role in the development process (Beck et al., 2001);

  4. 4.

    Integrate software continuously with other actors: large and complex projects are frequently broken into small and much more controllable ones. In that case, coordination and periodic integration activities between the various development teams could anticipate a great number of subsequent incompatibilities among the developed subsystems;

  5. 5.

    Continuously Verify & Validate: in order to release software in critical environments, where a single failure could potentially cause loss of human lives, environmental pollution or huge economic losses, each developed subsystem must be meticulously verified and validated. Verification & Validation (V&V) are two core activities for Software Quality Assurance (SQA): they must be done in order to increase our trust that the developed product satisfies its specification, and it is adequate to resolve the original research purpose. Furthermore, V&V activities have to be applied continuously, possibly when the research work reaches a new development step, to rapidly identify and manage deviations that could affect critical properties, such as security and safety;

  6. 6.

    Make your work traceable: a trace of all the done work for the currently developed software must be always present and available to developers. In that way, all the principal design decision and software architecture are visible, and they determine the basis for the subsequent work. Moreover, traceability constitutes the only way to prove to an independent Assessor how the risks related to software functionalities were identified and mitigated. Otherwise, it is impossible to observe and appreciate the fundamental design choices for the final product;

  7. 7.

    Let your approach be risk-based: finding, covering and monitoring risks related to software functionalities constitutes a vital activity for critical software. A risk-based approach is much more effective than a “no one”, since it explicitly identifies and addresses all the software failures which could cause tangible damages to people or to the environment;

  8. 8.

    Don’t break or lose the already achieved quality: as the work proceeds, and the SQA techniques improve its internal and external quality, it becomes essential to preserve it. In particular, new requested changes to the software must not conflict or undermine the already done risk management activities and software implemented functions.

3.3 Roles

Table 1 reports the professional figures described in CENELEC standards. It is noticeable that the Scrum basic roles of Product Owner, Scrum Master and Development Team (Rubin, 2012) fit perfectly with figures related to software development, i.e., Requirements Managers, Designers and Implementers, while others not frequently mentioned roles—like Managers—correspond for responsibilities to Project Managers.

Despite this evident matching, the Scrum framework does not provide any professional independent figure related to the Software Quality Assurance (SQA) process (Rubin, 2012). Therefore, since the importance that SQA has for critical software, Verifiers, Validators and Assessors roles have to be introduced, in order to be able to check for software technical quality, and its adequacy to the original problem. In particular, S4S extends the set of Scrum roles with these figures, which are strictly related to critical software development, while adapting their activity in an agile perspective. Thus, as we will observe after, V&V and Assessment activities are potentially performed at the end of each iteration, providing rapid identification of possible compliance and safety issues.

Another fundamental point regards the independence of Integrators and Testers from software implementation. Although in the CENELEC standards domain, tests written by programmers could be accepted by the Verifier whether they are adequate and completely specified (Comité européen de normalisation en électronique et en électrotechnique, 2011), verification independence is considered crucial for safety-critical products. Therefore, S4S prescribes that during each Scrum sprint, the same developer cannot cover both implementation and verification activities. In that way, design considerations will not influence Testers or Integrators judgment.

Table 1 CENELEC EN50128:2011 defined roles (from https://link.springer.com/chapter/10.1007%2F978-3-030-85347-1_10)

3.4 Workflow

3.4.1 Safe-sprints

The fundamental concept of the S4S workflow is the Safe-Sprint. As described in Fig. 2, a Safe-Sprint is a time-boxed iteration that produces a new software increment verified and validated against the applicable standards. It effectively defines the way used by the Scrum team to check and monitor software quality. Nevertheless, the concept is not new and was first introduced in R-Scrum (Fitzgerald et al., 2013).

Fig. 2
figure 2

S4S Workflow

In the R-Scrum agile process, what makes a sprint safe, i.e., adequate for safety-critical software, are three key factors. The first one is called Sprint hardening, which expresses that the output of every iteration has to contain the needed user documentation and proof-of-conformance required for software assessment. Hardening means, therefore, releasing a new validated software version at the end of each sprint.

The second factor is Continuous compliance, which summarizes that the product has to be continuously subjected to V&V activities. Continuous compliance is essential for the Scrum team to monitor and assure the technical quality of the software during the incremental flow of the agile process. In particular, the regular application of quality assurance techniques helps to discover and fix critical bugs rapidly.

The third factor is Living traceability, which defines the possibility at any time to generate a clear trace about the user requirements implementation. Living traceability has the purpose of making the entire development process more accessible to external people. Therefore, it covers an important role in certification.

Thereafter, the concept of Safe-Sprint evolved inside the context of Safe-Scrum project (Hanssen et al., 2018). Here it was extended to work with complex safety-critical systems made of many parts and comply with various sector standards. Such of these are the IEC 61508 (International Electrotechnical Commission, 2010) and the EN 50128 (Comité européen de normalisation en électronique et en électrotechnique, 2011).

With S4S, our objective is to inherit and further extend the Safe-Sprint concept by (i) enforcing the independence between developers and testers during the iteration, and (ii) defining a practical approach for documentation management. We consider, in fact, these aspects two fundamental pillars of critical software, for which more discussion is needed.

In the next subsection, we provide the reader with a depth view of the structure of our Safe-Sprint.

3.4.2 Structure of a safe-sprint

The entire workflow of our Safe-Sprint is described in Fig. 3. As shown, the first step of every iteration is the Sprint Planning. Here, the scope remains the same as in Scrum, i.e., to select a group of manageable and most important stories (the Sprint Backlog), starting from a clear and feasible objective (the Sprint Goal) (Rubin, 2012). However, team members must distribute the work with the same philosophy of the CENELEC EN50128 V-Model (Comité européen de normalisation en électronique et en électrotechnique, 2011). No one can cover both design and testing activities; otherwise, verification could fail its critical objective. Therefore, in S4S, those who are responsible for testing and integration cannot participate in the software design process. Practically, this avoids tests written by programmers, considering it crucial to not restrict verification only to the software design expected scenarios.

Fig. 3
figure 3

S4S Safe-Sprint Structure

Sprint Planning is then followed by the Sprint Implementation, which represents the research practical part beginning. During that phase, design developers have to experiment with their ideas in order to understand all possible consequences. The effects of each design choice have to be clear in order to make effective decisions in the future.

Another fundamental point of Sprint Implementation regards traceability. Although the team works following an adaptive and flexible process, it must be capable to produce a complete trace of the software development process. Indeed, this constitutes an important step to describe to an independent assessor, how the software was constructed. Thus, through the implementation, every produced artifact must be linked to its relative Product Backlog story.

Next to the Sprint Implementation, there is the Acceptance Testing and Software Verification: test developers check whether all the selected user stories are correctly implemented and software behaviors as expected. Here, other techniques, such as static analysis and formal methods, may be employed by testers if they retain them useful. Even more, for Safety Integrity Level (SIL) 3 & 4 software systems, combining testing with one of these techniques is strictly required.

At the end of Software Verification, the development of the Sprint Backlog will be completed. Nevertheless, for safety-critical software research projects, there are other fundamental needed steps.

Firstly, although stories added to the Sprint Goal have been tracked and verified, there is no confirmation that they did not adversely impact the already available features and covered hazards. Thus, a subsequent Regression Testing step becomes essential in order to preserve the already achieved technical quality. That activity, differently from the expectations, is straightforward to realize in the context of an iterative development process. Indeed, already planned tests may be reused without the need of producing new code.

Secondly, integration developers have to check for software behavior on the target hardware. Typically, such as for the railway domain, the product has to be distributed on custom boards and operative systems. It is not rare that the final target is an embedded industrial architecture with a limited set of computational resources. In these conditions, integrators have the crucial purpose of increasing the team trust about technical problems, that could arise when the software is released in its real environment. Therefore, there is the need for a Hardware Integration phase.

Thirdly, is important to provide a Traceability phase during which all the development team has to contribute to update the software specification documents with the newly discovered observations. Indeed, the research purpose is not to produce a vendible product, but guidelines to construct it. In that way, documents could be employed multiple times to create an engineered version of the final product and prove its functional safety. By default, S4S adopts the set of documents described in the CENELEC standards for railway signaling. Anyway, needed documents can be adapted depending on the developed product.

Finally, there is the last essential phase of Quality Assurance, which implements the Continuous Verification & Validation concept described in the literature. Letting an independent group of Verifiers, Validator, and Assessors, to check the produced increment against the software requirements specification, and the applicable standards allows identifying and rapidly correct any critical violation. Ideally, the output of each Sprint could be potentially released to the final user. However, the reality is quite different. In most cases, research groups are small, and SQA experts, if available, can review the work on a timeline of months, not weeks. Therefore, in S4S this phase could be also planned after a group of Sprints, and possibly exploiting automatic SQA tools and external experts, if any.

After the Quality Assurance phase ends, the Safe-Sprint terminates with the Scrum known activities of Review and Retrospective.

3.4.3 Documentation management

In this subsection, we want to explain more deeply how documentation management works in S4S.

Documentation, referring to this term to formal evidence that a standard requires for software assessment, is a description of both the development life-cycle and the implemented product. An assessor uses this evidence to inspect if the software released by a company could be considered safe for its intended application (Comité européen de normalisation en électronique et en électrotechnique, 2011).

Documentation management, instead, refers to the process an organization employs to manage that evidence.

One possible way of doing that is to treat documentation as the main driver of the software process. Waterfall-based development processes adopt this approach, linking subsequent phases with several requirements or design specifications. The development process has, in that case, a linear organization, with all its main steps organized in a strict sequence (Sommerville, 2015). Here, documents become information drivers between adjacent process steps.

However, in the context of an innovation process, we will not have a linear evolution since requirements have to be progressively refined. As a consequence, changes in one process phase require reworking all related documents (Sommerville, 2015; Boehm & Turner, 2003).

In S4S, instead, the Scrum team sees documentation only as an output. Proof-of-conformance evidence requested by an assessor is, in that case, the result of a complex research activity, which has conducted the team to the requirements and architecture that software must have for its intended application. Product documentation and software process activity descriptions are refined and updated during each Safe-Sprint, as described in the previous subsection.

Managing documentation as output has three main advantages. The first one is that it can reduce reworking since specifications are updated when engineers have validated their design solutions. No effort has to be spent on documentation of untestable and unfeasible requirements.

The second advantage regards a reduced complexity in writing documents. In particular, during a single iteration, the development team has not to specify and trace the entire product but only its new or modified parts.

Finally, the third advantage is that some documents can be automatically extracted from software models or internal code documentation. For instance, with tools such as DoxygenFootnote 2, one can produce a Software Component Design Specification (SCDS) as requested by the CENELEC EN50128.

Nevertheless, there are also some other things to consider.

At first, in order to not reduce the team speed during the process, it is essential to clarify the purpose and the content of each specification. Defining templates for each type of document can help the team to focus only on required information.

Secondly, for each documentation management tool being used, a set of guidelines and policies for developers have to be defined. Indeed, each tool must be effectively configured and used to produce high-quality results.

Therefore, a clear set of templates, guidelines and policies have to be identified in order to produce good documentation.

3.5 Metrics

In this subsection, we want to analyze and propose some metrics that readers can use to evaluate and improve an application of S4S.

In particular, we identified three categories of metrics that will assist practitioners to evaluate process efficiency, safety and traceability.

Indeed, the primary purpose of S4S is to help researchers make safe, transparent, and sustainable decisions which is a fundamental requirement for safety-critical research projects.

In the next paragraph, we describe in-depth categories and the importance of related metrics.

3.5.1 Efficiency metrics

The efficiency of the software development process, i.e. the sustainable use of available resources (including humans and time), could be determined by different factors, which depend on the process structure and organization. In the case of S4S, the main concerns for efficiency are the cost of V&V activities requested by regulatory agencies and the impact of requirements or design changes that could arise during a research project.

V&V cost includes time, human resources, and tools needed to verify and validate the output of each iteration. We considered it since its related tasks constitute the most prominent part of the Safe-Sprint. Thus, if the V&V cost is too high and not sustainable, developers cannot achieve an economical and efficient implementation of S4S.

Requirements or design changes cost is, instead, given by the reworking, i.e. the cost to revise all the documents and the code already developed. The importance of this parameter comes from the conception of S4S. In fact, we introduced the methodology to help researchers with complex problems, where the solution is not known a priori. Therefore, changing something has to be light and efficient.

Given the considerations above, we defined the two following metrics:

  • the percentage of time spent for software V&V, since it determines the final cost of human resources and adopted tools;

  • the quantity of resources to revise in case of changes (to requirements or design), which is proportional to the reworking cost.

3.5.2 Safety metrics

We consider “safe” a software development process that provides instruments for developers to measure and control the impact of their decisions. The importance of process safety comes from the fact that fixing software bugs too late could cause different problems, such as budget overrun, user dissatisfaction, and in the worst case, project failure.

About S4S, we structured the Safe-Sprint in order to give developers enough space to validate their solutions. In particular, we identified continuous testing, regression testing and the periodic application of V&V analysis as the primary means to control software technical quality.

From these considerations, we selected the subsequent metrics:

  • the number of discovered issues for each software version, since the knowledge of bugs, is essential to determine the product quality;

  • the testing code coverage data, since the team must have a complete vision of software issues.

3.5.3 Transparency metrics

A development process is transparent if developers can trace and show how requirements were implemented and verified. In critical fields, transparency is a fundamental property since, for certification purposes, products have to be assessed by an independent agency.

From a practical point of view, we can implement transparency by documenting the process with pieces of information and documents that explain how we refined software requirements during each step. Moreover, nowadays, documents can be stored and managed using requirements management tools such as Rational IBM DOORSFootnote 3.

Considering S4S, from one side, we worked to make the production of documentation required by CENELEC certification bodies sustainable. On the other side, we tried to make the overall process transparent and accessible to external people.

From the above considerations, we identified the following set of metrics to evaluate transparency:

  • process coverage, since an external certification body has to understand all the mechanisms behind our development process;

  • the adoption of requirements management tools or platforms to view and update development process documentation efficiently;

  • the adherence between the produced documents and the applicable standards, since standards constitute the main base for certification bodies.

4 A case study

In order to validate the core principles and ideas behind the presented methodology, we had the chance to adopt it within an industrial research project founded by Rete Ferroviaria Italiana SpAFootnote 4. The project regards the implementation of a message-oriented Middleware to support safe and reliable communications among the nodes of a railway signalling system. In terms of complexity, the software exhibits different challenges, which could be summarized in:

  • supporting all the pre-existent RFI hardware platforms and communication protocols;

  • assuring that a communication fault could not lead to a catastrophic failure;

  • fulfilling the real-time communication constraints and performance requirements of the RFI signalling infrastructure;

  • satisfying the current European and National standards for railway signalling systems;

Given the complexity of the project, it represented an ideal case study to validate the proposed S4S agile process.

In the next sections, we illustrate the details about the design of the experiment, including the development team and the initialization of S4S. Then, we discuss the analysis conducted in retrospective meetings based on metrics described in Subsec. 3.5.

4.1 Design of experiment for S4S application

4.1.1 Scrum team composition

As described in Subsec. 3.3, S4S includes professional figures related to software development, verification and validation activities. In particular, the scrum team, which is responsible to build high-quality software, must include people with knowledge in requirements management, software design, software programming, and testing. Whereas the verifier, validator and assessor figures, who are external personnel from the scrum team, have to supervise the correctness of testing activities, requirements validity, and project standard-compliance.

In the context of this case study, the scrum team was composed by young researchers from the University, while RFI SpA provided the necessary support for V&V activities, with a team with strong experience in software safety validation and assessment tasks.

In particular, regarding the other scrum roles, the product owner was an embedded system engineer who has participated in and directed other research projects related to critical environments in the past. Tester and integrator roles, instead, were covered by a single researcher with experience in functional and white-box testing approaches. The scrum master role was assigned to a software engineer, who had the necessary knowledge on agile software development and the Scrum framework. Then, two junior developers were responsible for software design and development. In particular, since the group was made by experienced developers and novices, we have also decided to employ the latter only for software development tasks. In that way, senior figures were able to prioritize and check the features produced by junior programmers.

4.1.2 S4S initialization

The application of the S4S development process to the research project required some preliminary steps, which had the purpose of identifying:

  • the starting set of software features in the form of Product Backlog user and safety stories;

  • the time estimate for each backlog item needed for implementation and testing;

  • the verification and validation activities required for checking the output of each Safe-Sprint;

  • the set of technologies and tools to support the implementation of the Safe-Sprint workflow described in Subsec. 3.4.2.

For the creation of the Product Backlog, we studied the documentation provided by RFI and organized preliminary workshops for requirements analysis. The output consisted of 126 backlog items, which reflected some technical project difficulties due to the presence of: (i) SIL 4 safety functions; (ii) constrained embedded targets, and (iii) many different target platforms.

Regarding the identification of V&V activities to check the quality of the produced code, as strongly recommended by the railway standard Comité européen de normalisation en électronique et en électrotechnique (2011), we selected :

  • unit and integration testing to verify the functional behaviour of new features;

  • regression testing to assure the behaviour of pre-existent software components;

  • static analysis to enforce software compliance with RFI coding standard.

Considering the selection of tools to support the Safe-Sprint workflow, we employed a set of fundamental technologies to build a transparent and efficient development process.

The first type of technology we want to mention is collaboration platforms. Collaboration platforms, such as GitLabFootnote 5, AtlassianFootnote 6 and Microsoft Azure DevOpsFootnote 7, represent a comprehensive environment where agile teams can plan, inspect, and adapt their work, basing their decision on visible results. Regarding critical software, collaboration platforms can help agile teams to build a traceable and open development process, where an Assessor can improve his understanding of how the team works.

We decided to manage the entire development process using the GitLab open-source platform. Indeed, this enabled us to:

  1. 1.

    represent and share the items of our Product Backlog using Issue ListsFootnote 8;

  2. 2.

    study the weight and priority of each issue to optimize planning;

  3. 3.

    plan Safe-Sprints using MilestonesFootnote 9;

  4. 4.

    track the status of Safe-Sprints with Issue boardsFootnote 10;

  5. 5.

    direct link our codebase and relative changes to backlog issues.

Thus, the adoption of Gitlab allowed us to track and share all the main steps of the development process.

The second type of technology was testing and debugging tools, where there is the need to automate test runs during a Safe-Sprint. Indeed, although the continuous execution of test cases could improve the confidence of developers in software quality, testing remains a very time-consuming activity (Paige et al., 2008).

For this task, sector standards require the adoption of non-invasive testing and debugging tools, since the tester cannot introduce changes in the software source code (Comité européen de normalisation en électronique et en électrotechnique, 2011). Therefore, we selected the Lauterbach Trace32Footnote 11 commercial product, which enables developers to:

  • debug and trace program execution without any modification of source code;

  • inspecting the current program state at different levels, including internal hardware and operative system (if present) structures;

  • profiling the task response time for Worst-Case Execution Time (WCET) analysis;

  • analyze software on different industrial hardware platforms and operative systems;

  • demonstrate the capability of the testing tool to the Assessor.

The use of a certified instrument supported our group to improve verification efficiency while maintaining compliance with railway signalling standards.

Finally, the last type of technology, which we consider essential to mention, is static code-checkers. Code-checkers can be used to enforce software compliance against a coding standard. The enforcement of a coding standard is very useful, since it can reduce the number of software bugs by promoting the use of programming best practices.

For our project, RFI provided our group with the possibility to use the MISRA-C code checker of MATLAB PolyspaceFootnote 12, which completed our set of essential tools.

4.2 Case study results

4.2.1 Analysis of process efficiency

Considering the percentage of time spent for software V&V, we found that Safe-Sprints of 4 weeks were essential to managing the required verification and validation activities. In particular, we organized a single Safe-Sprint as the following:

  1. 1.

    in the first week, we defined and implemented the Safe-Sprint backlog;

  2. 2.

    in the second week we worked on unit, integration, and regression testing, and checked coding-standard compliance of code;

  3. 3.

    in the third week we updated software requirements and architecture with all the captured observations;

  4. 4.

    during the last week, we released the new version to RFI experts to revise testing plans and analyze product safety;

  5. 5.

    finally, at the end of the Safe-Sprint, we reviewed all the done work with RFI project managers, concluding the iteration with a Sprint Retrospective.

This organization presents the 75% of time spent for V&V, which means that its related cost was at least three times the development one. We consider this result as a good starting point given that:

  • the project is a safety-critical software with a high level of risk;

  • the time to develop and certify a safety-critical product in RFI is in the order of years.

Furthermore, since we have not yet automated tasks such as test suites generation and integration testing, there is also space for optimization.

About the reworking cost, i.e. the quantity of items that the team has to revise in case of requirements or design changes, we experienced with the case study different situations. Summarizing them, it was possible that:

  • the review of a Safe-Sprint did not meet user expectations;

  • integration with other subsystems shown incompatibilities or usability issues;

In the first case, the team had to revise the implementation and the documents produced in the last Safe-Sprint. Thus, the reworking cost was related to the time allocated for single sprints as team productivity. For our project, four weeks represented a good thread-off between the need to produce a significant increment while limiting reworking.

In the second case, integration activities exhibited an unpredictable reworking cost. As we said in the case study introduction, the selected project has to work with the pre-existent hardware and software subsystems, as with other currently developed research projects. During those integration activities, we experienced different tricky and unpredictable compatibility issues. Indeed, since many teams worked at different levels and contexts, they took contrasting decisions.

In terms of reworking, the resolution of integration issues required the revision of multiple Safe-Sprints. Therefore, it was essential, during the application of S4S, to plan frequent integration with other subsystems. Otherwise, the reworking cost would have been hard to limit and predict.

4.2.2 Analysis of process safety

Concerning process safety, the increased awareness of developers is observable from the data of test suites and discovered issues reported in Figs. 4 and 5. As described, the team iteratively tested the code to check its technical quality (see Fig. 4), enabling the discovery of software bugs since the beginning of the process (see Fig. 5).

Fig. 4
figure 4

User Stories and Test Suites for each Safe-Sprint

Fig. 5
figure 5

Test Suites and Discovered Issues for each Safe-Sprint

In terms of testing coverage, the above results were supported by an approximately complete analysis of the software code, as reported from the data of our test suites in Table 2. Therefore, developers had at least some evidence of the behaviour of each software part.

Table 2 Code coverage data extracted with Lauterbach Trace32

Concerning regression testing, its application helped the group, during the project, also to consolidate the work of the previous Safe-Sprints. For each iteration, the team executed test suites of software modules impacted by new modifications, gaining a much clearer vision of the effect of single changes. In our project, we experienced this phenomenon in the Safe-Sprint 8, where a software modification made for a single target platform generated an incompatibility with other architectures.

4.2.3 Analysis of process transparency

In terms of process transparency, we analyzed adherence of process documentation against railway standards, process coverage, and traceability cost-effectiveness.

About the adherence between the standards and development process documentation, we opted for full adoption of the formal set presented in the CENELEC EN50128. Indeed, although the proposed specifications refer to a traditional waterfall process, they can also be employed to describe the requirements and design of every software product.

The main advantage of this is that the certification body is not required to change its workflow to inspect the quality of produced software. Thus, as expected, the product is accompanied by requirements, architecture, and design specifications, as V&V analysis reports. However, since we used an iterative software lifecycle, we had to integrate the proposed documents with evidence that could describe, as well, the quality of the S4S process.

In order to do that, we thought that a complete audit of the executed Safe-Sprints could be essential to show the Assessor how each function was verified and validated. In particular, we considered the Safe-Sprints audit composed by:

  1. 1.

    The initial version of the product backlog;

  2. 2.

    The composition and competencies of the Scrum Team;

  3. 3.

    The list of all the adopted technologies with related manuals and licenses;

  4. 4.

    The following information for each executed Safe-Sprint:

    1. (a)

      The Safe-Sprint backlog;

    2. (b)

      The changes to software source code;

    3. (c)

      The acceptance tests used to check sprint backlog items;

    4. (d)

      The integration tests used to check software behavior on specified targets;

    5. (e)

      The software modules involved in regression testing;

    6. (f)

      The report of each testing activity;

    7. (g)

      The achieved compliance to coding standard;

    8. (h)

      The results of the V&V analysis of the Quality Assurance Team;

    9. (i)

      The Sprint Review and Retrospective results;

    10. (j)

      The modification made to the product backlog;

As the reader can notice, the purpose is to describe in depth what the team did to guarantee, measure and control the technical quality of the product.

Therefore, with S4S we achieved a strict relationship with applicable standards, but we had to introduce additional evidence to show its iteratively quality assurance process.

Another remarkable aspect, strictly related to the documents and the Safe-Sprints audit, is that they constitute an open and accessible trace of all the done activities. Thus, due to the possibility to show in-depth the refinement and validation of software requirements, we achieved, as well, a high-level process coverage.

Considering the adoption of tools to manage the above documents and audits, we needed instruments to improve process cost-effectiveness.

In particular, regarding the traceability of the S4S agile process, all the required data were captured by the GitLab collaborative platform. As we described in Fig. 3 and in Subsec. 4.1.2 the team had a comprehensive platform to trace the implementation of each backlog story. However, since the Safe-Sprints audit is a custom output, we are currently working on custom tools to extract and compose it automatically.

Then, about the software requirements, architecture, and design specifications, as other V&V reports requested by the standard, we are evaluating the possibility to use Rational IBM DOORS. Indeed, we found it very difficult to work outside a requirements management tool, given the large number and complexity of software requirements. Therefore, we are currently collaborating with RFI to address the problem of identifying a proper Integrated Development Environment (IDE) and migrating documentation of pre-existent products.

5 Conclusion

In this paper, we proposed an extension of the Scrum agile methodology, namely S4S, suitable to guide and help the design and development of software components in safety-critical domains, in particular, in the railway domain. We discussed S4S in full details, including its context, principles, roles and workflow. Furthermore, in order to evaluate the methodology, we report a case-study on a real, highly complex safety-critical research product with changeable requirements, which represents a typical situation for research groups.

The reported case study highlighted that S4S (i) enables iterative and evolutive development of safety-critical software, even if architecture and/or requirements need to be refined, (ii) allows documentation to be produced – and kept updated – as an output of the entire process, and (iii) makes the entire process much more safe and reactive w.r.t human errors.

Therefore, from these conclusions, we could state that the agile mindset remains effective in a critical context if it embraces all its values in terms of quality. Nevertheless, this paper only constitutes a starting point: we intend to apply S4S in other different critical research projects, even those involving third-party and/or legacy software components, in order to add new tools and techniques that would increase its current efficiency and safety.