Skip to main content
Log in

Detecting requirements defects with NLP patterns: an industrial experience in the railway domain

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

In the railway safety-critical domain requirements documents have to abide to strict quality criteria. Rule-based natural language processing (NLP) techniques have been developed to automatically identify quality defects in natural language requirements. However, the literature is lacking empirical studies on the application of these techniques in industrial settings. Our goal is to investigate to which extent NLP can be practically applied to detect defects in the requirements documents of a railway signalling manufacturer. To address this goal, we first identified a set of typical defects classes, and, for each class, an engineer of the company implemented a set of defect-detection patterns by means of the GATE tool for text processing. After a preliminary analysis, we applied the patterns to a large set of 1866 requirements previously annotated for defects. The output of the patterns was further inspected by two domain experts to check the false positive cases. Additional discard-patterns were defined to automatically remove these cases. Finally, SREE, a tool that searches for typically ambiguous terms, was applied to the requirements. The experiments show that SREE and our patterns may play complementary roles in the detection of requirements defects. This is one of the first works in which defect detection NLP techniques are applied on a very large set of industrial requirements annotated by domain experts. We contribute with a comparison between traditional manual techniques used in industry for requirements analysis, and analysis performed with NLP. Our experience shows that several discrepancies can be observed between the two approaches. The analysis of the discrepancies offers hints to improve the capabilities of NLP techniques with company specific solutions, and suggests that also company practices need to be modified to effectively exploit NLP tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/ISTI-FMT/QUARS_plus_plus.

  2. The standard is currently replaced by ISO/IEC/IEEE 29148:2011 (ISO 2011).

  3. In this context, we consider as a pattern i also a dictionary from SREE-reduced, as defined in Section 3.4

  4. The dataset appears balanced since VE1 continued to randomly select new requirements from the original requirements considered, until a balanced number of accepted and rejected requirements was obtained.

  5. According to Landis and Koch (1977), the following qualitative measures are associated to the different ranges of the Cohen’s Kappa: k < 0, no agreement; 0 ≤ k ≤ 0.20, slight; 0.21 ≤ k ≤ 0.40, fair; 0.41 ≤ k ≤ 0.60, moderate; 0.61 ≤ k ≤ 0.80 substantial; and 0.81 ≤ k ≤ 1 almost perfect agreement.

  6. The results presented in Tables 9 and 8 differ from those presented in our original conference paper. When VE2 replicated the experiments performed by VE1, discrepancies in the results emerged. These were traced back to the usage of a support tool, developed by VE1 on top of GATE, to ease the analysis of the requirements. The tool introduced further manipulations, which led to incorrect numerical results. The results presented in this paper are produced based solely on the analysis of the output of GATE, and are, to the best of our knowledge, correct.

  7. The requirement was not rejected since it was clarified by other subsequent requirements. This violates the guideline (c) that require requirements to be stand-alone, but the defect was not considered crucial.

  8. The value of pR that considers the analysis of the false positive cases for the SREE dictionaries cannot be provided, since we analysed only a subset of the defects for the plurals class. However, the average value of pD gives a clear indication of the precision of SREE at the level of defects.

  9. https://gate.ac.uk/commercial.html.

References

  • Alvarez SA (2002) An exact analytical relation among recall, precision, and classification accuracy in information retrieval. Tech. Rep BCCS-02-01. Computer Science Department, Boston College

  • Ambriola V, Gervasi V (2006) On the systematic analysis of natural language requirements with Circe. Autom Softw Eng 13(1):107–167

    Article  Google Scholar 

  • Anda B, Sjøberg DI (2002) Towards an inspection technique for use case models. In: Proceedings of the 14th international conference on software engineering and knowledge engineering (SEKE’02). ACM, pp 127–134

  • Arora C, Sabetzadeh M, Briand L, Zimmer F (2015) Automated checking of conformance to requirements templates using natural language processing. IEEE Trans Softw Eng 41(10):944–968

    Article  Google Scholar 

  • Aurum A, Petersson H, Wohlin C (2002) State-of-the-art: software inspections after 25 years. Softw Test Verif Reliab 12(3):133–154

    Article  Google Scholar 

  • Baskerville RL, Wood-Harper AT (1996) A critical perspective on action research as a method for information systems research. J Inf Technol 11(3):235–246

    Article  Google Scholar 

  • Berry DM, Kamsties E (2005) The syntactically dangerous all and plural in specifications. IEEE Softw 22(1):55–57

    Article  Google Scholar 

  • Berry DM, Kamsties E, Krieger MM (2003) From contract drafting to software specification: Linguistic sources of ambiguity. https://cs.uwaterloo.ca/~dberry/handbook/ambiguityHandbook.pdf

  • Berry D, Gacitua R, Sawyer P, Tjong SF (2012) The case for dumb requirements engineering tools. In: Proceedings of the 18th international working conference on requirements engineering: foundation for software quality (REFSQ’12), vol 7195. Springer, LNCS, pp 211–217

    Chapter  Google Scholar 

  • Berry DM, Cleland-Huang J, Ferrari A, Maalej W, Mylopoulos J, Zowghi D (2017) Panel: context-dependent evaluation of tools for nl re tasks: Recall vs. precision, and beyond. In: 2017 IEEE 25th International requirements engineering conference (RE), pp 570–573. https://doi.org/10.1109/RE.2017.64

  • Bonin F, Dell’Orletta F, Venturi G, Montemagni S (2010) A contrastive approach to multi-word term extraction from domain corpora. In: Proceedings of the 7th International conference on language resources and evaluation (LREC’10), pp 19–21

  • Casamayor A, Godoy D, Campo M (2012) Functional grouping of natural language requirements for assistance in architectural software design. KBS 30:78–86

    Google Scholar 

  • CENELEC (2011) EN 50128:2011: railway applications - communication, signalling and processing systems - software for railway control and protection systems. Tech. rep.

  • Chantree F, Nuseibeh B, Roeck AND, Willis A (2006) Identifying nocuous ambiguities in natural language requirements. In: Proceedings of the 14th IEEE international requirements engineering conference (RE’06). IEEE, pp 56–65

  • Cleland-Huang J, Czauderna A, Gibiec M, Emenecker J (2010) A machine learning approach for tracing regulatory codes to product specific requirements. In: ICSE (1). ACM, pp 155–164

  • Collins-Thompson K (2014) Computational assessment of text readability: a survey of current and future research. ITL-Int J Appl Linguist 165(2):97–135

    Google Scholar 

  • Cunningham H (2002) GATE, a general architecture for text engineering. Comput Human 36(2):223–254

    Article  Google Scholar 

  • Cutts M (1996) The plain English guide. Oxford University Press

  • Derczynski L, Maynard D, Rizzo G, van Erp M, Gorrell G, Troncy R, Petrak J, Bontcheva K (2015) Analysis of named entity recognition and linking for tweets. Inf Process Manag 51(2):32–49

    Article  Google Scholar 

  • Fabbrini F, Fusani M, Gnesi S, Lami G (2001) The linguistic approach to the natural language requirements quality: benefit of the use of an automatic tool. In: Proceedings of the 26th Annual NASA Goddard software engineering workshop. IEEE, pp 97–105

  • Fagan ME (1976) Design and code inspections to reduce errors in program development. IBM Syst J 15(3):182–211

    Article  Google Scholar 

  • Falessi D, Cantone G, Canfora G (2013) Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. IEEE Trans Softw Eng 39(1):18–44

    Article  Google Scholar 

  • Femmer H, Kučera J, Vetrò A (2014) On the impact of passive voice requirements on domain modelling. In: Proceedings of the 8th ACM / IEEE international symposium on empirical software engineering and measurement (ESEM’14), Art. 21. ACM

  • Femmer H, Fernández DM, Wagner S, Eder S (2017) Rapid quality assurance with requirements smells. J Syst Softw 123:190–213

    Article  Google Scholar 

  • Ferrari A, Gnesi S (2012) Using collective intelligence to detect pragmatic ambiguities. In: Proceedings of the 20th IEEE international requirements engineering conference (RE’12). IEEE, pp 191–200

  • Ferrari A, dell’Orletta F, Spagnolo GO, Gnesi S (2014) Measuring and improving the completeness of natural language requirements. In: Proceedings of the 20th international working conference on requirements engineering: foundation for software quality (REFSQ’14). Springer, pp 23–38

  • Ferrari A, Spoletini P, Gnesi S (2016) Ambiguity and tacit knowledge in requirements elicitation interviews. Requir Eng 21(3):333–355

    Article  Google Scholar 

  • Ferrari A, Dell’Orletta F, Esuli A, Gervasi V, Gnesi S (2017) Natural language requirements processing: a 4D vision. IEEE Software (to appear)

  • Gacitua R, Sawyer P, Gervasi V (2010) On the effectiveness of abstraction identification in requirements engineering. In: Proceedings of the 18th IEEE international requirements engineering conference (RE’10). IEEE, pp 5–14

  • Gervasi V, Zowghi D (2005) Reasoning about inconsistencies in natural language requirements. ACM Trans Softw Eng Methodol 14(3):277–330

    Article  Google Scholar 

  • Ghaisas S, Rose P, Daneva M, Sikkel K, Wieringa RJ (2013) Generalizing by similarity: Lessons learnt from industrial case studies. In: Proceedings of the 1st international workshop on conducting empirical studies in industry. IEEE Press, pp 37–42

  • Gleich B, Creighton O, Kof L (2010) Ambiguity detection: towards a tool explaining ambiguity sources. In: Proceedings of the 16th international working conference on requirements engineering: foundation for software quality (REFSQ’10), vol 6182. Springer, LNCS, pp 218–232

    Chapter  Google Scholar 

  • Gnesi S, Lami G, Trentanni G (2005) An automatic tool for the analysis of natural language requirements. Int J Comput Syst Sci Eng 20(1):53–62

    Google Scholar 

  • Gorschek T, Garre P, Larsson S, Wohlin C (2006) A model for technology transfer in practice. IEEE Softw 23(6):88–95

    Article  Google Scholar 

  • Goth G (2016) Deep or shallow, nlp is breaking out. Commun ACM 59(3):13–16

    Article  Google Scholar 

  • IEEE (1998) IEEE guide for developing system requirements specifications. IEEE Std 1233, 1998 Edition, pp 1–36, https://doi.org/10.1109/IEEESTD.1998.88826

  • ISO IEC, IEEE (2011) ISO/IEC/IEEE international standard - systems and software engineering – life cycle processes –requirements engineering. ISO/IEC/IEEE 29148:2011(E), pp 1–94, https://doi.org/10.1109/IEEESTD.2011.6146379

  • Kamsties E (2005) Understanding ambiguity in requirements engineering. In: Engineering and managing software requirements. Springer, Berlin, pp 245–266

  • Kamsties E, Berry DM, Paech B (2001) Detecting ambiguities in requirements documents using inspections. In: Proceedings of the 1st workshop on inspection in software engineering (WISE’01), pp 68–80

  • Kang N, van Mulligen EM, Kors JA (2011) Comparing and combining chunkers of biomedical text. J Biomed Inform 44(2):354–360

    Article  Google Scholar 

  • Kassab M, Neill C, Laplante P (2014) State of practice in requirements engineering: contemporary data. Innov Syst Softw Eng 10(4):235–241

    Article  Google Scholar 

  • Kiyavitskaya N, Zeni N, Mich L, Berry DM (2008) Requirements for tools for ambiguity identification and measurement in natural language requirements specifications. Requir Eng 13(3):207–239

    Article  Google Scholar 

  • Kof L (2008) From textual scenarios to message sequence charts: inclusion of condition generation and actor extraction. In: Proceedings of the 16th IEEE international requirements engineering conference, (RE’08). IEEE, pp 331–332

  • Kof L (2009) Translation of textual specifications to automata by means of discourse context modeling. In: Proceedings of the 15th international working conference on requirements engineering: foundation for software quality (REFSQ’09), vol 5512. Springer, LNCS, pp 197–211

    Chapter  Google Scholar 

  • Kof L (2010) From requirements documents to system models: a tool for interactive semi-automatic translation. In: Proceedings of the 18th IEEE international requirements engineering conference (RE’10). IEEE, pp 391–392

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics, 159–174

  • Lian X, Rahimi M, Cleland-Huang J, Zhang L, Ferrari R, Smith M (2016) Mining requirements knowledge from collections of domain documents. In: Proceedings of the 24th IEEE international requirements engineering conference (RE’16). IEEE, pp 156–165

  • Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Proceedings of the 23rd IEEE international requirements engineering conference, (RE’15). IEEE, pp 116–125

  • Manning CD (2011) Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In: Proceedings of the 12th international conference on intelligent text processing and computational linguistics (CICLing’11), LNCS, vol 6608. Springer, pp 171–189

  • Mavin A, Wilkinson P, Harwood A, Novak M (2009) Easy approach to requirements syntax (ears). In: Proceedings of the 17th IEEE international requirements engineering conference (RE’09). IEEE, pp 317–322

  • Mavin A, Wilksinson P, Gregory S, Uusitalo E (2016) Listens learned (8 lessons learned applying EARS). In: Proceedings of the 24th IEEE international requirements engineering conference (RE’16). IEEE, pp 276–282

  • Mich L (1996) NL-OOPS: from natural language to object oriented requirements using the natural language processing system LOLITA. NLE 2(2):161–187

    Google Scholar 

  • Mich L, Franch M, Inverardi PN (2004) Market research for requirements analysis using linguistic tools. Requir Eng 9(1):40–56

    Article  Google Scholar 

  • Pohl K, Rupp C (2011) Requirements engineering fundamentals. Rocky Nook, Inc

  • Quirchmayr T, Paech B, Kohl R, Karey H (2017) Semi-automatic software feature-relevant information extraction from natural language user manuals. In: Proceedings of the 23rd international working conference on requirements engineering: foundation for software quality (REFSQ’17). Springer, pp 255–272

  • Robeer M, Lucassen G, van der Werf JME, Dalpiaz F, Brinkkemper S (2016) Automated extraction of conceptual models from user stories via nlp. In: Proceedings of the 24th IEEE international requirements engineering conference (RE’16). IEEE, pp 196–205

  • Rosadini B, Ferrari A, Gori G, Fantechi A, Gnesi S, Trotta I, Bacherini S (2017) Using NLP to detect requirements defects: an industrial experience in the railway domain. In: Proceedings of the 23rd international working conference on requirements engineering: foundation for software quality (REFSQ’17). LNCS, vol 10153, pp 344–360

    Chapter  Google Scholar 

  • Rosenberg LH, Hammer F, Huffman LL (1998) Requirements, testing and metrics. In: In 15th Annual pacific northwest software quality conference

  • RTCA Inc, EUROCAE (2012) DO-178C: software considerations in airborne systems and equipment certification. Tech. rep.

  • Runeson P, Host M, Rainer A, Regnell B (2012) Case study research in software engineering: guidelines and examples. Wiley

  • Shull F, Rus I, Basili V (2000) How perspective-based reading can improve requirements inspections. IEEE Comput 33(7):73–79

    Article  Google Scholar 

  • Sultanov H, Hayes JH (2013) Application of reinforcement learning to requirements engineering: requirements tracing. In: Proceedings of the 21st IEEE international requirements engineering conference (RE’13). IEEE, pp 52–61

  • Terzakis J, Gregory S (2016) Ramp: requirements authors mentoring program. In: Proceedings of the 24th IEEE international requirements engineering conference (RE’16). IEEE, pp 323–328

  • Tjong SF, Berry DM (2013) The design of SREE: a prototype potential ambiguity finder for requirements specifications and lessons learned. In: Proceedings of the 19th international working conference on requirements engineering: foundation for software quality (REFSQ’13), vol 7830. Springer, LNCS, pp 80–95

    Chapter  Google Scholar 

  • Wieringa R, Daneva M (2015) Six strategies for generalizing software engineering theories. Sci Comput Program 101:136–152

    Article  Google Scholar 

  • Wilmink M, Bockisch C (2017) On the ability of lightweight checks to detect ambiguity in requirements documentation. In: Proceedings of the 23rd international working conference on requirements engineering: foundation for software quality (REFSQ’17), vol 10153. Springer International Publishing, LNCS, pp 327–343

    Chapter  Google Scholar 

  • Wilson WM, Rosenberg LH, Hyatt LE (1997) Automated analysis of requirement specifications. In: Proceedings of the 19th international conference on software engineering. ACM, pp 161–171

  • Yang H, Roeck AND, Gervasi V, Willis A, Nuseibeh B (2011) Analysing anaphoric ambiguity in natural language requirements. Requir Eng 16(3):163–189

    Article  Google Scholar 

  • Yin RK (2013) Case study research: design and methods. Sage Publications

  • Yue T, Briand LC, Labiche Y (2015) atoucan: an automated framework to derive uml analysis models from use case models. ACM Trans Softw Eng Methodol (TOSEM) 24(3):13

    Article  Google Scholar 

  • Zhang H, Yue T, Ali S, Liu C (2016) Towards mutation analysis for use cases. In: Proceedings of the ACM/IEEE 19th international conference on model driven engineering languages and systems. ACM, pp 363–373

  • Zowghi D, Gervasi V, McRae A (2001) Using default reasoning to discover inconsistencies in natural language requirements. In: Proceedings of the 8th Asia-Pacific software engineering conference (APSEC’01), pp 133–140

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their precious recommendations, which contributed to make the paper clearer and more complete. We are also extremely grateful to Daniel M. Berry, for providing the dictionaries of SREE, and to Daniel Méndez Fernández for his valuable suggestions on reporting case studies in software engineering. This work has been partially funded by the ASTRail project. This project received funding from the Shift2Rail Joint Undertaking under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 777561. The content of this paper reflects only the authors’ view and the Shift2Rail Joint Undertaking is not responsible for any use that may be made of the included information.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessio Ferrari.

Additional information

Communicated by: Anna Perini and Paul Grünbacher

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ferrari, A., Gori, G., Rosadini, B. et al. Detecting requirements defects with NLP patterns: an industrial experience in the railway domain. Empir Software Eng 23, 3684–3733 (2018). https://doi.org/10.1007/s10664-018-9596-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9596-7

Keywords

Navigation