Software requirements as an application domain for natural language processing

Abstract

Mapping functional requirements first to specifications and then to code is one of the most challenging tasks in software development. Since requirements are commonly written in natural language, they can be prone to ambiguity, incompleteness and inconsistency. Structured semantic representations allow requirements to be translated to formal models, which can be used to detect problems at an early stage of the development process through validation. Storing and querying such models can also facilitate software reuse. Several approaches constrain the input format of requirements to produce specifications, however they usually require considerable human effort in order to adopt domain-specific heuristics and/or controlled languages. We propose a mechanism that automates the mapping of requirements to formal representations using semantic role labeling. We describe the first publicly available dataset for this task, employ a hierarchical framework that allows requirements concepts to be annotated, and discuss how semantic role labeling can be adapted for parsing software requirements.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    A system also involves non-functional requirements that describe quality criteria. However, this paper focuses on functional requirements, which specify what a system can do.

  2. 2.

    http://www.scasefp7.eu.

  3. 3.

    http://www.w3.org/TR/1999/REC-rdf-syntax-19990222.

  4. 4.

    http://www.w3.org/TR/turtle/.

  5. 5.

    http://www.w3.org/TR/1999/PR-rdf-schema-19990303/.

  6. 6.

    http://www.w3.org/TR/2004/REC-owl-guide-20040210/.

  7. 7.

    Although we don’t rely on OWL inference capabilities explicitly in this paper, they are useful for expressing integrity conditions over the ontology, such as ensuring that certain properties have inverse properties (e.g. owns/owned _ by).

  8. 8.

    https://github.com/edumbill/doap/wiki.

  9. 9.

    We first introduced this parser in (Roth and Klein 2015). This section provides additional details of the parsing architecture and underlying motivations.

  10. 10.

    http://code.google.com/p/mate-tools/.

  11. 11.

    http://github.com/turian/neural-language-model.

  12. 12.

    http://protege.stanford.edu.

  13. 13.

    The majority of requirements collected in this way were provided by a software development course organized jointly by several European universities, cf. http://www.fer.unizg.hr/rasip/dsd.

References

  1. Abbott, R. J. (1983). Program design by informal English descriptions. Communications of the ACM, 26(11), 882–894.

    Article  Google Scholar 

  2. Bach, N., & Badaskar, S. (2007). A review of relation extraction, language technologies institute. Pittsburgh: Carnegie Mellon University.

    Google Scholar 

  3. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.

    Google Scholar 

  4. Berant, J., Chou, A., Frostig, R., & Liang, P. (2013). Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 (pp. 1533–1544).

  5. Björkelund, A., Hafdell, L., & Nugues, P. (2009). Multilingual semantic role labeling. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics, Stroudsburg, PA, CoNLL ’09 (pp. 43–48).

  6. Boehm, B., & Basili, V. R. (2001). Software defect reduction top 10 list. Computer, 34, 135–137.

    Article  Google Scholar 

  7. Bohnet, B. (2010). Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China (pp. 89–97).

  8. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (pp. 1247–1250). New York: ACM SIGMOD ’08.

  9. Booch, G. (1986). Object-oriented development. IEEE Transactions on Software Engineering, 12(2), 211–221.

    Article  Google Scholar 

  10. Bunescu, R., & Mooney, R. J. (2005a). Subsequence kernels for relation extraction. In Advances in Neural Information Processing Systems (vol. 18, pp. 171–178). Proceedings of the 2005 Conference (NIPS).

  11. Bunescu, R. C., & Mooney, R. J. (2005b). A shortest path dependency kernel for relation extraction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA (pp. 724–731).

  12. Cai, Q., & Yates, A. (2013). Large-scale semantic parsing via schema matching and lexicon extension. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (vol. 1, Long Papers, pp. 423–433). Bulgaria: Sofia.

  13. Clark, S., & Curran, J. R. (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4), 493–552.

    Article  Google Scholar 

  14. Culotta, A., & Sorensen, J. (2004). Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, ACL ’04 (pp. 423–429).

  15. Denger, C., Berry, D. M., & Kamsties, E. (2003). Higher quality requirements specifications through natural language patterns. In Proceedings of the IEEE International Conference on Software: Science, Technology and Engineering (pp. 80–90).

  16. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.

    Google Scholar 

  17. Fleiss, J. L., Levin, B., & Paik, M. C. (1981). The measurement of interrater agreement. Statistical Methods for Rates and Proportions, 2, 212–236.

    Google Scholar 

  18. Gervasi, V., & Zowghi, D. (2005). Reasoning about inconsistencies in natural language requirements. ACM Transactions on Software Engineering and Methodology, 14(3), 277–330.

    Article  Google Scholar 

  19. Ghosh, S., Elenius, D., Li, W., Lincoln, P., Shankar, N., & Steiner, W. (2014). Automatically extracting requirements specifications from natural language. arXiv preprint: arXiv:14033142.

  20. Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics, 28(3), 245–288.

    Article  Google Scholar 

  21. Gordon, M., & Harel, D. (2009). Generating executable scenarios from natural language. In Computational Linguistics and Intelligent Text Processing (pp. 456–467).

  22. GuoDong, Z., Jian, S., Jie, Z., & Min, Z. (2005). Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, ACL ’05 (pp. 427–434).

  23. Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M .A., Màrquez, L., Meyers, A., Nivre, J., Padó, S., & Štěpánek, J., et al (2009). The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task (pp. 1–18).

  24. Harmain, H. M., & Gaizauskas, R. (2003). Cm-builder: A natural language-based case tool for object-oriented analysis. Automated Software Engineering, 10(2), 157–181.

    Article  Google Scholar 

  25. Kambhatla, N. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, ACL, Stroudsburg, PA, ACLdemo ’04 (pp. 178–181).

  26. Kof, L. (2004). Natural language processing for requirements engineering: Applicability to large requirements documents. In Workshop Proceedings of the 19th International Conference on Automated Software Engineering.

  27. Konrad, S. (2005). Facilitating the construction of specification pattern-based properties. In Proceedings of the 13th IEEE International Conference on Requirements Engineering (pp. 329–338).

  28. Kwiatkowksi, T., Zettlemoyer, L., Goldwater, S., & Steedman, M. (2010). Inducing probabilistic CCG grammars from logical form with higher-order unification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA (pp. 1223–1233).

  29. Meyers, A., Reeves, R., & Macleod, C. (2008). NomBank v1.0. Philadelphia: Linguistic Data Consortium.

    Google Scholar 

  30. Mich, L. (1996). NL-OOPS: From natural language to object oriented requirements using the natural language processing system LOLITA. Natural Language Engineering, 2(2), 161–187.

    Article  Google Scholar 

  31. Mich, L., Mariangela, F., & Pierluigi, N. I. (2004). Market research for requirements analysis using linguistic tools. Requirements Engineering, 9(1), 40–56.

    Article  Google Scholar 

  32. Nanduri, S., & Rugaber, S. (1995). Requirements validation via automated natural language parsing. Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences, 3, 362–368.

    Google Scholar 

  33. Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.

    Article  Google Scholar 

  34. Post, A., & Hoenicke, H. (2012). Formalization and analysis of real-time requirements: A feasibility study at BOSCH. In Proceedings of the Fourth International Conference on Verified Software: Theories, Tools, and Experiments (pp. 225–240).

  35. Pradhan, S. S., Ward, W. H., Hacioglu, K., Martin, J. H., & Jurafsky, D. (2004). Shallow semantic parsing using support vector machines. In Susan Dumais, D. M., Roukos, S. (eds) HLT-NAACL 2004: Main Proceedings, Association for Computational Linguistics, Boston, MA (pp. 233–240).

  36. Roth, M., & Klein, E. (2015). Parsing software requirements with an ontology-based semantic role labeler. Proceedings of the IWCS Workshop Language and Ontologies, 2015, 15–21.

    Google Scholar 

  37. Roth, M., & Woodsend, K. (2014). Composition of word representations improves semantic role labelling. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar (pp. 407–413).

  38. Roth, M., Diamantopoulos, T., Klein, E., & Symeonidis, A. (2014). Software requirements: A new domain for semantic parsers. In Proceedings of the ACL 2014 Workshop on Semantic Parsing, Baltimore, MD (pp. 50–54).

  39. Saeki, M., Horai, H., & Enomoto, H. (1989). Software development process from natural language specification. In Proceedings of the 11th International Conference on Software Engineering (pp. 64–73).

  40. Steedman, M. (2000). The syntactic process (vol. 35). MIT Press, Cambridge, MA

  41. Tang, L. R. (2003). Integrating top-down and bottom-up approaches in inductive logic programming: Applications in natural language processing and relational data mining. PhD thesis, Department of Computer Sciences, University of Texas, Austin, Texas.

  42. Tjong, S. F., Hallam, N., & Hartley, M. (2006). Improving the quality of natural language requirements specifications through natural language requirements patterns. In Proceedings of the Sixth IEEE International Conference on Computer and Information Technology, Washington, DC (pp. 199–205).

  43. van Lamsweerde, A. (2009). Requirements engineering: From system goals to UML models to software specifications. New York: Wiley.

    Google Scholar 

  44. Wong, Y. W., & Mooney, R. J. (2006). Learning for semantic parsing with statistical machine translation. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, HLT-NAACL ’06 (pp. 439–446).

  45. Yeh, A. (2000). More accurate tests for the statistical significance of result differences. In Proceedings of the 18th International Conference on Computational Linguistics, Saarbrücken, Germany (pp. 947–953).

  46. Zelenko, D., Aone, C., & Richardella, A. (2003). Kernel methods for relation extraction. Journal of Machine Learning Research, 3, 1083–1106.

    Google Scholar 

  47. Zettlemoyer, L., & Collins, M. (2007). Online learning of relaxed CCG grammars for parsing to logical form. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic (pp. 678–687).

  48. Zhao, S., & Grishman, R. (2005). Extracting relations with integrated information using kernel methods. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, ACL ’05 (pp. 419–426).

  49. Zolotas, C., Diamantopoulos, T., Chatzidimitriou, K. C., & Symeonidis, A. L. (2016). From requirements to source code: A model-driven engineering approach for RESTful web services. Automated Software Engineering. In press.

Download references

Acknowledgements

Parts of this work have been supported by the FP7 Collaborative Project S-CASE (Grant Agreement No 610717), funded by the European Commission.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Michael Roth.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Diamantopoulos, T., Roth, M., Symeonidis, A. et al. Software requirements as an application domain for natural language processing. Lang Resources & Evaluation 51, 495–524 (2017). https://doi.org/10.1007/s10579-017-9381-z

Download citation

Keywords

  • Semantic annotation
  • Software requirements
  • Semantic role labeling