Advertisement

Language Resources and Evaluation

, Volume 51, Issue 2, pp 495–524 | Cite as

Software requirements as an application domain for natural language processing

  • Themistoklis Diamantopoulos
  • Michael Roth
  • Andreas Symeonidis
  • Ewan Klein
Project Notes

Abstract

Mapping functional requirements first to specifications and then to code is one of the most challenging tasks in software development. Since requirements are commonly written in natural language, they can be prone to ambiguity, incompleteness and inconsistency. Structured semantic representations allow requirements to be translated to formal models, which can be used to detect problems at an early stage of the development process through validation. Storing and querying such models can also facilitate software reuse. Several approaches constrain the input format of requirements to produce specifications, however they usually require considerable human effort in order to adopt domain-specific heuristics and/or controlled languages. We propose a mechanism that automates the mapping of requirements to formal representations using semantic role labeling. We describe the first publicly available dataset for this task, employ a hierarchical framework that allows requirements concepts to be annotated, and discuss how semantic role labeling can be adapted for parsing software requirements.

Keywords

Semantic annotation Software requirements Semantic role labeling 

Notes

Acknowledgements

Parts of this work have been supported by the FP7 Collaborative Project S-CASE (Grant Agreement No 610717), funded by the European Commission.

References

  1. Abbott, R. J. (1983). Program design by informal English descriptions. Communications of the ACM, 26(11), 882–894.CrossRefGoogle Scholar
  2. Bach, N., & Badaskar, S. (2007). A review of relation extraction, language technologies institute. Pittsburgh: Carnegie Mellon University.Google Scholar
  3. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.Google Scholar
  4. Berant, J., Chou, A., Frostig, R., & Liang, P. (2013). Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 (pp. 1533–1544).Google Scholar
  5. Björkelund, A., Hafdell, L., & Nugues, P. (2009). Multilingual semantic role labeling. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics, Stroudsburg, PA, CoNLL ’09 (pp. 43–48).Google Scholar
  6. Boehm, B., & Basili, V. R. (2001). Software defect reduction top 10 list. Computer, 34, 135–137.CrossRefGoogle Scholar
  7. Bohnet, B. (2010). Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China (pp. 89–97).Google Scholar
  8. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (pp. 1247–1250). New York: ACM SIGMOD ’08.Google Scholar
  9. Booch, G. (1986). Object-oriented development. IEEE Transactions on Software Engineering, 12(2), 211–221.CrossRefGoogle Scholar
  10. Bunescu, R., & Mooney, R. J. (2005a). Subsequence kernels for relation extraction. In Advances in Neural Information Processing Systems (vol. 18, pp. 171–178). Proceedings of the 2005 Conference (NIPS).Google Scholar
  11. Bunescu, R. C., & Mooney, R. J. (2005b). A shortest path dependency kernel for relation extraction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA (pp. 724–731).Google Scholar
  12. Cai, Q., & Yates, A. (2013). Large-scale semantic parsing via schema matching and lexicon extension. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (vol. 1, Long Papers, pp. 423–433). Bulgaria: Sofia.Google Scholar
  13. Clark, S., & Curran, J. R. (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4), 493–552.CrossRefGoogle Scholar
  14. Culotta, A., & Sorensen, J. (2004). Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, ACL ’04 (pp. 423–429).Google Scholar
  15. Denger, C., Berry, D. M., & Kamsties, E. (2003). Higher quality requirements specifications through natural language patterns. In Proceedings of the IEEE International Conference on Software: Science, Technology and Engineering (pp. 80–90).Google Scholar
  16. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.Google Scholar
  17. Fleiss, J. L., Levin, B., & Paik, M. C. (1981). The measurement of interrater agreement. Statistical Methods for Rates and Proportions, 2, 212–236.Google Scholar
  18. Gervasi, V., & Zowghi, D. (2005). Reasoning about inconsistencies in natural language requirements. ACM Transactions on Software Engineering and Methodology, 14(3), 277–330.CrossRefGoogle Scholar
  19. Ghosh, S., Elenius, D., Li, W., Lincoln, P., Shankar, N., & Steiner, W. (2014). Automatically extracting requirements specifications from natural language. arXiv preprint: arXiv:14033142.
  20. Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics, 28(3), 245–288.CrossRefGoogle Scholar
  21. Gordon, M., & Harel, D. (2009). Generating executable scenarios from natural language. In Computational Linguistics and Intelligent Text Processing (pp. 456–467).Google Scholar
  22. GuoDong, Z., Jian, S., Jie, Z., & Min, Z. (2005). Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, ACL ’05 (pp. 427–434).Google Scholar
  23. Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M .A., Màrquez, L., Meyers, A., Nivre, J., Padó, S., & Štěpánek, J., et al (2009). The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task (pp. 1–18).Google Scholar
  24. Harmain, H. M., & Gaizauskas, R. (2003). Cm-builder: A natural language-based case tool for object-oriented analysis. Automated Software Engineering, 10(2), 157–181.CrossRefGoogle Scholar
  25. Kambhatla, N. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, ACL, Stroudsburg, PA, ACLdemo ’04 (pp. 178–181).Google Scholar
  26. Kof, L. (2004). Natural language processing for requirements engineering: Applicability to large requirements documents. In Workshop Proceedings of the 19th International Conference on Automated Software Engineering.Google Scholar
  27. Konrad, S. (2005). Facilitating the construction of specification pattern-based properties. In Proceedings of the 13th IEEE International Conference on Requirements Engineering (pp. 329–338).Google Scholar
  28. Kwiatkowksi, T., Zettlemoyer, L., Goldwater, S., & Steedman, M. (2010). Inducing probabilistic CCG grammars from logical form with higher-order unification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, MA (pp. 1223–1233).Google Scholar
  29. Meyers, A., Reeves, R., & Macleod, C. (2008). NomBank v1.0. Philadelphia: Linguistic Data Consortium.Google Scholar
  30. Mich, L. (1996). NL-OOPS: From natural language to object oriented requirements using the natural language processing system LOLITA. Natural Language Engineering, 2(2), 161–187.CrossRefGoogle Scholar
  31. Mich, L., Mariangela, F., & Pierluigi, N. I. (2004). Market research for requirements analysis using linguistic tools. Requirements Engineering, 9(1), 40–56.CrossRefGoogle Scholar
  32. Nanduri, S., & Rugaber, S. (1995). Requirements validation via automated natural language parsing. Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences, 3, 362–368.Google Scholar
  33. Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.CrossRefGoogle Scholar
  34. Post, A., & Hoenicke, H. (2012). Formalization and analysis of real-time requirements: A feasibility study at BOSCH. In Proceedings of the Fourth International Conference on Verified Software: Theories, Tools, and Experiments (pp. 225–240).Google Scholar
  35. Pradhan, S. S., Ward, W. H., Hacioglu, K., Martin, J. H., & Jurafsky, D. (2004). Shallow semantic parsing using support vector machines. In Susan Dumais, D. M., Roukos, S. (eds) HLT-NAACL 2004: Main Proceedings, Association for Computational Linguistics, Boston, MA (pp. 233–240).Google Scholar
  36. Roth, M., & Klein, E. (2015). Parsing software requirements with an ontology-based semantic role labeler. Proceedings of the IWCS Workshop Language and Ontologies, 2015, 15–21.Google Scholar
  37. Roth, M., & Woodsend, K. (2014). Composition of word representations improves semantic role labelling. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar (pp. 407–413).Google Scholar
  38. Roth, M., Diamantopoulos, T., Klein, E., & Symeonidis, A. (2014). Software requirements: A new domain for semantic parsers. In Proceedings of the ACL 2014 Workshop on Semantic Parsing, Baltimore, MD (pp. 50–54).Google Scholar
  39. Saeki, M., Horai, H., & Enomoto, H. (1989). Software development process from natural language specification. In Proceedings of the 11th International Conference on Software Engineering (pp. 64–73).Google Scholar
  40. Steedman, M. (2000). The syntactic process (vol. 35). MIT Press, Cambridge, MAGoogle Scholar
  41. Tang, L. R. (2003). Integrating top-down and bottom-up approaches in inductive logic programming: Applications in natural language processing and relational data mining. PhD thesis, Department of Computer Sciences, University of Texas, Austin, Texas.Google Scholar
  42. Tjong, S. F., Hallam, N., & Hartley, M. (2006). Improving the quality of natural language requirements specifications through natural language requirements patterns. In Proceedings of the Sixth IEEE International Conference on Computer and Information Technology, Washington, DC (pp. 199–205).Google Scholar
  43. van Lamsweerde, A. (2009). Requirements engineering: From system goals to UML models to software specifications. New York: Wiley.Google Scholar
  44. Wong, Y. W., & Mooney, R. J. (2006). Learning for semantic parsing with statistical machine translation. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, HLT-NAACL ’06 (pp. 439–446).Google Scholar
  45. Yeh, A. (2000). More accurate tests for the statistical significance of result differences. In Proceedings of the 18th International Conference on Computational Linguistics, Saarbrücken, Germany (pp. 947–953).Google Scholar
  46. Zelenko, D., Aone, C., & Richardella, A. (2003). Kernel methods for relation extraction. Journal of Machine Learning Research, 3, 1083–1106.Google Scholar
  47. Zettlemoyer, L., & Collins, M. (2007). Online learning of relaxed CCG grammars for parsing to logical form. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic (pp. 678–687).Google Scholar
  48. Zhao, S., & Grishman, R. (2005). Extracting relations with integrated information using kernel methods. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, ACL ’05 (pp. 419–426).Google Scholar
  49. Zolotas, C., Diamantopoulos, T., Chatzidimitriou, K. C., & Symeonidis, A. L. (2016). From requirements to source code: A model-driven engineering approach for RESTful web services. Automated Software Engineering. In press.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  • Themistoklis Diamantopoulos
    • 1
  • Michael Roth
    • 2
  • Andreas Symeonidis
    • 1
  • Ewan Klein
    • 3
  1. 1.Electrical and Computer Engineering DepartmentAristotle University of ThessalonikiThessalonikiGreece
  2. 2.Department of Language Science and TechnologySaarland UniversitySaarbrückenGermany
  3. 3.School of InformaticsUniversity of EdinburghEdinburghUK

Personalised recommendations