Abstract
Semantic similarity information supports requirements tracing and helps to reveal important requirements quality defects such as redundancies and inconsistencies.
Previous work has applied semantic similarity algorithms to requirements, however, we do not know enough about the performance of machine learning and deep learning models in that context.
Therefore, in this work we create the largest dataset for analyzing the similarity of requirements so far through the use of Amazon Mechanical Turk, a crowd-sourcing marketplace for micro-tasks. Based on this dataset, we investigate and compare different types of algorithms for estimating semantic similarities of requirements, covering both relatively simple bag-of-words and machine learning models.
In our experiments, a model which relies on averaging trained word and character embeddings as well as an approach based on character sequence occurrences and overlaps achieve the best performances on our requirements dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://www.mturk.com/ (accessed 06 February 2019).
References
Femmer, H., Vogelsang, A.: Requirements quality is quality in use. IEEE Softw. 36(3), 83–91 (2018)
Femmer, H., Fernández, D.M., Wagner, S., Eder, S.: Rapid quality assurance with requirements smells. J. Syst. Softw. 123, 190–213 (2017)
Femmer, H.: Automatic requirements reviews - potentials, limitations and practical tool support. In: Felderer, M., Méndez Fernández, D., Turhan, B., Kalinowski, M., Sarro, F., Winkler, D. (eds.) PROFES 2017. LNCS, vol. 10611, pp. 617–620. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69926-4_53
Wiegers, K.E., Beatty, J.: Software Requirements. Microsoft Press, Redmond (2013)
Natt och Dag, J., Regnell, B., Carlshamre, P., Andersson, M., Karlsson, J.: A feasibility study of automated natural language requirements analysis in market-driven development. Requir. Eng. 7(1), 20–33 (2002)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), pp. 1–14. Association for Computational Linguistics (2017)
He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1576–1586 (2015)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. CoRR abs/1503.00075 (2015)
Nie, Y., Bansal, M.: Shortcut-stacked sentence encoders for multi-domain inference. In: Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, pp. 41–45. Association for Computational Linguistics (2017)
Parikh, A., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2249–2255. Association for Computational Linguistics (2016)
He, H., Lin, J.: Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 937–948. Association for Computational Linguistics (2016)
Mihany, F.A., Moussa, H., Kamel, A., Ezat, E.: A framework for measuring similarity between requirements documents. In: Proceedings of the 10th International Conference on Informatics and Systems. INFOS 2016, pp. 334–335. ACM, New York (2016)
Mihany, F.A., Moussa, H., Kamel, A., Ezzat, E., Ilyas, M.: An automated system for measuring similarity between software requirements. In: Proceedings of the 2nd Africa and Middle East Conference on Software Engineering, AMECSE 2016, pp. 46–51. ACM New York (2016)
Natt och Dag, J., Gervasi, V., Brinkkemper, S., Regnell, B.: Speeding up requirements management in a product software company: linking customer wishes to product requirements through linguistic engineering. In: Proceedings of 12th IEEE International Requirements Engineering Conference, September 2004, pp. 283–294 (2004)
Natt och Dag, J., Regnell, B., Gervasi, V., Brinkkemper, S.: A linguistic-engineering approach to large-scale requirements management. IEEE Softw. 22(1), 32–39 (2005)
Hayes, J.H., Dekhtyar, A., Sundaram, S.K.: Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans. Softw. Eng. 32(1), 4–19 (2006)
Eder, S., Femmer, H., Hauptmann, B., Junker, M.: Configuring latent semantic indexing for requirements tracing. In: Proceedings of the Second International Workshop on Requirements Engineering and Testing, RET 2015, pp. 27–33. IEEE Press, Piscataway (2015)
Mezghani, M., Kang, J., Sèdes, F.: Industrial requirements classification for redundancy and inconsistency detection in SEMIOS. In: 26th IEEE International Requirements Engineering Conference, RE 2018, Banff, AB, Canada, 20–24 August 2018, pp. 297–303 (2018)
Juergens, E., et al.: Can clone detection support quality assessments of requirements specifications? In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, ICSE 2010, pp. 79–88. ACM, New York (2010)
Falessi, D., Cantone, G., Canfora, G.: Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. IEEE Trans. Softw. Eng. 39(1), 18–44 (2013)
Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: SemEval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval 2012, pp. 385–393. Association for Computational Linguistics, Stroudsburg (2012)
Wieting, J., Gimpel, K.: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations. CoRR abs/1711.05732 (2017)
Wieting, J., Mallinson, J., Gimpel, K.: Learning paraphrastic sentence embeddings from back-translated bitext. In: Proceedings of Empirical Methods in Natural Language Processing. (2017)
Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Charagram: embedding words and sentences via character n-grams. CoRR abs/1607.02789 (2016)
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, September 2017, pp. 670–680. Association for Computational Linguistics (2017)
Cer, D., et al.: Universal sentence encoder. CoRR abs/1803.11175 (2018)
Lan, W., Xu, W.: Character-based neural networks for sentence pair modeling. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 157–163. Association for Computational Linguistics (2018)
Al-Natsheh, H.T., Martinet, L., Muhlenbach, F., ZIGHED, D.A.: UdL at SemEval-2017 task 1: semantic textual similarity estimation of English sentence pairs using regression model over pairwise features. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, August 2017, pp. 115–119. Association for Computational Linguistics (2017)
BrychcÃn, T., Svoboda, L.: UWB at SemEval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: SemEval@NAACL-HLT, pp. 588–594. The Association for Computer Linguistics (2016)
Sultan, M.A., Bethard, S., Sumner, T.: Back to basics for monolingual alignment: exploiting word similarity and contextual evidence. Trans. Assoc. Comput. Linguist. 2, 219–230 (2014)
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29044-2
Basili, V.R., Caldiera, G.: Rombach, D.H.: The goal question metric approach. In: Encyclopedia of Software Engineering, pp. 528–532 (1994)
Shepperd, M., MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)
Dagan, I., Dolan, B., Magnini, B., Roth, D.: Recognizing textual entailment: rational, evaluation and approaches. J. Nat. Lang. Eng. 4, I-Xvii (2010)
Ferrari, A., Spagnolo, G.O., Gnesi, S.: PURE: a dataset of public requirements documents. In: IEEE 25th International Requirements Engineering Conference (RE), pp. 502–505. IEEE (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Femmer, H., Müller, A., Eder, S. (2020). Semantic Similarities in Natural Language Requirements. In: Winkler, D., Biffl, S., Mendez, D., Bergsmann, J. (eds) Software Quality: Quality Intelligence in Software and Systems Engineering. SWQD 2020. Lecture Notes in Business Information Processing, vol 371. Springer, Cham. https://doi.org/10.1007/978-3-030-35510-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-35510-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35509-8
Online ISBN: 978-3-030-35510-4
eBook Packages: Computer ScienceComputer Science (R0)