Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements

  • Siba Mishra
  • Arpit SharmaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12045)


[Context and Motivation] Requirements glossary defines specialized and technical terms used in a requirements document. A requirements glossary helps in improving the quality and understandability of requirements documents. [Question/Problem] Manual extraction of glossary terms from a large body of requirements is an expensive and time-consuming task. This paper proposes a fundamentally new approach for automated extraction of glossary terms from large-sized requirements documents. [Principal Ideas/Result] Firstly, our technique extracts the candidate glossary terms by applying text chunking. Next, we apply a novel word embeddings based semantic filter for reducing the number of candidate glossary terms. Since word embeddings are very effective in identifying terms that are semantically very similar, this filter ensures that only domain-specific terms are present in the final set of glossary terms. We create a domain-specific reference corpus for home automation by Wikipedia crawling and use it for computing the semantic similarity scores of candidate glossary terms. We apply our technique to a large-sized requirements document, i.e., a CrowdRE dataset with around 3000 crowd-generated requirements for smart home applications. Semantic filtering reduces the number of glossary terms by 92.7%. To evaluate the quality of our extracted glossary terms we manually create the ground truth data from CrowdRE dataset and use it for computing precision and recall. Additionally, we also compute the requirements coverage of these extracted glossary terms. [Contributions] Our detailed experiments show that word embeddings based semantic filtering can be very useful for extracting glossary terms from a large body of requirements.


Requirements engineering Natural language processing Word embeddings Term extraction Semantic filter 


  1. 1.
    Aguilera, C., Berry, D.M.: The use of a repeated phrase finder in requirements extraction. J. Syst. Softw. 13(3), 209–230 (1990)CrossRefGoogle Scholar
  2. 2.
    Arora, C., Sabetzadeh, M., Briand, L.C., Zimmer, F.: Automated extraction and clustering of requirements glossary terms. IEEE Trans. Softw. Eng. 43(10), 918–945 (2017)CrossRefGoogle Scholar
  3. 3.
    Dwarakanath, A., Ramnani, R.R., Sengupta, S.: Automatic extraction of glossary terms from natural language requirements. In: 21st IEEE International Requirements Engineering Conference (RE), pp. 314–319, July 2013Google Scholar
  4. 4.
    Ferrari, A., Donati, B., Gnesi, S.: Detecting domain-specific ambiguities: an NLP approach based on Wikipedia crawling and word embeddings. In: 25th IEEE International Requirements Engineering Conference Workshops (REW), pp. 393–399, September 2017Google Scholar
  5. 5.
    Ferrari, A., Esuli, A., Gnesi, S.: Identification of cross-domain ambiguity with language models. In: 5th International Workshop on Artificial Intelligence for Requirements Engineering (AIRE), pp. 31–38, August 2018Google Scholar
  6. 6.
    Gemkow, T., Conzelmann, M., Hartig, K., Vogelsang, A.: Automatic glossary term extraction from large-scale requirements specifications. In: 26th IEEE International Requirements Engineering Conference, pp. 412–417. IEEE Computer Society (2018)Google Scholar
  7. 7.
    Goldin, L., Berry, D.M.: AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom. Softw. Eng. 4(4), 375–412 (1997). Scholar
  8. 8.
    Hull, M.E.C., Jackson, K., Dick, J.: Requirements Engineering, 2nd edn. Springer, Heidelberg (2005). Scholar
  9. 9.
    Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)CrossRefGoogle Scholar
  10. 10.
    Kof, L.: Natural language processing for requirements engineering: applicability to large requirements documents. In: Workshop on Automated Software Engineering (2004)Google Scholar
  11. 11.
    Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS 2014, pp. 2177–2185 (2014)Google Scholar
  12. 12.
    Ménard, P.A., Ratté, S.: Concept extraction from business documents for software engineering projects. Autom. Softw. Eng. 23(4), 649–686 (2015). Scholar
  13. 13.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  14. 14.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. NIPS 2013, pp. 3111–3119 (2013)Google Scholar
  15. 15.
    Mishra, S., Sharma, A.: On the use of word embeddings for identifying domain specific ambiguities in requirements. In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), pp. 234–240 (2019)Google Scholar
  16. 16.
    Murukannaiah, P.K., Ajmeri, N., Singh, M.P.: Toward automating crowd RE. In: 25th IEEE International Requirements Engineering Conference (RE), pp. 512–515, September 2017Google Scholar
  17. 17.
    Murukannaiah, P.K., Ajmeri, N., Singh, M.P.: Acquiring creative requirements from the crowd: understanding the influences of individual personality and creative potential in crowd RE. In: 24th IEEE International Requirements Engineering Conference (RE), pp. 176–185, September 2016Google Scholar
  18. 18.
    Park, Y., Byrd, R.J., Boguraev, B.K.: Automatic glossary extraction: beyond terminology identification. In: COLING 2002: The 19th International Conference on Computational Linguistics (2002).
  19. 19.
    Pohl, K.: Requirements Engineering - Fundamentals, Principles, and Techniques, 1st edn. Springer, Heidelberg (2010)Google Scholar
  20. 20.
    Pohl, K., Böckle, G., van der Linden, F.J.: Software Product Line Engineering: Foundations. Principles and Techniques. Springer, Heidelberg (2005). Scholar
  21. 21.
    Popescu, D., Rugaber, S., Medvidovic, N., Berry, D.M.: Reducing ambiguities in requirements specifications via automatically created object-oriented models. In: Paech, B., Martell, C. (eds.) Monterey Workshop 2007. LNCS, vol. 5320, pp. 103–124. Springer, Heidelberg (2008). Scholar
  22. 22.
    Romero, F.P., Olivas, J.A., Genero, M., Piattini, M.: Automatic extraction of the main terminology used in empirical software engineering through text mining techniques. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. ESEM 2008, pp. 357–358 (2008)Google Scholar
  23. 23.
    Zou, X., Settimi, R., Cleland-Huang, J.: Improving automated requirements trace retrieval: a study of term-based enhancement methods. Empir. Softw. Eng. 15(2), 119–146 (2010). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Computer ScienceIndian Institute of Science Education and ResearchBhopalIndia

Personalised recommendations