Requirements Engineering

, Volume 21, Issue 3, pp 357–381 | Cite as

Detecting, classifying, and tracing non-functional software requirements

  • Anas Mahmoud
  • Grant Williams
RE 2015


In this paper, we describe a novel unsupervised approach for detecting, classifying, and tracing non-functional software requirements (NFRs). The proposed approach exploits the textual semantics of software functional requirements (FRs) to infer potential quality constraints enforced in the system. In particular, we conduct a systematic analysis of a series of word similarity methods and clustering techniques to generate semantically cohesive clusters of FR words. These clusters are classified into various categories of NFRs based on their semantic similarity to basic NFR labels. Discovered NFRs are then traced to their implementation in the solution space based on their textual semantic similarity to source code artifacts. Three software systems are used to conduct the experimental analysis in this paper. The results show that methods that exploit massive sources of textual human knowledge are more accurate in capturing and modeling the notion of similarity between FR words in a software system. Results also show that hierarchical clustering algorithms are more capable of generating thematic word clusters than partitioning clustering techniques. In terms of performance, our analysis indicates that the proposed approach can discover, classify, and trace NFRs with accuracy levels that can be adequate for practical applications.


Classification Non-functional requirements Information retrieval Semantics 



The authors would like to thank our study participants and the Institutional Review Board (IRB) at LSU for approving this research. This work was supported in part by the Louisiana Board of Regents Research Competitiveness Subprogram (LA BoR-RCS), contract number: LEQSF(2015-18)-RD-A-07.


  1. 1.
    Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: International conference on program comprehension, pp 103–112Google Scholar
  2. 2.
    Aggarwal C, Zhai C (2012) A survey of text clustering algorithms. Mining text data. Springer, Newyork, pp 77–128Google Scholar
  3. 3.
    Anquetil N, Fourrier C, Lethbridge T (1999) Experiments with clustering as a software remodularization method. In: Working conference on reverse engineering, pp 235–255Google Scholar
  4. 4.
    Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Conference of the centre for advanced studies on collaborative research, pp 4–14Google Scholar
  5. 5.
    Antoniol1 G, Guéhéneuc Y, Merlo E, Tonella P (2007) Mining the lexicon used by programmers during software evolution. In: International conference on software maintenance, pp 14–23Google Scholar
  6. 6.
    Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035Google Scholar
  7. 7.
    Bekkerman R, El-Yaniv R, Tishby N, Winter Y (2003) Distributional word clusters vs. words for text categorization. J Mach Learn Res 3:1183–1208zbMATHGoogle Scholar
  8. 8.
    Bollegala D, Matsuo Y, Ishizuka M (2007) Measuring semantic similarity between words using web search engines. In: International conference on world wide web, pp 757–766Google Scholar
  9. 9.
    Budiu R, Royer C, Pirolli P (2007) Modeling information scent: a comparison of LSA, PMI and GLSA similarity measures on common tests and corpora. In: Large scale semantic access to content (text, image, video, and sound), pp 314–332Google Scholar
  10. 10.
    Bullinaria J, Levy J (2007) Extracting semantic representations from word co-occurrence statistics: a computational study. Behav Res Methods 39(3):510–526CrossRefGoogle Scholar
  11. 11.
    van Rijsbergen CJ (1979) Information retrieval. Butterworths, New YorkzbMATHGoogle Scholar
  12. 12.
    Carreńo G, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: International conference on software engineering, pp 343–348Google Scholar
  13. 13.
    Casamayor A, Godoy D, Campo M (2010) Identification of non-functional requirements in textual specifications: a semi-supervised learning approach. Inf Softw Technol 52(4):436–445CrossRefGoogle Scholar
  14. 14.
    Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei D (2009) Reading tea leaves: how humans interpret topic models. Curran Associates, County Down, pp 288–296Google Scholar
  15. 15.
    Chen J, Ren Y, Riedl J (2010) The effects of diversity on group productivity and member withdrawal in online volunteer groups. In: SIGCHI conference on human factors in computing systems, pp 821–830Google Scholar
  16. 16.
    Chung L, do Prado Leite J (2009) On non-functional requirements in software engineering. Concept Model Found Appl Lecture Notes Comput Sci 5600:363–379CrossRefGoogle Scholar
  17. 17.
    Chung L, Nixon B, Yu E, Mylopoulos J (2000) Non-functional requirements in software engineering. Kluwer Academic, BostonCrossRefzbMATHGoogle Scholar
  18. 18.
    Church K, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Ling 16(1):22–29Google Scholar
  19. 19.
    Cilibrasi R, Vitanyi P (2007) The Google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383CrossRefGoogle Scholar
  20. 20.
    Cleland-Huang J, Chang C, Christensen M (2003) Event-based traceability for managing evolutionary change. IEEE Trans Softw Eng 29(9):796–810CrossRefGoogle Scholar
  21. 21.
    Cleland-Huang J, Heimdahl M, Huffman-Hayes J, Lutz R, Mäder P (2012) Trace queries for safety requirements in high assurance systems. In: International conference on requirements engineering: foundation for software quality, pp 179–193Google Scholar
  22. 22.
    Cleland-Huang J, Schmelzer D (2003) Dynamically tracing non-functional requirements through design pattern invariants. In: Workshop on traceability in emerging forms of software tracing non-functional requirementsGoogle Scholar
  23. 23.
    Cleland-Huang J, Settimi R, BenKhadra O, Berezhanskaya E, Christina S (2005) Goal-centric traceability for managing non-functional requirements. In: International conference on software engineering, pp 362–371Google Scholar
  24. 24.
    Cleland-Huang J, Settimi R, Zou X, Solc P (2007) Automated classification of non-functional requirements. Requir Eng 12(2):103–120CrossRefGoogle Scholar
  25. 25.
    Cysneiros LM (2007) Evaluating the effectiveness of using catalogues to elicit nonfunctional requirements. In: Workshop em Engenharia de Requisitos, pp 107–115Google Scholar
  26. 26.
    De Lucia A, Oliveto R, Sgueglia P (2006) Incremental approach and user feedbacks: a silver bullet for traceability recovery. In: International conference on software maintenance, pp 299–309Google Scholar
  27. 27.
    De Lucia A, Oliveto R, Tortora G (2009) Assessing IR-based traceability recovery tools through controlled experiments. Empir Softw Eng 14(1):57–92CrossRefGoogle Scholar
  28. 28.
    Dean A, Voss D (1999) Design and analysis of experiments. Springer, New YorkCrossRefzbMATHGoogle Scholar
  29. 29.
    Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRefGoogle Scholar
  30. 30.
    Deißenböck F, Pizka M (2005) Concise and consistent naming. In: International workshop on program comprehension, pp 97–106Google Scholar
  31. 31.
    Demmel J, Kahan W (1990) Accurate singular values of bidiagonal matrices. J Sci Stat Comput 11(5):873–912MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, CambridgezbMATHGoogle Scholar
  33. 33.
    Funahashi T, Yamana H (2010) Reliability verification of search engines’ hit counts: How to select a reliable hit count for a query. In: International conference on current trends in web engineering, pp 114–125Google Scholar
  34. 34.
    Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: International joint conference on artificial intelligence, pp 1606–1611Google Scholar
  35. 35.
    Glinz M (2007) On non-functional requirements. In: IEEE international requirements engineering conference, pp 21–26Google Scholar
  36. 36.
    Goldin L, Berry D (1997) AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom Softw Eng 4(4):375–412CrossRefGoogle Scholar
  37. 37.
    Gotel O, Cleland-Huang J, Huffman-Hayes J, Zisman A, Egyed A, Grnbacher P, Dekhtyar A, Antoniol G, Maletic J (2012) The grand challenge of traceability (v1.0). In: Software and systems traceability. Springer, LondonGoogle Scholar
  38. 38.
    Gracia J, Trillo R, Espinoza M, Mena E (2006) Querying the web: a multiontology disambiguation method. In: International conference on web engineering, pp 241–248Google Scholar
  39. 39.
    Gross D, Yu E (2000) From non-functional requirements to design through patterns. Requir Eng 6(1):18–36CrossRefzbMATHGoogle Scholar
  40. 40.
    Guo W, Li H, Ji H, Diab M (2013) Linking tweets to news: a framework to enrich short text data in social media. In: Annual meeting of the association for computational linguistics, pp 239–249Google Scholar
  41. 41.
    Hearst M, Pedersen J (1996) Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: International ACM SIGIR conference on Research and development in information retrieval, pp 76–84Google Scholar
  42. 42.
    Hill E, Binkley D, Lawrie D, Pollock L, Vijay-Shanker K (2014) An empirical study of identifier splitting techniques. Empir Softw Eng 19(6):1754–1780CrossRefGoogle Scholar
  43. 43.
    Hill E, Fry Z, Boyd H, Sridhara G, Novikova Y, Pollock L, Vijay-Shanker K (2008) Amap: Automatically mining abbreviation expansions in programs to enhance software maintenance tools. In: International working conference on mining software repositories, pp 79–88Google Scholar
  44. 44.
    Holzinger A, Yildirim P, Geier M, Simonic KM (2013) Quality-based knowledge discovery from medical text on the web. In: Pasi G, Bordogna G, Jain L (eds) Quality issues in the management of web information. Springer, Berlin, pp 145–158CrossRefGoogle Scholar
  45. 45.
    Huffman-Hayes J, Dekhtyar A, Sundaram S (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19CrossRefGoogle Scholar
  46. 46.
    Kassab M, Ormandjieva O, Daneva M (2009) A metamodel for tracing non-functional requirements. In: World congress on computer science and information engineering, pp 687–694Google Scholar
  47. 47.
    Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkCrossRefGoogle Scholar
  48. 48.
    Kotonya G, Sommerville I (1998) Requirements engineering: processes and techniques. Wiley, New YorkGoogle Scholar
  49. 49.
    Kuhn A, Ducasse S, Gîrba T (2007) Semantic clustering: identifying topics in source code. Inf Softw Technol 49(3):230–243CrossRefGoogle Scholar
  50. 50.
    Landauer T, Dutnais S (1997) A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240CrossRefGoogle Scholar
  51. 51.
    Lau J, Newman D, Karimi S, Baldwin T (2010) Best topic word selection for topic labelling. In: International conference on computational linguistics, pp 605–613Google Scholar
  52. 52.
    Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. MIT Press, CambridgeGoogle Scholar
  53. 53.
    Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Annual international conference on systems documentation, pp 24–26Google Scholar
  54. 54.
    Lo D, Nagappan N, Zimmermann T (2015) How practitioners perceive the relevance of software engineering research. In: Joint meeting on foundations of software engineering, pp 415–425Google Scholar
  55. 55.
    Lohar S, Amornborvornwong S, Zisman A, Cleland-Huang J (2013) Improving trace accuracy through data-driven configuration and composition of tracing features. In: Joint meeting on foundations of software engineering, pp 378–388Google Scholar
  56. 56.
    Luisa M, Mariangela F, NoviInverardi P (2004) Market research for requirements analysis using linguistic tools. Requir Eng 9(1):40–56CrossRefGoogle Scholar
  57. 57.
    Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Methods Instrum Comput 28(2):203–208CrossRefGoogle Scholar
  58. 58.
    Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Requirements engineering conference, pp 116–125Google Scholar
  59. 59.
    Mahmoud A (2015) An information theoretic approach for extracting and tracing non-functional requirements. In: International requirements engineering conferenceGoogle Scholar
  60. 60.
    Mahmoud A, Niu N (2015) On the role of semantics in automated requirements tracing. Requir Eng 20(3):281–300CrossRefGoogle Scholar
  61. 61.
    Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: National conference on artificial intelligence, pp 775–780Google Scholar
  62. 62.
    Mimno D, Wallach H, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: The conference on empirical methods in natural language processing, pp 262–272Google Scholar
  63. 63.
    Mirakhorli M, Cleland-Huang J (2012) Tracing non-functional requirements. In: Zisman A, Cleland-Huang J, Gotel O (eds) Software and systems traceability. Springer, New York, pp 299–320Google Scholar
  64. 64.
    Mylopoulos J, Chung L, Nixon B (1992) Representing and using nonfunctional requirements: a process-oriented approach. IEEE Trans Softw Eng 18(6):483–497CrossRefGoogle Scholar
  65. 65.
    Newman D, Han Lau J, Grieser K, Baldwin T (2010) Automatic evaluation of topic coherence. In: Annual conference of the North American chapter of the association for computational linguistics, pp 100–108Google Scholar
  66. 66.
    Newman D, Noh Y, Talley E, Karimi S, Baldwin T (2010) Evaluating topic models for digital libraries. In: Annual joint conference on digital libraries, pp 215–224Google Scholar
  67. 67.
    Niu N, Mahmoud A (2012) Enhancing candidate link generation for requirements tracing: the cluster hypothesis revisited. In: IEEE international requirements engineering conference, pp 81–90Google Scholar
  68. 68.
    Nuseibeh B (2001) Weaving together requirements and architectures. Computer 34(3):115–119CrossRefGoogle Scholar
  69. 69.
    Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010) On the equivalence of information retrieval methods for automated traceability link recovery. In: International conference on program comprehension, pp 68–71Google Scholar
  70. 70.
    Peraldi Frati MA, Albinet A (2010) Requirement traceability in safety critical systems. In: Workshop on critical automotive applications: robustness and safety, pp 11–14Google Scholar
  71. 71.
    Pollock L (2012) Leveraging natural language analysis of software: achievements, challenges, and opportunities. In: IEEE international conference on software maintenance, pp 4–4Google Scholar
  72. 72.
    Pollock L, Vijay-Shanker K, Hill E, Sridhara G, Shepherd D (2013) Natural language-based software analyses and tools for software maintenance, Lecture notes in computer science, vol 7171. Springer, Berlin, pp 94–125Google Scholar
  73. 73.
    Porter F (1997) An algorithm for suffix stripping. Morgan Kaufmann Publishers Inc, BurlingtonGoogle Scholar
  74. 74.
    Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: International joint conference on artificial intelligence, pp 448–453Google Scholar
  75. 75.
    Rosario B (2000) Latent semantic indexing: an overview. INFOSYS 240 Spring Paper, University of California, BerkeleyGoogle Scholar
  76. 76.
    Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620CrossRefzbMATHGoogle Scholar
  77. 77.
    Sawyer P, Rayson P, Cosh K (2005) Shallow knowledge as an aid to deep understanding in early phase requirements engineering. IEEE Trans Softw Eng 31(11):969–981CrossRefGoogle Scholar
  78. 78.
    Slankas J, Williams L (2013) Automated extraction of non-functional requirements in available documentation. In: International workshop on natural language analysis in software engineering (NaturaLiSE), pp 9–16Google Scholar
  79. 79.
    Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: International ACM SIGIR conference on research and development in information retrieval, pp 208–215Google Scholar
  80. 80.
    Sousa D, Sarmento L, Rodrigues EM (2010) Characterization of the twitter replies network: are user ties social or topical? In: International workshop on search and mining user-generated contents, pp 63–70Google Scholar
  81. 81.
    Sridhara G, Hill E, Pollock L, Vijay-Shanker K (2008) Identifying word relations in software: A comparative study of semantic similarity tools. In: IEEE international conference on program comprehension, pp 123–132Google Scholar
  82. 82.
    Strube M, Ponzetto S (2006) Wikirelate! computing semantic relatedness using Wikipedia. In: National conference on artificial intelligence, pp 1419–1424Google Scholar
  83. 83.
    Thelwall M (2008) Extracting accurate and complete results from search engines: case study windows live. J Am Soc Inform Sci Technol 59(1):38–50CrossRefGoogle Scholar
  84. 84.
    Turney P (2001) Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: European conference on machine learning, pp 491–502Google Scholar
  85. 85.
    Woon WL, Madnick S (2009) Asymmetric information distances for automated taxonomy construction. Knowl Inf Syst 21(1):91–111CrossRefGoogle Scholar
  86. 86.
    Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Annual meeting on association for computational linguistics, pp 133–138Google Scholar
  87. 87.
    Xiang Z, Wöber K, Fesenmaier D (2008) Representation of the online tourism domain in search engines. J Travel Res 47(2):137–150CrossRefGoogle Scholar
  88. 88.
    Zhang W, Yang Y, Wang Q, Shu F (2011) An empirical study on classification of non-functional requirements. In: International conference on software engineering and knowledge engineering, pp 190–195Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  1. 1.Division of Computer Science and EngineeringLouisiana State UniversityBaton RougeUSA

Personalised recommendations