Requirements Engineering

, 16:251 | Cite as

Relevance-based abstraction identification: technique and evaluation

Best Papers of Re'10: Requirements Engineering in a Multi-faceted World


When first approaching an unfamiliar domain or requirements document, it is often useful to get a quick grasp of what the essential concepts and entities in the domain are. This process is called abstraction identification, where the word abstraction refers to an entity or concept that has a particular significance in the domain. Abstraction identification has been proposed and evaluated as a useful technique in requirements engineering (RE). In this paper, we propose a new technique for automated abstraction identification called relevance-based abstraction identification (RAI), and evaluate its performance—in multiple configurations and through two refinements—compared to other tools and techniques proposed in the literature, where we find that RAI significantly outperforms previous techniques. We present an experiment measuring the effectiveness of RAI compared to human judgement, and discuss how RAI could be used to good effect in requirements engineering.


Abstractions Natural language Requirements elicitation Evaluation of tool 



This work was funded by EPSRC grant EP/F069227/1 MaTREx.


  1. 1.
    Aguilera C, Berry DM (1990) The use of a repeated phrase finder in requirements extraction. J Syst Softw 13(3):209–230. doi: 10.1016/0164-1212(90)90097-6 CrossRefGoogle Scholar
  2. 2.
    Ananiadou S (1994) A methodology for automatic term recognition. In: Proceedings of the 15th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 1034–1038. doi: 10.3115/991250.991317
  3. 3.
    Berry-Rogghe G (1973) The computation of collocations and their relevance in lexical studies. Edinburgh University Press, EdinburghGoogle Scholar
  4. 4.
    Bourigault D (1992) Surface grammatical analysis for the extraction of terminological noun phrases. In: Proceedings of the 14th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 977–981. doi: 10.3115/993079.993111
  5. 5.
    Cleland-Huang J, Berenbach B, Clark S, Settimi R, Romanova E (2007) Best practices for automated traceability. Computer 40(6):27–35. doi: 10.1109/MC.2007.195 CrossRefGoogle Scholar
  6. 6.
    Nattoch Dag J, Gervasi V, Brinkkemper S, Regnell B (2005) A linguistic-engineering approach to large-scale requirements management. IEEE Softw 22(1):32–39. doi: 10.1109/MS.2005.1 CrossRefGoogle Scholar
  7. 7.
    Dardenne A, van Lamsweerde A, Fickas S (1993) Goal-directed requirements acquisition. In: 6IWSSD: selected papers of the sixth international workshop on software specification and design. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, pp 3–50. doi: 10.1016/0167-6423(93)90021-G
  8. 8.
    Dumais ST, Furnas GW, Landauer TK, Deerwester S Harshman R (1988) Using latent semantic analysis to improve access to textual information. In: CHI ’88: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 281–285. doi: 10.1145/57167.57214
  9. 9.
    Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing—survey and recommendations. Commun ACM 4(5):226–234. doi: 10.1145/366532.366545 CrossRefGoogle Scholar
  10. 10.
    Erick CJ, Chung C (2008) RFID in logistics—a practical introduction. CRC Press, Taylor & Francis Group, USAGoogle Scholar
  11. 11.
    Francis WN, Kucera H (1982) Frequency analysis of english usage: Lexicon and grammer. Houghton Mifflin, BostonGoogle Scholar
  12. 12.
    Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multi-word terms:. the c-value/nc-value method. Int J Digit Libr 3:115–130CrossRefGoogle Scholar
  13. 13.
    Gacitua R, Sawyer P (2008) Ensemble methods for ontology learning - an empirical experiment to evaluate combinations of concept acquisition techniques. In: ICIS ’08: Proceedings of seventh IEEE/ACIS international conference on computer and information science. IEEE Computer Society, Washington, DC, pp 328–333. doi: 10.1109/ICIS.2008.94
  14. 14.
    Gacitua R, Sawyer P, Gervasi V (2010) On the effectiveness of abstraction identification in requirements engineering. IEEE Computer Society, Los Alamitos. pp 5–14. doi: 10.1109/RE.2010.12
  15. 15.
    Gacitua R, Sawyer P, Rayson P (2008) A flexible framework to experiment with ontology learning techniques. Know Based Syst 21(3):192–199. doi: 10.1016/j.knosys.2007.11.009 CrossRefGoogle Scholar
  16. 16.
    Gervasi V (2000) Environment support for requirements writing and analysis. Ph.D. thesis, University of PisaGoogle Scholar
  17. 17.
    Goldin L, Berry DM (1997) Abstfinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom Softw Eng 4(4):375–412. doi: 10.1023/A:1008617922496 CrossRefGoogle Scholar
  18. 18.
    Goldin L, Finkelstein A (2006) Abstraction-based requirements management. In: ROA ’06: Proceedings of the 2006 international workshop on role of abstraction in software engineering ACM, New York. pp 3–10. doi: 10.1145/1137620.1137623
  19. 19.
    Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32:4–19. doi: 10.1109/TSE.2006.3 CrossRefGoogle Scholar
  20. 20.
    Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19. doi: 10.1109/TSE.2006.3 CrossRefGoogle Scholar
  21. 21.
    Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on computational linguistics. association for computational linguistics, Morristown. pp 539–545. doi: 10.3115/992133.992154
  22. 22.
    Hwang YS, Finch A, Sasaki Y (2007) Improving statistical machine translation using shallow linguistic knowledge. Comput Speech Lang 21(2):350–372. doi: 10.1016/j.csl.2006.06.007 CrossRefGoogle Scholar
  23. 23.
    Jacobs P (1993) Using statistical methods to improve knowledge-based news categorization. IEEE Expert Int Syst Their Appl 8(2):13–23. doi: 10.1109/64.207425 MathSciNetGoogle Scholar
  24. 24.
    Kageura K, Umino B (1996) Methods of automatic term recognition: a review. Terminology 3(2):259–289CrossRefGoogle Scholar
  25. 25.
    Kof L (2007) Text analysis for requirements engineering- application of computational linguistics. VDM Verlag, Saarbrücken, GermanyGoogle Scholar
  26. 26.
    Lecceuche R (2000) Finding comparatively important concepts between texts. In: ASE ’00: Proceedings of the 15th IEEE international conference on automated software engineering. IEEE computer society, Washington, DC, p 55Google Scholar
  27. 27.
    Leech G, Paul R, Wilson A (2001) Word frequencies in written and spoken english: based on the British National Corpus. Longman, LondonGoogle Scholar
  28. 28.
    Lenat DB (1995) Cyc: a large-scale investment in knowledge infrastructure. Commun ACM 38(11):33–38. doi: 10.1145/219717.219745 CrossRefGoogle Scholar
  29. 29.
    Liu K (2000) Semiotics in information systems engineering. Cambridge University Press, New YorkMATHCrossRefGoogle Scholar
  30. 30.
    Maarek YS, Berry DM (1989) The use of lexical affinities in requirements extraction. SIGSOFT Softw Eng Notes 14(3):196–202. doi: 10.1145/75200.75229 CrossRefGoogle Scholar
  31. 31.
    Maedche A, Staab S (2000) Discovering conceptual relations from text. In: Proceedings of the 14th European conference on artificial intelligence, ECAI’2000. IOS Press, Amsterdam, pp 321–325.
  32. 32.
    Maron ME (1961) Automatic indexing: an experimental inquiry. J ACM 8(3):404–417. doi: 10.1145/321075.321084 MATHCrossRefGoogle Scholar
  33. 33.
    McKeown K, Radev D (2000) Collocations. In: Dale R, Moisl H, Somers H (eds) Handbook of natural language processing. Marcel Dekker, NYGoogle Scholar
  34. 34.
    Miller GA (1995) WordNet: a lexical database for english. Commun ACM 38(11):39–41. doi: 10.1145/219717.219748 CrossRefGoogle Scholar
  35. 35.
    Moens MF (2001) Automatic indexing and abstracting of document texts. Comput Linguist 27(1):149–149. doi: 10.1162/coli.2000.27.1.149a CrossRefGoogle Scholar
  36. 36.
    Oakes MP (1998) Statistics for corpus linguistics. Edinburgh University Press, EdinburghGoogle Scholar
  37. 37.
    ONIX: Onix text retrieval toolkit (2000) Available from:
  38. 38.
    Porter MF (1997) An algorithm for suffix stripping. pp 313–316Google Scholar
  39. 39.
    Rayson P, Emmet L, Garside R, Sawyer P (2001) The REVERE project: experiments with the application of probabilistic nlp to systems engineering. In: NLDB ’00: Proceedings of 5th international conference on applications of natural language to information systems. Springer, London, pp 288–300Google Scholar
  40. 40.
    Rayson P, Garside R (2000) Comparing corpora using frequency profiling. In: CompareCorpora ’00: Proceedings of the workshop on comparing corpora. Association for Computational Linguistics, Morristown, pp 1–6Google Scholar
  41. 41.
    Ryu PM (2004) Determining the specificity of terms using compositional and contextual information. In: Proceedings of the ACL 2004 workshop on student research. Association for computational linguistics, Morristown. p 1 doi: 10.3115/1219079.1219080
  42. 42.
    Sawyer P, Rayson P, Cosh K (2005) Shallow knowledge as an aid to deep understanding in early phase requirements engineering. Softw Eng IEEE Trans 31(11):969–981. doi: 10.1109/TSE.2005.129 CrossRefGoogle Scholar
  43. 43.
    Stone A, Sawyer P (2006) Identifying tacit knowledge-based requirements. Softw IEE Proc 153(6):211–218. doi: 10.1049/ip-sen:20060034
  44. 44.
    Šnajder J, Bašić BD, Tadić M (2008) Automatic acquisition of inflectional lexica for morphological normalisation. Inf Process Manage 44(5):1720–1731. doi: 10.1016/j.ipm.2008.03.006 CrossRefGoogle Scholar
  45. 45.
    Wermter J, Hahn U. (2005) Finding new terminology in very large corpora. In: K-CAP ’05: Proceedings of the 3rd international conference on knowledge capture. ACM, New York. pp 137–144. doi: 10.1145/1088622.1088648
  46. 46.
    Wermter J, Hahn U (2006) You can’t beat frequency (unless you use linguistic knowledge): a qualitative evaluation of association measures for collocation and term extraction. In: ACL-44: Proceedings of 21st international conference on computational linguistics. Association for computational linguistics, Morristown, pp 785–792. doi: 10.3115/1220175.1220274

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Ricardo Gacitua
    • 1
  • Pete Sawyer
    • 1
  • Vincenzo Gervasi
    • 2
  1. 1.School of Computing and CommunicationsLancaster UniversityLancasterUK
  2. 2.Dipartimento di InformaticaUniversità di PisaPisaItaly

Personalised recommendations