Skip to main content
Log in

Relevance-based abstraction identification: technique and evaluation

  • Best Papers of Re'10: Requirements Engineering in a Multi-faceted World
  • Published:
Requirements Engineering Aims and scope Submit manuscript

Abstract

When first approaching an unfamiliar domain or requirements document, it is often useful to get a quick grasp of what the essential concepts and entities in the domain are. This process is called abstraction identification, where the word abstraction refers to an entity or concept that has a particular significance in the domain. Abstraction identification has been proposed and evaluated as a useful technique in requirements engineering (RE). In this paper, we propose a new technique for automated abstraction identification called relevance-based abstraction identification (RAI), and evaluate its performance—in multiple configurations and through two refinements—compared to other tools and techniques proposed in the literature, where we find that RAI significantly outperforms previous techniques. We present an experiment measuring the effectiveness of RAI compared to human judgement, and discuss how RAI could be used to good effect in requirements engineering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Available on request from the authors.

References

  1. Aguilera C, Berry DM (1990) The use of a repeated phrase finder in requirements extraction. J Syst Softw 13(3):209–230. doi:10.1016/0164-1212(90)90097-6

    Article  Google Scholar 

  2. Ananiadou S (1994) A methodology for automatic term recognition. In: Proceedings of the 15th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 1034–1038. doi:10.3115/991250.991317

  3. Berry-Rogghe G (1973) The computation of collocations and their relevance in lexical studies. Edinburgh University Press, Edinburgh

    Google Scholar 

  4. Bourigault D (1992) Surface grammatical analysis for the extraction of terminological noun phrases. In: Proceedings of the 14th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 977–981. doi:10.3115/993079.993111

  5. Cleland-Huang J, Berenbach B, Clark S, Settimi R, Romanova E (2007) Best practices for automated traceability. Computer 40(6):27–35. doi:10.1109/MC.2007.195

    Article  Google Scholar 

  6. Nattoch Dag J, Gervasi V, Brinkkemper S, Regnell B (2005) A linguistic-engineering approach to large-scale requirements management. IEEE Softw 22(1):32–39. doi:10.1109/MS.2005.1

    Article  Google Scholar 

  7. Dardenne A, van Lamsweerde A, Fickas S (1993) Goal-directed requirements acquisition. In: 6IWSSD: selected papers of the sixth international workshop on software specification and design. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, pp 3–50. doi:10.1016/0167-6423(93)90021-G

  8. Dumais ST, Furnas GW, Landauer TK, Deerwester S Harshman R (1988) Using latent semantic analysis to improve access to textual information. In: CHI ’88: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 281–285. doi:10.1145/57167.57214

  9. Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing—survey and recommendations. Commun ACM 4(5):226–234. doi:10.1145/366532.366545

    Article  Google Scholar 

  10. Erick CJ, Chung C (2008) RFID in logistics—a practical introduction. CRC Press, Taylor & Francis Group, USA

    Google Scholar 

  11. Francis WN, Kucera H (1982) Frequency analysis of english usage: Lexicon and grammer. Houghton Mifflin, Boston

    Google Scholar 

  12. Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multi-word terms:. the c-value/nc-value method. Int J Digit Libr 3:115–130

    Article  Google Scholar 

  13. Gacitua R, Sawyer P (2008) Ensemble methods for ontology learning - an empirical experiment to evaluate combinations of concept acquisition techniques. In: ICIS ’08: Proceedings of seventh IEEE/ACIS international conference on computer and information science. IEEE Computer Society, Washington, DC, pp 328–333. doi:10.1109/ICIS.2008.94

  14. Gacitua R, Sawyer P, Gervasi V (2010) On the effectiveness of abstraction identification in requirements engineering. IEEE Computer Society, Los Alamitos. pp 5–14. doi:10.1109/RE.2010.12

  15. Gacitua R, Sawyer P, Rayson P (2008) A flexible framework to experiment with ontology learning techniques. Know Based Syst 21(3):192–199. doi:10.1016/j.knosys.2007.11.009

    Article  Google Scholar 

  16. Gervasi V (2000) Environment support for requirements writing and analysis. Ph.D. thesis, University of Pisa

  17. Goldin L, Berry DM (1997) Abstfinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom Softw Eng 4(4):375–412. doi:10.1023/A:1008617922496

    Article  Google Scholar 

  18. Goldin L, Finkelstein A (2006) Abstraction-based requirements management. In: ROA ’06: Proceedings of the 2006 international workshop on role of abstraction in software engineering ACM, New York. pp 3–10. doi:10.1145/1137620.1137623

  19. Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32:4–19. doi:10.1109/TSE.2006.3

    Article  Google Scholar 

  20. Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19. doi:10.1109/TSE.2006.3

    Article  Google Scholar 

  21. Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on computational linguistics. association for computational linguistics, Morristown. pp 539–545. doi:10.3115/992133.992154

  22. Hwang YS, Finch A, Sasaki Y (2007) Improving statistical machine translation using shallow linguistic knowledge. Comput Speech Lang 21(2):350–372. doi:10.1016/j.csl.2006.06.007

    Article  Google Scholar 

  23. Jacobs P (1993) Using statistical methods to improve knowledge-based news categorization. IEEE Expert Int Syst Their Appl 8(2):13–23. doi:10.1109/64.207425

    MathSciNet  Google Scholar 

  24. Kageura K, Umino B (1996) Methods of automatic term recognition: a review. Terminology 3(2):259–289

    Article  Google Scholar 

  25. Kof L (2007) Text analysis for requirements engineering- application of computational linguistics. VDM Verlag, Saarbrücken, Germany

  26. Lecceuche R (2000) Finding comparatively important concepts between texts. In: ASE ’00: Proceedings of the 15th IEEE international conference on automated software engineering. IEEE computer society, Washington, DC, p 55

  27. Leech G, Paul R, Wilson A (2001) Word frequencies in written and spoken english: based on the British National Corpus. Longman, London

    Google Scholar 

  28. Lenat DB (1995) Cyc: a large-scale investment in knowledge infrastructure. Commun ACM 38(11):33–38. doi:10.1145/219717.219745

    Article  Google Scholar 

  29. Liu K (2000) Semiotics in information systems engineering. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  30. Maarek YS, Berry DM (1989) The use of lexical affinities in requirements extraction. SIGSOFT Softw Eng Notes 14(3):196–202. doi:10.1145/75200.75229

    Article  Google Scholar 

  31. Maedche A, Staab S (2000) Discovering conceptual relations from text. In: Proceedings of the 14th European conference on artificial intelligence, ECAI’2000. IOS Press, Amsterdam, pp 321–325. http://www.bibsonomy.org/bibtex/235b13d633e8193273c7db845a1881f90/danielt

  32. Maron ME (1961) Automatic indexing: an experimental inquiry. J ACM 8(3):404–417. doi:10.1145/321075.321084

    Article  MATH  Google Scholar 

  33. McKeown K, Radev D (2000) Collocations. In: Dale R, Moisl H, Somers H (eds) Handbook of natural language processing. Marcel Dekker, NY

    Google Scholar 

  34. Miller GA (1995) WordNet: a lexical database for english. Commun ACM 38(11):39–41. doi:10.1145/219717.219748

    Article  Google Scholar 

  35. Moens MF (2001) Automatic indexing and abstracting of document texts. Comput Linguist 27(1):149–149. doi:10.1162/coli.2000.27.1.149a

    Article  Google Scholar 

  36. Oakes MP (1998) Statistics for corpus linguistics. Edinburgh University Press, Edinburgh

    Google Scholar 

  37. ONIX: Onix text retrieval toolkit (2000) Available from: http://www.lextek.com/manuals/onix/stopwords1.html

  38. Porter MF (1997) An algorithm for suffix stripping. pp 313–316

  39. Rayson P, Emmet L, Garside R, Sawyer P (2001) The REVERE project: experiments with the application of probabilistic nlp to systems engineering. In: NLDB ’00: Proceedings of 5th international conference on applications of natural language to information systems. Springer, London, pp 288–300

  40. Rayson P, Garside R (2000) Comparing corpora using frequency profiling. In: CompareCorpora ’00: Proceedings of the workshop on comparing corpora. Association for Computational Linguistics, Morristown, pp 1–6

  41. Ryu PM (2004) Determining the specificity of terms using compositional and contextual information. In: Proceedings of the ACL 2004 workshop on student research. Association for computational linguistics, Morristown. p 1 doi:10.3115/1219079.1219080

  42. Sawyer P, Rayson P, Cosh K (2005) Shallow knowledge as an aid to deep understanding in early phase requirements engineering. Softw Eng IEEE Trans 31(11):969–981. doi:10.1109/TSE.2005.129

    Article  Google Scholar 

  43. Stone A, Sawyer P (2006) Identifying tacit knowledge-based requirements. Softw IEE Proc 153(6):211–218. doi:10.1049/ip-sen:20060034

  44. Šnajder J, Bašić BD, Tadić M (2008) Automatic acquisition of inflectional lexica for morphological normalisation. Inf Process Manage 44(5):1720–1731. doi:10.1016/j.ipm.2008.03.006

    Article  Google Scholar 

  45. Wermter J, Hahn U. (2005) Finding new terminology in very large corpora. In: K-CAP ’05: Proceedings of the 3rd international conference on knowledge capture. ACM, New York. pp 137–144. doi:10.1145/1088622.1088648

  46. Wermter J, Hahn U (2006) You can’t beat frequency (unless you use linguistic knowledge): a qualitative evaluation of association measures for collocation and term extraction. In: ACL-44: Proceedings of 21st international conference on computational linguistics. Association for computational linguistics, Morristown, pp 785–792. doi:10.3115/1220175.1220274

Download references

Acknowledgments

This work was funded by EPSRC grant EP/F069227/1 MaTREx.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Gacitua.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gacitua, R., Sawyer, P. & Gervasi, V. Relevance-based abstraction identification: technique and evaluation. Requirements Eng 16, 251–265 (2011). https://doi.org/10.1007/s00766-011-0122-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00766-011-0122-3

Keywords

Navigation