Relevance-based abstraction identification: technique and evaluation

Gacitua, Ricardo; Sawyer, Pete; Gervasi, Vincenzo

doi:10.1007/s00766-011-0122-3

Relevance-based abstraction identification: technique and evaluation

Best Papers of Re'10: Requirements Engineering in a Multi-faceted World
Published: 11 June 2011

Volume 16, pages 251–265, (2011)
Cite this article

Requirements Engineering Aims and scope Submit manuscript

Ricardo Gacitua¹,
Pete Sawyer¹ &
Vincenzo Gervasi²

1515 Accesses
17 Citations
Explore all metrics

Abstract

When first approaching an unfamiliar domain or requirements document, it is often useful to get a quick grasp of what the essential concepts and entities in the domain are. This process is called abstraction identification, where the word abstraction refers to an entity or concept that has a particular significance in the domain. Abstraction identification has been proposed and evaluated as a useful technique in requirements engineering (RE). In this paper, we propose a new technique for automated abstraction identification called relevance-based abstraction identification (RAI), and evaluate its performance—in multiple configurations and through two refinements—compared to other tools and techniques proposed in the literature, where we find that RAI significantly outperforms previous techniques. We present an experiment measuring the effectiveness of RAI compared to human judgement, and discuss how RAI could be used to good effect in requirements engineering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Available on request from the authors.

References

Aguilera C, Berry DM (1990) The use of a repeated phrase finder in requirements extraction. J Syst Softw 13(3):209–230. doi:10.1016/0164-1212(90)90097-6
Article Google Scholar
Ananiadou S (1994) A methodology for automatic term recognition. In: Proceedings of the 15th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 1034–1038. doi:10.3115/991250.991317
Berry-Rogghe G (1973) The computation of collocations and their relevance in lexical studies. Edinburgh University Press, Edinburgh
Google Scholar
Bourigault D (1992) Surface grammatical analysis for the extraction of terminological noun phrases. In: Proceedings of the 14th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 977–981. doi:10.3115/993079.993111
Cleland-Huang J, Berenbach B, Clark S, Settimi R, Romanova E (2007) Best practices for automated traceability. Computer 40(6):27–35. doi:10.1109/MC.2007.195
Article Google Scholar
Nattoch Dag J, Gervasi V, Brinkkemper S, Regnell B (2005) A linguistic-engineering approach to large-scale requirements management. IEEE Softw 22(1):32–39. doi:10.1109/MS.2005.1
Article Google Scholar
Dardenne A, van Lamsweerde A, Fickas S (1993) Goal-directed requirements acquisition. In: 6IWSSD: selected papers of the sixth international workshop on software specification and design. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, pp 3–50. doi:10.1016/0167-6423(93)90021-G
Dumais ST, Furnas GW, Landauer TK, Deerwester S Harshman R (1988) Using latent semantic analysis to improve access to textual information. In: CHI ’88: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 281–285. doi:10.1145/57167.57214
Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing—survey and recommendations. Commun ACM 4(5):226–234. doi:10.1145/366532.366545
Article Google Scholar
Erick CJ, Chung C (2008) RFID in logistics—a practical introduction. CRC Press, Taylor & Francis Group, USA
Google Scholar
Francis WN, Kucera H (1982) Frequency analysis of english usage: Lexicon and grammer. Houghton Mifflin, Boston
Google Scholar
Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multi-word terms:. the c-value/nc-value method. Int J Digit Libr 3:115–130
Article Google Scholar
Gacitua R, Sawyer P (2008) Ensemble methods for ontology learning - an empirical experiment to evaluate combinations of concept acquisition techniques. In: ICIS ’08: Proceedings of seventh IEEE/ACIS international conference on computer and information science. IEEE Computer Society, Washington, DC, pp 328–333. doi:10.1109/ICIS.2008.94
Gacitua R, Sawyer P, Gervasi V (2010) On the effectiveness of abstraction identification in requirements engineering. IEEE Computer Society, Los Alamitos. pp 5–14. doi:10.1109/RE.2010.12
Gacitua R, Sawyer P, Rayson P (2008) A flexible framework to experiment with ontology learning techniques. Know Based Syst 21(3):192–199. doi:10.1016/j.knosys.2007.11.009
Article Google Scholar
Gervasi V (2000) Environment support for requirements writing and analysis. Ph.D. thesis, University of Pisa
Goldin L, Berry DM (1997) Abstfinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom Softw Eng 4(4):375–412. doi:10.1023/A:1008617922496
Article Google Scholar
Goldin L, Finkelstein A (2006) Abstraction-based requirements management. In: ROA ’06: Proceedings of the 2006 international workshop on role of abstraction in software engineering ACM, New York. pp 3–10. doi:10.1145/1137620.1137623
Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32:4–19. doi:10.1109/TSE.2006.3
Article Google Scholar
Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19. doi:10.1109/TSE.2006.3
Article Google Scholar
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on computational linguistics. association for computational linguistics, Morristown. pp 539–545. doi:10.3115/992133.992154
Hwang YS, Finch A, Sasaki Y (2007) Improving statistical machine translation using shallow linguistic knowledge. Comput Speech Lang 21(2):350–372. doi:10.1016/j.csl.2006.06.007
Article Google Scholar
Jacobs P (1993) Using statistical methods to improve knowledge-based news categorization. IEEE Expert Int Syst Their Appl 8(2):13–23. doi:10.1109/64.207425
MathSciNet Google Scholar
Kageura K, Umino B (1996) Methods of automatic term recognition: a review. Terminology 3(2):259–289
Article Google Scholar
Kof L (2007) Text analysis for requirements engineering- application of computational linguistics. VDM Verlag, Saarbrücken, Germany
Lecceuche R (2000) Finding comparatively important concepts between texts. In: ASE ’00: Proceedings of the 15th IEEE international conference on automated software engineering. IEEE computer society, Washington, DC, p 55
Leech G, Paul R, Wilson A (2001) Word frequencies in written and spoken english: based on the British National Corpus. Longman, London
Google Scholar
Lenat DB (1995) Cyc: a large-scale investment in knowledge infrastructure. Commun ACM 38(11):33–38. doi:10.1145/219717.219745
Article Google Scholar
Liu K (2000) Semiotics in information systems engineering. Cambridge University Press, New York
Book MATH Google Scholar
Maarek YS, Berry DM (1989) The use of lexical affinities in requirements extraction. SIGSOFT Softw Eng Notes 14(3):196–202. doi:10.1145/75200.75229
Article Google Scholar
Maedche A, Staab S (2000) Discovering conceptual relations from text. In: Proceedings of the 14th European conference on artificial intelligence, ECAI’2000. IOS Press, Amsterdam, pp 321–325. http://www.bibsonomy.org/bibtex/235b13d633e8193273c7db845a1881f90/danielt
Maron ME (1961) Automatic indexing: an experimental inquiry. J ACM 8(3):404–417. doi:10.1145/321075.321084
Article MATH Google Scholar
McKeown K, Radev D (2000) Collocations. In: Dale R, Moisl H, Somers H (eds) Handbook of natural language processing. Marcel Dekker, NY
Google Scholar
Miller GA (1995) WordNet: a lexical database for english. Commun ACM 38(11):39–41. doi:10.1145/219717.219748
Article Google Scholar
Moens MF (2001) Automatic indexing and abstracting of document texts. Comput Linguist 27(1):149–149. doi:10.1162/coli.2000.27.1.149a
Article Google Scholar
Oakes MP (1998) Statistics for corpus linguistics. Edinburgh University Press, Edinburgh
Google Scholar
ONIX: Onix text retrieval toolkit (2000) Available from: http://www.lextek.com/manuals/onix/stopwords1.html
Porter MF (1997) An algorithm for suffix stripping. pp 313–316
Rayson P, Emmet L, Garside R, Sawyer P (2001) The REVERE project: experiments with the application of probabilistic nlp to systems engineering. In: NLDB ’00: Proceedings of 5th international conference on applications of natural language to information systems. Springer, London, pp 288–300
Rayson P, Garside R (2000) Comparing corpora using frequency profiling. In: CompareCorpora ’00: Proceedings of the workshop on comparing corpora. Association for Computational Linguistics, Morristown, pp 1–6
Ryu PM (2004) Determining the specificity of terms using compositional and contextual information. In: Proceedings of the ACL 2004 workshop on student research. Association for computational linguistics, Morristown. p 1 doi:10.3115/1219079.1219080
Sawyer P, Rayson P, Cosh K (2005) Shallow knowledge as an aid to deep understanding in early phase requirements engineering. Softw Eng IEEE Trans 31(11):969–981. doi:10.1109/TSE.2005.129
Article Google Scholar
Stone A, Sawyer P (2006) Identifying tacit knowledge-based requirements. Softw IEE Proc 153(6):211–218. doi:10.1049/ip-sen:20060034
Šnajder J, Bašić BD, Tadić M (2008) Automatic acquisition of inflectional lexica for morphological normalisation. Inf Process Manage 44(5):1720–1731. doi:10.1016/j.ipm.2008.03.006
Article Google Scholar
Wermter J, Hahn U. (2005) Finding new terminology in very large corpora. In: K-CAP ’05: Proceedings of the 3rd international conference on knowledge capture. ACM, New York. pp 137–144. doi:10.1145/1088622.1088648
Wermter J, Hahn U (2006) You can’t beat frequency (unless you use linguistic knowledge): a qualitative evaluation of association measures for collocation and term extraction. In: ACL-44: Proceedings of 21st international conference on computational linguistics. Association for computational linguistics, Morristown, pp 785–792. doi:10.3115/1220175.1220274

Download references

Acknowledgments

This work was funded by EPSRC grant EP/F069227/1 MaTREx.

Author information

Authors and Affiliations

School of Computing and Communications, Lancaster University, Lancaster, UK
Ricardo Gacitua & Pete Sawyer
Dipartimento di Informatica, Università di Pisa, Pisa, Italy
Vincenzo Gervasi

Authors

Ricardo Gacitua
View author publications
You can also search for this author in PubMed Google Scholar
Pete Sawyer
View author publications
You can also search for this author in PubMed Google Scholar
Vincenzo Gervasi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo Gacitua.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gacitua, R., Sawyer, P. & Gervasi, V. Relevance-based abstraction identification: technique and evaluation. Requirements Eng 16, 251–265 (2011). https://doi.org/10.1007/s00766-011-0122-3

Download citation

Received: 12 January 2011
Accepted: 16 May 2011
Published: 11 June 2011
Issue Date: September 2011
DOI: https://doi.org/10.1007/s00766-011-0122-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Relevance-based abstraction identification: technique and evaluation

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to automate the systematic review of scientific literature

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

A Case Study of Introducing Security Risk Assessment in Requirements Engineering in a Large Organization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Relevance-based abstraction identification: technique and evaluation

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to automate the systematic review of scientific literature

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

A Case Study of Introducing Security Risk Assessment in Requirements Engineering in a Large Organization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation