Automatic extraction of acronym definitions from the Web

Sánchez, David; Isern, David

doi:10.1007/s10489-009-0197-4

Automatic extraction of acronym definitions from the Web

Published: 30 September 2009

Volume 34, pages 311–327, (2011)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

David Sánchez¹ &
David Isern¹

32 Citations
6 Altmetric
Explore all metrics

Abstract

Acronyms are widely used to abbreviate and stress important concepts. The discovery of the definitions associated to an acronym is an important matter in order to support language processing and knowledge-related tasks as information retrieval, ontology mapping or question answering. Acronyms represent a very dynamic and unbounded topic that is constantly evolving. Manual attempts to compose a global scale dictionary of acronym-definition pairs result in an overwhelming amount of work and limited results. Attending these shortcomings, this paper presents an automatic and unsupervised methodology to generate acronyms and extract their potential definitions from the Web. The method has been designed to minimise the set of constraints, offering a domain and -partially- language independent solution, and to exploit the Web in order to create large and general acronym-definition sets. Results have been manually evaluated against the largest manually built acronym repository: Acronym Finder. The evaluation shows that the proposed approach is able to improve the coverage of manual attempts maintaining a high precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adar E (2002) S-RAD: A simple and robust abbreviation dictionary. HP Laboratories
Agirre E, Ansa O, Hovy E, Martínez D (2000) Enriching very large ontologies using the WWW. In: Proc of Workshop on Ontology Construction of the European Conference of AI. ECAI, Berlin, pp 73–77
Google Scholar
Brill E (2003) Processing natural language without natural language processing. In: Gelbukh A (ed) Proc of 4th international conference on computational linguistics and intelligent text processing, bconfnameCICLing 2003, Mexico City, Mexico. Springer, Berlin/Heidelberg, pp 360–369
Chapter Google Scholar
Brill E, Lin J, Banko M, Dumais S (2001) Data-intensive question answering. In: Voorhees EM, Harman DK (eds) Proc of tenth text retrieval conference, TREC 2001. Department of Commerce, National Institute of Standards and Technology, Gaithersburg, Maryland, US, pp 393–400
Carmel D, Farchi E, Petruschka Y, Soffer A (2002) Automatic query wefinement using lexical affinities with maximal information gain. In: Beaulieu M, Baeza-Yates R, Myaeng SH, Järvelin K (eds) Proc of 25th annual international. ACM, SIGIR conference on research and development in information retrieval, SIGIR 02. Tampere, Finland, pp 283–290
Castells P (2003) Sistemas interactivos y colaborativos en la Web. In: Bravo C, Redondo MA (eds) La web semántica. Ediciones de la Universidad de Castilla-La Mancha, pp 195–212
Chang C-H, Hsu C-C (1998) Integrating query expansion and conceptual relevance feedback for personalized web information retrieval. Comput Netw ISDN Syst 30:621–623
Article Google Scholar
Chang JT, Schütze H (2006) Abbreviations in biomedical text. In: Ananiadou S, McNaught J (eds) Text mining for biology and biomedicine. Artech House, Norwood, pp 99–119
Google Scholar
Chirita P-A, Firan CS, Nejdl W (2007) Personalized query expansion for the Web. In: Clarke CLA, Fuhr N, Kando N, Kraaij W, de Vries AP (eds) Proc of 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR 07. ACM, Amsterdam, pp 7–14
Chapter Google Scholar
Church KW, Gale W, Hanks P, Hindle D (1991) Using statistics in lexical analysis. In: Zernik U (ed) Lexical acquisition: exploiting on-line resources to build a lexicon. Lawrence Erlbaum Associates, New Jersey, pp 115–164
Google Scholar
Cilibrasi RL, Vitányi PMB (2006) The Google similarity distance. IEEE Trans Knowl Data Eng 19:370–383
Article Google Scholar
Cimiano P, Staab S (2004) Learning by Googling. SIGKDD Explor 6:24–33
Article Google Scholar
Ciravegna F, Dingli A, Guthrie D, Wilks Y (2003) Integrating information to bootstrap information extraction from Web sites. In: Kambhampati S, Knoblock CA (eds) Proc of IJCAI workshop on information integration on the Web, IIWeb 2003. IJCAI Press, Acapulco, pp 9–14
Google Scholar
Dannélls D (2006) Automatic acronym recognition. In: Proc of 11st conference of the European chapter of the association for computational linguistics, EACL 2006. The Association for Computer Linguistics, Trento, pp 167–170
Google Scholar
Dimililer N, Varoğlu E, Altınçay H (2009) Classifier subset selection for biomedical named entity recognition. Appl Intell. doi:10.1007/s10489-008-0124-0 to appear
Dujmovic J, Bai H (2006) Evaluation and comparison of search engines using the LSP method. Comput Sci Inf Syst 3:711–722
Google Scholar
Etzioni O, Cafarella M, Downey D, Kok S, Popescu A, Shaked T, Soderland S, Weld DS (2004) Web-scale information Extraction in KnowItAll. In: Proc of 13th international World Wide Web conference, WWW 2004. ACM Press, New York, pp 100–110
Google Scholar
Etzioni O, Cafarella M, Downey D, Popescu A-M, Shaked T, Soderland S, Weld DS, Yates A (2005) Unsupervised named-entity extraction from the Web: an experimental study. Artif Intell 165:91–134
Article Google Scholar
Ferreira da Silva J, Lopes GP (1999) A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: Proc of sixth meeting on mathematics of language, MOL6. Association for Computational Linguistics, Orlando, pp 369–381
Google Scholar
Grefenstette G (1999) The World Wide Web as a resource for example-based machine translation tasks. In: Proc of twenty-first international conference on translating and the computer. Aslib Press, London
Google Scholar
Henzinger MR (2008) PageRank algorithm. In: Kao M-Y (ed) Encyclopedia of algorithms. Springer, New York
Google Scholar
Hisamitsu T, Niwa Y (2001) Extracting useful terms from parenthetical expression by combining simple rules and statistical measures: a comparative evaluation of bigram statistics. In: Bourigault D, Christian J, L’Homme M-C (eds) Recent advances in computational terminology. Benjamins, Amsterdam, pp 209–224
Google Scholar
Hunt JW, Szymanski TG (1977) A fast algorithm for computing longest common subsequences. Commun ACM 20:350–353
Article MathSciNet MATH Google Scholar
Kilgarriff A, Grefenstette G (2003) Introduction to the special issue on the Web as Corpus. Comput Linguist 29:333–347
Article MathSciNet Google Scholar
Kim M-C, Choi K-S (1999) A comparison of collocation-based similarity measures in query expansion. Inf Process Manag 35:19–30
Article Google Scholar
Kim S-B, Seo H-C, Rim H-C (2004) Information retrieval using word senses: root sense tagging approach. In: Järvelin K, Allan J, Bruza P, Sanderson M (eds) Proc of 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR 04. ACM, Sheffield, pp 258–265
Google Scholar
Lam-Adesina AM, Jones GJF (2001) Applying summarization techniques for term selection in relevance feedback. In: Kraft DH, Croft WB, Harper DJ, Zobel J (eds) Proc of 24th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR 01. ACM, New Orleans, pp 1–9
Chapter Google Scholar
Larkey L, Ogilvie P, Price A, Tamilio B (2000) Acrophile: an automated acronym extractor and server. In: Proc of 5th ACM conference on digital libraries. Association for Computing Machinery, San Antonio, pp 205–214
Chapter Google Scholar
Liu H, Friedman C (2003) Mining terminological knowledge in large biomedical corpora. In: Altman RB, Dunker AK, Hunter L, Klein TE (eds) Proc of 8th Pacific symposium on biocomputing, PSB 2003. PSB Association, Lihue, pp 415–426
Google Scholar
Nadeau D, Turney PD (2005) A supervised learning approach to acronym identification. In: Kégl B, Lapalme G (eds) Proc of 18th conference of the Canadian society for computational studies of intelligence, Canadian AI 2005. Springer, Berlin/Heidelberg, pp 319–329
Google Scholar
Okazaki N, Ananiadou S (2006) A term recognition approach to acronym recognition. In: Proc of international committee on computational linguistics and the association for computational linguistics, COLING-ACL 2006. Association for Computational Linguistics, Sydney, pp 643–650
Google Scholar
Park Y, Byrd RJ (2001) Hybrid text mining for finding abbreviations and their definitions. In: Lee L, Harman D (eds) Proc of conference on empirical methods in natural language processing, EMNLP 2001. Intelligent Information Systems Institute, Pittsburgh, pp 126–133
Google Scholar
Pustejovsky J, Castaño J, Cochran B, Kotecki M, Morrell M (2001) Automatic extraction of acronym-meaning pairs from MEDLINE databases. In: Patel V, Rogers R, Haux R (eds) Proc of 10th Triennial congress of the international medical informatics association, MEDINFO 2001. IOS Press, London, pp 371–375
Google Scholar
Qiu Y, Frei H-P (1993) Concept based query expansion. In: Korfhage R, Rasmussen E, Willett P (eds) Proc of 16th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR 93. ACM, Pittsburgh, pp 160–169
Chapter Google Scholar
Resnik P, Smith N (2003) The Web as a parallel corpus. Comput Linguist 29:349–380
Article Google Scholar
Sánchez D, Moreno A (2008) Pattern-based automatic taxonomy learning from the Web. AI Commun 21:27–48
MathSciNet MATH Google Scholar
Schwartz A, Hearst M (2003) A simple algorithm for identifying abbreviation definitions in biomedical texts. In: Altman RB, Dunker AK, Hunter L, Klein TE (eds) Proc of 8th Pacific symposium on biocomputing, PSB 2003. PSB Association, Lihue, pp 451–462
Google Scholar
Taghva K, Gilbreth J (1999) Recognizing acronyms and their definitions. Int J Document Anal Recognit 1:191–198
Article Google Scholar
Torii M, Hu Z-Z, Song M, Wu CH, Liu H (2006) A comparison study on algorithms of detecting long forms for short forms in biomedical text. BMC Bioinform 8:S5
Article Google Scholar
Turney PD (2001) Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Raedt LD, Flach P (eds) Proc of 12th European conference on machine learning, ECML 2001, Freiburg, Germany. Springer, Berlin/Heidelberg, pp 491–499
Google Scholar
WordNet (1998) WordNet—an electronic lexical database. MIT Press, Cambridge
Google Scholar
Xiao L, Wissmann D, Brown M, Jablonski S (2004) Information extraction from the Web: system and techniques. Appl Intell 21:195–224
Article MATH Google Scholar
Yarowsky D (1995) Unsupervised word-sense disambiguation rivaling supervised methods. In: Uszkoreit H (ed) Proc of 33rd annual meeting of the association for computational linguistics. Association for Computational Linguistics, Cambridge, pp 189–196
Chapter Google Scholar
Yeates S (1999) Automatic extraction of acronyms from text. In: Yeates S (ed.) Proc of third New Zealand computer science research students’ conference. University of Waikato, Te Kohinga Marama Marae, Hamilton, New Zealand, pp 117–124
Yoon Y-C, Park S-Y, Song Y-I, Rim H-C, Rhee D-W (2008) Automatic acronym dictionary construction based on acronym generation types. IEICE Trans Inform Syst E91-D:1584–1587
Article Google Scholar
Yu H, Hripcsak G, Friedman C (2002) Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc 9:262–272
Article Google Scholar
Yu S, Cai D, Wen J-R, Ma W-Y (2003) Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Hencsey G, White B, Robin Chen Y-F, Kovács L, Lawrence S (eds) Proc of 12th international conference on World Wide Web, WWW 03, Budapest. ACM, New York, pp 11–18
Google Scholar
Zahariev M (1991) In faculty of control systems and computers. Polytechnic Institute of Bucharest Simon Fraser University, Bucharest, Rumania

Download references

Author information

Authors and Affiliations

Department of Computer Science and Mathematics, Intelligent Technologies for Advanced Knowledge Acquisition (ITAKA) Research Group, University Rovira i Virgili, Tarragona, Catalonia, Spain
David Sánchez & David Isern

Authors

David Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
David Isern
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Sánchez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sánchez, D., Isern, D. Automatic extraction of acronym definitions from the Web. Appl Intell 34, 311–327 (2011). https://doi.org/10.1007/s10489-009-0197-4

Download citation

Published: 30 September 2009
Issue Date: April 2011
DOI: https://doi.org/10.1007/s10489-009-0197-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic extraction of acronym definitions from the Web

Abstract

Access this article

Similar content being viewed by others

A Survey on Acronym–Expansion Mining Approaches from Text and Web

Acronyms: identification, expansion and disambiguation

SynFinder: A System for Domain-Based Detection of Synonyms Using WordNet and the Web of Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic extraction of acronym definitions from the Web

Abstract

Access this article

Similar content being viewed by others

A Survey on Acronym–Expansion Mining Approaches from Text and Web

Acronyms: identification, expansion and disambiguation

SynFinder: A System for Domain-Based Detection of Synonyms Using WordNet and the Web of Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation