Soft Computing

, Volume 11, Issue 4, pp 369–373

Using SVM to Extract Acronyms from Text

Focus

Abstract

The paper addresses the problem of extracting acronyms and their expansions from text. We propose a support vector machines (SVM) based approach to deal with the problem. First, all likely acronyms are identified using heuristic rules. Second, expansion candidates are generated from surrounding text of acronyms. Last, SVM model is employed to select the genuine expansions. Analysis shows that the proposed approach has the advantages of saving over the conventional rule based approaches. Experimental results show that our approach outperforms the baseline method of using rules. We also show that the trained SVM model is generic and can adapt to other domains easily.

Keywords

Acronym Expansion Classification Support vector machines 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adar E (2004) SaRAD: a simple and robust abbreviation dictionary. Bioinformatics 20:527–533CrossRefGoogle Scholar
  2. 2.
    Bowden PR, Automatic (1999) Glossary construction for technical papers. Department Working Paper, Nottingham Trent UniversityGoogle Scholar
  3. 3.
    Bowden PR, Halstead P, Rose TG (2000). Dictionaryless English plural noun singularisation using a corpus-based list of irregular forms. In: Proceedings of the 17th international conference on English Language Research on Computerized Corpora, Rodopi, Amersterdam, The Netherlands, pp 130–137Google Scholar
  4. 4.
    Chang JT, Schutze H, Altman RB (2002) Create an online dictionary of abbreviation from MEDLINE. J Am Med Inform Assoc 9(6):612–620CrossRefGoogle Scholar
  5. 5.
    Hettich S, Bay SD (1999) The UCI KDD Archive. [http:// kdd.ics.uci.edu]. Department of Information and Computer Science, University of California, IrvineGoogle Scholar
  6. 6.
    Larkey LS, Ogilvie P, Price MA, Tamilio B (2000) Acrophile: An automated acronym extractor and server. In: Proceedings of the 5th ACM conference on digital libraries. ACM Press, San Antonio, pp 205–214Google Scholar
  7. 7.
    Park Y, Byrd RJ (2001) Hybrid text mining for finding abbreviations and their definitions. In: Proceedings of the 2001 conference on empirical methods in natural language processing, Pittsburgh, pp 126–133Google Scholar
  8. 8.
    Pustejovsky J, Castano J, Cochran B, Kotecki M, Morrell M (2001) Automatic extraction of acronym-meaning pairs from MEDLINE databases. Medinfo 10(Pt 1):371–375Google Scholar
  9. 9.
    Schwartz A, Hearst M (2003) A simple algorithm for identifying abbreviation definitions in biomedical text. In: Proceedings of the 2003 pacific symposium on biocomputing. World Scientific Press, SingaporeGoogle Scholar
  10. 10.
    Taghva K, Gilbreth J (1999) Recognizing acronyms and their definitions. Technical Report, ISRI (Information Science Research Institute), UNLVGoogle Scholar
  11. 11.
    Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin Heidelberg New YorkMATHGoogle Scholar
  12. 12.
    Yeates S (1999) Automatic extraction of acronyms from text. In: Proceedings of the 3rd new zealand computer science research students’ conference, University of Waikato, Hamilton, pp 117–124Google Scholar
  13. 13.
    Yeates S, Bainbridge D, Witten IH (2000) Using compression to identify acronyms in text. In: Proceedings of data compression conference, IEEE Press, New York, pp 582Google Scholar
  14. 14.
    Yoshida M, Fukuda K, Takagi T (2000) PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary. Bioinformatics 16:169–175CrossRefGoogle Scholar
  15. 15.
    Yu H, Hripcsak G, Friedman C (2002) Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc 9:262–272MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  1. 1.College of SoftwareNankai UniversityTianjinChina

Personalised recommendations