Using SVM to Extract Acronyms from Text
- First Online:
The paper addresses the problem of extracting acronyms and their expansions from text. We propose a support vector machines (SVM) based approach to deal with the problem. First, all likely acronyms are identified using heuristic rules. Second, expansion candidates are generated from surrounding text of acronyms. Last, SVM model is employed to select the genuine expansions. Analysis shows that the proposed approach has the advantages of saving over the conventional rule based approaches. Experimental results show that our approach outperforms the baseline method of using rules. We also show that the trained SVM model is generic and can adapt to other domains easily.
KeywordsAcronym Expansion Classification Support vector machines
Unable to display preview. Download preview PDF.
- 2.Bowden PR, Automatic (1999) Glossary construction for technical papers. Department Working Paper, Nottingham Trent UniversityGoogle Scholar
- 3.Bowden PR, Halstead P, Rose TG (2000). Dictionaryless English plural noun singularisation using a corpus-based list of irregular forms. In: Proceedings of the 17th international conference on English Language Research on Computerized Corpora, Rodopi, Amersterdam, The Netherlands, pp 130–137Google Scholar
- 5.Hettich S, Bay SD (1999) The UCI KDD Archive. [http:// kdd.ics.uci.edu]. Department of Information and Computer Science, University of California, IrvineGoogle Scholar
- 6.Larkey LS, Ogilvie P, Price MA, Tamilio B (2000) Acrophile: An automated acronym extractor and server. In: Proceedings of the 5th ACM conference on digital libraries. ACM Press, San Antonio, pp 205–214Google Scholar
- 7.Park Y, Byrd RJ (2001) Hybrid text mining for finding abbreviations and their definitions. In: Proceedings of the 2001 conference on empirical methods in natural language processing, Pittsburgh, pp 126–133Google Scholar
- 8.Pustejovsky J, Castano J, Cochran B, Kotecki M, Morrell M (2001) Automatic extraction of acronym-meaning pairs from MEDLINE databases. Medinfo 10(Pt 1):371–375Google Scholar
- 9.Schwartz A, Hearst M (2003) A simple algorithm for identifying abbreviation definitions in biomedical text. In: Proceedings of the 2003 pacific symposium on biocomputing. World Scientific Press, SingaporeGoogle Scholar
- 10.Taghva K, Gilbreth J (1999) Recognizing acronyms and their definitions. Technical Report, ISRI (Information Science Research Institute), UNLVGoogle Scholar
- 12.Yeates S (1999) Automatic extraction of acronyms from text. In: Proceedings of the 3rd new zealand computer science research students’ conference, University of Waikato, Hamilton, pp 117–124Google Scholar
- 13.Yeates S, Bainbridge D, Witten IH (2000) Using compression to identify acronyms in text. In: Proceedings of data compression conference, IEEE Press, New York, pp 582Google Scholar