Extracting Named Entities Using Support Vector Machines

Wu, Yu-Chieh; Fan, Teng-Kai; Lee, Yue-Shi; Yen, Show-Jane

doi:10.1007/11683568_8

Yu-Chieh Wu²⁴,
Teng-Kai Fan²⁴,
Yue-Shi Lee²⁵ &
…
Show-Jane Yen²⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3886))

Included in the following conference series:

International Workshop on Knowledge Discovery in Life Science LIterature

585 Accesses
20 Citations

Abstract

Identifying proper names, like gene names, DNAs, or proteins is useful to help researchers to mining the text information. Learning to extract proper names in natural language text is a named entity recognition (NER) task. Previous studies focus on combining abundant human made rules, trigger words, to enhance the system performance. However these methods require domain experts to build up these rules and word set which relies on lots of human efforts. In this paper, we present a robust named entity recognition system based on support vector machines (SVM). By integrating with rich feature set and the proposed mask method, the system performance is satisfactory on the MUC-7 and biology named entity recognition tasks which outperforms famous machine learning-based method, such as hidden markov model (HMM), and maximum entropy model (MEM). We compare our method to previous systems that were performed on the same data set. The experiments show that when training with the MUC-7 data set, our system achieves 86.4 in F_(β= 1) rate and 81.57 for the biology corpus. Besides, our named entity system is able to handle real time processing applications, the turn around time on a 63 K words document set is less than 30 seconds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bikel, D., Schwartz, R., Weischedel, R.: An algorithm that learns what’s in a name. Machine Learning, 211–231 (1999)
Google Scholar
Borthwick, A.: Maximum entropy approach to named entity recognition, PhD dissertation (1998)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)
MathSciNet Google Scholar
Carreras, X., Marquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (2003)
Google Scholar
Chieu, H.L., Ng, H.T.: Name entity recognition: a maximum entropy approach using global information. In: International Conference on Computational Linguistics (COLING), pp. 190–196 (2002)
Google Scholar
Giménez, J., Márquez, L.: Fast and accurate Part-of-Speech tagging: the SVM approach revisited. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, pp. 158–165 (2003)
Google Scholar
Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C., Mitchell, B., Cunningham, H., Wilks, Y.: University of Sheffield: Description of the. LaSIE-II System as Used for MUC-7. In: Proceedings of 7th Message Understanding Conference, pp. 1–12 (1998)
Google Scholar
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 17th Computational Linguistics (COLING), pp. 390–396 (2002)
Google Scholar
Joachims, T.: A statistical learning model of text classification with support vector machines. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 128–136 (2001)
Google Scholar
Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the 2nd Meetings of the North American Chapter and the Association for the Computational Linguistics (2001)
Google Scholar
Paliouras, G., Karkaletsis, V., Petasis, G., Spyropoulos, C.D.: Learning decision trees for named-entity recognition and classification. In: Proceedings of the 14th European Conference on Artificial Intelligence, ECAI (2000)
Google Scholar
Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the 7th Conference on Natural Language Learning (CoNLL), pp. 184–187 (2003)
Google Scholar
McNamee, P., Mayfield, J.: Entity Extraction without Language-Specific Resources. In: Proceedings of the 6th Conferrence on Natural Language Learning (CoNLL), pp. 183–186 (2002)
Google Scholar
Mikheev, A., Grover, C., Moens, M.: Description of the LTG system used for MUC-7. In: Proceedings of 7th Message Understanding Conference, pp. 1–12 (1998)
Google Scholar
Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1–8 (1999)
Google Scholar
Takeuchi, K., Collier, N.: Use of support vector machines in extended named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning (CoNLL), pp. 119–125 (2002)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of Conference on Natural Language Learning (CoNLL), pp. 127–132 (2000)
Google Scholar
Tjong Kim Sang, E.F., Fien, D.M.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of the 7th Conference on Natural Language Learning (CoNLL), pp. 142–147 (2003)
Google Scholar
Zhou, G.D., Su, J.: Name entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 473–480 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Central University, No.300, Jhong-Da Rd., Jhongli City, Taoyuan County, 32001, Taiwan, R.O.C.
Yu-Chieh Wu & Teng-Kai Fan
Department of Computer Science and Information Engineering, Ming Chuan University, No.5, De-Ming Rd, Gweishan District, Taoyuan County, 333, Taiwan, R.O.C.
Yue-Shi Lee & Show-Jane Yen

Authors

Yu-Chieh Wu
View author publications
You can also search for this author in PubMed Google Scholar
Teng-Kai Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yue-Shi Lee
View author publications
You can also search for this author in PubMed Google Scholar
Show-Jane Yen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Brain Tumor Research Program, Children’s Memorial Hospital, and Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Eric G. Bremer
Computer Science Department, Knowledge Management in Bioinformatics, Humbold-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Jörg Hakenberg
iXmatch Inc., 5555 West 78th Street Suite E, 55439-2702, Minneapolis, MN, USA
Eui-Hong (Sam) Han
School of Biomedical Sciences, University of Ulster, Cromore Road,, BT52 1SA, Coleraine, Northern Ireland, UK
Daniel Berrar
School of Biomedial Sciences, Bioinformatics Research Group, University of Ulster, Cromore Road, BT52 1SA, Coleraine, Northern Ireland, UK
Werner Dubitzky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, YC., Fan, TK., Lee, YS., Yen, SJ. (2006). Extracting Named Entities Using Support Vector Machines. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_8

Download citation

DOI: https://doi.org/10.1007/11683568_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32809-4
Online ISBN: 978-3-540-32810-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics