Abstract
Identifying proper names, like gene names, DNAs, or proteins is useful to help researchers to mining the text information. Learning to extract proper names in natural language text is a named entity recognition (NER) task. Previous studies focus on combining abundant human made rules, trigger words, to enhance the system performance. However these methods require domain experts to build up these rules and word set which relies on lots of human efforts. In this paper, we present a robust named entity recognition system based on support vector machines (SVM). By integrating with rich feature set and the proposed mask method, the system performance is satisfactory on the MUC-7 and biology named entity recognition tasks which outperforms famous machine learning-based method, such as hidden markov model (HMM), and maximum entropy model (MEM). We compare our method to previous systems that were performed on the same data set. The experiments show that when training with the MUC-7 data set, our system achieves 86.4 in F(β= 1) rate and 81.57 for the biology corpus. Besides, our named entity system is able to handle real time processing applications, the turn around time on a 63 K words document set is less than 30 seconds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bikel, D., Schwartz, R., Weischedel, R.: An algorithm that learns what’s in a name. Machine Learning, 211–231 (1999)
Borthwick, A.: Maximum entropy approach to named entity recognition, PhD dissertation (1998)
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Carreras, X., Marquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (2003)
Chieu, H.L., Ng, H.T.: Name entity recognition: a maximum entropy approach using global information. In: International Conference on Computational Linguistics (COLING), pp. 190–196 (2002)
Giménez, J., Márquez, L.: Fast and accurate Part-of-Speech tagging: the SVM approach revisited. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, pp. 158–165 (2003)
Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C., Mitchell, B., Cunningham, H., Wilks, Y.: University of Sheffield: Description of the. LaSIE-II System as Used for MUC-7. In: Proceedings of 7th Message Understanding Conference, pp. 1–12 (1998)
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 17th Computational Linguistics (COLING), pp. 390–396 (2002)
Joachims, T.: A statistical learning model of text classification with support vector machines. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 128–136 (2001)
Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the 2nd Meetings of the North American Chapter and the Association for the Computational Linguistics (2001)
Paliouras, G., Karkaletsis, V., Petasis, G., Spyropoulos, C.D.: Learning decision trees for named-entity recognition and classification. In: Proceedings of the 14th European Conference on Artificial Intelligence, ECAI (2000)
Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the 7th Conference on Natural Language Learning (CoNLL), pp. 184–187 (2003)
McNamee, P., Mayfield, J.: Entity Extraction without Language-Specific Resources. In: Proceedings of the 6th Conferrence on Natural Language Learning (CoNLL), pp. 183–186 (2002)
Mikheev, A., Grover, C., Moens, M.: Description of the LTG system used for MUC-7. In: Proceedings of 7th Message Understanding Conference, pp. 1–12 (1998)
Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1–8 (1999)
Takeuchi, K., Collier, N.: Use of support vector machines in extended named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning (CoNLL), pp. 119–125 (2002)
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of Conference on Natural Language Learning (CoNLL), pp. 127–132 (2000)
Tjong Kim Sang, E.F., Fien, D.M.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of the 7th Conference on Natural Language Learning (CoNLL), pp. 142–147 (2003)
Zhou, G.D., Su, J.: Name entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 473–480 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, YC., Fan, TK., Lee, YS., Yen, SJ. (2006). Extracting Named Entities Using Support Vector Machines. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_8
Download citation
DOI: https://doi.org/10.1007/11683568_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32809-4
Online ISBN: 978-3-540-32810-0
eBook Packages: Computer ScienceComputer Science (R0)