Skip to main content

Extracting Named Entities Using Support Vector Machines

  • Conference paper
Knowledge Discovery in Life Science Literature (KDLL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3886))

Abstract

Identifying proper names, like gene names, DNAs, or proteins is useful to help researchers to mining the text information. Learning to extract proper names in natural language text is a named entity recognition (NER) task. Previous studies focus on combining abundant human made rules, trigger words, to enhance the system performance. However these methods require domain experts to build up these rules and word set which relies on lots of human efforts. In this paper, we present a robust named entity recognition system based on support vector machines (SVM). By integrating with rich feature set and the proposed mask method, the system performance is satisfactory on the MUC-7 and biology named entity recognition tasks which outperforms famous machine learning-based method, such as hidden markov model (HMM), and maximum entropy model (MEM). We compare our method to previous systems that were performed on the same data set. The experiments show that when training with the MUC-7 data set, our system achieves 86.4 in F(β= 1) rate and 81.57 for the biology corpus. Besides, our named entity system is able to handle real time processing applications, the turn around time on a 63 K words document set is less than 30 seconds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bikel, D., Schwartz, R., Weischedel, R.: An algorithm that learns what’s in a name. Machine Learning, 211–231 (1999)

    Google Scholar 

  2. Borthwick, A.: Maximum entropy approach to named entity recognition, PhD dissertation (1998)

    Google Scholar 

  3. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)

    MathSciNet  Google Scholar 

  4. Carreras, X., Marquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (2003)

    Google Scholar 

  5. Chieu, H.L., Ng, H.T.: Name entity recognition: a maximum entropy approach using global information. In: International Conference on Computational Linguistics (COLING), pp. 190–196 (2002)

    Google Scholar 

  6. Giménez, J., Márquez, L.: Fast and accurate Part-of-Speech tagging: the SVM approach revisited. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, pp. 158–165 (2003)

    Google Scholar 

  7. Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C., Mitchell, B., Cunningham, H., Wilks, Y.: University of Sheffield: Description of the. LaSIE-II System as Used for MUC-7. In: Proceedings of 7th Message Understanding Conference, pp. 1–12 (1998)

    Google Scholar 

  8. Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 17th Computational Linguistics (COLING), pp. 390–396 (2002)

    Google Scholar 

  9. Joachims, T.: A statistical learning model of text classification with support vector machines. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 128–136 (2001)

    Google Scholar 

  10. Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the 2nd Meetings of the North American Chapter and the Association for the Computational Linguistics (2001)

    Google Scholar 

  11. Paliouras, G., Karkaletsis, V., Petasis, G., Spyropoulos, C.D.: Learning decision trees for named-entity recognition and classification. In: Proceedings of the 14th European Conference on Artificial Intelligence, ECAI (2000)

    Google Scholar 

  12. Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the 7th Conference on Natural Language Learning (CoNLL), pp. 184–187 (2003)

    Google Scholar 

  13. McNamee, P., Mayfield, J.: Entity Extraction without Language-Specific Resources. In: Proceedings of the 6th Conferrence on Natural Language Learning (CoNLL), pp. 183–186 (2002)

    Google Scholar 

  14. Mikheev, A., Grover, C., Moens, M.: Description of the LTG system used for MUC-7. In: Proceedings of 7th Message Understanding Conference, pp. 1–12 (1998)

    Google Scholar 

  15. Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1–8 (1999)

    Google Scholar 

  16. Takeuchi, K., Collier, N.: Use of support vector machines in extended named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning (CoNLL), pp. 119–125 (2002)

    Google Scholar 

  17. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of Conference on Natural Language Learning (CoNLL), pp. 127–132 (2000)

    Google Scholar 

  18. Tjong Kim Sang, E.F., Fien, D.M.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of the 7th Conference on Natural Language Learning (CoNLL), pp. 142–147 (2003)

    Google Scholar 

  19. Zhou, G.D., Su, J.: Name entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 473–480 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, YC., Fan, TK., Lee, YS., Yen, SJ. (2006). Extracting Named Entities Using Support Vector Machines. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_8

Download citation

  • DOI: https://doi.org/10.1007/11683568_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32809-4

  • Online ISBN: 978-3-540-32810-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics