Skip to main content

A Distributional Semantics Approach to Simultaneous Recognition of Multiple Classes of Named Entities

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

Abstract

Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al’s permutation-based variant of the Random Indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language, which is validated on the GENIA corpus which has annotations for 46 biomedical entity classes and supports nested entities. Using distributional semantics features only, it achieves an overall micro-averaged F-measure of 67.3% based on fragment matching with performance ranging from 7.4% for “DNA substructure” to 80.7% for “Bioentity”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Byrne, K.: Nested Named Entity Recognition in Historical Archive Text. In: Proceedings of International Conference on Semantic Computing (2007)

    Google Scholar 

  2. Cohen, T., Widdows, D.: Empirical distributional semantics: methods and biomedical applications. Journal of Biomedical Informatics 42 (2009)

    Google Scholar 

  3. Clark, A.: Inducing Syntactic Categories by Context Distribution Clustering. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning (2000)

    Google Scholar 

  4. David, B., Lloyd, T.: Numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia (1997)

    MATH  Google Scholar 

  5. Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput. 10 (1998)

    Google Scholar 

  6. Eddy, S.R.: Hidden Markov Models. Curr. Opin. Struct. Biol. 6 (1996)

    Google Scholar 

  7. Finkel, J.R., Manning, C.D.: Joint Parsing and Named Entity Recognition. In: Proceedings of of NAACL HLT (2009)

    Google Scholar 

  8. Finkel, J.R., Manning, C.D.: Nested Named Entity Recognition. In: EMNLP (2009)

    Google Scholar 

  9. Fox, C.: A Stop List for General Text. ACM SIGIR Forum 24 (199)

    Google Scholar 

  10. Gu, B.: Recognizing Nested Named Entities in GENIA Corpus. In: Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis (2006)

    Google Scholar 

  11. Harris, Z.S.: The structure of science information. Journal of Biomedical Informatics 35 (2002)

    Google Scholar 

  12. Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz Mappings into a Hilbert Space. Contemporary Mathematics 26 (1984)

    Google Scholar 

  13. Jones, M.N., Mewhort, D.J.K.: Representing Word Meaning and Order Information in a Composite Holographic Lexicon. Psychol. Rev. 114 (2007)

    Google Scholar 

  14. Kanerva, P., Kristofersson, J., Holst, A.: Random Indexing of Text Samples for Latent Semantic Analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society (2000)

    Google Scholar 

  15. Kim, J.D., Ohta, T., Tateisi, Y., et al.: GENIA Corpus-a Semantically Annotated Corpus for Bio-Textmining. Bioinformatics-Oxford 19 (2003)

    Google Scholar 

  16. Kim, J.D., Ohta, T., Tsujii, J.: Corpus Annotation for Mining Biomedical Events from Literature. BMC Bioinformatics 9 (2008)

    Google Scholar 

  17. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of ICML (2001)

    Google Scholar 

  18. Landauer, T.K., Dumais, S.T.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychol. Rev. 104, 211–240 (1997)

    Article  Google Scholar 

  19. Leaman, R., Gonzalez, G.: BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition. In: Proceedings of PSB (2008)

    Google Scholar 

  20. Lund, K., Burgess, C.: Hyperspace Analog to Language (HAL): A General Model of Semantic Representation. Language and Cognitive Processes (1996)

    Google Scholar 

  21. Màrquez, L., Villarejo, L., Martí, M.A., et al.: Semeval-2007 Task 09: Multilevel Semantic Annotation of Catalan and Spanish. In: Proceedings of the 4th International Workshop on Semantic Evaluations (2007)

    Google Scholar 

  22. McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. In: Proceedings of CoNLL (2003)

    Google Scholar 

  23. McDonald, R., Fernando, P.: Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics (2005)

    Google Scholar 

  24. Rau, L., Res, G., Center, D., et al.: Extracting Company Names from Text. In: Proceedings of IEEE Conference on Artificial Intelligence Applications (1991)

    Google Scholar 

  25. Sahlgren, M., Holst, A., Kanerva, P.: Permutations as a Means to Encode Order in Word Space. In: Proceedings of CogSci. (2008)

    Google Scholar 

  26. Sahlgren, M.: The Word-Space Model. Doctoral Dissertation in Computational Linguistics. Stockholm University (2006)

    Google Scholar 

  27. Saussure, F., Bally, C., Séchehaye, A., et al.: Cours de linguistique générale. Payot, Paris (1922)

    Google Scholar 

  28. Schütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24, 97–123 (1998)

    Google Scholar 

  29. Settles, B.: ABNER: An Open Source Tool for Automatically Tagging Genes, Proteins and Other Entity Names in Text. Bioinformatics 21 (2005)

    Google Scholar 

  30. Shen, D., Zhang, J., Zhou, G., et al.: Effective Adaptation of a Hidden Markov Model-Based Named Entity Recognizer for Biomedical Domain. In: Proceedings of ACL (2003)

    Google Scholar 

  31. Song, Y., Kim, E., Lee, G.G., et al.: POSBIOTM-NER in the Shared Task of BioNLP/NLPBA 2004. In: Proceedings of IJNLPBA (2004)

    Google Scholar 

  32. Tsai, R.T., Wu, S.H., Chou, W.C., et al.: Various Criteria in the Evaluation of Biomedical Named Entity Recognition. BMC Bioinformatics 7 (2006)

    Google Scholar 

  33. Widdows, D., Ferraro, K.: Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application. In: Proceedings of LREC (2008)

    Google Scholar 

  34. Widdows, D., Cohen, T.: Semantic Vector Combinations and the Synoptic Gospels. In: Proceedings of the Third Quantum Interaction Symposium (2009)

    Google Scholar 

  35. Zhou, G., Zhang, J., Su, J., et al.: Recognizing Names in Biomedical Texts: A Machine Learning Approach. Bioinformatics 20 (2004)

    Google Scholar 

  36. Zhou, G.D.: Recognizing Names in Biomedical Texts using Mutual Information Independence Model and SVM Plus Sigmoid. Int. J. Med. Inf. 75 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jonnalagadda, S., Leaman, R., Cohen, T., Gonzalez, G. (2010). A Distributional Semantics Approach to Simultaneous Recognition of Multiple Classes of Named Entities. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12116-6_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12115-9

  • Online ISBN: 978-3-642-12116-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics