Abstract
Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al’s permutation-based variant of the Random Indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language, which is validated on the GENIA corpus which has annotations for 46 biomedical entity classes and supports nested entities. Using distributional semantics features only, it achieves an overall micro-averaged F-measure of 67.3% based on fragment matching with performance ranging from 7.4% for “DNA substructure” to 80.7% for “Bioentity”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Byrne, K.: Nested Named Entity Recognition in Historical Archive Text. In: Proceedings of International Conference on Semantic Computing (2007)
Cohen, T., Widdows, D.: Empirical distributional semantics: methods and biomedical applications. Journal of Biomedical Informatics 42 (2009)
Clark, A.: Inducing Syntactic Categories by Context Distribution Clustering. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning (2000)
David, B., Lloyd, T.: Numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput. 10 (1998)
Eddy, S.R.: Hidden Markov Models. Curr. Opin. Struct. Biol. 6 (1996)
Finkel, J.R., Manning, C.D.: Joint Parsing and Named Entity Recognition. In: Proceedings of of NAACL HLT (2009)
Finkel, J.R., Manning, C.D.: Nested Named Entity Recognition. In: EMNLP (2009)
Fox, C.: A Stop List for General Text. ACM SIGIR Forum 24 (199)
Gu, B.: Recognizing Nested Named Entities in GENIA Corpus. In: Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis (2006)
Harris, Z.S.: The structure of science information. Journal of Biomedical Informatics 35 (2002)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz Mappings into a Hilbert Space. Contemporary Mathematics 26 (1984)
Jones, M.N., Mewhort, D.J.K.: Representing Word Meaning and Order Information in a Composite Holographic Lexicon. Psychol. Rev. 114 (2007)
Kanerva, P., Kristofersson, J., Holst, A.: Random Indexing of Text Samples for Latent Semantic Analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society (2000)
Kim, J.D., Ohta, T., Tateisi, Y., et al.: GENIA Corpus-a Semantically Annotated Corpus for Bio-Textmining. Bioinformatics-Oxford 19 (2003)
Kim, J.D., Ohta, T., Tsujii, J.: Corpus Annotation for Mining Biomedical Events from Literature. BMC Bioinformatics 9 (2008)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of ICML (2001)
Landauer, T.K., Dumais, S.T.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychol. Rev. 104, 211–240 (1997)
Leaman, R., Gonzalez, G.: BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition. In: Proceedings of PSB (2008)
Lund, K., Burgess, C.: Hyperspace Analog to Language (HAL): A General Model of Semantic Representation. Language and Cognitive Processes (1996)
Màrquez, L., Villarejo, L., Martí, M.A., et al.: Semeval-2007 Task 09: Multilevel Semantic Annotation of Catalan and Spanish. In: Proceedings of the 4th International Workshop on Semantic Evaluations (2007)
McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. In: Proceedings of CoNLL (2003)
McDonald, R., Fernando, P.: Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics (2005)
Rau, L., Res, G., Center, D., et al.: Extracting Company Names from Text. In: Proceedings of IEEE Conference on Artificial Intelligence Applications (1991)
Sahlgren, M., Holst, A., Kanerva, P.: Permutations as a Means to Encode Order in Word Space. In: Proceedings of CogSci. (2008)
Sahlgren, M.: The Word-Space Model. Doctoral Dissertation in Computational Linguistics. Stockholm University (2006)
Saussure, F., Bally, C., Séchehaye, A., et al.: Cours de linguistique générale. Payot, Paris (1922)
Schütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24, 97–123 (1998)
Settles, B.: ABNER: An Open Source Tool for Automatically Tagging Genes, Proteins and Other Entity Names in Text. Bioinformatics 21 (2005)
Shen, D., Zhang, J., Zhou, G., et al.: Effective Adaptation of a Hidden Markov Model-Based Named Entity Recognizer for Biomedical Domain. In: Proceedings of ACL (2003)
Song, Y., Kim, E., Lee, G.G., et al.: POSBIOTM-NER in the Shared Task of BioNLP/NLPBA 2004. In: Proceedings of IJNLPBA (2004)
Tsai, R.T., Wu, S.H., Chou, W.C., et al.: Various Criteria in the Evaluation of Biomedical Named Entity Recognition. BMC Bioinformatics 7 (2006)
Widdows, D., Ferraro, K.: Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application. In: Proceedings of LREC (2008)
Widdows, D., Cohen, T.: Semantic Vector Combinations and the Synoptic Gospels. In: Proceedings of the Third Quantum Interaction Symposium (2009)
Zhou, G., Zhang, J., Su, J., et al.: Recognizing Names in Biomedical Texts: A Machine Learning Approach. Bioinformatics 20 (2004)
Zhou, G.D.: Recognizing Names in Biomedical Texts using Mutual Information Independence Model and SVM Plus Sigmoid. Int. J. Med. Inf. 75 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jonnalagadda, S., Leaman, R., Cohen, T., Gonzalez, G. (2010). A Distributional Semantics Approach to Simultaneous Recognition of Multiple Classes of Named Entities. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-12116-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)