Is a Common Phrase an Entity Mention or Not? Dual Representations for Domain-Specific Named Entity Recognition

Zhang, Jiangtao; Li, Juanzi; Li, Xiao-Li; Cao, Yixin; Hou, Lei; Wang, Shuai

doi:10.1007/978-3-319-91452-7_53

Jiangtao Zhang²⁴,
Juanzi Li²⁴,
Xiao-Li Li²⁵,
Yixin Cao²⁴,
Lei Hou²⁴ &
…
Shuai Wang²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10827))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3404 Accesses

Abstract

Named Entity Recognition (NER) for specific domains is critical for building and managing domain-specific knowledge bases, but conventional NER methods cannot be applied to specific domains effectively. We found that one of reasons is the problem of common-phrase-like entity mention prevalent in many domains. That is, many common phrases frequently occurring in general corpora may or may not be treated as named entities in specific domains. Therefore, determining whether a common phrase is an entity mention or not is a challenge. To address this issue, we present a novel BLSTM based NER model tailored for specific domains by learning dual representations for each word. It learns not only general domain knowledge derived from an external large scale general corpus via a word embedding model, but also the specific domain knowledge by training a stacked deep neural network (SDNN) integrating the results of a low-cost pre-entity-linking process. Extensive experiments on a real-world dataset of movie comments demonstrate the superiority of our model over existing state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.netflix.com/jp-en/title/70178217.
2.
https://catalog.ldc.upenn.edu/ldc2011t07.
3.
http://www.imdb.com.
4.
https://baike.baidu.com/.
5.
https://keras.io.
6.
https://github.com/naxier/MovieEL.
7.
Using Embedding Layer in Keras.
8.
https://taku910.github.io/crfpp/.
9.
http://nlp.cs.rpi.edu/kbp/.

References

Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)
MathSciNet MATH Google Scholar
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2787–2795 (2013)
Google Scholar
Cao, Y., Huang, L., Ji, H., Chen, X., Li, J.: Bridge text and knowledge by learning multi-prototype entity mention embedding. In: ACL (2017)
Google Scholar
Cao, Y., Li, J., Guo, X., Bai, S., Ji, H., Tang, J.: Name list only? Target entity disambiguation in short texts. In: EMNLP (2015)
Google Scholar
Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. TACL 4, 357–370 (2016)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Durrett, G., Klein, D.: A joint model for entity analysis: coreference, typing, and linking. In: TACL (2014)
Google Scholar
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 168–171 (2003)
Google Scholar
Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Information extraction: identifying protein names from biological papers. In: PSB, pp. 707–718 (1998)
Google Scholar
Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition and classification in biological science journal articles. In: Proceedings of the Computational Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44 (2000)
Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Article Google Scholar
Gu, B.: Recognizing nested named entities in GENIA corpus. In: Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology at HLT-NAACL 2006, pp. 112–113 (2006)
Google Scholar
Henriksson, A., Dalianis, H., Kowalski, S.: Generating features for named entity recognition by learning prototypes in semantic space: the case of de-identifying health records. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 450–457 (2014)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: NAACL-Short 2006, pp. 57–60 (2006)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015)
Google Scholar
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2181–2187. AAAI (2015)
Google Scholar
Luo, G., Huang, X., Nie, Z., Lin, C.-Y.: Joint named entity recognition and disambiguation. In: EMNLP, pp. 879–888 (2015)
Google Scholar
Ma, X., Hovy, E.H.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. CoRR abs/1603.01354 (2016)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL (2009)
Google Scholar
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: HLT-NAACL 2003, vol. 4, pp. 142–147 (2003)
Google Scholar
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. Trans. Knowl. Data Eng. 27, 443–460 (2015)
Article Google Scholar
Tomori, S., Ninomiya, T., Mori, S.: Domain specific named entity recognition referring to the real world by deep neural networks. In: ACL, vol. 2, Short Papers (2016)
Google Scholar
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theor. 13, 260–269 (2006)
Article Google Scholar
Wang, J., Zhao, W.X., Wei, H., Yan, H., Li, X.: Mining new business opportunities: identifying trend related products by leveraging commercial intents from microblogs. In: EMNLP, pp. 1337–1347 (2013)
Google Scholar
Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: A unified tagging solution: bidirectional LSTM recurrent neural network with word embedding. CoRR abs/1511.00215 (2015)
Google Scholar
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI, pp. 1112–1119 (2014)
Google Scholar
Yang, Z., Lin, H., Li, Y.: Exploiting the contextual cues for bio-entity name recognition in biomedical literature. J. Biomed. Inform. 41, 580–587 (2008)
Article Google Scholar
Zhang, J., Li, J., Li, X.-L., Shi, Y., Li, J., Wang, Z.: Domain-specific entity linking via fake named entity detection. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9642, pp. 101–116. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32025-0_7
Chapter Google Scholar
Zhang, J., Cao, Y., Hou, L., Li, J., Zheng, H.-T.: XLink: an unsupervised bilingual entity linking system. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds.) CCL/NLP-NABD -2017. LNCS (LNAI), vol. 10565, pp. 172–183. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69005-6_15
Chapter Google Scholar
Zhao, S.: Named entity recognition in biomedical texts using an hmm model. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, JNLPBA 2004, pp. 84–87 (2004)
Google Scholar

Download references

Acknowledgments

The work is supported by major national research and development projects (2017YFB1002101), NSFC key project (U1736204, 61661146007), Fund of Online Education Research Center, Ministry of Education (No. 2016ZD102), and THU-NUS NExT Co-Lab.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Jiangtao Zhang, Juanzi Li, Yixin Cao, Lei Hou & Shuai Wang
Institute for Infocomm Research, A*STAR, Singapore, 138632, Singapore
Xiao-Li Li

Authors

Jiangtao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Juanzi Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Li Li
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Cao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Hou
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiangtao Zhang .

Editor information

Editors and Affiliations

Simon Fraser University, Burnaby, BC, Canada
Jian Pei
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
University of Queensland, Brisbane, QLD, Australia
Shazia Sadiq
University of Western Australia, Crawley, WA, Australia
Jianxin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Li, J., Li, XL., Cao, Y., Hou, L., Wang, S. (2018). Is a Common Phrase an Entity Mention or Not? Dual Representations for Domain-Specific Named Entity Recognition. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_53

Download citation

DOI: https://doi.org/10.1007/978-3-319-91452-7_53
Published: 13 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91451-0
Online ISBN: 978-3-319-91452-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics