End-to-End Neural Relation Extraction Using Deep Biaffine Attention

Nguyen, Dat Quoc; Verspoor, Karin

doi:10.1007/978-3-030-15712-8_47

Dat Quoc Nguyen²⁰ &
Karin Verspoor²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11437))

Included in the following conference series:

European Conference on Information Retrieval

3104 Accesses
30 Citations
6 Altmetric

Abstract

We propose a neural network model for joint extraction of named entities and relations between them, without any hand-crafted features. The key contribution of our model is to extend a BiLSTM-CRF-based entity recognition model with a deep biaffine attention layer to model second-order interactions between latent features for relation classification, specifically attending to the role of an entity in a directional relationship. On the benchmark “relation and entity recognition” dataset CoNLL04, experimental results show that our model outperforms previous models, producing new state-of-the-art performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adel, H., Schütze, H.: Global normalization of convolutional neural networks for joint entity and relation classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1723–1729 (2017)
Google Scholar
Bach, N., Badaskar, S.: A review of relation extraction. Carnegie Mellon University, Technical Report (2007)
Google Scholar
Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMs. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 349–359 (2015)
Google Scholar
Bekoulis, G., Deleu, J., Demeester, T., Develder, C.: Adversarial training for multi-context joint entity and relation extraction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2830–2836 (2018)
Google Scholar
Bekoulis, G., Deleu, J., Demeester, T., Develder, C.: Joint entity recognition and relation extraction as a multi-head selection problem. Expert Syst. Appl. 114, 34–45 (2018)
Article Google Scholar
Blanco, R., Cambazoglu, B.B., Mika, P., Torzec, N.: Entity recommendations in web search. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 33–48. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_3
Chapter Google Scholar
Dozat, T., Manning, C.D.: Deep Biaffine attention for neural dependency parsing. In: Proceedings of the 5th International Conference on Learning Representations (2017)
Google Scholar
Dozat, T., Qi, P., Manning, C.D.: Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 20–30 (2017)
Google Scholar
Gupta, P., Schütze, H., Andrassy, B.: Table filling multi-task recurrent neural network for joint entity and relation extraction. In: Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2537–2547 (2016)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv preprint arXiv:1508.01991 (2015)
Jiang, J.: Information extraction from text. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 11–41. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-3223-4
Google Scholar
Kate, R.J., Mooney, R.J.: Joint entity and relation extraction using card-pyramid parsing. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 203–212 (2010)
Google Scholar
Katiyar, A., Cardie, C.: Going out on a limb: joint extraction of entity mentions and relations without dependency trees. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 917–928 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014)
Google Scholar
Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. ACL 4, 313–327 (2016)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260–270 (2016)
Google Scholar
Li, Q., Ji, H.: Incremental joint extraction of entity mentions and relations. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 402–412 (2014)
Google Scholar
Miwa, M., Bansal, M.: End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1116 (2016)
Google Scholar
Miwa, M., Sasaki, Y.: Modeling joint entity and relation extraction with table representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1858–1869 (2014)
Google Scholar
Neubig, G., et al.: DyNet: The Dynamic Neural Network Toolkit. arXiv preprint arXiv:1701.03980 (2017)
Nguyen, T.H., Grishman, R.: Combining neural networks and log-linear models to improve relation extraction. In: Proceedings of IJCAI Workshop on Deep Learning for Artificial Intelligence (2016)
Google Scholar
Pawar, S., Bhattacharyya, P., Palshikar, G.: End-to-end relation extraction using neural networks and markov logic networks. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 818–827 (2017)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Google Scholar
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 147–155 (2009)
Google Scholar
Roth, D., tau Yih, W.: Global inference for entity and relation identification via a linear programming formulation. In: Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Google Scholar
Roth, D., Yih, W.T.: A linear programming formulation for global inference in natural language tasks. In: Proceedings of the 8th Conference on Computational Natural Language Learning, pp. 1–8 (2004)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Thomas, P., Starlinger, J., Vowinkel, A., Arzt, S., Leser, U.: GeneView: a comprehensive semantic search engine for PubMed. Nucleic Acids Res. 40(W1), W585–W591 (2012)
Article Google Scholar
Wang, S., Zhang, Y., Che, W., Liu, T.: Joint extraction of entities and relations based on a novel graph scheme. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 4461–4467 (2018)
Google Scholar
Zhang, M., Zhang, Y., Fu, G.: End-to-end neural relation extraction with global optimization. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1730–1740 (2017)
Google Scholar
Zheng, S., et al.: Joint entity and relation extraction based on a hybrid neural network. Neurocomputing 257, 59–66 (2017)
Article Google Scholar
Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P., Xu, B.: Joint extraction of entities and relations based on a novel tagging scheme. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1227–1236 (2017)
Google Scholar

Download references

Acknowledgments

This work was supported by the ARC projects DP150101550 and LP160101469.

Author information

Authors and Affiliations

The University of Melbourne, Melbourne, Australia
Dat Quoc Nguyen & Karin Verspoor

Authors

Dat Quoc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Karin Verspoor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dat Quoc Nguyen .

Editor information

Editors and Affiliations

University of Strathclyde, Glasgow, UK
Leif Azzopardi
Bauhaus Universität Weimar, Weimar, Germany
Benno Stein
Universität Duisburg-Essen, Duisburg, Germany
Norbert Fuhr
GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
Philipp Mayr
Delft University of Technology, Delft, The Netherlands
Claudia Hauff
University of Twente, Enschede, The Netherlands
Djoerd Hiemstra

Appendix

Implementation Details: We apply dropout [28] with a 67% keep probability to the inputs of BiLSTMs and FFNNs. Following [15], we also use word dropout to learn an embedding for unknown words: we replace each word token w appearing \(\#(w)\) times in the training set with a special “unk” symbol with probability \(\mathsf {p}_{unk}(w) = \frac{0.25}{0.25 + \#(w)}\).

Word embeddings are initialized by the 100-dimensional pre-trained GloVe word vectors [24], while character and NER label embeddings are initialized randomly. All these embeddings are then updated during training. For learning character-level word embeddings, we set the size of LSTM hidden states in \(\mathrm {BiLSTM}_{\text {char}}\) to be equal to the size of character embeddings. Here, we perform a minimal grid search of hyper-parameters for Setup 1, resulting in the Adam initial learning rate of 0.0005, the character embedding size of 25, the NER label embedding size of 100, the size of the output layers of both \(\mathrm {FFNN}_{\text {head}}\) and \(\mathrm {FFNN}_{\text {tail}}\) at 100, the number of \(\mathrm {BiLSTM}_{\text {NER}}\) and \(\mathrm {BiLSTM}_{\text {RC}}\) layers at 2 and the size of LSTM hidden states in each layer at 100. These optimal hyper-parameters for Setup 1 are then reused for Setup 2 where we additionally use the boundary tag embedding size of 100.

Metric: Similar to the previous works, when computing the macro-averaged F1 scores, we omit the entity label “Other” and the negative relation “NEG”. Here, for NER an entity is predicted correctly if both the entity boundaries and the entity type are correct, while for EC a multi-token entity is considered as correct if at least one of its comprising tokens is predicted correctly. In all cases, a relation is scored as correct if both the argument entities and the relation type are correct.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, D.Q., Verspoor, K. (2019). End-to-End Neural Relation Extraction Using Deep Biaffine Attention. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11437. Springer, Cham. https://doi.org/10.1007/978-3-030-15712-8_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-15712-8_47
Published: 07 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15711-1
Online ISBN: 978-3-030-15712-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

End-to-End Neural Relation Extraction Using Deep Biaffine Attention

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation