Fine-Grained Entity Typing via Label Noise Reduction and Data Augmentation

Li, Haoyang; Lin, Xueling; Chen, Lei

doi:10.1007/978-3-030-73194-6_24

Haoyang Li¹⁶,
Xueling Lin¹⁶ &
Lei Chen¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12681))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

2819 Accesses
1 Citations

Abstract

Fine-grained entity typing aims to assign one or more types for entity mentions in the corpus. Recently, distant supervision has been utilized to generate training data. However, it has two drawbacks. First, the same labels are assigned to every entity mention in a context-agnostic manner, which introduces label noise. Some approaches alleviate this issue by hand-crafted features. However, they require efforts from experts. Second, the entity mentions out of Knowledge Base (KB) are ignored and hence cannot be added to the training data, which decreases the size of the training data. Furthermore, the existing entity typing systems neglect the types of other entity mentions in the same context which provide evidence to infer the types of the target entity mentions. In this paper, we first propose graph-based and sampling-based approaches, to reduce the label noise generated by the distant supervision, and then augment the training data by finding potential entity mentions in the corpus and inferring their types. Moreover, we propose a hierarchical neural network, which involves the types of other mentions in the context and satisfies the type consistency, to predict the types. Experiments on two datasets show that our system outperforms state-of-the-art entity typing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.ncbi.nlm.nih.gov/research/bionlp/Data/.

References

Althaus, E., Blumenstock, M., Disterhoft, A., Hildebrandt, A., Krupp, M.: Algorithms for the maximum weight connected \(k\)-induced subgraph problem. In: Zhang, Z., Wu, L., Xu, W., Du, D.-Z. (eds.) COCOA 2014. LNCS, vol. 8881, pp. 268–282. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12691-3_21
Chapter Google Scholar
Anand, A., et al.: Fine-grained entity type classification by jointly learning representations and label embeddings. arXiv (2017)
Google Scholar
Bergstra, J., et al.: Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In: SciPy, pp. 13–20. Citeseer (2013)
Google Scholar
Cui, W., et al.: KBQA: learning question answering over QA corpora and knowledge bases. PVLDB 10(5), 565–576 (2017)
Google Scholar
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gillick, D., et al.: Context-dependent fine-grained entity type tagging (2014)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hu, Z., Yang, Z., et al.: Toward controlled generation of text. In: ICML, vol. 70, pp. 1587–1596. JMLR. org (2017)
Google Scholar
Kobayashi, S.: Contextual augmentation: data augmentation by words with paradigmatic relations. arXiv:1805.06201 (2018)
Lehmann, J., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)
Article Google Scholar
Lin, X., Chen, L.: Canonicalization of open knowledge bases with side information from the source text. In: ICDE, pp. 950–961. IEEE (2019)
Google Scholar
Lin, X., et al.: KBPearl: a knowledge base population system supported by joint entity and relation linking. PVLDB 13(7), 1035–1049 (2020)
Google Scholar
Loper, E., et al.: Nltk: the natural language toolkit. arXiv preprint (2002)
Google Scholar
Mendes, P.N., et al.: DBpedia spotlight: shedding light on the web of documents. In: I-SEMANTICS, pp. 1–8. ACM (2011)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL|AFNLP, pp. 1003–1011 (2009)
Google Scholar
Nadeau, D., et al.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Ren, X., El-Kishky, A., et al.: Clustype: effective entity recognition and typing by relation phrase-based clustering. In: SIGKDD. ACM (2015)
Google Scholar
Ren, X., He, W.O.: Label noise reduction in entity typing by heterogeneous partial-label embedding. In: SIGKDD, pp. 1825–1834. ACM (2016)
Google Scholar
Ren, X., et al.: AFET: automatic fine-grained entity typing by hierarchical partial-label embedding. In: EMNLP, pp. 1369–1378 (2016)
Google Scholar
Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C.R., Han, J.: Automated phrase mining from massive text corpora. IEEE TKDE 30(10), 1825–1837 (2018)
Google Scholar
Shimaoka, S., et al.: An attentive neural architecture for fine-grained entity type classification. arXiv preprint arXiv:1604.05525 (2016)
Wei, J.W., et al.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. arXiv:1901.11196 (2019)
Xin, J., Lin, Y., Liu, Z., Sun, M.: Improving neural fine-grained entity typing with knowledge attention. In: AAAI (2018)
Google Scholar
Xu, D., et al.: A survey on multi-output learning. arXiv (2019)
Google Scholar
Xu, P., Barbosa, D.: Neural fine-grained entity type classification with hierarchy-aware loss. arXiv:1803.03378 (2018)
Yogatama, D., et al.: Embedding methods for fine grained entity type classification. In: ACL|IJCNLP (2015)
Google Scholar
Zeng, D., et al.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: EMNLP, pp. 1753–1762 (2015)
Google Scholar

Download references

Acknowledgment

This work is partially supported by the Hong Kong RGC GRF Project 16202218, CRF Project C6030-18G, C1031-18G, C5026-18G, AOE Project AoE/E-603/18, China NSFC No. 61729201, Guangdong Basic and Applied Basic Research Foundation 2019B151530001, Hong Kong ITC ITF grants ITS/044/18FX and ITS/470/18FX, Microsoft Research Asia Collaborative Research Grant, Didi-HKUST joint research lab project, and Wechat and Webank Research Grants.

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Hong Kong, China
Haoyang Li, Xueling Lin & Lei Chen

Authors

Haoyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Xueling Lin
View author publications
You can also search for this author in PubMed Google Scholar
Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haoyang Li .

Editor information

Editors and Affiliations

Aalborg University, Aalborg, Denmark
Christian S. Jensen
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Academia Sinica, Taipei, Taiwan
De-Nian Yang
The Pennsylvania State University, University Park, PA, USA
Wang-Chien Lee
National Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Athens University of Economics and Business, Athens, Greece
Vana Kalogeraki
National Cheng Kung University, Tainan City, Taiwan
Jen-Wei Huang
National Tsing Hua University, Hsinchu, Taiwan
Chih-Ya Shen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Lin, X., Chen, L. (2021). Fine-Grained Entity Typing via Label Noise Reduction and Data Augmentation. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12681. Springer, Cham. https://doi.org/10.1007/978-3-030-73194-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-73194-6_24
Published: 06 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73193-9
Online ISBN: 978-3-030-73194-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics