Abstract
Fine-grained entity typing aims to assign one or more types for entity mentions in the corpus. Recently, distant supervision has been utilized to generate training data. However, it has two drawbacks. First, the same labels are assigned to every entity mention in a context-agnostic manner, which introduces label noise. Some approaches alleviate this issue by hand-crafted features. However, they require efforts from experts. Second, the entity mentions out of Knowledge Base (KB) are ignored and hence cannot be added to the training data, which decreases the size of the training data. Furthermore, the existing entity typing systems neglect the types of other entity mentions in the same context which provide evidence to infer the types of the target entity mentions. In this paper, we first propose graph-based and sampling-based approaches, to reduce the label noise generated by the distant supervision, and then augment the training data by finding potential entity mentions in the corpus and inferring their types. Moreover, we propose a hierarchical neural network, which involves the types of other mentions in the context and satisfies the type consistency, to predict the types. Experiments on two datasets show that our system outperforms state-of-the-art entity typing systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Althaus, E., Blumenstock, M., Disterhoft, A., Hildebrandt, A., Krupp, M.: Algorithms for the maximum weight connected \(k\)-induced subgraph problem. In: Zhang, Z., Wu, L., Xu, W., Du, D.-Z. (eds.) COCOA 2014. LNCS, vol. 8881, pp. 268–282. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12691-3_21
Anand, A., et al.: Fine-grained entity type classification by jointly learning representations and label embeddings. arXiv (2017)
Bergstra, J., et al.: Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In: SciPy, pp. 13–20. Citeseer (2013)
Cui, W., et al.: KBQA: learning question answering over QA corpora and knowledge bases. PVLDB 10(5), 565–576 (2017)
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gillick, D., et al.: Context-dependent fine-grained entity type tagging (2014)
He, K., et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Hu, Z., Yang, Z., et al.: Toward controlled generation of text. In: ICML, vol. 70, pp. 1587–1596. JMLR. org (2017)
Kobayashi, S.: Contextual augmentation: data augmentation by words with paradigmatic relations. arXiv:1805.06201 (2018)
Lehmann, J., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)
Lin, X., Chen, L.: Canonicalization of open knowledge bases with side information from the source text. In: ICDE, pp. 950–961. IEEE (2019)
Lin, X., et al.: KBPearl: a knowledge base population system supported by joint entity and relation linking. PVLDB 13(7), 1035–1049 (2020)
Loper, E., et al.: Nltk: the natural language toolkit. arXiv preprint (2002)
Mendes, P.N., et al.: DBpedia spotlight: shedding light on the web of documents. In: I-SEMANTICS, pp. 1–8. ACM (2011)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL|AFNLP, pp. 1003–1011 (2009)
Nadeau, D., et al.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Ren, X., El-Kishky, A., et al.: Clustype: effective entity recognition and typing by relation phrase-based clustering. In: SIGKDD. ACM (2015)
Ren, X., He, W.O.: Label noise reduction in entity typing by heterogeneous partial-label embedding. In: SIGKDD, pp. 1825–1834. ACM (2016)
Ren, X., et al.: AFET: automatic fine-grained entity typing by hierarchical partial-label embedding. In: EMNLP, pp. 1369–1378 (2016)
Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C.R., Han, J.: Automated phrase mining from massive text corpora. IEEE TKDE 30(10), 1825–1837 (2018)
Shimaoka, S., et al.: An attentive neural architecture for fine-grained entity type classification. arXiv preprint arXiv:1604.05525 (2016)
Wei, J.W., et al.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. arXiv:1901.11196 (2019)
Xin, J., Lin, Y., Liu, Z., Sun, M.: Improving neural fine-grained entity typing with knowledge attention. In: AAAI (2018)
Xu, D., et al.: A survey on multi-output learning. arXiv (2019)
Xu, P., Barbosa, D.: Neural fine-grained entity type classification with hierarchy-aware loss. arXiv:1803.03378 (2018)
Yogatama, D., et al.: Embedding methods for fine grained entity type classification. In: ACL|IJCNLP (2015)
Zeng, D., et al.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: EMNLP, pp. 1753–1762 (2015)
Acknowledgment
This work is partially supported by the Hong Kong RGC GRF Project 16202218, CRF Project C6030-18G, C1031-18G, C5026-18G, AOE Project AoE/E-603/18, China NSFC No. 61729201, Guangdong Basic and Applied Basic Research Foundation 2019B151530001, Hong Kong ITC ITF grants ITS/044/18FX and ITS/470/18FX, Microsoft Research Asia Collaborative Research Grant, Didi-HKUST joint research lab project, and Wechat and Webank Research Grants.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, H., Lin, X., Chen, L. (2021). Fine-Grained Entity Typing via Label Noise Reduction and Data Augmentation. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12681. Springer, Cham. https://doi.org/10.1007/978-3-030-73194-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-73194-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73193-9
Online ISBN: 978-3-030-73194-6
eBook Packages: Computer ScienceComputer Science (R0)