Skip to main content

Fine-Grained Entity Typing via Label Noise Reduction and Data Augmentation

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12681))

Included in the following conference series:

Abstract

Fine-grained entity typing aims to assign one or more types for entity mentions in the corpus. Recently, distant supervision has been utilized to generate training data. However, it has two drawbacks. First, the same labels are assigned to every entity mention in a context-agnostic manner, which introduces label noise. Some approaches alleviate this issue by hand-crafted features. However, they require efforts from experts. Second, the entity mentions out of Knowledge Base (KB) are ignored and hence cannot be added to the training data, which decreases the size of the training data. Furthermore, the existing entity typing systems neglect the types of other entity mentions in the same context which provide evidence to infer the types of the target entity mentions. In this paper, we first propose graph-based and sampling-based approaches, to reduce the label noise generated by the distant supervision, and then augment the training data by finding potential entity mentions in the corpus and inferring their types. Moreover, we propose a hierarchical neural network, which involves the types of other mentions in the context and satisfies the type consistency, to predict the types. Experiments on two datasets show that our system outperforms state-of-the-art entity typing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ncbi.nlm.nih.gov/research/bionlp/Data/.

References

  1. Althaus, E., Blumenstock, M., Disterhoft, A., Hildebrandt, A., Krupp, M.: Algorithms for the maximum weight connected \(k\)-induced subgraph problem. In: Zhang, Z., Wu, L., Xu, W., Du, D.-Z. (eds.) COCOA 2014. LNCS, vol. 8881, pp. 268–282. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12691-3_21

    Chapter  Google Scholar 

  2. Anand, A., et al.: Fine-grained entity type classification by jointly learning representations and label embeddings. arXiv (2017)

    Google Scholar 

  3. Bergstra, J., et al.: Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In: SciPy, pp. 13–20. Citeseer (2013)

    Google Scholar 

  4. Cui, W., et al.: KBQA: learning question answering over QA corpora and knowledge bases. PVLDB 10(5), 565–576 (2017)

    Google Scholar 

  5. Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  6. Gillick, D., et al.: Context-dependent fine-grained entity type tagging (2014)

    Google Scholar 

  7. He, K., et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  8. Hu, Z., Yang, Z., et al.: Toward controlled generation of text. In: ICML, vol. 70, pp. 1587–1596. JMLR. org (2017)

    Google Scholar 

  9. Kobayashi, S.: Contextual augmentation: data augmentation by words with paradigmatic relations. arXiv:1805.06201 (2018)

  10. Lehmann, J., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)

    Article  Google Scholar 

  11. Lin, X., Chen, L.: Canonicalization of open knowledge bases with side information from the source text. In: ICDE, pp. 950–961. IEEE (2019)

    Google Scholar 

  12. Lin, X., et al.: KBPearl: a knowledge base population system supported by joint entity and relation linking. PVLDB 13(7), 1035–1049 (2020)

    Google Scholar 

  13. Loper, E., et al.: Nltk: the natural language toolkit. arXiv preprint (2002)

    Google Scholar 

  14. Mendes, P.N., et al.: DBpedia spotlight: shedding light on the web of documents. In: I-SEMANTICS, pp. 1–8. ACM (2011)

    Google Scholar 

  15. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL|AFNLP, pp. 1003–1011 (2009)

    Google Scholar 

  16. Nadeau, D., et al.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  17. Ren, X., El-Kishky, A., et al.: Clustype: effective entity recognition and typing by relation phrase-based clustering. In: SIGKDD. ACM (2015)

    Google Scholar 

  18. Ren, X., He, W.O.: Label noise reduction in entity typing by heterogeneous partial-label embedding. In: SIGKDD, pp. 1825–1834. ACM (2016)

    Google Scholar 

  19. Ren, X., et al.: AFET: automatic fine-grained entity typing by hierarchical partial-label embedding. In: EMNLP, pp. 1369–1378 (2016)

    Google Scholar 

  20. Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C.R., Han, J.: Automated phrase mining from massive text corpora. IEEE TKDE 30(10), 1825–1837 (2018)

    Google Scholar 

  21. Shimaoka, S., et al.: An attentive neural architecture for fine-grained entity type classification. arXiv preprint arXiv:1604.05525 (2016)

  22. Wei, J.W., et al.: EDA: easy data augmentation techniques for boosting performance on text classification tasks. arXiv:1901.11196 (2019)

  23. Xin, J., Lin, Y., Liu, Z., Sun, M.: Improving neural fine-grained entity typing with knowledge attention. In: AAAI (2018)

    Google Scholar 

  24. Xu, D., et al.: A survey on multi-output learning. arXiv (2019)

    Google Scholar 

  25. Xu, P., Barbosa, D.: Neural fine-grained entity type classification with hierarchy-aware loss. arXiv:1803.03378 (2018)

  26. Yogatama, D., et al.: Embedding methods for fine grained entity type classification. In: ACL|IJCNLP (2015)

    Google Scholar 

  27. Zeng, D., et al.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: EMNLP, pp. 1753–1762 (2015)

    Google Scholar 

Download references

Acknowledgment

This work is partially supported by the Hong Kong RGC GRF Project 16202218, CRF Project C6030-18G, C1031-18G, C5026-18G, AOE Project AoE/E-603/18, China NSFC No. 61729201, Guangdong Basic and Applied Basic Research Foundation 2019B151530001, Hong Kong ITC ITF grants ITS/044/18FX and ITS/470/18FX, Microsoft Research Asia Collaborative Research Grant, Didi-HKUST joint research lab project, and Wechat and Webank Research Grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haoyang Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, H., Lin, X., Chen, L. (2021). Fine-Grained Entity Typing via Label Noise Reduction and Data Augmentation. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12681. Springer, Cham. https://doi.org/10.1007/978-3-030-73194-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73194-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73193-9

  • Online ISBN: 978-3-030-73194-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics