Evaluation of a New Representation for Noise Reduction in Distant Supervision

García-Mendoza, Juan-Luis; Villaseñor-Pineda, Luis; Buscaldi, Davide; Bustio-Martínez, Lázaro; Orihuela-Espina, Felipe

doi:10.1007/978-3-031-19496-2_8

Juan-Luis García-Mendoza¹⁰,
Luis Villaseñor-Pineda¹⁰,
Davide Buscaldi¹¹,
Lázaro Bustio-Martínez¹² &
…
Felipe Orihuela-Espina^10,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13613))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

Abstract

Distant Supervision is a relation extraction approach that allows automatic labeling of a dataset. However, this labeling introduces noise in the labels (e.g., when two entities in a sentence are automatically labeled with an invalid relation). Noise in labels makes difficult the relation extraction task. This noise is precisely one of the main challenges of this task. Until now, the methods that incorporate a previous noise reduction step do not evaluate the performance of this step. This paper evaluates the noise reduction using a new representation obtained with autoencoders. In addition, it was incoporated more information to the input of the autoencoder proposed in the state-of-the-art to improve the representation over which the noise is reduced. Also, three methods were proposed to select the instances considered as real. As a result, it was obtained the highest values of the area under the ROC curves using the improved input combined with state-of-the-art anomaly detection methods. Moreover, the three proposed selection methods significantly improve the existing method in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://developers.google.com/freebase/.
2.
Available in http://iesl.cs.umass.edu/riedel/ecml/.

References

Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)
Article Google Scholar
Bastos, A., et al.: RECON: relation extraction using knowledge graph context in a graph neural network. In: Proceedings of the Web Conference 2021, pp. 1673–1685 (2020)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of Data, pp. 93–104 (2000)
Google Scholar
Cer, D., et al.: Universal Sentence Encoder. arXiv:1803.11175v2 [cs.CL] p. 7 (2018)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of Second International Conference on Knowledge Discovery and Data Mining (KDD-1996), vol. 96(34), pp. 226–231 (1996)
Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: open information extraction: the second generation. Int. Joint Conf. Artif. Intell. 11, 3–10 (2011)
Google Scholar
García-Mendoza, J.L., Villaseñor-Pineda, L., Orihuela-Espina, F., Bustio-Martínez, L.: An autoencoder-based representation for noise reduction in distant supervision of relation extraction. J. Intell. Fuzzy Syst. 42(5), 4523–4529 (2022)
Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics (COLING-1992), pp. 539–545 (1992)
Google Scholar
Hendrickx, I., et al.: SemEval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010, pp. 94–99 (2010)
Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)
MathSciNet MATH Google Scholar
Jat, S., Khandelwal, S., Talukdar, P.: Improving distantly supervised relation extraction using word and entity based attention. In: 6th Workshop on Automated Knowledge Base Construction, AKBC@NIPS 2017 (2017)
Google Scholar
Ji, G., Liu, K., He, S., Zhao, J.: Distant supervision for relation extraction with sentence-level attention and entity descriptions. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-2017), pp. 3060–3066 (2017)
Google Scholar
Kim, J.T., Moldovan, D.I.: Acquisition of semantic patterns for information extraction from corpora. In: Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications, pp. 171–176 (1993)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 831–838. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_86
Chapter Google Scholar
Lin, Y., Shen, S., Liu, Z., Luan, H., Sun, M.: Neural relation extraction with selective attention over instances. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 2124–2133 (2016)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining, pp. 413–422. ICDM 2008 (2008)
Google Scholar
Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692(2019)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. In: International Conference on Learning Representations (ICLR) Workshop (2016)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th Annual Meeting of the ACL, pp. 1003–1011 (2009)
Google Scholar
Pang, G., Shen, C., Cao, L., Van den Hengel, A.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 1(1), 3569–3570 (2020)
Google Scholar
Pevný, T.: LODA: lightweight on-line detector of anomalies. Mach. Learn. 102(2), 275–304 (2016)
Article MathSciNet Google Scholar
Piskorski, J., Yangarber, R.: Information extraction: past, present and future. In: oibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing, pp. 23–49. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-28569-1_2
Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_10
Chapter Google Scholar
Ru, C., Tang, J., Li, S., Xie, S., Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction. Inf. Process. Manage. 54(4), 593–608 (2018)
Article Google Scholar
Shyu, M.L., Chen, S.C., Sarinnapakorn, K., Chang, L.: A Novel Anomaly Detection Scheme Based on Principal Component Classifier. Technical report (2003)
Google Scholar
Smirnova, A., Cudré-Mauroux, P.: Relation extraction using distant supervision: a survey. ACM Comput. Surv. 51(5), 1–35 (2018)
Article Google Scholar
Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34, 233–272 (1999)
Article Google Scholar
Takamatsu, S., Sato, I., Nakagawa, H.: Reducing wrong labels in distant supervision for relation extraction. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 721–729 (2012)
Google Scholar
Vashishth, S., Joshi, R., Prayaga, S.S., Bhattacharyya, C., Talukdar, P.: Reside: improving distantly-supervised neural relation extraction using side information. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1257–1266 (2018)
Google Scholar
Ye, Z.X., Ling, Z.H.: Distant supervision relation extraction with intra-bag and inter-bag attentions. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2810–2819 (2019)
Google Scholar
Zeng, D., Liu, K., Chen, Y., Zhao, J.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1753–1762 (2015)
Google Scholar
Zhou, P., Xu, J., Qi, Z., Bao, H., Chen, Z., Xu, B.: Distant supervision for relation extraction with hierarchical selective attention. Neural Netw. 108, 240–247 (2018)
Article Google Scholar

Download references

Acknowledgements

The present work was supported by CONACyT/México (scholarship 937210 and grant CB-2015-01-257383) and Labex EFL through EFL mobility grants. Additionally, the authors thank CONACYT for the computer resources provided through the INAOE Supercomputing Laboratory’s Deep Learning Platform for Language Technologies.

Author information

Authors and Affiliations

Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Puebla, Mexico
Juan-Luis García-Mendoza, Luis Villaseñor-Pineda & Felipe Orihuela-Espina
Université Sorbonne Paris Nord, LIPN, Villetaneuse, France
Davide Buscaldi
Universidad Iberoamericana, DEII, CDMX, Mexico
Lázaro Bustio-Martínez
University of Birmigham, Birmingham, UK
Felipe Orihuela-Espina

Authors

Juan-Luis García-Mendoza
View author publications
You can also search for this author in PubMed Google Scholar
Luis Villaseñor-Pineda
View author publications
You can also search for this author in PubMed Google Scholar
Davide Buscaldi
View author publications
You can also search for this author in PubMed Google Scholar
Lázaro Bustio-Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Felipe Orihuela-Espina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan-Luis García-Mendoza .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico, Mexico
Obdulia Pichardo Lagunas
Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Baja California, Mexico
Juan Martínez-Miranda
Instituto Politécnico Nacional, Mexico, Mexico
Bella Martínez Seis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García-Mendoza, JL., Villaseñor-Pineda, L., Buscaldi, D., Bustio-Martínez, L., Orihuela-Espina, F. (2022). Evaluation of a New Representation for Noise Reduction in Distant Supervision. In: Pichardo Lagunas, O., Martínez-Miranda, J., Martínez Seis, B. (eds) Advances in Computational Intelligence. MICAI 2022. Lecture Notes in Computer Science(), vol 13613. Springer, Cham. https://doi.org/10.1007/978-3-031-19496-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-19496-2_8
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19495-5
Online ISBN: 978-3-031-19496-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation of a New Representation for Noise Reduction in Distant Supervision