Sequential Short-Text Classification from Multiple Textual Representations with Weak Supervision

Filho, Ivan J. Reis; Martins, Luiz H. D.; Parmezan, Antonio R. S.; Marcacini, Ricardo M.; Rezende, Solange O.

doi:10.1007/978-3-031-21686-2_12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13653))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

696 Accesses

Abstract

The amount of news generated on the internet has increased significantly in recent years. As a trend, text data has gained attention from industry, government, academia, and the financial market. This information is potentially valuable to assist domain experts in decision making. Therefore, related applications based on machine learning have been widely available in several areas of knowledge. However, for supervised learning tasks, the availability of annotated texts in quantity and quality is a recurring problem. This work proposes a time-series-driven approach to labeling chronologically arranged documents. Our proposal categorizes short texts for a particular domain according to the level and trend patterns of a given time series. We use the obtained weak labels with the understanding that they are imperfect but still useful for building predictive text models. Documents and agribusiness commodity price series were employed to assess performance in four classification scenarios. The experimental evaluation considered nine textual representations and different learning paradigms. Neural language-based models demonstrated better classification performance than traditional ones. The results indicate that the proposed approach can be an alternative for automatically labeling a large news volume.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Scheme for News Article Classification in a Low-Resource Language

Transformers are Short-Text Classifiers

News Title Classification with Support from Auxiliary Long Texts

Notes

1.
https://www.noticiasagricolas.com.br/.

References

Aggarwal, C.C.: Machine Learning for Text. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73531-3
Alfonseca, E., Filippova, K., Delort, J.Y., Garrido, G.: Pattern learning for relation extraction with a hierarchical topic model. In: Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 54–59 (2012)
Google Scholar
Anklin, V., et al.: Learning whole-slide segmentation from inexact and incomplete labels using tissue graphs. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12902, pp. 636–646. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_59
Araujo, A.F., Gôlo, M.P., Marcacini, R.M.: Opinion mining for app reviews: an analysis of textual representation and predictive models. Autom. Software Eng. 29(1), 1–30 (2022)
Article Google Scholar
Batista-Navarro, R., Hawkins, O.: Topic modelling vs distant supervision: a comparative evaluation based on the classification of parliamentary enquiries. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) TPDL 2019. LNCS, vol. 11799, pp. 415–419. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30760-8_46
Boecking, B., Neiswanger, W., Xing, E., Dubrawski, A.: Interactive weak supervision: learning useful heuristics for data labeling. arXiv preprint arXiv:2012.06046 (2020)
Chatfield, C., Xing, H.: The Analysis of Time Series: An Introduction with R. CRC Press (2019)
Google Scholar
Chen, L.M., Xiu, B.X., Ding, Z.Y.: Multiple weak supervision for short text classification. Appl. Intell. 1–16 (2022)
Google Scholar
Dai, E., Shu, K., Sun, Y., Wang, S.: Labeled data generation with inexact supervision. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 218–226 (2021)
Google Scholar
De Sa, C., Ratner, A., Ré, C., Shin, J., Wang, F., Wu, S., Zhang, C.: Deepdive: declarative knowledge base construction. ACM SIGMOD Record 45(1), 60–67 (2016)
Article Google Scholar
dos Santos, B.N., Marcacini, R.M., Rezende, S.O.: Multi-domain aspect extraction using bidirectional encoder representations from transformers. IEEE Access 9, 91604–91613 (2021)
Article Google Scholar
Helmstetter, S., Paulheim, H.: Collecting a large scale dataset for classifying fake news tweets using weak supervision. Fut. Internet 13(5), 114 (2021)
Article Google Scholar
Hsieh, C.Y., Lin, W.I., Xu, M., Niu, G., Lin, H.T., Sugiyama, M.: Active refinement for multi-label learning: a pseudo-label approach. arXiv preprint arXiv:2109.14676 (2021)
Janev, V., Pujić, D., Jelić, M., Vidal, M.-E.: Chapter 9 survey on big data applications. In: Janev, V., Graux, D., Jabeen, H., Sallinger, E. (eds.) Knowledge Graphs and Big Data Processing. LNCS, vol. 12072, pp. 149–164. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53199-7_9
Krause, S., Li, H., Uszkoreit, H., Xu, F.: Large-scale learning of relation-extraction rules with distant supervision from the web. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 263–278. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_17
Lison, P., Hubin, A., Barnes, J., Touileb, S.: Named entity recognition without labelled data: a weak supervision approach. arXiv preprint arXiv:2004.14723 (2020)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011 (2009)
Google Scholar
Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: International Conference on Very Large Data Bases, vol. 11, p. 269. NIH Public Access (2017)
Google Scholar
Rekatsinas, T., Chu, X., Ilyas, I.F., Ré, C.: Holoclean: holistic data repairs with probabilistic inference. arXiv preprint arXiv:1702.00820 (2017)
Shi, Y., Xiao, Y., Niu, L.: A brief survey of relation extraction based on distant supervision. In: Rodrigues, J.M.F., et al. (eds.) ICCS 2019. LNCS, vol. 11538, pp. 293–303. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22744-9_23
Shu, K., et al.: Leveraging multi-source weak social supervision for early detection of fake news. arXiv preprint arXiv:2004.01732 (2020)
Souza, F., Nogueira, R., Lotufo, R.: Portuguese named entity recognition using bert-CRF. arXiv preprint arXiv:1909.10649 (2019)
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: Brazilian Conference on Intelligent Systems (2020)
Google Scholar
de Souza, M.C., Nogueira, B.M., Rossi, R.G., Marcacini, R.M., dos Santos, B.N., Rezende, S.O.: A network-based positive and unlabeled learning approach for fake news detection. Mach. Learn. 1–44 (2021)
Google Scholar
Wang, Y., et al.: Weak supervision for fake news detection via reinforcement learning. In: AAAI Conference on Artificial Intelligence, vol. 34, pp. 516–523 (2020)
Google Scholar
Yao, W., Liu, J., Cai, Z.: Personal attributes extraction in chinese text based on distant-supervision and LSTM. In: Park, J.J., Loia, V., Yi, G., Sung, Y. (eds.) CUTE/CSA -2017. LNEE, vol. 474, pp. 511–515. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7605-3_84
Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)
Article Google Scholar

Download references

Acknowledgements

This work was carried out at the Center for Artificial Intelligence (C4AI-USP) and partially supported by the São Paulo Research Foundation (FAPESP) (grant #2019/07665-4) and the IBM Corporation. The authors of this paper thank FAPESP (Process 2019 / 25010-5) and the National Center for Scientific and Technological Development (CNPq) (process 309575/2021-4). The corresponding author thanks the Minas Gerais State Research Support Foundation (FAPEMIG) (Process PCRH BPG-00054-210).

Author information

Authors and Affiliations

Minas Gerais State University, Frutal, Brazil
Ivan J. Reis Filho & Luiz H. D. Martins
University of São Paulo, São Carlos, Brazil
Ivan J. Reis Filho, Antonio R. S. Parmezan, Ricardo M. Marcacini & Solange O. Rezende

Authors

Ivan J. Reis Filho
View author publications
You can also search for this author in PubMed Google Scholar
Luiz H. D. Martins
View author publications
You can also search for this author in PubMed Google Scholar
Antonio R. S. Parmezan
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo M. Marcacini
View author publications
You can also search for this author in PubMed Google Scholar
Solange O. Rezende
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan J. Reis Filho .

Editor information

Editors and Affiliations

Federal University of Rio Grande do Norte, Natal, Brazil
João Carlos Xavier-Junior
Federal University of Bahia, Salvador, Brazil
Ricardo Araújo Rios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Filho, I.J.R., Martins, L.H.D., Parmezan, A.R.S., Marcacini, R.M., Rezende, S.O. (2022). Sequential Short-Text Classification from Multiple Textual Representations with Weak Supervision. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13653. Springer, Cham. https://doi.org/10.1007/978-3-031-21686-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-21686-2_12
Published: 19 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21685-5
Online ISBN: 978-3-031-21686-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sequential Short-Text Classification from Multiple Textual Representations with Weak Supervision

Abstract

Access this chapter

Similar content being viewed by others

A Scheme for News Article Classification in a Low-Resource Language

Transformers are Short-Text Classifiers

News Title Classification with Support from Auxiliary Long Texts

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Sequential Short-Text Classification from Multiple Textual Representations with Weak Supervision

Abstract

Access this chapter

Similar content being viewed by others

A Scheme for News Article Classification in a Low-Resource Language

Transformers are Short-Text Classifiers

News Title Classification with Support from Auxiliary Long Texts

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation