Skip to main content

Sequential Short-Text Classification from Multiple Textual Representations with Weak Supervision

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2022)

Abstract

The amount of news generated on the internet has increased significantly in recent years. As a trend, text data has gained attention from industry, government, academia, and the financial market. This information is potentially valuable to assist domain experts in decision making. Therefore, related applications based on machine learning have been widely available in several areas of knowledge. However, for supervised learning tasks, the availability of annotated texts in quantity and quality is a recurring problem. This work proposes a time-series-driven approach to labeling chronologically arranged documents. Our proposal categorizes short texts for a particular domain according to the level and trend patterns of a given time series. We use the obtained weak labels with the understanding that they are imperfect but still useful for building predictive text models. Documents and agribusiness commodity price series were employed to assess performance in four classification scenarios. The experimental evaluation considered nine textual representations and different learning paradigms. Neural language-based models demonstrated better classification performance than traditional ones. The results indicate that the proposed approach can be an alternative for automatically labeling a large news volume.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.noticiasagricolas.com.br/.

References

  1. Aggarwal, C.C.: Machine Learning for Text. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73531-3

  2. Alfonseca, E., Filippova, K., Delort, J.Y., Garrido, G.: Pattern learning for relation extraction with a hierarchical topic model. In: Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 54–59 (2012)

    Google Scholar 

  3. Anklin, V., et al.: Learning whole-slide segmentation from inexact and incomplete labels using tissue graphs. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12902, pp. 636–646. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_59

  4. Araujo, A.F., Gôlo, M.P., Marcacini, R.M.: Opinion mining for app reviews: an analysis of textual representation and predictive models. Autom. Software Eng. 29(1), 1–30 (2022)

    Article  Google Scholar 

  5. Batista-Navarro, R., Hawkins, O.: Topic modelling vs distant supervision: a comparative evaluation based on the classification of parliamentary enquiries. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds.) TPDL 2019. LNCS, vol. 11799, pp. 415–419. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30760-8_46

  6. Boecking, B., Neiswanger, W., Xing, E., Dubrawski, A.: Interactive weak supervision: learning useful heuristics for data labeling. arXiv preprint arXiv:2012.06046 (2020)

  7. Chatfield, C., Xing, H.: The Analysis of Time Series: An Introduction with R. CRC Press (2019)

    Google Scholar 

  8. Chen, L.M., Xiu, B.X., Ding, Z.Y.: Multiple weak supervision for short text classification. Appl. Intell. 1–16 (2022)

    Google Scholar 

  9. Dai, E., Shu, K., Sun, Y., Wang, S.: Labeled data generation with inexact supervision. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 218–226 (2021)

    Google Scholar 

  10. De Sa, C., Ratner, A., Ré, C., Shin, J., Wang, F., Wu, S., Zhang, C.: Deepdive: declarative knowledge base construction. ACM SIGMOD Record 45(1), 60–67 (2016)

    Article  Google Scholar 

  11. dos Santos, B.N., Marcacini, R.M., Rezende, S.O.: Multi-domain aspect extraction using bidirectional encoder representations from transformers. IEEE Access 9, 91604–91613 (2021)

    Article  Google Scholar 

  12. Helmstetter, S., Paulheim, H.: Collecting a large scale dataset for classifying fake news tweets using weak supervision. Fut. Internet 13(5), 114 (2021)

    Article  Google Scholar 

  13. Hsieh, C.Y., Lin, W.I., Xu, M., Niu, G., Lin, H.T., Sugiyama, M.: Active refinement for multi-label learning: a pseudo-label approach. arXiv preprint arXiv:2109.14676 (2021)

  14. Janev, V., Pujić, D., Jelić, M., Vidal, M.-E.: Chapter 9 survey on big data applications. In: Janev, V., Graux, D., Jabeen, H., Sallinger, E. (eds.) Knowledge Graphs and Big Data Processing. LNCS, vol. 12072, pp. 149–164. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53199-7_9

  15. Krause, S., Li, H., Uszkoreit, H., Xu, F.: Large-scale learning of relation-extraction rules with distant supervision from the web. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 263–278. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_17

  16. Lison, P., Hubin, A., Barnes, J., Touileb, S.: Named entity recognition without labelled data: a weak supervision approach. arXiv preprint arXiv:2004.14723 (2020)

  17. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011 (2009)

    Google Scholar 

  18. Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: International Conference on Very Large Data Bases, vol. 11, p. 269. NIH Public Access (2017)

    Google Scholar 

  19. Rekatsinas, T., Chu, X., Ilyas, I.F., Ré, C.: Holoclean: holistic data repairs with probabilistic inference. arXiv preprint arXiv:1702.00820 (2017)

  20. Shi, Y., Xiao, Y., Niu, L.: A brief survey of relation extraction based on distant supervision. In: Rodrigues, J.M.F., et al. (eds.) ICCS 2019. LNCS, vol. 11538, pp. 293–303. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22744-9_23

  21. Shu, K., et al.: Leveraging multi-source weak social supervision for early detection of fake news. arXiv preprint arXiv:2004.01732 (2020)

  22. Souza, F., Nogueira, R., Lotufo, R.: Portuguese named entity recognition using bert-CRF. arXiv preprint arXiv:1909.10649 (2019)

  23. Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: Brazilian Conference on Intelligent Systems (2020)

    Google Scholar 

  24. de Souza, M.C., Nogueira, B.M., Rossi, R.G., Marcacini, R.M., dos Santos, B.N., Rezende, S.O.: A network-based positive and unlabeled learning approach for fake news detection. Mach. Learn. 1–44 (2021)

    Google Scholar 

  25. Wang, Y., et al.: Weak supervision for fake news detection via reinforcement learning. In: AAAI Conference on Artificial Intelligence, vol. 34, pp. 516–523 (2020)

    Google Scholar 

  26. Yao, W., Liu, J., Cai, Z.: Personal attributes extraction in chinese text based on distant-supervision and LSTM. In: Park, J.J., Loia, V., Yi, G., Sung, Y. (eds.) CUTE/CSA -2017. LNEE, vol. 474, pp. 511–515. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7605-3_84

  27. Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work was carried out at the Center for Artificial Intelligence (C4AI-USP) and partially supported by the São Paulo Research Foundation (FAPESP) (grant #2019/07665-4) and the IBM Corporation. The authors of this paper thank FAPESP (Process 2019 / 25010-5) and the National Center for Scientific and Technological Development (CNPq) (process 309575/2021-4). The corresponding author thanks the Minas Gerais State Research Support Foundation (FAPEMIG) (Process PCRH BPG-00054-210).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan J. Reis Filho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Filho, I.J.R., Martins, L.H.D., Parmezan, A.R.S., Marcacini, R.M., Rezende, S.O. (2022). Sequential Short-Text Classification from Multiple Textual Representations with Weak Supervision. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13653. Springer, Cham. https://doi.org/10.1007/978-3-031-21686-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21686-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21685-5

  • Online ISBN: 978-3-031-21686-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics