Skip to main content

LexiFusedNet: A Unified Approach for Imbalanced Short-Text Classification Using Lexicon-Based Feature Extraction, Transfer Learning and One Class Classifiers

  • Conference paper
  • First Online:
Knowledge Management and Acquisition for Intelligent Systems (PKAW 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14317))

Included in the following conference series:

  • 141 Accesses

Abstract

The incorporation of lexicon-based feature extraction and the utilization of a one-class classification loss within a transfer learning-based deep neural network offer significant advantages. In lexicon-based feature extraction, weights are assigned to relevant words in short texts, and fine-tuning with these weighted features enables the capture of critical information. Augmenting contextualized word embeddings with lexicon-derived weights highlights the significance of specific words, thereby enriching text comprehension. One-class classification loss methods such as OCSVM or SVDD identify anomalous instances based on word relevance. A comprehensive evaluation on four benchmark datasets has confirmed an improvement in short-text classification performance, effectively addressing issues related to data imbalances, contextual limitations, and noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Basiri, M.E., Nemati, S., Abdar, M., Cambria, E., Acharya, U.R.: ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis. J. Future Gener. Comput. Syst. 115, 279–294 (2021)

    Article  Google Scholar 

  2. Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel, F., Rosso, P., Sanguinetti, M.: Semeval- 2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 54–63. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)

    Google Scholar 

  3. Bauyrjan. 2020 US election Tweets-Unlabeled. https://www.kaggle.com/datasets/bauyrjanj/2020-us-election-tweets-unlabeled (2020)

  4. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. In: 33rd Conference on Neural Information Processing Systems. NeurIPS, Vancouver, Canada (2019)

    Google Scholar 

  5. Bose, S., Su, G.: Deep one-class hate speech detection model. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. ELRA, Marseille, France, pp. 7040–7048 (2022)

    Google Scholar 

  6. Bose, S., Su, G., Liu, L.: Deep one-class fine-tuning for imbalanced short text classification in transfer learning. In: Accepted for International Conference on Advanced Data Mining and Applications. ADMA, Shenyang, China (2023)

    Google Scholar 

  7. Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. arXiv preprint arXiv:1901.03407 [cs.LG] (2019)

  8. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009 [cs.CL] (2017)

  9. Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019)

    Google Scholar 

  10. Gibert, O., Perez, N., García-Pablos, A., Cuadros, M.: Hate speech dataset from a white supremacy forum. In: Proceedings of the ALW2, pp. 11–20. Association for Computational Linguistics, Brussels, Belgium (2018)

    Google Scholar 

  11. Kulkarni, A., Hengle, A., Udyawar, R.: An attention ensemble approach for efficient text classification of indian languages. In: Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task, pp. 40–46. NLP Association of India (NLPAI) (2020)

    Google Scholar 

  12. Li, Q., et al.: A survey on text classification: from traditional to deep learning. ACM Trans. Intell. Syst. Technol. 13(2), 1–41 (2022)

    Google Scholar 

  13. Li, W., Guo, Q., Elka, C.: A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE Trans. Geosci. Remote 49(2), 717–725 (2011)

    Article  Google Scholar 

  14. Mandl, T., Modha, S., Majumder, P., Patel, D., Dave, M., Mandlia, C., Patel, A.: Overview of the HASOC track at FIRE 2019: hate speech and offensive content identification in Indo-European languages. In: Proceedings of the FIRE ’19, pp. 14–17. Association for Computing Machinery, New York, NY, USA (2019)

    Google Scholar 

  15. Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: HateXplain: a benchmark dataset for explainable hate speech detection. arXiv preprint arXiv:2012.10289 [cs.CL] (2022)

  16. Moya, M. M., Koch, M. W., Hostetler, L. D.: One-class classifier networks for target recognition applications. https://www.osti.gov/biblio/6755553. Accessed 8 Apr 2023

  17. Poletto, F., Basile, V., Sanguinetti, M., Bosco, C., Patti, V.: Resources and benchmark corpora for hate speech detection: a systematic review. Lang. Resour. Eval. 55(2), 477–523 (2020)

    Article  Google Scholar 

  18. Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi-supervised image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 142–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_9

    Chapter  Google Scholar 

  19. Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. Association for Computing Machinery, Corvalis, Oregon, USA (2007)

    Google Scholar 

  20. https://hatebase.org/. Accessed 16 Jun 2023

  21. Schölkopf, B., Alexander, J. S.: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, pp 656–657 (2002)

    Google Scholar 

  22. Tax, D. M. J.: Data description toolbox. https://homepage.tudelft.nl/n9d04/. Accessed 8 Apr 2023

  23. Tax, D.M.J., Duin, R.P.W.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)

    Article  MATH  Google Scholar 

  24. Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, CA, USA (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saugata Bose .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bose, S., Su, G. (2023). LexiFusedNet: A Unified Approach for Imbalanced Short-Text Classification Using Lexicon-Based Feature Extraction, Transfer Learning and One Class Classifiers. In: Wu, S., Yang, W., Amin, M.B., Kang, BH., Xu, G. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2023. Lecture Notes in Computer Science(), vol 14317. Springer, Singapore. https://doi.org/10.1007/978-981-99-7855-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7855-7_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7854-0

  • Online ISBN: 978-981-99-7855-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics