LexiFusedNet: A Unified Approach for Imbalanced Short-Text Classification Using Lexicon-Based Feature Extraction, Transfer Learning and One Class Classifiers

Bose, Saugata; Su, Guoxin

doi:10.1007/978-981-99-7855-7_6

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14317))

Included in the following conference series:

Pacific Rim Knowledge Acquisition Workshop

141 Accesses

Abstract

The incorporation of lexicon-based feature extraction and the utilization of a one-class classification loss within a transfer learning-based deep neural network offer significant advantages. In lexicon-based feature extraction, weights are assigned to relevant words in short texts, and fine-tuning with these weighted features enables the capture of critical information. Augmenting contextualized word embeddings with lexicon-derived weights highlights the significance of specific words, thereby enriching text comprehension. One-class classification loss methods such as OCSVM or SVDD identify anomalous instances based on word relevance. A comprehensive evaluation on four benchmark datasets has confirmed an improvement in short-text classification performance, effectively addressing issues related to data imbalances, contextual limitations, and noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Basiri, M.E., Nemati, S., Abdar, M., Cambria, E., Acharya, U.R.: ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis. J. Future Gener. Comput. Syst. 115, 279–294 (2021)
Article Google Scholar
Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel, F., Rosso, P., Sanguinetti, M.: Semeval- 2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 54–63. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)
Google Scholar
Bauyrjan. 2020 US election Tweets-Unlabeled. https://www.kaggle.com/datasets/bauyrjanj/2020-us-election-tweets-unlabeled (2020)
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. In: 33rd Conference on Neural Information Processing Systems. NeurIPS, Vancouver, Canada (2019)
Google Scholar
Bose, S., Su, G.: Deep one-class hate speech detection model. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. ELRA, Marseille, France, pp. 7040–7048 (2022)
Google Scholar
Bose, S., Su, G., Liu, L.: Deep one-class fine-tuning for imbalanced short text classification in transfer learning. In: Accepted for International Conference on Advanced Data Mining and Applications. ADMA, Shenyang, China (2023)
Google Scholar
Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. arXiv preprint arXiv:1901.03407 [cs.LG] (2019)
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009 [cs.CL] (2017)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019)
Google Scholar
Gibert, O., Perez, N., García-Pablos, A., Cuadros, M.: Hate speech dataset from a white supremacy forum. In: Proceedings of the ALW2, pp. 11–20. Association for Computational Linguistics, Brussels, Belgium (2018)
Google Scholar
Kulkarni, A., Hengle, A., Udyawar, R.: An attention ensemble approach for efficient text classification of indian languages. In: Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task, pp. 40–46. NLP Association of India (NLPAI) (2020)
Google Scholar
Li, Q., et al.: A survey on text classification: from traditional to deep learning. ACM Trans. Intell. Syst. Technol. 13(2), 1–41 (2022)
Google Scholar
Li, W., Guo, Q., Elka, C.: A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE Trans. Geosci. Remote 49(2), 717–725 (2011)
Article Google Scholar
Mandl, T., Modha, S., Majumder, P., Patel, D., Dave, M., Mandlia, C., Patel, A.: Overview of the HASOC track at FIRE 2019: hate speech and offensive content identification in Indo-European languages. In: Proceedings of the FIRE ’19, pp. 14–17. Association for Computing Machinery, New York, NY, USA (2019)
Google Scholar
Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: HateXplain: a benchmark dataset for explainable hate speech detection. arXiv preprint arXiv:2012.10289 [cs.CL] (2022)
Moya, M. M., Koch, M. W., Hostetler, L. D.: One-class classifier networks for target recognition applications. https://www.osti.gov/biblio/6755553. Accessed 8 Apr 2023
Poletto, F., Basile, V., Sanguinetti, M., Bosco, C., Patti, V.: Resources and benchmark corpora for hate speech detection: a systematic review. Lang. Resour. Eval. 55(2), 477–523 (2020)
Article Google Scholar
Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi-supervised image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 142–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_9
Chapter Google Scholar
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. Association for Computing Machinery, Corvalis, Oregon, USA (2007)
Google Scholar
https://hatebase.org/. Accessed 16 Jun 2023
Schölkopf, B., Alexander, J. S.: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, pp 656–657 (2002)
Google Scholar
Tax, D. M. J.: Data description toolbox. https://homepage.tudelft.nl/n9d04/. Accessed 8 Apr 2023
Tax, D.M.J., Duin, R.P.W.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)
Article MATH Google Scholar
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, CA, USA (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Wollongong, Northfields Ave, Wollongong, Australia
Saugata Bose & Guoxin Su

Authors

Saugata Bose
View author publications
You can also search for this author in PubMed Google Scholar
Guoxin Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saugata Bose .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, NSW, Australia
Shiqing Wu
University of Tasmania, Hobart, TAS, Australia
Wenli Yang
University of Tasmania, Hobart, TAS, Australia
Muhammad Bilal Amin
University of Tasmania, Hobart, TAS, Australia
Byeong-Ho Kang
University of Technology Sydney, Sydney, NSW, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bose, S., Su, G. (2023). LexiFusedNet: A Unified Approach for Imbalanced Short-Text Classification Using Lexicon-Based Feature Extraction, Transfer Learning and One Class Classifiers. In: Wu, S., Yang, W., Amin, M.B., Kang, BH., Xu, G. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2023. Lecture Notes in Computer Science(), vol 14317. Springer, Singapore. https://doi.org/10.1007/978-981-99-7855-7_6

Download citation

DOI: https://doi.org/10.1007/978-981-99-7855-7_6
Published: 03 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7854-0
Online ISBN: 978-981-99-7855-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LexiFusedNet: A Unified Approach for Imbalanced Short-Text Classification Using Lexicon-Based Feature Extraction, Transfer Learning and One Class Classifiers