Abstract
The incorporation of lexicon-based feature extraction and the utilization of a one-class classification loss within a transfer learning-based deep neural network offer significant advantages. In lexicon-based feature extraction, weights are assigned to relevant words in short texts, and fine-tuning with these weighted features enables the capture of critical information. Augmenting contextualized word embeddings with lexicon-derived weights highlights the significance of specific words, thereby enriching text comprehension. One-class classification loss methods such as OCSVM or SVDD identify anomalous instances based on word relevance. A comprehensive evaluation on four benchmark datasets has confirmed an improvement in short-text classification performance, effectively addressing issues related to data imbalances, contextual limitations, and noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Basiri, M.E., Nemati, S., Abdar, M., Cambria, E., Acharya, U.R.: ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis. J. Future Gener. Comput. Syst. 115, 279–294 (2021)
Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel, F., Rosso, P., Sanguinetti, M.: Semeval- 2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 54–63. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)
Bauyrjan. 2020 US election Tweets-Unlabeled. https://www.kaggle.com/datasets/bauyrjanj/2020-us-election-tweets-unlabeled (2020)
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. In: 33rd Conference on Neural Information Processing Systems. NeurIPS, Vancouver, Canada (2019)
Bose, S., Su, G.: Deep one-class hate speech detection model. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. ELRA, Marseille, France, pp. 7040–7048 (2022)
Bose, S., Su, G., Liu, L.: Deep one-class fine-tuning for imbalanced short text classification in transfer learning. In: Accepted for International Conference on Advanced Data Mining and Applications. ADMA, Shenyang, China (2023)
Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. arXiv preprint arXiv:1901.03407 [cs.LG] (2019)
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009 [cs.CL] (2017)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019)
Gibert, O., Perez, N., García-Pablos, A., Cuadros, M.: Hate speech dataset from a white supremacy forum. In: Proceedings of the ALW2, pp. 11–20. Association for Computational Linguistics, Brussels, Belgium (2018)
Kulkarni, A., Hengle, A., Udyawar, R.: An attention ensemble approach for efficient text classification of indian languages. In: Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task, pp. 40–46. NLP Association of India (NLPAI) (2020)
Li, Q., et al.: A survey on text classification: from traditional to deep learning. ACM Trans. Intell. Syst. Technol. 13(2), 1–41 (2022)
Li, W., Guo, Q., Elka, C.: A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE Trans. Geosci. Remote 49(2), 717–725 (2011)
Mandl, T., Modha, S., Majumder, P., Patel, D., Dave, M., Mandlia, C., Patel, A.: Overview of the HASOC track at FIRE 2019: hate speech and offensive content identification in Indo-European languages. In: Proceedings of the FIRE ’19, pp. 14–17. Association for Computing Machinery, New York, NY, USA (2019)
Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: HateXplain: a benchmark dataset for explainable hate speech detection. arXiv preprint arXiv:2012.10289 [cs.CL] (2022)
Moya, M. M., Koch, M. W., Hostetler, L. D.: One-class classifier networks for target recognition applications. https://www.osti.gov/biblio/6755553. Accessed 8 Apr 2023
Poletto, F., Basile, V., Sanguinetti, M., Bosco, C., Patti, V.: Resources and benchmark corpora for hate speech detection: a systematic review. Lang. Resour. Eval. 55(2), 477–523 (2020)
Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi-supervised image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 142–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_9
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. Association for Computing Machinery, Corvalis, Oregon, USA (2007)
https://hatebase.org/. Accessed 16 Jun 2023
Schölkopf, B., Alexander, J. S.: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, pp 656–657 (2002)
Tax, D. M. J.: Data description toolbox. https://homepage.tudelft.nl/n9d04/. Accessed 8 Apr 2023
Tax, D.M.J., Duin, R.P.W.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, CA, USA (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bose, S., Su, G. (2023). LexiFusedNet: A Unified Approach for Imbalanced Short-Text Classification Using Lexicon-Based Feature Extraction, Transfer Learning and One Class Classifiers. In: Wu, S., Yang, W., Amin, M.B., Kang, BH., Xu, G. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2023. Lecture Notes in Computer Science(), vol 14317. Springer, Singapore. https://doi.org/10.1007/978-981-99-7855-7_6
Download citation
DOI: https://doi.org/10.1007/978-981-99-7855-7_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7854-0
Online ISBN: 978-981-99-7855-7
eBook Packages: Computer ScienceComputer Science (R0)