Abstract
Short texts have become a kind of prevalent source of information, and the classification of these short texts in various forms is valuable to many applications. However, most existing short text classification approaches circumvent the sparsity problem by extending short texts (or their feature representations) or exploiting additional information to adapt short texts to traditional text classification approaches. In this paper, we try to solve the sparsity problem of short text classification in a different direction: adapting the classifier to short texts. We propose a sparse representation short text classification method based on entropy weighted constraint. The main idea behind this study is to consider that the short texts are similar in potentially specific subspace. Specifically, we first introduce word embedding to represent the initial sparse representation dictionary, and then a fast feature subset selection algorithm is used to filter the dictionary. Again, we design an objective function based on sparse representation of entropy weight constraint. The optimal value of the objective function is obtained by Lagrange multiplier method. Finally, the distance between the short text to be classified and the short text in each class is calculated under the subspace, and the short text is classified according to three classification rules. Experiments over five datasets show that the proposed approach can effectively alleviate the problem of sparse feature of short text and is more efficient and effective than the existing short text classification method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hu, X., Wang, H., Li, P.: Online biterm topic model based short text stream classification using short text expansion and concept drifting detection. Pattern Recognit. Lett. 116, 187–194 (2018)
Uysal, A.K.: On two-stage feature selection methods for text classification. IEEE Access 6, 43233–43251 (2018)
Gong, H., Sakakini, T., Bhat, S., Xiong, J.J.: Document similarity for texts of varying lengths via hidden topics. In: 56th Annual Meeting of the Association for Computational Linguistics, pp. 2341–2351. ACL, Melbourne (2018)
Wang, J., Shen, J., Li, P.: Provable variable selection for streaming features. In: International Conference on Machine Learning, pp. 5158–5166. ICML, Stockholm (2018)
Srishti, G., Abhinav, K., Gogia, A., Kumaraguru, P.T., Chakraborty, T.: Collective classification of spam campaigners on twitter: a hierarchical meta-path based approach. In: The International World Wide Web Conference WWW 2018, pp. 529–538. WWW, Lyon (2018)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Boom, C.D., Canneyt, S.V., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. v 80, 150–156 (2016)
Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323, 130–142 (2015)
Zhang, W., Du, Y.H., Yoshida, T., Yang, Y.: DeepRec: a deep neural network approach to recommendation with item embedding and weighted loss function. Inf. Sci. 470, 121–140 (2019)
Wang, J., Wang, Z.Y., Zhang, D.W., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: the 26th International Joint Conference on Artificial Intelligence, pp. 2915–2921. IJCAI, Melbourne (2017)
Li, W.M., Zhu, H., Liu, W., Chen, D.H., Jiang, J.L.: An anti-noise process mining algorithm based on minimum spanning tree clustering. IEEE, pp. 48756–48764 (2018)
Sun, J.Y., Wang, X.Z., Xiong, N.X., Shao, J.: Learning sparse representation with variational auto-encoder for anomaly detection. IEEE Access 6, 33353–33361 (2018)
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. TKDE 25(1), 1–14 (2013)
Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: The 32th AAAI Conference on Artificial Intelligence, pp. 5198–5204. AAAI, New Orleans (2018)
Stein, R.A., Jaques, P.A., Valiati, J.F.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019)
Li, X.R., Zhu, D.X., Dong, M.: Multinomial classification with class-conditional overlapping sparse feature groups. Pattern Recognit. Lett. 101, 37–43 (2018)
Acknowledgment
The work is supported by the National Natural Science Foundation of China (No. 61762078, 61363058, 61663004) Guangxi Key Laboratory of Trusted Software (No. kx201910) and Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security (MIMS18-08).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tuo, T., Ma, H., Li, Z., Lin, X. (2019). Effectively Classify Short Texts with Sparse Representation Using Entropy Weighted Constraint. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11776. Springer, Cham. https://doi.org/10.1007/978-3-030-29563-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-29563-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29562-2
Online ISBN: 978-3-030-29563-9
eBook Packages: Computer ScienceComputer Science (R0)