Effectively Classify Short Texts with Sparse Representation Using Entropy Weighted Constraint

Tuo, Ting; Ma, Huifang; Li, Zhixin; Lin, Xianghong

doi:10.1007/978-3-030-29563-9_14

Effectively Classify Short Texts with Sparse Representation Using Entropy Weighted Constraint

Ting Tuo¹¹,
Huifang Ma^11,12,13,
Zhixin Li¹³ &
…
Xianghong Lin¹¹

Conference paper
First Online: 22 August 2019

1222 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11776))

Abstract

Short texts have become a kind of prevalent source of information, and the classification of these short texts in various forms is valuable to many applications. However, most existing short text classification approaches circumvent the sparsity problem by extending short texts (or their feature representations) or exploiting additional information to adapt short texts to traditional text classification approaches. In this paper, we try to solve the sparsity problem of short text classification in a different direction: adapting the classifier to short texts. We propose a sparse representation short text classification method based on entropy weighted constraint. The main idea behind this study is to consider that the short texts are similar in potentially specific subspace. Specifically, we first introduce word embedding to represent the initial sparse representation dictionary, and then a fast feature subset selection algorithm is used to filter the dictionary. Again, we design an objective function based on sparse representation of entropy weight constraint. The optimal value of the objective function is obtained by Lagrange multiplier method. Finally, the distance between the short text to be classified and the short text in each class is calculated under the subspace, and the short text is classified according to three classification rules. Experiments over five datasets show that the proposed approach can effectively alleviate the problem of sparse feature of short text and is more efficient and effective than the existing short text classification method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Hu, X., Wang, H., Li, P.: Online biterm topic model based short text stream classification using short text expansion and concept drifting detection. Pattern Recognit. Lett. 116, 187–194 (2018)
Article Google Scholar
Uysal, A.K.: On two-stage feature selection methods for text classification. IEEE Access 6, 43233–43251 (2018)
Article MathSciNet Google Scholar
Gong, H., Sakakini, T., Bhat, S., Xiong, J.J.: Document similarity for texts of varying lengths via hidden topics. In: 56th Annual Meeting of the Association for Computational Linguistics, pp. 2341–2351. ACL, Melbourne (2018)
Google Scholar
Wang, J., Shen, J., Li, P.: Provable variable selection for streaming features. In: International Conference on Machine Learning, pp. 5158–5166. ICML, Stockholm (2018)
Google Scholar
Srishti, G., Abhinav, K., Gogia, A., Kumaraguru, P.T., Chakraborty, T.: Collective classification of spam campaigners on twitter: a hierarchical meta-path based approach. In: The International World Wide Web Conference WWW 2018, pp. 529–538. WWW, Lyon (2018)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Google Scholar
Boom, C.D., Canneyt, S.V., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. v 80, 150–156 (2016)
Google Scholar
Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323, 130–142 (2015)
Article MathSciNet Google Scholar
Zhang, W., Du, Y.H., Yoshida, T., Yang, Y.: DeepRec: a deep neural network approach to recommendation with item embedding and weighted loss function. Inf. Sci. 470, 121–140 (2019)
Article Google Scholar
Wang, J., Wang, Z.Y., Zhang, D.W., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: the 26th International Joint Conference on Artificial Intelligence, pp. 2915–2921. IJCAI, Melbourne (2017)
Google Scholar
Li, W.M., Zhu, H., Liu, W., Chen, D.H., Jiang, J.L.: An anti-noise process mining algorithm based on minimum spanning tree clustering. IEEE, pp. 48756–48764 (2018)
Article Google Scholar
Sun, J.Y., Wang, X.Z., Xiong, N.X., Shao, J.: Learning sparse representation with variational auto-encoder for anomaly detection. IEEE Access 6, 33353–33361 (2018)
Article Google Scholar
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. TKDE 25(1), 1–14 (2013)
Google Scholar
Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: The 32th AAAI Conference on Artificial Intelligence, pp. 5198–5204. AAAI, New Orleans (2018)
Google Scholar
Stein, R.A., Jaques, P.A., Valiati, J.F.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019)
Article Google Scholar
Li, X.R., Zhu, D.X., Dong, M.: Multinomial classification with class-conditional overlapping sparse feature groups. Pattern Recognit. Lett. 101, 37–43 (2018)
Article Google Scholar

Download references

Acknowledgment

The work is supported by the National Natural Science Foundation of China (No. 61762078, 61363058, 61663004) Guangxi Key Laboratory of Trusted Software (No. kx201910) and Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security (MIMS18-08).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Northwest Normal University, Lanzhou, 730070, Gansu, China
Ting Tuo, Huifang Ma & Xianghong Lin
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
Huifang Ma
Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541004, Guangxi, China
Huifang Ma & Zhixin Li

Authors

Ting Tuo
View author publications
You can also search for this author in PubMed Google Scholar
Huifang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Li
View author publications
You can also search for this author in PubMed Google Scholar
Xianghong Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huifang Ma .

Editor information

Editors and Affiliations

University of Piraeus, Piraeus, Greece
Christos Douligeris
University of Vienna, Vienna, Austria
Dimitris Karagiannis
University of Piraeus, Piraeus, Greece
Dimitris Apostolou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tuo, T., Ma, H., Li, Z., Lin, X. (2019). Effectively Classify Short Texts with Sparse Representation Using Entropy Weighted Constraint. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11776. Springer, Cham. https://doi.org/10.1007/978-3-030-29563-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-29563-9_14
Published: 22 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29562-2
Online ISBN: 978-3-030-29563-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics