Skip to main content

Effectively Classify Short Texts with Sparse Representation Using Entropy Weighted Constraint

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11776))

Abstract

Short texts have become a kind of prevalent source of information, and the classification of these short texts in various forms is valuable to many applications. However, most existing short text classification approaches circumvent the sparsity problem by extending short texts (or their feature representations) or exploiting additional information to adapt short texts to traditional text classification approaches. In this paper, we try to solve the sparsity problem of short text classification in a different direction: adapting the classifier to short texts. We propose a sparse representation short text classification method based on entropy weighted constraint. The main idea behind this study is to consider that the short texts are similar in potentially specific subspace. Specifically, we first introduce word embedding to represent the initial sparse representation dictionary, and then a fast feature subset selection algorithm is used to filter the dictionary. Again, we design an objective function based on sparse representation of entropy weight constraint. The optimal value of the objective function is obtained by Lagrange multiplier method. Finally, the distance between the short text to be classified and the short text in each class is calculated under the subspace, and the short text is classified according to three classification rules. Experiments over five datasets show that the proposed approach can effectively alleviate the problem of sparse feature of short text and is more efficient and effective than the existing short text classification method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html.

  2. 2.

    http://www.people.com.cn/.

  3. 3.

    http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/.

  4. 4.

    http://www.esp.uem.es/jmgomez/smsspamcorpus/.

  5. 5.

    http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz.

References

  1. Hu, X., Wang, H., Li, P.: Online biterm topic model based short text stream classification using short text expansion and concept drifting detection. Pattern Recognit. Lett. 116, 187–194 (2018)

    Article  Google Scholar 

  2. Uysal, A.K.: On two-stage feature selection methods for text classification. IEEE Access 6, 43233–43251 (2018)

    Article  MathSciNet  Google Scholar 

  3. Gong, H., Sakakini, T., Bhat, S., Xiong, J.J.: Document similarity for texts of varying lengths via hidden topics. In: 56th Annual Meeting of the Association for Computational Linguistics, pp. 2341–2351. ACL, Melbourne (2018)

    Google Scholar 

  4. Wang, J., Shen, J., Li, P.: Provable variable selection for streaming features. In: International Conference on Machine Learning, pp. 5158–5166. ICML, Stockholm (2018)

    Google Scholar 

  5. Srishti, G., Abhinav, K., Gogia, A., Kumaraguru, P.T., Chakraborty, T.: Collective classification of spam campaigners on twitter: a hierarchical meta-path based approach. In: The International World Wide Web Conference WWW 2018, pp. 529–538. WWW, Lyon (2018)

    Google Scholar 

  6. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)

    Google Scholar 

  7. Boom, C.D., Canneyt, S.V., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. v 80, 150–156 (2016)

    Google Scholar 

  8. Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323, 130–142 (2015)

    Article  MathSciNet  Google Scholar 

  9. Zhang, W., Du, Y.H., Yoshida, T., Yang, Y.: DeepRec: a deep neural network approach to recommendation with item embedding and weighted loss function. Inf. Sci. 470, 121–140 (2019)

    Article  Google Scholar 

  10. Wang, J., Wang, Z.Y., Zhang, D.W., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: the 26th International Joint Conference on Artificial Intelligence, pp. 2915–2921. IJCAI, Melbourne (2017)

    Google Scholar 

  11. Li, W.M., Zhu, H., Liu, W., Chen, D.H., Jiang, J.L.: An anti-noise process mining algorithm based on minimum spanning tree clustering. IEEE, pp. 48756–48764 (2018)

    Article  Google Scholar 

  12. Sun, J.Y., Wang, X.Z., Xiong, N.X., Shao, J.: Learning sparse representation with variational auto-encoder for anomaly detection. IEEE Access 6, 33353–33361 (2018)

    Article  Google Scholar 

  13. Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. TKDE 25(1), 1–14 (2013)

    Google Scholar 

  14. Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: The 32th AAAI Conference on Artificial Intelligence, pp. 5198–5204. AAAI, New Orleans (2018)

    Google Scholar 

  15. Stein, R.A., Jaques, P.A., Valiati, J.F.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019)

    Article  Google Scholar 

  16. Li, X.R., Zhu, D.X., Dong, M.: Multinomial classification with class-conditional overlapping sparse feature groups. Pattern Recognit. Lett. 101, 37–43 (2018)

    Article  Google Scholar 

Download references

Acknowledgment

The work is supported by the National Natural Science Foundation of China (No. 61762078, 61363058, 61663004) Guangxi Key Laboratory of Trusted Software (No. kx201910) and Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security (MIMS18-08).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huifang Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tuo, T., Ma, H., Li, Z., Lin, X. (2019). Effectively Classify Short Texts with Sparse Representation Using Entropy Weighted Constraint. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11776. Springer, Cham. https://doi.org/10.1007/978-3-030-29563-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29563-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29562-2

  • Online ISBN: 978-3-030-29563-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics