ICONIP 2014: Neural Information Processing pp 581-588 | Cite as

News Title Classification with Support from Auxiliary Long Texts

  • Yuanxin Ouyang
  • Yao Huangfu
  • Hao Sheng
  • Zhang Xiong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8835)

Abstract

The performance of short text classification is limited due to its intrinsic shortness of sentences which causes the sparseness of vector space model. Traditional classifiers like SVM are extremely sensitive to the features space, thereby making classification performance unsatisfying in short text related applications. It is believed that using external information to help better represent input data would possibly yield satisfying results. In this paper, we target on the problem of news title classification which is an essential and typical member in short text family and propose an approach which employs external information from long text to address the problem the sparseness. Afterwards Restricted Boltzman Machine are utilised to select features and then finally perform classification using Support Vector Machine. The experimental study on Reuters-21578 and Sogou Chinese news corpus has demonstrates the effectiveness of the proposed method.

Keywords

Support Vector Machine Feature Selection Hide Node External Information Short Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahonen-Myka, H.: Discovery of frequent word sequences in text. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 180–189. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788. ACM (2007)Google Scholar
  3. 3.
    Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International Conference on World Wide Web, pp. 757–766 (2007)Google Scholar
  4. 4.
    Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Proceedings of IEEE 23rd International Conference on Data Engineering, pp. 716–725. IEEE (2007)Google Scholar
  5. 5.
    Dauphin, Y., Bengio, Y.: Stochastic ratio matching of rbms for sparse high-dimensional inputs. In: Advances in Neural Information Processing Systems, pp. 1340–1348 (2013)Google Scholar
  6. 6.
    Dilrukshi, I., De Zoysa, K., Caldera, A.: Twitter news classification using svm. In: Proceedings of 8th International Conference on Computer Science Education, pp. 287–291 (April 2013)Google Scholar
  7. 7.
    Drury, B., Torgo, L., Almeida, J.: Classifying news stories to estimate the direction of a stock market index. In: Proceedings of 6th Iberian Conference on Information Systems and Technologies, pp. 1–4 (June 2011)Google Scholar
  8. 8.
    Hinton, G.: A practical guide to training restricted boltzmann machines. Momentum 9(1), 926 (2010)Google Scholar
  9. 9.
    Jin, O., Liu, N.N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 775–784. ACM (2011)Google Scholar
  10. 10.
    Kehagias, A., Petridis, V., Kaburlasos, V.G., Fragkou, P.: A comparison of word-and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21(3), 227–247 (2003)CrossRefGoogle Scholar
  11. 11.
    Larochelle, H., Bengio, Y.: Classification using discriminative restricted boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)Google Scholar
  12. 12.
    Li, R., Tao, X., Tang, L., Hu, Y.-F.: Using maximum entropy model for chinese text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 578–587. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    Li, Y., Chung, S.M., Holt, J.D.: Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering 64(1), 381–404 (2008)CrossRefGoogle Scholar
  14. 14.
    Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L.M., Horiguchi, S., Ha, Q.T.: A hidden topic-based framework toward building applications with short web documents. IEEE Transactions on Knowledge and Data Engineering 23(7), 961–976 (2011)CrossRefGoogle Scholar
  15. 15.
    Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM (2008)Google Scholar
  16. 16.
    Srivastava, N., Salakhutdinov, R.R., Hinton, G.E.: Modeling documents with deep boltzmann machines. arXiv preprint arXiv:1309.6865 (2013)Google Scholar
  17. 17.
    Zhang, C.-X., Zhang, J.-S., Ji, N.-N., Guo, G.: Learning ensemble classifiers via restricted boltzmann machines. Pattern Recognition Letters 36, 161–170 (2014)CrossRefGoogle Scholar
  18. 18.
    Zhang, W., Yoshida, T., Tang, X.: Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21(8), 879–886 (2008)CrossRefGoogle Scholar
  19. 19.
    Zhang, W., Yoshida, T., Tang, X., Wang, Q.: Text clustering using frequent itemsets. Knowledge-Based Systems 23(5), 379–388 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yuanxin Ouyang
    • 1
    • 2
  • Yao Huangfu
    • 1
  • Hao Sheng
    • 1
    • 2
  • Zhang Xiong
    • 1
    • 2
  1. 1.School of Computer Science and EngineeringBeihang UniversityBeijingChina
  2. 2.Research Institute of Beihang University in ShenzhenShenzhenChina

Personalised recommendations