Advertisement

Research on Hot Micro-blog Forecast Based on XGBOOST and Random Forest

  • Jianrong Wang
  • Chao Lou
  • Ruiguo Yu
  • Jie Gao
  • Tianyi Xu
  • Mei Yu
  • Haibo Di
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11062)

Abstract

Since the sina micro-blog is becoming an important place for more and more people to participate in the exchange activity, people showed great enthusiasm on micro-blog research in recent years. However, most of the research is carried out around the micro-blog hot topic and few research on hot micro-blog with higher total number of forwarding and comments for a period of time. To solve this problem, a feature discretization based on Extreme Gradient Boosting (XGBOOST) is proposed to disperse the feature by the prediction path of the tree, so as to improve the running rate and prediction accuracy of the model. Meanwhile, a stochastic forest classification algorithm based on constraint is proposed to solve the imbalance caused by random selection of features in traditional random forest (RF) algorithms. With the help of feature extraction, discretization, and classification, we realizes the hot micro-blog forecast in this paper. Finally, the experimental results show that by using the classification algorithm proposed in this paper, the classification accuracy has been improved to a large extent.

Keywords

Micro-Blog Discretization XGBOOST Random forest 

References

  1. 1.
    Gaonkar, S., Li, J., Choudhury, R.R., et al.: Micro-Blog: sharing and querying content through mobile phones and social participation. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services, Breckenridge, CO, pp. 174–186 (2008)Google Scholar
  2. 2.
    Zhou, G.: MB-SinglePass: microblog topic detection based on combined similarity. Comput. Sci. 39, 198–202 (2012)Google Scholar
  3. 3.
    Ridene, Y., Belloir, N., Barbier, F., et al.: A DSML for mobile phone applications testing. In: Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking, pp. 1–6 (2010)Google Scholar
  4. 4.
    Wen, H., Li, Z.H.: The research of popular topic mining method based on microblogging text. In: Proceedings of the 4th International Conference on Instrumentation and Measurement, Computer, Communication and Control, pp. 888–892 (2014)Google Scholar
  5. 5.
    Xu, T., Xu, M., Ding, H.: BBS topic’s hotness forecast based on back-propagation neural network. In: Proceedings of the International Conference on Web Information Systems and Mining, pp. 57–61. IEEE Computer Society (2010)Google Scholar
  6. 6.
    Liu, R., Guo, W.: HMM-based state prediction for Internet hot topic. In: Proceedings of the IEEE International Conference on Computer Science and Automation Engineering, pp. 157–161 (2011)Google Scholar
  7. 7.
    Wu, Z., Liao, J., Zhang, L.: Predicting on retweeting of hot topic tweets in microblog. In: Proceedings of the 5th IEEE International Conference on Broadband Network & Multimedia Technology, pp. 119–123 (2014)Google Scholar
  8. 8.
    Fang, M.Y., Chen, Y.Z., Gao, P., et al.: Topic trend prediction based on wavelet transformation. In: Proceedings of the 11th Web Information System and Application Conference, pp. 157–162 (2014)Google Scholar
  9. 9.
    Ye, Y.M., Wu, Q.Y., Huang, J.Z.X., et al.: Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recogn. 46, 769–787 (2013)CrossRefGoogle Scholar
  10. 10.
    Chen, T., Guestrin, C.: XGBOOST: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)Google Scholar
  11. 11.
    Friedman, J.: Greedy function approximation: a gradient boosting maching. Ann. Stat. 29, 1189–1231 (2001)CrossRefGoogle Scholar
  12. 12.
    Desai, N., Meera Narvekar, P.: Normalization of noisy text data ☆. Procedia Comput. Sci. 45, 127–132 (2015)CrossRefGoogle Scholar
  13. 13.
    Skorkovská, L., Zajíc, Z.: Score normalization methods applied to topic identification. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 133–140. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10816-2_17CrossRefGoogle Scholar
  14. 14.
    Fuchs, C.A., Peres, A.: Quantum-state disturbance versus information gain: uncertainty relations for quantum information. Phys. Rev. A 53(4), 2038–2045 (1996)CrossRefGoogle Scholar
  15. 15.
    Wells, G.L., Yang, Y.R., Smalarz, L.: Eyewitness identification: Bayesian information gain, base-rate effect equivalency curves, and reasonable suspicion. Law Hum Behav. 39, 99–122 (2015)CrossRefGoogle Scholar
  16. 16.
    Liang, J., Shi, Z., Li, D., et al.: Information entropy, rough entropy and knowledge granulation in incomplete information systems. Int. J. Gen. Syst. 35(6), 641–654 (2006)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Jianrong Wang
    • 1
    • 2
    • 3
  • Chao Lou
    • 1
    • 2
    • 3
  • Ruiguo Yu
    • 1
    • 2
    • 3
  • Jie Gao
    • 1
    • 2
    • 3
  • Tianyi Xu
    • 1
    • 2
    • 3
  • Mei Yu
    • 1
    • 2
    • 3
  • Haibo Di
    • 1
    • 2
    • 3
  1. 1.School of Computer Science and TechnologyTianjin UniversityTianjinChina
  2. 2.Tianjin Key Laboratory of Advanced Networking (TANK Lab)TianjinChina
  3. 3.Tianjin Key Laboratory of Cognitive Computing and ApplicationTianjinChina

Personalised recommendations