Skip to main content

Research on Hot Micro-blog Forecast Based on XGBOOST and Random Forest

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11062))

Abstract

Since the sina micro-blog is becoming an important place for more and more people to participate in the exchange activity, people showed great enthusiasm on micro-blog research in recent years. However, most of the research is carried out around the micro-blog hot topic and few research on hot micro-blog with higher total number of forwarding and comments for a period of time. To solve this problem, a feature discretization based on Extreme Gradient Boosting (XGBOOST) is proposed to disperse the feature by the prediction path of the tree, so as to improve the running rate and prediction accuracy of the model. Meanwhile, a stochastic forest classification algorithm based on constraint is proposed to solve the imbalance caused by random selection of features in traditional random forest (RF) algorithms. With the help of feature extraction, discretization, and classification, we realizes the hot micro-blog forecast in this paper. Finally, the experimental results show that by using the classification algorithm proposed in this paper, the classification accuracy has been improved to a large extent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gaonkar, S., Li, J., Choudhury, R.R., et al.: Micro-Blog: sharing and querying content through mobile phones and social participation. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services, Breckenridge, CO, pp. 174–186 (2008)

    Google Scholar 

  2. Zhou, G.: MB-SinglePass: microblog topic detection based on combined similarity. Comput. Sci. 39, 198–202 (2012)

    Google Scholar 

  3. Ridene, Y., Belloir, N., Barbier, F., et al.: A DSML for mobile phone applications testing. In: Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking, pp. 1–6 (2010)

    Google Scholar 

  4. Wen, H., Li, Z.H.: The research of popular topic mining method based on microblogging text. In: Proceedings of the 4th International Conference on Instrumentation and Measurement, Computer, Communication and Control, pp. 888–892 (2014)

    Google Scholar 

  5. Xu, T., Xu, M., Ding, H.: BBS topic’s hotness forecast based on back-propagation neural network. In: Proceedings of the International Conference on Web Information Systems and Mining, pp. 57–61. IEEE Computer Society (2010)

    Google Scholar 

  6. Liu, R., Guo, W.: HMM-based state prediction for Internet hot topic. In: Proceedings of the IEEE International Conference on Computer Science and Automation Engineering, pp. 157–161 (2011)

    Google Scholar 

  7. Wu, Z., Liao, J., Zhang, L.: Predicting on retweeting of hot topic tweets in microblog. In: Proceedings of the 5th IEEE International Conference on Broadband Network & Multimedia Technology, pp. 119–123 (2014)

    Google Scholar 

  8. Fang, M.Y., Chen, Y.Z., Gao, P., et al.: Topic trend prediction based on wavelet transformation. In: Proceedings of the 11th Web Information System and Application Conference, pp. 157–162 (2014)

    Google Scholar 

  9. Ye, Y.M., Wu, Q.Y., Huang, J.Z.X., et al.: Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recogn. 46, 769–787 (2013)

    Article  Google Scholar 

  10. Chen, T., Guestrin, C.: XGBOOST: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

    Google Scholar 

  11. Friedman, J.: Greedy function approximation: a gradient boosting maching. Ann. Stat. 29, 1189–1231 (2001)

    Article  Google Scholar 

  12. Desai, N., Meera Narvekar, P.: Normalization of noisy text data ☆. Procedia Comput. Sci. 45, 127–132 (2015)

    Article  Google Scholar 

  13. Skorkovská, L., Zajíc, Z.: Score normalization methods applied to topic identification. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 133–140. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_17

    Chapter  Google Scholar 

  14. Fuchs, C.A., Peres, A.: Quantum-state disturbance versus information gain: uncertainty relations for quantum information. Phys. Rev. A 53(4), 2038–2045 (1996)

    Article  Google Scholar 

  15. Wells, G.L., Yang, Y.R., Smalarz, L.: Eyewitness identification: Bayesian information gain, base-rate effect equivalency curves, and reasonable suspicion. Law Hum Behav. 39, 99–122 (2015)

    Article  Google Scholar 

  16. Liang, J., Shi, Z., Li, D., et al.: Information entropy, rough entropy and knowledge granulation in incomplete information systems. Int. J. Gen. Syst. 35(6), 641–654 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mei Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J. et al. (2018). Research on Hot Micro-blog Forecast Based on XGBOOST and Random Forest. In: Liu, W., Giunchiglia, F., Yang, B. (eds) Knowledge Science, Engineering and Management. KSEM 2018. Lecture Notes in Computer Science(), vol 11062. Springer, Cham. https://doi.org/10.1007/978-3-319-99247-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99247-1_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99246-4

  • Online ISBN: 978-3-319-99247-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics