Prediction of User Retweets Based on Social Neighborhood Information and Topic Modelling

  • Pablo Gabriel Celayes
  • Martín Ariel DomínguezEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10633)


Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.

We build our own sample graph of Twitter users and study the problem of predicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F1 score of \(87.6\%\), based purely on social information, that is, without analyzing the content of the tweets.

For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.


Retweet prediction Social model Social network analysis Machine learning LDA SVM 


  1. 1.
    Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)CrossRefGoogle Scholar
  2. 2.
    Choudhury, M.D., Lin, Y.R., Sundaram, H., Candan, K.S., Xie, L., Kelliher, A.: How does the data sampling strategy impact the discovery of information diffusion in social media? In: ICWSM. The AAAI Press (2010)Google Scholar
  3. 3.
    Goel, A., Sharma, A., Wang, D., Yin, Z.: Discovering similar users on twitter. In: In 11th Workshop on Mining and Learning with Graphs (2013)Google Scholar
  4. 4.
    Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: WTF: The who to follow service at twitter. In: Proceedings of the 22nd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2013)Google Scholar
  5. 5.
    Kamath, K., Sharma, A., Wang, D., Yin, Z.: RealGraph: user interaction prediction at twitter. In: In User Engagement Optimization Workshop @ KDD (2014)Google Scholar
  6. 6.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Lin, J., Kolcz., A.: Large-scale machine learning at twitter. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM (2012)Google Scholar
  8. 8.
    Nasir, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Bad news travel fast: a content-based analysis of interestingness on twitter. In: WebSci 2011: Proceedings of the 3rd International Conference on Web Science (2011)Google Scholar
  9. 9.
    Petrovic, S., Osborne, M., Lavrenko, V.: RT to win! predicting message propagation in twitter. ICWSM 11, 586–589 (2011)Google Scholar
  10. 10.
    Yanar, A.: Combining topology-based & content-based analysis for followee recommendation on Twitter. Ph.D. thesis, Middle East Technical University, April 2015Google Scholar
  11. 11.
    Zaman, T.R., Herbrich, R., Van Gael, J., Stern, D.: Predicting information spreading in twitter. In: Workshop on computational social science and the wisdom of crowds, NIPS, vol. 104, pp. 17599–17601. Citeseer (2010)Google Scholar
  12. 12.
    Zhang, Q., Gong, Y., Wu, J., Huang, H., Huang, X.: Retweet prediction with attention-based deep neural network. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM (2016)Google Scholar
  13. 13.
    Zhao, W.X., et al.: Comparing Twitter and Traditional Media Using Topic Models. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Pablo Gabriel Celayes
    • 1
  • Martín Ariel Domínguez
    • 1
    Email author
  1. 1.Facultad de Matemática, AstronomíaFísica y Computación Universidad Nacional de CórdobaCórdobaArgentina

Personalised recommendations