Abstract
Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.
We build our own sample graph of Twitter users and study the problem of predicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F1 score of \(87.6\%\), based purely on social information, that is, without analyzing the content of the tweets.
For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Likes are represented by a small heart and are used to show appreciation for a tweet. The number of “likes” is the number of the users which express it for a given tweet.
- 3.
For Support Vector Classifier, name of classical Support Vector Machines (SVM) in scikit-learn.
- 4.
- 5.
We denote with social+lda10 the models that combine social features and classical LDA features with 10 topics. Similar notation applies for 20 topics and the TwitterLDA variation.
References
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Choudhury, M.D., Lin, Y.R., Sundaram, H., Candan, K.S., Xie, L., Kelliher, A.: How does the data sampling strategy impact the discovery of information diffusion in social media? In: ICWSM. The AAAI Press (2010)
Goel, A., Sharma, A., Wang, D., Yin, Z.: Discovering similar users on twitter. In: In 11th Workshop on Mining and Learning with Graphs (2013)
Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: WTF: The who to follow service at twitter. In: Proceedings of the 22nd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2013)
Kamath, K., Sharma, A., Wang, D., Yin, Z.: RealGraph: user interaction prediction at twitter. In: In User Engagement Optimization Workshop @ KDD (2014)
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)
Lin, J., Kolcz., A.: Large-scale machine learning at twitter. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM (2012)
Nasir, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Bad news travel fast: a content-based analysis of interestingness on twitter. In: WebSci 2011: Proceedings of the 3rd International Conference on Web Science (2011)
Petrovic, S., Osborne, M., Lavrenko, V.: RT to win! predicting message propagation in twitter. ICWSM 11, 586–589 (2011)
Yanar, A.: Combining topology-based & content-based analysis for followee recommendation on Twitter. Ph.D. thesis, Middle East Technical University, April 2015
Zaman, T.R., Herbrich, R., Van Gael, J., Stern, D.: Predicting information spreading in twitter. In: Workshop on computational social science and the wisdom of crowds, NIPS, vol. 104, pp. 17599–17601. Citeseer (2010)
Zhang, Q., Gong, Y., Wu, J., Huang, H., Huang, X.: Retweet prediction with attention-based deep neural network. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM (2016)
Zhao, W.X., et al.: Comparing Twitter and Traditional Media Using Topic Models. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Celayes, P.G., Domínguez, M.A. (2018). Prediction of User Retweets Based on Social Neighborhood Information and Topic Modelling. In: Castro, F., Miranda-Jiménez, S., González-Mendoza, M. (eds) Advances in Computational Intelligence. MICAI 2017. Lecture Notes in Computer Science(), vol 10633. Springer, Cham. https://doi.org/10.1007/978-3-030-02840-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-02840-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02839-8
Online ISBN: 978-3-030-02840-4
eBook Packages: Computer ScienceComputer Science (R0)