Skip to main content

Incorporating Neighborhood Information and Sentence Embedding Similarity into a Repost Prediction Model in Social Media Networks

  • Conference paper
  • First Online:
Computational Data and Social Networks (CSoNet 2022)

Abstract

Predicting repost behaviors within social media networks plays an important role in human activities analysis and influence maximization decision making. Traditional methods for repost prediction can be categorized into stochastic diffusion based models and user profile or content features based machine learning models. In this paper, we propose a new framework combining user profile, content similarity and the neighborhood information around each target link as input features to make the prediction. Here neighborhood information can be interpreted as the combination of neighbors’ user profile. Two different kinds of graph based combination models are introduced in the article. After collecting the input features, we implement the state-of-the-art machine learning methods, e.g., Logistic Regression, K-nearest Neighbors, Gaussian Naive Bayes, Deep Neural Network, Random Forest, XGBoosting and Stacking Model to predict repost probability. We evaluate our model on real dataset Weibo to compare the performance with different features and machine learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anshelevich, E., Chakrabarty, D., Hate, A., Swamy, C.: Approximation algorithms for the firefighter problem: cuts over time and submodularity. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 974–983. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10631-6_98

    Chapter  MATH  Google Scholar 

  2. Bourigault, S., Lamprier, S., Gallinari, P.: Representation learning for information diffusion through social networks: an embedded cascade model. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM 2016, pp. 573–582. ACM, New York (2016). https://doi.org/10.1145/2835776.2835817, http://doi.acm.org/10.1145/2835776.2835817

  3. Budak, C., Agrawal, D., El Abbadi, A.: Limiting the spread of misinformation in social networks. In: Proceedings of the 20th International Conference on World Wide Web, pp. 665–674. ACM (2011)

    Google Scholar 

  4. Chen, G.H., Nikolov, S., Shah, D.: A latent source model for nonparametric time series classification. In: Advances in Neural Information Processing Systems, pp. 1088–1096 (2013)

    Google Scholar 

  5. Chen, M., Zheng, Q.P., Boginski, V., Pasiliao, E.L.: Reinforcement learning in information cascades based on dynamic user behavior. In: Tagarelli, A., Tong, H. (eds.) CSoNet 2019. LNCS, vol. 11917, pp. 148–154. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34980-6_17

    Chapter  Google Scholar 

  6. Domingos, P.: Mining social networks for viral marketing. IEEE Intell. Syst. 20(1), 80–82 (2005)

    Google Scholar 

  7. Fei, H., Jiang, R., Yang, Y., Luo, B., Huan, J.: Content based social behavior prediction: a multi-task learning approach. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 995–1000. ACM (2011)

    Google Scholar 

  8. Goyal, A., Bonchi, F., Lakshmanan, L.V.: Learning influence probabilities in social networks. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 241–250. ACM (2010)

    Google Scholar 

  9. Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83(6), 1420–1443 (1978)

    Article  Google Scholar 

  10. Guille, A., Hacid, H.: A predictive model for the temporal dynamics of information diffusion in online social networks. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1145–1152. ACM (2012)

    Google Scholar 

  11. Jiang, B., et al.: Retweeting behavior prediction based on one-class collaborative filtering in social networks. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 977–980. ACM (2016)

    Google Scholar 

  12. Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 137–146. ACM, New York (2003). https://doi.org/10.1145/956750.956769, http://doi.acm.org/10.1145/956750.956769

  13. Lagnier, C., Denoyer, L., Gaussier, E., Gallinari, P.: Predicting information diffusion in social networks using content and user’s profiles. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 74–85. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_7

    Chapter  Google Scholar 

  14. Peng, H.K., Zhu, J., Piao, D., Yan, R., Zhang, Y.: Retweet modeling using conditional random fields. In: 2011 11th IEEE International Conference on Data Mining Workshops, pp. 336–343. IEEE (2011)

    Google Scholar 

  15. Qiang, Z., Pasiliao, E.L., Zheng, Q.P.: Model-based learning of information diffusion in social media networks. Appl. Netw. Sci. 4(1), 1–16 (2019). https://doi.org/10.1007/s41109-019-0215-3

    Article  Google Scholar 

  16. Reimers, N., Gurevych, I.: Making monolingual sentence embeddings multilingual using knowledge distillation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2020). https://arxiv.org/abs/2004.09813

  17. Rodriguez, M.G., Balduzzi, D., Schölkopf, B.: Uncovering the temporal dynamics of diffusion networks. arXiv preprint arXiv:1105.0697 (2011)

  18. Saito, K., Kimura, M., Ohara, K., Motoda, H.: Learning continuous-time information diffusion model for social behavioral data analysis. In: Zhou, Z.-H., Washio, T. (eds.) ACML 2009. LNCS (LNAI), vol. 5828, pp. 322–337. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-05224-8_25

    Chapter  Google Scholar 

  19. Saito, K., Nakano, R., Kimura, M.: Prediction of information diffusion probabilities for independent cascade model. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5179, pp. 67–75. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85567-5_9

    Chapter  Google Scholar 

  20. Saito, K., Ohara, K., Yamagishi, Y., Kimura, M., Motoda, H.: Learning diffusion probability based on node attributes in social networks. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS (LNAI), vol. 6804, pp. 153–162. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21916-0_18

    Chapter  Google Scholar 

  21. Shah, D., Zaman, T.: Detecting sources of computer viruses in networks: theory and experiment. SIGMETRICS Perform. Eval. Rev. 38(1), 203–214 (2010). https://doi.org/10.1145/1811099.1811063, http://doi.acm.org/10.1145/1811099.1811063

  22. Suh, B., Hong, L., Pirolli, P., Chi, E.H.: Want to be retweeted? Large scale analytics on factors impacting retweet in twitter network. In: 2010 IEEE Second International Conference on Social Computing, pp. 177–184. IEEE (2010)

    Google Scholar 

  23. Tsur, O., Rappoport, A.: What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the fifth ACM International Conference on Web Search and Data Mining, pp. 643–652. ACM (2012)

    Google Scholar 

  24. Varshney, D., Kumar, S., Gupta, V.: Predicting information diffusion probabilities in social networks: a Bayesian networks based approach. Knowl.-Based Syst. 133, 66–76 (2017)

    Article  Google Scholar 

  25. Yun, G., Zheng, Q.P., Boginski, V., Pasiliao, E.L.: Information network cascading and network re-construction with bounded rational user behaviors. In: Tagarelli, A., Tong, H. (eds.) CSoNet 2019. LNCS, vol. 11917, pp. 351–362. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34980-6_37

    Chapter  Google Scholar 

  26. Zhang, J., Tang, J., Li, J., Liu, Y., Xing, C.: Who influenced you? Predicting retweet via social influence locality. ACM Trans. Knowl. Discov. Data 9(3), 25:1–25:26 (2015). https://doi.org/10.1145/2700398, http://doi.acm.org/10.1145/2700398

  27. Zhang, M., Chen, Y.: Link prediction based on graph neural networks. arXiv preprint arXiv:1802.09691 (2018)

  28. Zhu, J., Xiong, F., Piao, D., Liu, Y., Zhang, Y.: Statistically modeling the effectiveness of disaster information in social media. In: 2011 IEEE Global Humanitarian Technology Conference (GHTC), pp. 431–436. IEEE (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhecheng Qiang .

Editor information

Editors and Affiliations

Appendices

Appendix

A Prediction Performance Measures

Measure

Definition

Formula

Accuracy

The ratio of correctly predicted observations to the total observations

\(\frac{\left( TP+TN\right) }{\left( TP+FP+FN+TN\right) }\)

Precision

The ratio of correctly predicted positive observations to the total predicted positive observations

\(\frac{TP}{\left( TP+FP\right) }\)

Recall

The ratio of correctly predicted positive observations to all the observations in actual class

\(\frac{TP}{\left( TP+FN\right) }\)

F1 Score

The weighted average of Precision and Recall

\(\frac{2 \cdot Precision\cdot Recall}{\left( Precision+Recall\right) }\)

ROCAUC

Compute area under the receiver operating characteristic curve which is True Positive Rate against False Positive Rate curve from prediction scores

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiang, Z., Pasiliao, E.L., Semenov, A., Zheng, Q.P. (2023). Incorporating Neighborhood Information and Sentence Embedding Similarity into a Repost Prediction Model in Social Media Networks. In: Dinh, T.N., Li, M. (eds) Computational Data and Social Networks . CSoNet 2022. Lecture Notes in Computer Science, vol 13831. Springer, Cham. https://doi.org/10.1007/978-3-031-26303-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26303-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26302-6

  • Online ISBN: 978-3-031-26303-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics