Skip to main content

TwitCID: A Collection of Data Sets for Studies on Information Diffusion on Social Networks

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11696))

  • 1140 Accesses

Abstract

Online social networks play a crucial role in spreading information at a very large scale. Modeling information propagation on social networks has been attracting a lot of attention from researchers. However, none of the data sets used in past works are made available to the research community, while they would be very useful for comparative studies. In this paper, we detail a collection of tweets composed of five data sets for a total of 18 million tweets that we release, and which is designed to evaluate methods on modeling the information spread, in the case of general information and brands marketing information. In addition to tweet IDs and a script to retrieve the whole tweet in JSON from the Twitter API, we release the values of the 29 extracted features for these data sets. These features consist of user based, content based and temporal based features. Finally, we provide the results of information diffusion prediction models (80% accuracy) which could serve as strong baselines for this research topic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://developer.twitter.com/en/developer-terms/agreement-and-policyid34.

  2. 2.

    Institut de Recherche en Informatique de Toulouse, UMR5505 CNRS, France.

  3. 3.

    http://weka.sourceforge.net.

  4. 4.

    This algorithm creates synthetic observations based upon the existing minority observations.

References

  1. Bakshy, E., Rosenn, I., Marlow, C., Adamic, L.: The role of social networks in information diffusion. In: Conference on World Wide Web, pp. 519–528. ACM (2012)

    Google Scholar 

  2. Cano Basave, A.E., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: Making sense of microposts (# MSM2013) concept extraction challenge. In: #MSM, pp. 1–15 (2013)

    Google Scholar 

  3. De Domenico, M., Lima, A., Mougel, P., Musolesi, M.: The Anatomy of a Scientific Rumor. (Nature Open Access) Scientific Reports 3, 2980 (2013)

    Google Scholar 

  4. Dong, R., Li, L., Zhang, Q., Cai, G.: Information diffusion on social media during natural disasters. IEEE Trans. Comput. Soc. Syst. 5(1), 265–276 (2018)

    Article  Google Scholar 

  5. Guille, A., Hacid, H.: A predictive model for the temporal dynamics of information diffusion in online social networks. In: International Conference on World Wide Web, pp. 1145–1152 ACM (2012)

    Google Scholar 

  6. Hoang, T.B.N., Mothe, J.: Predicting the diffusion of brand’s stories in social network. In: 19th Computational Linguistics and Intelligent Text Processing (2018)

    Google Scholar 

  7. Hoang, T.B.N., Mothe, J.: Predicting information diffusion on Twitter - analysis of predictive features. J. Comput. Sci. 28, 257–264 (2018)

    Article  Google Scholar 

  8. Hong, L., Dan, O., Davison, B.D.: Predicting popular messages in Twitter. In: International Conference on companion on World Wide Web, pp. 57–58. ACM (2011)

    Google Scholar 

  9. Kafeza, E., Kanavos, A., Makris, C., Vikatos, P.: Predicting information diffusion patterns in Twitter. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H. (eds.) AIAI 2014. IAICT, vol. 436, pp. 79–89. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44654-6_8

    Chapter  Google Scholar 

  10. Lagnier, C., Denoyer, L., Gaussier, E., Gallinari, P.: Predicting information diffusion in social networks using content and user’s profiles. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 74–85. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_7

    Chapter  Google Scholar 

  11. Lespagnol, C., Mothe, J., Ullah, M.Z.: Information nutritional label and word embedding to estimate information check-worthiness. In: SIGIR. ACM (2019). https://doi.org/10.1145/3331184.3331298

  12. Lu, R., Yang, Q.: Trend analysis of news topics on Twitter. Int. J. Mach. Learn. Comput. 2(3), 327–332 (2012)

    Article  Google Scholar 

  13. Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: sentiment analysis in Twitter. In: International Workshop on Semantic Evaluation, pp. 1–18 (2013)

    Google Scholar 

  14. Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. ACL (2011)

    Google Scholar 

  15. Röder, M., Usbeck, R., Hellmann, S., Gerber, D., Both, A.: N3-a collection of datasets for named entity recognition and disambiguation in the NLP interchange format. In: LREC, pp. 3529–3533 (2014)

    Google Scholar 

  16. Saif, H., Fernandez, M., He, Y., Alani, H.: Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-gold. In: 1st workshop on ESSEM at AIIA Conference (2013)

    Google Scholar 

  17. Sang, E.F.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Conference on Natural Language Learning at HLT-NAACL, pp. 142–147 (2003)

    Google Scholar 

  18. Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Workshop on Unsupervised Learning in NLP, pp. 53–63. ACL (2011)

    Google Scholar 

  19. Suh, B., Hong, L., Pirolli, P., Chi, E.H.: Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. In: IEEE International Conference on Social Computing, pp. 177–184. IEEE (2010)

    Google Scholar 

  20. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. JASIST 63(1), 163–173 (2012)

    Article  Google Scholar 

  21. Varshney, D., Kumar, S., Gupta, V.: Predicting information diffusion probabilities in social networks: a Bayesian networks based approach. Knowl.-Based Syst. 133, 66–76 (2017)

    Article  Google Scholar 

  22. Xie, W., Zhu, F., Liu, S., Wang, K.: Modelling cascades over time in microblogs. In: IEEE International Congress on Big Data, pp. 677–686 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thi Bich Ngoc Hoang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hoang, T.B.N., Mothe, J., Baillon, M. (2019). TwitCID: A Collection of Data Sets for Studies on Information Diffusion on Social Networks. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28577-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28576-0

  • Online ISBN: 978-3-030-28577-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics