TwitCID: A Collection of Data Sets for Studies on Information Diffusion on Social Networks

Hoang, Thi Bich Ngoc; Mothe, Josiane; Baillon, Manon

doi:10.1007/978-3-030-28577-7_5

Thi Bich Ngoc Hoang^17,18,
Josiane Mothe¹⁹ &
Manon Baillon²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11696))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1140 Accesses

Abstract

Online social networks play a crucial role in spreading information at a very large scale. Modeling information propagation on social networks has been attracting a lot of attention from researchers. However, none of the data sets used in past works are made available to the research community, while they would be very useful for comparative studies. In this paper, we detail a collection of tweets composed of five data sets for a total of 18 million tweets that we release, and which is designed to evaluate methods on modeling the information spread, in the case of general information and brands marketing information. In addition to tweet IDs and a script to retrieve the whole tweet in JSON from the Twitter API, we release the values of the 29 extracted features for these data sets. These features consist of user based, content based and temporal based features. Finally, we provide the results of information diffusion prediction models (80% accuracy) which could serve as strong baselines for this research topic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://developer.twitter.com/en/developer-terms/agreement-and-policyid34.
2.
Institut de Recherche en Informatique de Toulouse, UMR5505 CNRS, France.
3.
http://weka.sourceforge.net.
4.
This algorithm creates synthetic observations based upon the existing minority observations.

References

Bakshy, E., Rosenn, I., Marlow, C., Adamic, L.: The role of social networks in information diffusion. In: Conference on World Wide Web, pp. 519–528. ACM (2012)
Google Scholar
Cano Basave, A.E., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: Making sense of microposts (# MSM2013) concept extraction challenge. In: #MSM, pp. 1–15 (2013)
Google Scholar
De Domenico, M., Lima, A., Mougel, P., Musolesi, M.: The Anatomy of a Scientific Rumor. (Nature Open Access) Scientific Reports 3, 2980 (2013)
Google Scholar
Dong, R., Li, L., Zhang, Q., Cai, G.: Information diffusion on social media during natural disasters. IEEE Trans. Comput. Soc. Syst. 5(1), 265–276 (2018)
Article Google Scholar
Guille, A., Hacid, H.: A predictive model for the temporal dynamics of information diffusion in online social networks. In: International Conference on World Wide Web, pp. 1145–1152 ACM (2012)
Google Scholar
Hoang, T.B.N., Mothe, J.: Predicting the diffusion of brand’s stories in social network. In: 19th Computational Linguistics and Intelligent Text Processing (2018)
Google Scholar
Hoang, T.B.N., Mothe, J.: Predicting information diffusion on Twitter - analysis of predictive features. J. Comput. Sci. 28, 257–264 (2018)
Article Google Scholar
Hong, L., Dan, O., Davison, B.D.: Predicting popular messages in Twitter. In: International Conference on companion on World Wide Web, pp. 57–58. ACM (2011)
Google Scholar
Kafeza, E., Kanavos, A., Makris, C., Vikatos, P.: Predicting information diffusion patterns in Twitter. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H. (eds.) AIAI 2014. IAICT, vol. 436, pp. 79–89. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44654-6_8
Chapter Google Scholar
Lagnier, C., Denoyer, L., Gaussier, E., Gallinari, P.: Predicting information diffusion in social networks using content and user’s profiles. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 74–85. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_7
Chapter Google Scholar
Lespagnol, C., Mothe, J., Ullah, M.Z.: Information nutritional label and word embedding to estimate information check-worthiness. In: SIGIR. ACM (2019). https://doi.org/10.1145/3331184.3331298
Lu, R., Yang, Q.: Trend analysis of news topics on Twitter. Int. J. Mach. Learn. Comput. 2(3), 327–332 (2012)
Article Google Scholar
Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: sentiment analysis in Twitter. In: International Workshop on Semantic Evaluation, pp. 1–18 (2013)
Google Scholar
Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. ACL (2011)
Google Scholar
Röder, M., Usbeck, R., Hellmann, S., Gerber, D., Both, A.: N3-a collection of datasets for named entity recognition and disambiguation in the NLP interchange format. In: LREC, pp. 3529–3533 (2014)
Google Scholar
Saif, H., Fernandez, M., He, Y., Alani, H.: Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-gold. In: 1st workshop on ESSEM at AIIA Conference (2013)
Google Scholar
Sang, E.F.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Conference on Natural Language Learning at HLT-NAACL, pp. 142–147 (2003)
Google Scholar
Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Workshop on Unsupervised Learning in NLP, pp. 53–63. ACL (2011)
Google Scholar
Suh, B., Hong, L., Pirolli, P., Chi, E.H.: Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. In: IEEE International Conference on Social Computing, pp. 177–184. IEEE (2010)
Google Scholar
Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. JASIST 63(1), 163–173 (2012)
Article Google Scholar
Varshney, D., Kumar, S., Gupta, V.: Predicting information diffusion probabilities in social networks: a Bayesian networks based approach. Knowl.-Based Syst. 133, 66–76 (2017)
Article Google Scholar
Xie, W., Zhu, F., Liu, S., Wang, K.: Modelling cascades over time in microblogs. In: IEEE International Congress on Big Data, pp. 677–686 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

UPS, Université de Toulouse, IRIT, UMR5505 CNRS, Toulouse, France
Thi Bich Ngoc Hoang
University of Economics, The University of Danang, Da Nang, Vietnam
Thi Bich Ngoc Hoang
ESPE, UT2J, Université de Toulouse, IRIT, UMR5505 CNRS, Toulouse, France
Josiane Mothe
Université Capitole, Toulouse, France
Manon Baillon

Authors

Thi Bich Ngoc Hoang
View author publications
You can also search for this author in PubMed Google Scholar
Josiane Mothe
View author publications
You can also search for this author in PubMed Google Scholar
Manon Baillon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thi Bich Ngoc Hoang .

Editor information

Editors and Affiliations

Universita della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
Zurich University of Applied Sciences, Winterthur, Switzerland
Martin Braschler
University of Neuchâtel, Neuchâtel, Switzerland
Jacques Savoy
Technische Universität Wien, Vienna, Austria
Andreas Rauber
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Santiago de Compostela, Santiago de Compostela, Spain
David E. Losada
Swiss Alliance for Data-Intensive Services, Thun, Switzerland
Gundula Heinatz Bürki
University of Padua, Padua, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoang, T.B.N., Mothe, J., Baillon, M. (2019). TwitCID: A Collection of Data Sets for Studies on Information Diffusion on Social Networks. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-28577-7_5
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28576-0
Online ISBN: 978-3-030-28577-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics