Social Media Data in Research: Provenance Challenges

  • David CorsarEmail author
  • Milan Markovic
  • Peter Edwards
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9672)


In this paper we argue that understanding the provenance of social media datasets and their analysis is critical to addressing challenges faced by the social science research community in terms of the reliability and reproducibility of research utilising such data. Based on analysis of existing projects that use social media data, we present a number of research questions for the provenance community, which if addressed would help increase the transparency of the research process, aid reproducibility, and facilitate data reuse in the social sciences.


Provenance Social media Research process 


  1. [BT14]
    Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools and platforms. AI & Soc. 30(1), 89–116 (2014)CrossRefGoogle Scholar
  2. [CCT09]
    Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in databases: why, how, and where. Found. Trends Databases 1(4), 379–474 (2009)CrossRefGoogle Scholar
  3. [CFLV12]
    Cheney, J., Finkelstein, A., Ludascher, B., Vansummeren, S.: Principles of provenance. Dagstuhl Rep. 2(2), 84–113 (2012)Google Scholar
  4. [CYG+15]
    Cottrill, C., Yeboah, G., Gault, P., Nelson, J.D., Anable, J., Budd, T.: Tweeting transport: examining the use of twitter in transport events. In: Proceedings of the 47th Annual UTSG Conference (2015)Google Scholar
  5. [EPE+12]
    Edwards, P., Pignotti, E., Eckhardt, A., Ponnamperuma, K., Mellish, C., Bouttaz, T.: ourSpaces – design and deployment of a semantic virtual research environment. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 50–65. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. [fECoD13]
    Organisation for Economic Co-operation and Development: New data for understanding the human condition. Technical report, February 2013Google Scholar
  7. [MGC+15]
    Moreau, L., Groth, P., Cheney, J., Lebo, T., Miles, S.: The rationale of PROV. Web Semant. Sci. Serv. Agents World Wide Web 35(Part 4), 235–257 (2015)CrossRefGoogle Scholar
  8. [Mor11]
    Moreau, L.: Provenance-based reproducibility in the semantic web. Web Semant. Sci. Serv. Agents World Wide Web 9(2), 202–221 (2011)CrossRefGoogle Scholar
  9. [Tuf13]
    Tufekci, Z.: Big data: Pitfalls, methods and concepts for an emergent field. Technical report, March 2013Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Computing ScienceUniversity of AberdeenAberdeenUK

Personalised recommendations