Network Analysis on Provenance Graphs from a Crowdsourcing Application

  • Mark Ebden
  • Trung Dong Huynh
  • Luc Moreau
  • Sarvapali Ramchurn
  • Stephen Roberts
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7525)


Crowdsourcing has become a popular means for quickly achieving various tasks in large quantities. CollabMap is an online mapping application in which we crowdsource the identification of evacuation routes in residential areas to be used for planning large-scale evacuations. So far, approximately 38,000 micro-tasks have been completed by over 100 contributors. In order to assist with data verification, we introduced provenance tracking into the application, and approximately 5,000 provenance graphs have been generated. They have provided us various insights into the typical characteristics of provenance graphs in the crowdsourcing context. In particular, we have estimated probability distribution functions over three selected characteristics of these provenance graphs: the node degree, the graph diameter, and the densification exponent. We describe methods to define these three characteristics across specific combinations of node types and edge types, and present our findings in this paper. Applications of our methods include rapid comparison of one provenance graph versus another, or of one style of provenance database versus another. Our results also indicate that provenance graphs represent a suitable area of exploitation for existing network analysis tools concerned with modelling, prediction, and the inference of missing nodes and edges.


Degree Distribution Node Degree Community Detection Process Network Analysis Nonnegative Matrix Factorization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Altintas, I., Anand, M.K., Crawl, D., Bowers, S., Belloum, A., Missier, P., Ludäscher, B., Goble, C.A., Sloot, P.M.A.: Understanding Collaborative Studies through Interoperable Workflow Provenance. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 42–58. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Batagelj, V., Mrvar, A.: Pajek-program for large network analysis. Connections 21(2), 47–57 (1998)Google Scholar
  3. 3.
    Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K., Arbor, A.: Soylent: A Word Processor with a Crowd Inside. In: Artificial Intelligence, pp. 313–322 (2010)Google Scholar
  4. 4.
    Chung, F., Lu, L.: The average distances in random graphs with given expected degrees. Proc. Natl. Acad. Sci. USA 99, 15879–15882 (2002)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Clauset, A., Shalizi, C., Newman, M.: Power-law distributions in empirical data. SIAM Review 51, 661–703 (2009)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Kolaczyk, E.: Statistical Analysis of Network Data. Springer (2009)Google Scholar
  8. 8.
    Leskovec, J., Adamic, L., Huberman, B.: The dynamics of viral marketing. In: ACM Conference on Electronic Commerce (2006)Google Scholar
  9. 9.
    Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1(1), 2 (2007)CrossRefGoogle Scholar
  10. 10.
    Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker Graphs: An Approach to Modeling Networks. Journal of Machine Learning Research 11, 985–1042 (2010)MathSciNetMATHGoogle Scholar
  11. 11.
    Margo, D., Smogor, R.: Using provenance to extract semantic file attributes. In: Proceedings of the 2nd Conference on Theory and Practice of Provenance, TAPP 2010, p. 7. USENIX Association, Berkeley (2010)Google Scholar
  12. 12.
    Milgram, S.: The small world problem. Psychology Today 1, 61–67 (1967)Google Scholar
  13. 13.
    Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E., Van den Bussche, J.: The Open Provenance Model core specification (v1.1). Future Generation Computer Systems (July 2010)Google Scholar
  14. 14.
    Newman, M.E.J.: The structure and function of complex networks. SIAM Review 45(2), 58 (2003)CrossRefGoogle Scholar
  15. 15.
    Newman, M.: Networks: an introduction. Oxford University Press (2010)Google Scholar
  16. 16.
    Psorakis, I., Roberts, S., Ebden, M., Sheldon, B.: Overlapping community detection using Bayesian nonnegative matrix factorization. Physical Review E 83(6), 066114 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Mark Ebden
    • 1
  • Trung Dong Huynh
    • 2
  • Luc Moreau
    • 2
  • Sarvapali Ramchurn
    • 2
  • Stephen Roberts
    • 1
  1. 1.Department of Engineering ScienceUniversity of OxfordOxfordUnited Kingdom
  2. 2.Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUnited Kingdom

Personalised recommendations