Advertisement

Current Flow Betweenness Centrality with Apache Spark

  • Massimiliano BertolucciEmail author
  • Alessandro Lulli
  • Laura Ricci
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10048)

Abstract

The identification of the most central nodes of a graph is a fundamental task of data analysis. The current flow betweenness is a centrality index which considers how the information flows along all the paths of a graph, not only on the shortest ones. Finding the exact value of the current flow betweenness is computationally expensive for large graphs, so the definition of algorithms returning an approximation of this measure is mandatory. In this paper we propose a solution, based on the Gather Apply Scatter model, that estimates the current flow betweenness in a distributed setting using the Apache Spark framework. The experimental evaluation shows that the algorithm achieves high correlation with the exact value of the index and outperforms other algorithms.

Keywords

Centrality measure Thinking like a vertex Apache Spark 

References

  1. 1.
    Avrachenkov, K., Litvak, N., Medyanikov, V., Sokol, M.: Alpha current flow betweenness centrality. In: Bonato, A., Mitzenmacher, M., Prałat, P. (eds.) WAW 2013. LNCS, vol. 8305, pp. 106–117. Springer, Heidelberg (2013). doi: 10.1007/978-3-319-03536-9_9 CrossRefGoogle Scholar
  2. 2.
    Bader, D.A., Madduri, K.: Parallel algorithms for evaluating centrality indices in real-world networks. In: International Conference on Parallel Processing ICCP (2006)Google Scholar
  3. 3.
    Bertolucci, M., Lulli, A., Ricci, L., Carlini, E., Dazzi, P.: Static and dynamic big data partitioning on apache spark. In: ParCo International Conference on Parallel Computing, PARCO (2015), pp. 489–498, September 2015Google Scholar
  4. 4.
    Brandes, U.: A faster algorithm for betweenness centrality*. J. Math. Sociol. 25(2), 163–177 (2001)CrossRefzbMATHGoogle Scholar
  5. 5.
    Brandes, U., Fleischer, D.: Centrality measures based on current flow. In: Diekert, V., Durand, B. (eds.) STACS 2005. LNCS, vol. 3404, pp. 533–544. Springer, Heidelberg (2005). doi: 10.1007/978-3-540-31856-9_44 CrossRefGoogle Scholar
  6. 6.
    Carlini, E., Dazzi, P., Esposito, A., Lulli, A., Ricci, L.: Balanced graph partitioning with apache spark. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R.G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S.L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014. LNCS, vol. 8805, pp. 129–140. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-14325-5_12 Google Scholar
  7. 7.
    Carlini, E., Dazzi, P., Lulli, A., Ricci, L.: Distributed graph processing: an approach based on overlay composition. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1912–1917. ACM (2016)Google Scholar
  8. 8.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  9. 9.
    Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)CrossRefGoogle Scholar
  10. 10.
    Gonzalez, J.E., et al.: Graphx: graph processing in a distributed dataflow framework. In: OSDI 14, pp. 599–613 (2014)Google Scholar
  11. 11.
    Jiang, K.A.: Generalizing k-betweenness centrality using short paths and a parallel multithreaded implementation. In: ICPP 2009, pp. 542–549. IEEE (2009)Google Scholar
  12. 12.
    Lulli, A., Carlini, E., Dazzi, P., Lucchese, C., Ricci, L.: Fast connected components computation in large graphs by vertex pruning. IEEE Trans. Parallel Distrib. Syst. (2016). doi: 10.1109/TPDS.2016.2591038
  13. 13.
    Lulli, A., Dazzi, P., Ricci, L., Carlini, E.: A multi-layer framework for graph processing via overlay composition. In: Hunold, S., Costan, A., Giménez, D., Iosup, A., Ricci, L., Gómez Requena, M.E., Scarano, V., Varbanescu, A.L., Scott, S.L., Lankes, S., Weidendorfer, J., Alexander, M. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 515–527. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-27308-2_42 CrossRefGoogle Scholar
  14. 14.
    Lulli, A., Debatty, T., Dell’Amico, M., Michiardi, P., Ricci, L.: Scalable K-NN based text clustering. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 958–963. IEEE (2015)Google Scholar
  15. 15.
    Lulli, A., Gabrielli, L., Dazzi, P., Dell’Amico, M., Michiardi, P., Nanni, M., Ricci, L.: Improving population estimation from mobile calls: a clustering approach. In: 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 1097–1102. IEEE (2016)Google Scholar
  16. 16.
    Lulli, A., Ricci, L., Carlini, E., Dazzi, P.: Distributed current flow betweenness centrality. In: 2015 IEEE 9th International Conference on Self-adaptive and Self-organizing Systems (SASO), pp. 71–80. IEEE (2015)Google Scholar
  17. 17.
    Lulli, A., Ricci, L., Carlini, E., Dazzi, P., Lucchese, C.: Cracker: crumbling large graphs into connected components. In: 2015 IEEE Symposium on Computers and Communication (ISCC), pp. 574–581. IEEE (2015)Google Scholar
  18. 18.
    Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146. ACM (2010)Google Scholar
  19. 19.
    McCune, R.R., et al.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48, 25 (2015)CrossRefGoogle Scholar
  20. 20.
    Montresor, A., Jelasity, M.: Peersim: a scalable P2P simulator. In: IEEE Ninth Conference on Peer-to-Peer Computing, P2P 2009, pp. 99–100. IEEE (2009)Google Scholar
  21. 21.
    Newman, M.E.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)CrossRefGoogle Scholar
  22. 22.
    Rahimian, F., Payberah, A.H., Girdzijauskas, S., Jelasity, M., Haridi, S.: Ja-be-ja: A distributed algorithm for balanced graph partitioning (2013)Google Scholar
  23. 23.
    Ricci, L., Carlini, E.: Distributed virtual environments: from client server to P2P architectures. In: Proceedings of the International Conference on High Performance Computing and Simulation, HPCS 2012 (2012)Google Scholar
  24. 24.
    Schult, D.A., et al.: Exploring network structure, dynamics, and function using networkx. In: SciPy 2008, vol. 2008, pp. 11–16 (2008)Google Scholar
  25. 25.
    Xin, R., et al.: Graphx: a resilient distributed graph system on spark. In: Graph Data Management Experiences and Systems, p. 2. ACM (2013)Google Scholar
  26. 26.
    Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Massimiliano Bertolucci
    • 1
    Email author
  • Alessandro Lulli
    • 1
    • 2
  • Laura Ricci
    • 1
    • 2
  1. 1.Dipartimento di InformaticaUniversità di PisaPisaItaly
  2. 2.Istituto di Scienza e Tecnologie dell’Informazione (ISTI, CNR)PisaItaly

Personalised recommendations