Large-Scale Social Network Analysis

  • Mattia Lambertini
  • Matteo Magnani
  • Moreno Marzolla
  • Danilo Montesi
  • Carmine Paolino
Chapter

Abstract

Social Network Analysis (SNA) is an established discipline for the study of groups of individuals with applications in several areas, like economics, information science, organizational studies and psychology. In the last fifteen years the exponential growth of online Social Network Sites (SNSs) , like Facebook, QQ and Twitter has provided a new challenging application context for SNA methods. However, with respect to traditional SNA application domains these systems are characterized by very large volumes of data, and this has recently led to the development of parallel network analysis algorithms and libraries. In this chapter we provide an overview of the state of the art in the field of large scale social network analysis; in particular, we focus on parallel algorithms and libraries for the computation of network centrality metrics.

Notes

Acknowledgements

This work has been partially funded by PRIN project “Relazioni sociali e identità in rete: vissuti e narrazioni degli italiani nei siti di social network” and by FIRB project “Information monitoring, propagation analysis and community detection in Social Network Sites”. This work was done while M. Magnani and C. Paolino were with the Deptartment of Computer Science, University of Bologna.

The authors thank the CINECA supercomputing center for providing access to the IBM pSeries 575 used for part of the tests described in Sect. 6.6.

References

  1. 1.
    Anderson, W., Briggs, P., Hellberg, C.S., Hess, D.W., Khokhlov, A., Lanzagorta, M., Rosenberg, R.: Early experience with scientific programs on the cray MTA-2. In: Proceedings of 2003 ACM/IEEE Conference on Supercomputing, SC’03, Phoenix, p. 46. ACM, New York, (2003). doi:10.1145/1048935.1050196Google Scholar
  2. 2.
    Aragon, C.R., GSeidel, R.: Randomized search trees. In: Annual IEEE Symposium on Foundations of Computer Science, Research Triangle Park. IEEE Computer Society, Los Alamitos, pp 540–545 (1989). doi:10.1109/SFCS.1989.63531Google Scholar
  3. 3.
    Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Commun ACM 52, 56–67 (2009)CrossRefGoogle Scholar
  4. 4.
    Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2. In: Proceedings of International Conference on Parallel Processing, Columbus. IEEE Computer Society, Los Alamitos, pp 523–530 (2006). doi:10.1109/ICPP.2006.34Google Scholar
  5. 5.
    Bader, D.A., Madduri, K.: Parallel algorithms for evaluating centrality indices in real-world networks. In: Proceedings of 2006 International Conference on Parallel Processing, ICPP’06, Columbus, pp. 539–550. IEEE Computer Society, Washington, DC (2006). doi:10.1109/ICPP.2006.57Google Scholar
  6. 6.
    Bader, D.A., Madduri, K.: SNAP, small-world network analysis and partitioning: an open-source parallel graph framework for the exploration of large-scale networks. In: Proceedings of International Symposium on Parallel and Distributed Processing, IPDPS, Miami, pp. 1–12 (2008). doi:10.1109/IPDPS.2008.4536261Google Scholar
  7. 7.
    Bal, H.E., Maassen, J., van Nieuwpoort, R.V., Drost, N., Kemp, R., Palmer, N., Wrzesinska, G., Kielmann, T., Seinstra, F., Jacobs, C.: Real-world distributed computing with Ibis. Computer 43, 54–62 (2010). doi:10.1109/MC.2010.184CrossRefGoogle Scholar
  8. 8.
    Barabasi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 11 (1999)MathSciNetGoogle Scholar
  9. 9.
    Barrett, B.W., Berry, J.W., Murphy, R.C., Wheeler, K.B.: Implementing a portable multi-threaded graph library: the MTGL on Qthreads. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS, Rome, pp. 1–8 (2009). doi:10.1109/IPDPS.2009.5161102Google Scholar
  10. 10.
    Berry, J.W., Hendrickson, B., Kahan, S., Konecny, P.: Graph software development and performance on the MTA-2 and Eldorado. In: 48th Cray Users Group Meeting, Lugano (2006)Google Scholar
  11. 11.
    Boost: Boost C++ Libraries. Available at http://www.boost.org/ (2011)
  12. 12.
    Borkar, S.: Design challenges of technology scaling. IEEE Micro 19(4), 23–29 (1999)CrossRefGoogle Scholar
  13. 13.
    Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25, 163–177 (2001)CrossRefMATHGoogle Scholar
  14. 14.
    Buluç, A., Gilbert, J.R.: The combinatorial BLAS: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25, 496–509 (2011). doi:10.1177/1094342011403516CrossRefGoogle Scholar
  15. 15.
    Celli, F., Di Lascio, F., Magnani, M., Pacelli, B., Rossi, L.: Social network data and practices: the case of friendfeed. In: Chai, S.K., Salerno, J., Mabry, P. (eds.) Advances in Social Computing. LNCS, vol. 6007, pp 346–353. Springer, Berlin/Heidelberg (2010). doi:10.1007/978-3-642-12079-4_43CrossRefGoogle Scholar
  16. 16.
    Combinatorial BLAS: Combinatorial BLAS Library (MPI reference implementation). Version 1.1, Available at http://gauss.cs.ucsb.edu/~aydin/CombBLAS/html/index.html (2011)
  17. 17.
    Culler, D., Singh, K.P., Gupta, A.: Parallel Computer Architecture – A Hardware/Software Approach. Morgan Kaufmann, San Francisco (1998)Google Scholar
  18. 18.
    Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010). doi:10.1145/1629175.1629198CrossRefGoogle Scholar
  19. 19.
    DisNet: DisNet, A Framework for Distributed Graph Computation. Available at http://nd.edu/~dial/software.html (2011)
  20. 20.
    Du, N., Wang, H., Faloutsos, C.: Analysis of large multi-modal social networks: patterns and a generator. In: Proceedings of the 2010 European conference on Machine Learning and Knowledge Discovery in Databases: Part I, ECML PKDD’10, Barcelona, pp. 393–408. Springer, Berlin/Heidelberg, (2010). http://portal.acm.org/citation.cfm?id=1888258.1888291
  21. 21.
    Edmonds, N., Hoefler, T., Lumsdaine, A.: A space-efficient parallel algorithm for computing betweenness centrality in distributed memory. In: Proceedings of International Conference on High Performance Computing (HiPC), Dona Paula, pp. 1–10 (IEEE, 2010). doi:10.1109/HIPC.2010.5713180Google Scholar
  22. 22.
    Erdős, P., Rényi, A.: On random graphs I. Publ Math Debrecen 6, 290–297, 156 (1959)Google Scholar
  23. 23.
    Evans, B.M., Chi, E.H.: Towards a model of understanding social search. In: Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, CSCW ’08, San Diego. ACM, New York, pp. 485–494 (2008). doi:10.1145/1460563.1460641Google Scholar
  24. 24.
    Feo, J., Harper, D., Kahan, S., Konecny, P.: Eldorado. In: Proceedings of 2nd Conference on Computing Frontiers, CF ’05, Ischia. ACM, New York, pp. 28–34 (2005)Google Scholar
  25. 25.
    Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley Longman, Boston (1995)MATHGoogle Scholar
  26. 26.
    Freeman, L.C.: Centrality in social networks: a conceptual clarification. Soc. Netw. 1(3), 215–239 (1978–1979)Google Scholar
  27. 27.
    Gregor, D., Lumsdaine, A.: The Parallel BGL: A generic library for distributed graph computations. In: Parallel Object-Oriented Scientific Computing, POOSC, Glasgow (2005)Google Scholar
  28. 28.
    Hadoop.: Apache hadoop. Available at http://hadoop.apache.org/ (2011)
  29. 29.
    HipG.: HipG: High-level distributed processing of large-scale graphs. Available at http://www.cs.vu.nl/~ekr/hipg/ (2011)
  30. 30.
    Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: mining peta-scale graphs. Knowl. Inf. Syst. 27(2), 303–325 (2011). doi:10.1007/s10115-010-0305-0CrossRefGoogle Scholar
  31. 31.
    Krepska, E., Kielmann, T., Fokkink, W., Bal, H.: A high-level framework for distributed processing of large-scale graphs. In: Proceedings of the 12th International Conference on Distributed Computing and Networking, ICDCN’11, Bangalore, pp. 155–166. Springer, Berlin/Heidelberg (2011)Google Scholar
  32. 32.
    Kumar, V., Gupta, A.G.A., Karpis, G.: Introduction to Parallel Computing, 2nd edn. Addison Wesley, Harlow (2003)Google Scholar
  33. 33.
    Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic linear algebra subprograms for fortran usage. ACM Trans Math Softw 5, 308–323 (1979). doi:10.1145/355841.355847CrossRefMATHGoogle Scholar
  34. 34.
    Lichtenwalter, R.N., Chawla, N.V.: DisNet: A framework for distributed graph computation. In: Proceedings 2011 International Conference on Social Networks Analysis and Mining (ASONAM), Kaohsiung (2011, to appear)Google Scholar
  35. 35.
    Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.W.: Challenges in parallel graph processing. Parallel Process. Lett. 17(1), 5–20 (2007)CrossRefMathSciNetGoogle Scholar
  36. 36.
    Madduri, K., Bader, D.A.: Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis. In: Proceedings of International Parallel and Distributed Processing Symposium, IPDPS, Rome. IEEE Computer Society, Los Alamitos, pp. 1–11 (2009)Google Scholar
  37. 37.
    Madduri, K., Bader, D.A.: Small-world Network Analysis and Partitioning–Version 0.4. Available at http://snap-graph.sourceforge.net/ (2010)
  38. 38.
    Magnani, M., Rossi, L.: The ml-model for multi layer network analysis. In: IEEE International Conference on Advances in Social Network Analysis and Mining, Kaohsiung (2011)Google Scholar
  39. 39.
    Magnani, M., Rossi, L., Montesi, D.: Information propagation analysis in a social network site. In: 2010 International Conference on Advances in Social Networks Analysis and Mining, Odense, pp. 296–300. IEEE Computer Society, Los Alamitos (2010)Google Scholar
  40. 40.
    Message Passing Interface Forum MPI: A Message-Passing Interface Standard–Version 2.2. Available at http://www.mpi-forum.org/docs/ (2009)
  41. 41.
    Moore, G.E.: Cramming more components onto integrated circuits. Proc. IEEE 86(1), 82 (1998). doi:10.1109/JPROC.1998.658762CrossRefGoogle Scholar
  42. 42.
    Moreno, J.L., Jennings, H.H.: Who Shall Survive?: A New Approach to the Problem of Human Interrelations. Nervous and Mental Disease Publishing Co., Washington, D.C. (1934)CrossRefGoogle Scholar
  43. 43.
    OpenMP Architecture Review Board: OpenMP Application Program Interface–Version 3.1. Available at http://openmp.org/wp/ (2011)
  44. 44.
    Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: Generalizing degree and shortest paths. Soc. Netw. 32(3), 245–251 (2010)CrossRefGoogle Scholar
  45. 45.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
  46. 46.
    Pegasus: Project Pegasus. Available at http://www.cs.cmu.edu/~pegasus/ (2011)
  47. 47.
    Sandia National Laboratories: Multi-Threaded Graph Library–Version 1.0. Available at https://software.sandia.gov/trac/mtgl (2011)
  48. 48.
    Siek, J., Lee, L.Q., Lumsdaine, A.: The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley, Boston (2002)Google Scholar
  49. 49.
    Trobec, R., Vajteršic, M., Zinterhof, P. (eds.): Parallel Computing: Numerics, Applications, and Trends. Springer, Dordrecht/New York (2009). doi:10.1007/978-1-84882-409-6_1Google Scholar
  50. 50.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of “small-world” networks. Nature 393(6684), 440–442 (1998)CrossRefGoogle Scholar
  51. 51.
    Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of Third ACM International Conference on Web Search and Data Mining, WSDM ’10, New York, pp. 261–270. ACM, New York (2010). doi:10.1145/1718487.1718520Google Scholar
  52. 52.
    Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: an api for programming with millions of lightweight threads. In: 22nd IEEE International Symposium on Parallel and Distributed Processing, IPDPS. IEEE, Miami, pp. 1–8 (2008). doi:10.1109/IPDPS.2008.4536359Google Scholar
  53. 53.
    White, D., Borgatti, S.: Betweenness centrality measures for directed graphs. Soc. Netw. 16(4), 335–346 (1994). doi:10.1016/0378-8733(94)90015-9CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Mattia Lambertini
    • 1
  • Matteo Magnani
    • 2
  • Moreno Marzolla
    • 1
  • Danilo Montesi
    • 1
  • Carmine Paolino
    • 3
  1. 1.Department of Computer Science and EngineeringUniversity of BolognaBolognaItaly
  2. 2.Department of Information TechnologyUppsala UniversityUppsalaSweden
  3. 3.Department of Computer ScienceVrije UniversiteitAmsterdamThe Netherlands

Personalised recommendations