Advertisement

An Evaluation of Distributed Processing Models for Random Walk-Based Link Prediction Algorithms Over Social Big Data

  • Alejandro Corbellini
  • Cristian Mateos
  • Daniela Godoy
  • Alejandro Zunino
  • Silvia Schiaffino
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 444)

Abstract

The problem of inferring missing relationships between people in online social networks such as Facebook, Google+ and Twitter is currently being given much attention due to its enormous applicability. To this end, link prediction algorithms which operate on graph data have been considered. However, the relentless increase of the size of such networks calls for distributed processing models able to cope with the associated big amounts of data. In this paper, we study the suitability of three models (Fork-Join, Pregel and DPM) for scaling up a common class of such algorithms, i.e. random walk-based. Broadly, Fork-Join and Pregel promote two rather different ways of creating and handling parallel sub-computations, while DPM is a model combining the best of both. Experiments performed with the Twitter graph and two classical random walk-based algorithms named HITS and SALSA show that DPM outperforms Fork-Join and Pregel by [30–40]% and [10–20]% respectively in terms of recommendation time.

Keywords

Online social networks Big data Link prediction Fork-Join Pregel HITS SALSA 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    G. Bello-Orgaz, J. J. Jung, and D. Camacho. Social big data: Recent achievements and new challenges. Information Fusion, 28:45–59, 2016.Google Scholar
  2. 2.
    A. Corbellini, C. Mateos, D. Godoy, A. Zunino, and S. Schiaffino. An architecture and platform for developing distributed recommendation algorithms on large-scale social networks. Journal of Information Science, 41(5):686–704, 2015.Google Scholar
  3. 3.
    J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.Google Scholar
  4. 4.
    G. Durand, N. Belacel, and F. LaPlante. Graph theory based model for learning path recommendation. Information Sciences, 251:10–21, 2013.Google Scholar
  5. 5.
    X. Guo and J. Lu. Intelligent e-government services with personalized recommendation techniques. International Journal of Intelligent Systems, 22(5):401–417, 2007.Google Scholar
  6. 6.
    P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh. WTF: The who to follow service at Twitter. In 22th International World Wide Web Conference (WWW 2013), pages 505–514, 2013.Google Scholar
  7. 7.
    Y. Jing, X. Zhang, L. Wu, J. Wang, Z. Feng, and D. Wang. Recommendation on Flickr by combining community user ratings and item importance. In IEEE International Conference on Multimedia and Expo (ICME 2014), pages 1–6, 2014.Google Scholar
  8. 8.
    U. Kang, B. Meeder, E. E. Papalexakis, and C. Faloutsos. Heigen: Spectral analysis for billion-scale graphs. IEEE Transactions on Knowledge and Data Engineering, 26(2):350–362, 2014.Google Scholar
  9. 9.
    J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.Google Scholar
  10. 10.
    E. Krepska, T. Kielmann, W. Fokkink, and H. Bal. HipG: Parallel processing of large-scale graphs. ACM SIGOPS Operating Systems Review, 45(2):3–13, 2011.Google Scholar
  11. 11.
    H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In 19th International Conference on World Wide Web (WWW’10), pages 591–600, 2010.Google Scholar
  12. 12.
    R. Lempel and S. Moran. SALSA: The stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131–160, 2001.Google Scholar
  13. 13.
    D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019–1031, 2007.Google Scholar
  14. 14.
    J. R. Lourenço, V. Abramova, M. Vieira, B. Cabral, and J. Bernardino. NoSQL databases: A software engineering perspective. In New Contributions in Information Systems and Technologies - WorldCIST’15, volume 353 of Advances in Intelligent Systems and Computing, pages 741–750, 2015.Google Scholar
  15. 15.
    Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716–727, 2012.Google Scholar
  16. 16.
    G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In 2010 International Conference on Management of Data (SIGMOD ‘10), pages 135–146, 2010.Google Scholar
  17. 17.
    C. Mateos, A. Zunino, and M. Hirsch. EasyFJP: Providing hybrid parallelism as a concern for divide and conquer Java applications. Computer Science and Information Systems, 10(3):1129–1163, 2013.Google Scholar
  18. 18.
    I. Neo Technology. Neo4 J. http://www.neo4j.org/, 2013. Accessed: 05-08-2013.
  19. 19.
    R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10), volume 10, pages 1–14, 2010.Google Scholar
  20. 20.
    K. Rausch, E. Ntoutsi, K. Stefanidis, and H.-P. Kriegel. Exploring subspace clustering for recommendations. In 26th International Conference on Scientific and Statistical Database Management (SSDBM ‘14), pages 42:1–42:4, 2014.Google Scholar
  21. 21.
    P. Sarkar and A. W. Moore. Social Network Data Analytics, chapter Random Walks in Social Networks and their Applications: A Survey, pages 43–77. Springer, 2011.Google Scholar
  22. 22.
    L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990.Google Scholar
  23. 23.
    X. Wang, J. Ma, and M. Xu. Group recommendation for Flickr images by 4-order tensor decomposition. Journal of Computational Information Systems, 10(3):1315–1322, 2014.Google Scholar
  24. 24.
    R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. GraphX: A resilient distributed graph system on Spark. In First International Workshop on Graph Data Management Experiences and Systems (GRADES ‘13), pages 2:1–2:6. ACM, 2013.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Alejandro Corbellini
    • 1
    • 2
  • Cristian Mateos
    • 1
    • 2
  • Daniela Godoy
    • 1
    • 2
  • Alejandro Zunino
    • 1
    • 2
  • Silvia Schiaffino
    • 1
    • 2
  1. 1.ISISTAN Research InstituteUNICEN UniversityBuenos AiresArgentina
  2. 2.Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)Buenos AiresArgentina

Personalised recommendations