Advertisement

Task Scheduling for Processing Big Graphs in Heterogeneous Commodity Clusters

  • Alejandro Corbellini
  • Daniela Godoy
  • Cristian Mateos
  • Silvia Schiaffino
  • Alejandro Zunino
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 796)

Abstract

Large-scale graph processing is a challenging problem since vertices can be arbitrarily connected, reducing locality and easily expanding the solution space. As a result, in recent years, a new breed of distributed frameworks that handle graphs efficiently has emerged. In large clusters with many resources (RAM, CPUs, network connectivity), these frameworks focus on exploiting the available resources as efficiently as possible. However, on situations where the cluster hardware is unbalanced or low in computing resources, the framework must correctly allocate tasks in order to complete execution. In this work, we compare three frameworks, the generic Fork-Join framework adapted to graph processing, and the Pregel and DPM frameworks that were originally designed for computing graphs. A link-prediction algorithm was used as case study to analyze several scheduling strategies that allocate tasks to servers in a cluster of heterogeneous characteristics. The dataset used for the experiments is a snapshot from the Twitter graph, and specifically, a subset of its users that pushed the memory requirements of the algorithm.

References

  1. 1.
    Wang, P., Xu, B., Wu, Y., Zhou, X.: Link prediction in social networks: the state-of-the-art. Sci. China Inf. Sci. 58(1), 1–38 (2015). http://arxiv.org/abs/1411.5118
  2. 2.
    Mallek, S., Boukhris, I., Elouedi, Z.: Community detection for graph-based similarity: application to protein binding pockets classification. Pattern Recognit. Lett. 62, 49–54 (2015). http://www.sciencedirect.com/science/article/pii/S0167865515001488 CrossRefGoogle Scholar
  3. 3.
    Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)CrossRefGoogle Scholar
  4. 4.
    Lu, L., Zhou, T.: Link prediction in complex networks: a survey. Phys. A 390(6), 1150–1170 (2011)CrossRefGoogle Scholar
  5. 5.
    Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010). http://www.sciencedirect.com/science/article/pii/S0370157309002841 MathSciNetCrossRefGoogle Scholar
  6. 6.
    Rausch, K., Ntoutsi, E., Stefanidis, K., Kriegel, H.-P.: Exploring subspace clustering for recommendations. In: Proceedings of the 26th International Conference on Scientific and Statistical Database Management (SSDBM 2014), pp. 42:1–42:4, Aalborg, Denmark (2014)Google Scholar
  7. 7.
    Armentano, M., Godoy, D., Amandi, A.: Towards a followee recommender system for information seeking users in Twitter. In: Proceedings of the International Workshop on Semantic Adaptive Social Web (SASWeb 2011), ser. CEUR Workshop Proceedings, vol. 730, Girona, Spain (2011)Google Scholar
  8. 8.
    Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Proces. Lett. 17(1), 5–20 (2007)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Sui, X., Lee, T.-H., Whang, J.J., Savas, B., Jain, S., Pingali, K., Dhillon, I.: Parallel clustered low-rank approximation of graphs and its application to link prediction. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 76–95. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37658-0_6 CrossRefGoogle Scholar
  10. 10.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  11. 11.
    Mateos, C., Zunino, A., Campo, M.: An approach for non-intrusively adding malleable fork/join parallelism into ordinary JavaBean compliant applications. Comput. Lang. Syst. Struct. 36(3), 288–315 (2010)Google Scholar
  12. 12.
    Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems (GRADES 2013), New York, USA, pp. 2:1–2:6 (2013)Google Scholar
  13. 13.
    Cao, L., Cho, B., Kim, H.D., Li, Z., Tsai, M.-H., Gupta, I.: Delta-SimRank computing on MapReduce. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (BigMine 2012). ACM, Beijing, China, pp. 28–35 (2012). http://doi.acm.org/10.1145/2351316.2351321
  14. 14.
    Lu, H., Halappanavar, M., Kalyanaraman, A.: Parallel heuristics for scalable community detection. Parallel Comput. 47, 19–37 (2015)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Buzun, N., Korshunov, A., Avanesov, V., Filonenko, I., Kozlov, I., Turdakov, D., Kim, H.: EgoLP: fast and distributed community detection in billion-node social networks. In: 2014 IEEE International Conference on Data Mining Workshop, pp. 533–540 (2014)Google Scholar
  16. 16.
    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 International Conference on Management of Data (SIGMOD 2010), Indianapolis, USA, pp. 135–146 (2010)Google Scholar
  17. 17.
    Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endowment 5(8), 716–727 (2012)CrossRefGoogle Scholar
  18. 18.
    Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. Proc. VLDB Endowment 7(12), 1047–1058 (2014)CrossRefGoogle Scholar
  19. 19.
    Heitmann, B.: An open framework for multi-source, cross-domain personalisation with semantic interest graphs. In: Proceedings of the Sixth ACM Conference on Recommender Systems - RecSys 2012, p. 313 (2012)Google Scholar
  20. 20.
    Krepska, E., Kielmann, T., Fokkink, W., Bal, H.: HipG: parallel processing of large-scale graphs. ACM SIGOPS Oper. Syst. Rev. 45(2), 3–13 (2011)CrossRefGoogle Scholar
  21. 21.
    Gregor, D., Lumsdaine, A.: The parallel BGL: a generic library for distributed graph computations. Parallel Object-Oriented Sci. Comput. (POOSC) (2005)Google Scholar
  22. 22.
    Chan, A., Dehne, F.: CGMgraph/CGMlib: implementing and testing CGM graph algorithms on PC clusters. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 117–125. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-39924-7_20 CrossRefGoogle Scholar
  23. 23.
    Corbellini, A., Godoy, D., Mateos, C., Schiaffino, S., Zunino, A.: DPM: a novel distributed large-scale social graph processing framework for link prediction algorithms. Future Generation Computer Systems (2017). http://www.sciencedirect.com/science/article/pii/S0167739X17302352
  24. 24.
    Corbellini, A., Mateos, C., Godoy, D., Zunino, A., Schiaffino, S.: An architecture and platform for developing distributed recommendation algorithms on large-scale social networks. J. Inf. Sci. 41(5), 686–704 (2015). http://jis.sagepub.com/content/early/2015/06/06/0165551515588669.abstract CrossRefGoogle Scholar
  25. 25.
    Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)CrossRefGoogle Scholar
  26. 26.
    Kim, J.-K., Shivle, S., Siegel, H.J., Maciejewski, A.A., Braun, T.D., Schneider, M., Tideman, S., Chitta, R., Dilmaghani, R.B., Joshi, R., Kaul, A., Sharma, A., Sripada, S., Vangari, P., Yellampalli, S.S.: Dynamic mapping in a heterogeneous environment with tasks having priorities and multiple deadlines. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France (2003)Google Scholar
  27. 27.
    Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  28. 28.
    Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web (WWW 2010), Raleigh, NC, USA, pp. 591–600 (2010)Google Scholar
  29. 29.
    Faralli, S., Stilo, G., Velardi, P.: Large scale homophily analysis in Twitter using a twixonomy. In: Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI 2015). AAAI Press, Buenos Aires, Argentina, pp. 2334–2340 (2015)Google Scholar
  30. 30.
    Newman, M.E., Park, J.: Why social networks are different from other types of networks. Phys. Rev. E 68(3), 036122 (2003)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Alejandro Corbellini
    • 1
  • Daniela Godoy
    • 1
  • Cristian Mateos
    • 1
  • Silvia Schiaffino
    • 1
  • Alejandro Zunino
    • 1
  1. 1.ISISTAN-CONICET, UNICENTandilArgentina

Personalised recommendations