Skip to main content

Task Scheduling for Processing Big Graphs in Heterogeneous Commodity Clusters

  • Conference paper
  • First Online:
High Performance Computing (CARLA 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 796))

Included in the following conference series:

  • 816 Accesses

Abstract

Large-scale graph processing is a challenging problem since vertices can be arbitrarily connected, reducing locality and easily expanding the solution space. As a result, in recent years, a new breed of distributed frameworks that handle graphs efficiently has emerged. In large clusters with many resources (RAM, CPUs, network connectivity), these frameworks focus on exploiting the available resources as efficiently as possible. However, on situations where the cluster hardware is unbalanced or low in computing resources, the framework must correctly allocate tasks in order to complete execution. In this work, we compare three frameworks, the generic Fork-Join framework adapted to graph processing, and the Pregel and DPM frameworks that were originally designed for computing graphs. A link-prediction algorithm was used as case study to analyze several scheduling strategies that allocate tasks to servers in a cluster of heterogeneous characteristics. The dataset used for the experiments is a snapshot from the Twitter graph, and specifically, a subset of its users that pushed the memory requirements of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Twitter 2010 dataset, http://an.kaist.ac.kr/traces/WWW2010.html.

References

  1. Wang, P., Xu, B., Wu, Y., Zhou, X.: Link prediction in social networks: the state-of-the-art. Sci. China Inf. Sci. 58(1), 1–38 (2015). http://arxiv.org/abs/1411.5118

  2. Mallek, S., Boukhris, I., Elouedi, Z.: Community detection for graph-based similarity: application to protein binding pockets classification. Pattern Recognit. Lett. 62, 49–54 (2015). http://www.sciencedirect.com/science/article/pii/S0167865515001488

    Article  Google Scholar 

  3. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)

    Article  Google Scholar 

  4. Lu, L., Zhou, T.: Link prediction in complex networks: a survey. Phys. A 390(6), 1150–1170 (2011)

    Article  Google Scholar 

  5. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010). http://www.sciencedirect.com/science/article/pii/S0370157309002841

    Article  MathSciNet  Google Scholar 

  6. Rausch, K., Ntoutsi, E., Stefanidis, K., Kriegel, H.-P.: Exploring subspace clustering for recommendations. In: Proceedings of the 26th International Conference on Scientific and Statistical Database Management (SSDBM 2014), pp. 42:1–42:4, Aalborg, Denmark (2014)

    Google Scholar 

  7. Armentano, M., Godoy, D., Amandi, A.: Towards a followee recommender system for information seeking users in Twitter. In: Proceedings of the International Workshop on Semantic Adaptive Social Web (SASWeb 2011), ser. CEUR Workshop Proceedings, vol. 730, Girona, Spain (2011)

    Google Scholar 

  8. Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Proces. Lett. 17(1), 5–20 (2007)

    Article  MathSciNet  Google Scholar 

  9. Sui, X., Lee, T.-H., Whang, J.J., Savas, B., Jain, S., Pingali, K., Dhillon, I.: Parallel clustered low-rank approximation of graphs and its application to link prediction. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 76–95. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37658-0_6

    Chapter  Google Scholar 

  10. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  11. Mateos, C., Zunino, A., Campo, M.: An approach for non-intrusively adding malleable fork/join parallelism into ordinary JavaBean compliant applications. Comput. Lang. Syst. Struct. 36(3), 288–315 (2010)

    Google Scholar 

  12. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems (GRADES 2013), New York, USA, pp. 2:1–2:6 (2013)

    Google Scholar 

  13. Cao, L., Cho, B., Kim, H.D., Li, Z., Tsai, M.-H., Gupta, I.: Delta-SimRank computing on MapReduce. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (BigMine 2012). ACM, Beijing, China, pp. 28–35 (2012). http://doi.acm.org/10.1145/2351316.2351321

  14. Lu, H., Halappanavar, M., Kalyanaraman, A.: Parallel heuristics for scalable community detection. Parallel Comput. 47, 19–37 (2015)

    Article  MathSciNet  Google Scholar 

  15. Buzun, N., Korshunov, A., Avanesov, V., Filonenko, I., Kozlov, I., Turdakov, D., Kim, H.: EgoLP: fast and distributed community detection in billion-node social networks. In: 2014 IEEE International Conference on Data Mining Workshop, pp. 533–540 (2014)

    Google Scholar 

  16. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 International Conference on Management of Data (SIGMOD 2010), Indianapolis, USA, pp. 135–146 (2010)

    Google Scholar 

  17. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endowment 5(8), 716–727 (2012)

    Article  Google Scholar 

  18. Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. Proc. VLDB Endowment 7(12), 1047–1058 (2014)

    Article  Google Scholar 

  19. Heitmann, B.: An open framework for multi-source, cross-domain personalisation with semantic interest graphs. In: Proceedings of the Sixth ACM Conference on Recommender Systems - RecSys 2012, p. 313 (2012)

    Google Scholar 

  20. Krepska, E., Kielmann, T., Fokkink, W., Bal, H.: HipG: parallel processing of large-scale graphs. ACM SIGOPS Oper. Syst. Rev. 45(2), 3–13 (2011)

    Article  Google Scholar 

  21. Gregor, D., Lumsdaine, A.: The parallel BGL: a generic library for distributed graph computations. Parallel Object-Oriented Sci. Comput. (POOSC) (2005)

    Google Scholar 

  22. Chan, A., Dehne, F.: CGMgraph/CGMlib: implementing and testing CGM graph algorithms on PC clusters. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 117–125. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39924-7_20

    Chapter  Google Scholar 

  23. Corbellini, A., Godoy, D., Mateos, C., Schiaffino, S., Zunino, A.: DPM: a novel distributed large-scale social graph processing framework for link prediction algorithms. Future Generation Computer Systems (2017). http://www.sciencedirect.com/science/article/pii/S0167739X17302352

  24. Corbellini, A., Mateos, C., Godoy, D., Zunino, A., Schiaffino, S.: An architecture and platform for developing distributed recommendation algorithms on large-scale social networks. J. Inf. Sci. 41(5), 686–704 (2015). http://jis.sagepub.com/content/early/2015/06/06/0165551515588669.abstract

    Article  Google Scholar 

  25. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)

    Article  Google Scholar 

  26. Kim, J.-K., Shivle, S., Siegel, H.J., Maciejewski, A.A., Braun, T.D., Schneider, M., Tideman, S., Chitta, R., Dilmaghani, R.B., Joshi, R., Kaul, A., Sharma, A., Sripada, S., Vangari, P., Yellampalli, S.S.: Dynamic mapping in a heterogeneous environment with tasks having priorities and multiple deadlines. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France (2003)

    Google Scholar 

  27. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  28. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web (WWW 2010), Raleigh, NC, USA, pp. 591–600 (2010)

    Google Scholar 

  29. Faralli, S., Stilo, G., Velardi, P.: Large scale homophily analysis in Twitter using a twixonomy. In: Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI 2015). AAAI Press, Buenos Aires, Argentina, pp. 2334–2340 (2015)

    Google Scholar 

  30. Newman, M.E., Park, J.: Why social networks are different from other types of networks. Phys. Rev. E 68(3), 036122 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alejandro Corbellini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Corbellini, A., Godoy, D., Mateos, C., Schiaffino, S., Zunino, A. (2018). Task Scheduling for Processing Big Graphs in Heterogeneous Commodity Clusters. In: Mocskos, E., Nesmachnow, S. (eds) High Performance Computing. CARLA 2017. Communications in Computer and Information Science, vol 796. Springer, Cham. https://doi.org/10.1007/978-3-319-73353-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73353-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73352-4

  • Online ISBN: 978-3-319-73353-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics