Abstract
Large-scale graph processing is a challenging problem since vertices can be arbitrarily connected, reducing locality and easily expanding the solution space. As a result, in recent years, a new breed of distributed frameworks that handle graphs efficiently has emerged. In large clusters with many resources (RAM, CPUs, network connectivity), these frameworks focus on exploiting the available resources as efficiently as possible. However, on situations where the cluster hardware is unbalanced or low in computing resources, the framework must correctly allocate tasks in order to complete execution. In this work, we compare three frameworks, the generic Fork-Join framework adapted to graph processing, and the Pregel and DPM frameworks that were originally designed for computing graphs. A link-prediction algorithm was used as case study to analyze several scheduling strategies that allocate tasks to servers in a cluster of heterogeneous characteristics. The dataset used for the experiments is a snapshot from the Twitter graph, and specifically, a subset of its users that pushed the memory requirements of the algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Twitter 2010 dataset, http://an.kaist.ac.kr/traces/WWW2010.html.
References
Wang, P., Xu, B., Wu, Y., Zhou, X.: Link prediction in social networks: the state-of-the-art. Sci. China Inf. Sci. 58(1), 1–38 (2015). http://arxiv.org/abs/1411.5118
Mallek, S., Boukhris, I., Elouedi, Z.: Community detection for graph-based similarity: application to protein binding pockets classification. Pattern Recognit. Lett. 62, 49–54 (2015). http://www.sciencedirect.com/science/article/pii/S0167865515001488
Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)
Lu, L., Zhou, T.: Link prediction in complex networks: a survey. Phys. A 390(6), 1150–1170 (2011)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010). http://www.sciencedirect.com/science/article/pii/S0370157309002841
Rausch, K., Ntoutsi, E., Stefanidis, K., Kriegel, H.-P.: Exploring subspace clustering for recommendations. In: Proceedings of the 26th International Conference on Scientific and Statistical Database Management (SSDBM 2014), pp. 42:1–42:4, Aalborg, Denmark (2014)
Armentano, M., Godoy, D., Amandi, A.: Towards a followee recommender system for information seeking users in Twitter. In: Proceedings of the International Workshop on Semantic Adaptive Social Web (SASWeb 2011), ser. CEUR Workshop Proceedings, vol. 730, Girona, Spain (2011)
Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Proces. Lett. 17(1), 5–20 (2007)
Sui, X., Lee, T.-H., Whang, J.J., Savas, B., Jain, S., Pingali, K., Dhillon, I.: Parallel clustered low-rank approximation of graphs and its application to link prediction. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 76–95. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37658-0_6
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Mateos, C., Zunino, A., Campo, M.: An approach for non-intrusively adding malleable fork/join parallelism into ordinary JavaBean compliant applications. Comput. Lang. Syst. Struct. 36(3), 288–315 (2010)
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems (GRADES 2013), New York, USA, pp. 2:1–2:6 (2013)
Cao, L., Cho, B., Kim, H.D., Li, Z., Tsai, M.-H., Gupta, I.: Delta-SimRank computing on MapReduce. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (BigMine 2012). ACM, Beijing, China, pp. 28–35 (2012). http://doi.acm.org/10.1145/2351316.2351321
Lu, H., Halappanavar, M., Kalyanaraman, A.: Parallel heuristics for scalable community detection. Parallel Comput. 47, 19–37 (2015)
Buzun, N., Korshunov, A., Avanesov, V., Filonenko, I., Kozlov, I., Turdakov, D., Kim, H.: EgoLP: fast and distributed community detection in billion-node social networks. In: 2014 IEEE International Conference on Data Mining Workshop, pp. 533–540 (2014)
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 International Conference on Management of Data (SIGMOD 2010), Indianapolis, USA, pp. 135–146 (2010)
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endowment 5(8), 716–727 (2012)
Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. Proc. VLDB Endowment 7(12), 1047–1058 (2014)
Heitmann, B.: An open framework for multi-source, cross-domain personalisation with semantic interest graphs. In: Proceedings of the Sixth ACM Conference on Recommender Systems - RecSys 2012, p. 313 (2012)
Krepska, E., Kielmann, T., Fokkink, W., Bal, H.: HipG: parallel processing of large-scale graphs. ACM SIGOPS Oper. Syst. Rev. 45(2), 3–13 (2011)
Gregor, D., Lumsdaine, A.: The parallel BGL: a generic library for distributed graph computations. Parallel Object-Oriented Sci. Comput. (POOSC) (2005)
Chan, A., Dehne, F.: CGMgraph/CGMlib: implementing and testing CGM graph algorithms on PC clusters. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 117–125. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39924-7_20
Corbellini, A., Godoy, D., Mateos, C., Schiaffino, S., Zunino, A.: DPM: a novel distributed large-scale social graph processing framework for link prediction algorithms. Future Generation Computer Systems (2017). http://www.sciencedirect.com/science/article/pii/S0167739X17302352
Corbellini, A., Mateos, C., Godoy, D., Zunino, A., Schiaffino, S.: An architecture and platform for developing distributed recommendation algorithms on large-scale social networks. J. Inf. Sci. 41(5), 686–704 (2015). http://jis.sagepub.com/content/early/2015/06/06/0165551515588669.abstract
Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Kim, J.-K., Shivle, S., Siegel, H.J., Maciejewski, A.A., Braun, T.D., Schneider, M., Tideman, S., Chitta, R., Dilmaghani, R.B., Joshi, R., Kaul, A., Sharma, A., Sripada, S., Vangari, P., Yellampalli, S.S.: Dynamic mapping in a heterogeneous environment with tasks having priorities and multiple deadlines. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France (2003)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web (WWW 2010), Raleigh, NC, USA, pp. 591–600 (2010)
Faralli, S., Stilo, G., Velardi, P.: Large scale homophily analysis in Twitter using a twixonomy. In: Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI 2015). AAAI Press, Buenos Aires, Argentina, pp. 2334–2340 (2015)
Newman, M.E., Park, J.: Why social networks are different from other types of networks. Phys. Rev. E 68(3), 036122 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Corbellini, A., Godoy, D., Mateos, C., Schiaffino, S., Zunino, A. (2018). Task Scheduling for Processing Big Graphs in Heterogeneous Commodity Clusters. In: Mocskos, E., Nesmachnow, S. (eds) High Performance Computing. CARLA 2017. Communications in Computer and Information Science, vol 796. Springer, Cham. https://doi.org/10.1007/978-3-319-73353-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-73353-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73352-4
Online ISBN: 978-3-319-73353-1
eBook Packages: Computer ScienceComputer Science (R0)