Availability and Network-Aware MapReduce Task Scheduling over the Internet

  • Bing Tang
  • Qi Xie
  • Haiwu He
  • Gilles Fedak
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9528)


MapReduce offers an ease-of-use programming paradigm for processing large datasets. In our previous work, we have designed a MapReduce framework called BitDew-MapReduce for desktop grid and volunteer computing environment, that allows nonexpert users to run data-intensive MapReduce jobs on top of volunteer resources over the Internet. However, network distance and resource availability have great impact on MapReduce applications running over the Internet. To address this, an availability and network-aware MapReduce framework over the Internet is proposed. Simulation results show that the MapReduce job response time could be decreased by 27.15 %, thanks to Naive Bayes Classifier-based availability prediction and landmark-based network estimation.


MapReduce Volunteer computing Availability prediction Network distance prediction Naive Bayes Classifier 



This work is supported by the “100 Talents Project” of Computer Network Information Center of Chinese Academy of Sciences under grant no. 1101002001, and the Natural Science Foundation of Hunan Province under grant no. 2015JJ3071, and Scientific Research Fund of Hunan Provincial Education Department under grant no. 12C0121, 11C0689 and 11C0535.


  1. 1.
    Anderson, D.P.: Boinc: a system for public-resource computing and storage. In: GRID, pp. 4–10. IEEE (2004)Google Scholar
  2. 2.
    Costa, F., Silva, J.N., Veiga, L., Ferreira, P.: Large-scale volunteer computing over the internet. J. Internet Serv. Appl. 3(3), 329–346 (2012)CrossRefGoogle Scholar
  3. 3.
    Costa, F., Silva, L.M., Fedak, G., Kelley, I.: Optimizing data distribution in desktop grid platforms. Parallel Process. Lett. 18(3), 391–410 (2008)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Costa, F., Veiga, L., Ferreira, P.: Internet-scale support for map-reduce processing. J. Internet Serv. Appl. 4(1), 1–17 (2013)CrossRefGoogle Scholar
  5. 5.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  6. 6.
    Fedak, G., He, H., Cappello, F.: Bitdew: a data management and distribution service with multi-protocol file transfer and metadata abstraction. J. Netw. Comput. Appl. 32(5), 961–975 (2009)CrossRefGoogle Scholar
  7. 7.
    Jin, H., Yang, X., Sun, X.H., Raicu, I.: Adapt: Availability-aware mapreduce data placement for non-dedicated distributed computing. In: ICDCS, pp. 516–525. IEEE (2012)Google Scholar
  8. 8.
    Lee, K., Figueiredo, R.J.O.: Mapreduce on opportunistic resources leveraging resource availability. In: CloudCom, pp. 435–442. IEEE (2012)Google Scholar
  9. 9.
    Lin, H., Ma, X., Feng, W.-C.: Reliable mapreduce computing on opportunistic resources. Cluster Comput. 15(2), 145–161 (2012)CrossRefGoogle Scholar
  10. 10.
    Lu, L., Jin, H., Shi, X., Fedak, G.: Assessing mapreduce for internet computing: a comparison of hadoop and bitdew-mapreduce. In: GRID, pp. 76–84. IEEE Computer Society (2012)Google Scholar
  11. 11.
    Marozzo, F., Talia, D., Trunfio, P.: P2p-mapreduce: parallel data processing in dynamic cloud environments. J. Comput. Syst. Sci. 78(5), 1382–1402 (2012)CrossRefGoogle Scholar
  12. 12.
    Medina, A., Lakhina, A., Matta, I., Byers, J.W.: Brite: an approach to universal topology generation. In: MASCOTS, IEEE Computer Society (2001)Google Scholar
  13. 13.
    Moca, M., Silaghi, G.C., Fedak, G.: Distributed results checking for mapreduce in volunteer computing. In: IPDPS Workshops, pp. 1847–1854. IEEE (2011)Google Scholar
  14. 14.
    Ratnasamy, S., Handley, M., Karp, R.M., Shenker, S.: Topologically-aware overlay construction and server selection. In: INFOCOM (2002)Google Scholar
  15. 15.
    Song, S., Keleher, P.J., Bhattacharjee, B., Sussman, A.: Decentralized, accurate, and low-cost network bandwidth prediction. In: INFOCOM, pp. 6–10. IEEE (2011)Google Scholar
  16. 16.
    Tang, B., He, H., Fedak, G.: Parallel data processing in dynamic hybrid computing environment using mapreduce. In: ICA3PP (2014)Google Scholar
  17. 17.
    Tang, B., Moca, M., Chevalier, S., He, H., Fedak, G.: Towards mapreduce for desktop grid computing. In: 3PGCIC, pp. 193–200. IEEE Computer Society (2010)Google Scholar
  18. 18.
    Wei, B., Fedak, G., Cappello, F.: Towards efficient data distribution on computational desktop grids with bittorrent. Future Gener. Comp. Syst. 23(8), 983–989 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringHunan University of Science and TechnologyXiangtanChina
  2. 2.College of Computer Science and TechnologySouthwest University for NationalitiesChengduChina
  3. 3.Computer Network Information Center, Chinese Academy of SciencesBeijingChina
  4. 4.INRIA, LIP LaboratoryUniversity of LyonLyon Cedex 07France

Personalised recommendations