Skip to main content

The Scalability of Volunteer Computing for MapReduce Big Data Applications

Part of the Communications in Computer and Information Science book series (CCIS,volume 727)

Abstract

Volunteer Computing (VC) has been successfully applied to many compute-intensive scientific projects to solve embarrassingly parallel computing problems. There exist some efforts in the current literature to apply VC to data-intensive (i.e. big data) applications, but none of them has confirmed the scalability of VC for the applications in the opportunistic volunteer environments. This paper chooses MapReduce as a typical computing paradigm in coping with big data processing in distributed environments and models it on DHT (Distributed Hash Table) P2P overlay to bring this computing paradigm into VC environments. The modelling results in a distributed prototype implementation and a simulator. The experimental evaluation of this paper has confirmed that the scalability of VC for the MapReduce big data (up to 10 TB) applications in the cases, where the number of volunteers is fairly large (up to 10K), they commit high churn rates (up to 90%), and they have heterogeneous compute capacities (the fastest is 6 times of the slowest) and bandwidths (the fastest is up to 75 times of the slowest).

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-981-10-6385-5_14
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-981-10-6385-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.

References

  1. Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.N.: MapReduce with communication overlap (MaRCO). J. Parallel Distrib. Comput. 73(5), 608–620 (2013)

    CrossRef  Google Scholar 

  2. Afrati, F., Dolev, S., Sharma, S., Ullman, J.D.: Meta-MapReduce: a technique for reducing communication in MapReduce computations (2015). arXiv preprint arXiv:1508.01171

  3. Bruno, R., Ferreira, P.: FreeCycles: efficient data distribution for volunteer computing. In: Proceedings of the Fourth International Workshop on Cloud Data and Platforms (2014)

    Google Scholar 

  4. Climateprediction.net (2016). http://www.climateprediction.net

  5. Costa, F., Veiga, L., Ferreira, P.: Internet-scale support for map-reduce processing. J. Internet Serv. Appl. 4, 18 (2013)

    CrossRef  Google Scholar 

  6. Costa, F., Silva, L., Dahlin, M.: Volunteer cloud computing: MapReduce over the Internet. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing Workshops and Ph.D. Forum (IPDPSW), pp. 1855–1862 (2011)

    Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    CrossRef  Google Scholar 

  8. FiND@Home (2016). http://findah.ucd.ie

  9. Hadoop (2014). https://wiki.apache.org/hadoop/ProjectDescription

  10. Kaffille, S., Loesing, K.: Open Chord Version 1.0. 4 User’s Manual. The University of Bamberg, Germany (2007)

    Google Scholar 

  11. Korpela, E.J.: SETI@home, BOINC, and volunteer distributed computing. Annu. Rev. Earth Planet. Sci. 40, 69–87 (2012)

    CrossRef  Google Scholar 

  12. Li, W., Franzinelli, E.: Decentralizing volunteer computing coordination. In: Che, W., et al. (eds.) ICYCSEE 2016. CCIS, vol. 623, pp. 299–313. Springer, Singapore (2016). doi:10.1007/978-981-10-2053-7_27

    CrossRef  Google Scholar 

  13. Li, W., Guo, W., Franzinelli, E.: Achieving dynamic workload balancing for P2P volunteer computing. In: Proceedings of the 44th International Conference on Parallel Processing Workshops (ICPPW), pp. 240–249 (2015)

    Google Scholar 

  14. Lin, H., Ma, X., Archuleta, J., Feng, W.C., Gardner, M., Zhang, Z.: Moon: MapReduce on opportunistic environments. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 95–106 (2010)

    Google Scholar 

  15. Marozzo, F., Talia, D., Trunfio, P.: P2P-MapReduce: parallel data processing in dynamic cloud environments. J. Comput. Syst. Sci. 78(5), 1382–1402 (2012)

    CrossRef  Google Scholar 

  16. Oracle: An Enterprise Architect’s Guide to Big Data - Reference Architecture Overview. Oracle Enterprise Architecture White Paper (2016)

    Google Scholar 

  17. Sarmenta, L.: Volunteer Computing. Ph.D., thesis, Massachusetts Institute of Technology (2001)

    Google Scholar 

  18. Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.R., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup protocol for Internet applications. IEEE/ACM Trans. Netw. (TON) 11(1), 17–32 (2003)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Li, W., Guo, W. (2017). The Scalability of Volunteer Computing for MapReduce Big Data Applications. In: Zou, B., Li, M., Wang, H., Song, X., Xie, W., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 727. Springer, Singapore. https://doi.org/10.1007/978-981-10-6385-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6385-5_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6384-8

  • Online ISBN: 978-981-10-6385-5

  • eBook Packages: Computer ScienceComputer Science (R0)