Advertisement

The Journal of Supercomputing

, Volume 72, Issue 8, pp 3136–3155 | Cite as

Formal performance evaluation of the Map/Reduce framework within cloud computing

  • M. Carmen RuizEmail author
  • Diego Cazorla
  • Diego Pérez
  • Javier Conejero
Article

Abstract

The recent appearance, evolution and massive expansion of social media-based technologies, in conjunction with what currently is known as Internet of Things, results in a vertiginous data production. One of the main contributions to address this matter has been the Hadoop framework (which implements the Map/Reduce paradigm), especially when used in conjunction with Cloud computing environments. In this paper, a comprehensive and rigourous study of the Map/Reduce framework using formal methods is presented. Specifically, the Timed Process Algebra BTC is used, and the resulting formal model is evaluated with a real social media data Hadoop-based application. Moreover, the formal model is validated by carrying out several experiments on a real private Cloud environment. Finally, the formal model outcomes are harnessed to determine the best performance–cost agreement in a real scenario. Results show that the proposed model enables to determine in advance both the performance of a Hadoop-based application within Cloud environments and the best performance–cost agreement.

Keywords

Big data Performance analysis Process algebra  Map/Reduce Hadoop 

References

  1. 1.
    Amazon Calculator—Simple Monthly Calculator. http://calculator.s3.amazonaws.com/calc5.html. Accessed 21 July 2015
  2. 2.
    Anderson P (2007) What is Web 2.0? Ideas, technologies and implications for education. In: JISC Online ReportGoogle Scholar
  3. 3.
    Apache Hadoop (2015) http://hadoop.apache.org/. Accessed 21 July 2015
  4. 4.
    Babu S (2010) Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM symposium on cloud computing (SoCC ’10ACM), New York, pp 137–142Google Scholar
  5. 5.
    CentOS (2015) http://www.centos.org/. Accessed 21 July 2015
  6. 6.
    Conejero J, Rana O, Burnap P, Morgan J (2013) Scaling archived social media data analysis using a hadoop cloud. In: IEEE 6th international conference on cloud computing (CLOUD). Santa ClaraGoogle Scholar
  7. 7.
    COSMOS: Cardiff On-line Social Media Observatory (2013). http://www.cs.cf.ac.uk/cosmos/. Accessed 21 July 2015
  8. 8.
    Freitas L, Woodcock J (2007) FDR explorer. Electron Notes Theor Comput Sci 187:19–34CrossRefzbMATHGoogle Scholar
  9. 9.
    Hoare C (1985) Communicating sequential processes. Prentice Hall, Englewood CliffszbMATHGoogle Scholar
  10. 10.
    Jiang D, Ooi BC, Shi L, Wu S (2010) The performance of MapReduce: an in-depth study. Proc VLDB Endow 3(1–2):472–483CrossRefGoogle Scholar
  11. 11.
    Kernel Based Virtual Machine (2015) http://www.linux-kvm.org/. Accessed 21 July 2015
  12. 12.
    Ono K, Hirai Y, Tanabe Y, Noda N, Hagiya M (2011) Using Coq in Specification and Program Extraction of Hadoop MapReduce applications. In: Proceedings of the 9th international conference on software engineering and formal methods (SEFM’11), Springer, Berlin, pp 350–365Google Scholar
  13. 13.
    OpenNebula (2015) http://opennebula.org/. Accessed 21 July 2015
  14. 14.
    Ruiz MC, Cazorla D, Cuartero F, Pardo JJ (2006) Analysis of the SET e-commerce protocol using a true concurrency process algebra. In: 21st ACM Symposium on Applied Computing (SAC-06), ACM Press, New York, pp 879–886Google Scholar
  15. 15.
    Ruiz MC, Cazorla D, Cuartero F, Pardo JJ, Maciá H (2004) A bounded true concurrency process algebra for performance evaluation. FORTE Workshops, vol 3236., Lecture Notes in Computer ScienceSpringer, Berlin, pp 143–155Google Scholar
  16. 16.
    Ruiz MC, Pérez D, Pardo JJ, Cazorla D (2009) BAL Tool. http://www.dsi.uclm.es/retics/bal/. Accessed 21 July 2015
  17. 17.
    SentiStrength (2013) The sentiment strength detection in short texts. http://sentistrength.wlv.ac.uk/. Accessed 21 July 2015
  18. 18.
    The Coq Proof Assistant (2015) http://coq.inria.fr/. Accessed 21 July 2015
  19. 19.
    Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111CrossRefGoogle Scholar
  20. 20.
    Yang F, Su W, Zhu H, Li Q (2010) Formalizing MapReduce with CSP. In: Proceedings of the 17th IEEE international conference and workshops on the engineering of computer-based systems (ECBS’2010), pp 358–367Google Scholar
  21. 21.
    Yoshimura M (2010) System design optimization for product manufacturing, 1st edn. Springer, LondonCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • M. Carmen Ruiz
    • 1
    Email author
  • Diego Cazorla
    • 1
  • Diego Pérez
    • 2
  • Javier Conejero
    • 2
  1. 1.Universidad de Castilla-La ManchaAlbaceteSpain
  2. 2.Instituto de Investigación en Informática de AlbaceteAlbaceteSpain

Personalised recommendations