Skip to main content

Performance Analysis of Main Public Cloud Big Data Services Processing Brazilian Government Data

  • Conference paper
  • First Online:
High Performance Computing (CARLA 2020)

Abstract

The growing amount of information generated by big data systems has driven the use of tools that facilitate their processing, such as Hadoop and its entire ecosystem. These tools can run on computational clouds whose benefits include payment on-demand, self-service, and elasticity. This article evaluates three cloud services that delivers fully-configured Hadoop ecosystems: AWS Elastic Map Reduce (EMR), Google Dataproc, and Microsoft HDInsight. This evaluation was made by measuring their performance and computational resource consumption by performing workloads using data from the Bolsa Família, a social welfare program of the Brazilian Government. The results showed that HDInsight had better runtime performance. Variations in the consumption of resources related to memory, disk activity, cost, and processing were found, providing an insight into the strategy of each provider that can be useful in the decision-making processes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    JDK - Java Development Kit.

References

  1. Hashem, I., Yaqoob, I., Anua, N., Mokhtar, S., Gani, A., Khan, S.: The rise of “big data” on cloud computing, review and open research issues. Inf. Syst. 47, 98–115 (2015)

    Article  Google Scholar 

  2. Feller, E., Ramakrishnan, L., Morin, C.: Performance and energy efficiency of big data applications in cloud environments. J. Parallel Distrib. Comput. 79–80, 80–89 (2015)

    Article  Google Scholar 

  3. Huane, L.: Big data drives cloud adoption in enterprise. IEEE Internet Comput. 17, 68–71 (2013)

    Article  Google Scholar 

  4. Brasil, Cidadãos e Justiça. http://www.brasil.gov.br/cidadania-e-justica/2017/05/cidadaos-tem-acesso-a-dados-do-cadastro-unico-na-internet. Accessed 23 June 2019

  5. DataProc, Google. https://cloud.google.com/dataproc. Accessed 23 Dec 2019

  6. Amazon Web Services, Amazon. https://docs.aws.amazon.com. Accessed 23 June 2019

  7. Mell, P., Grance, T.: The NIST definition of cloud computing. National Institute of Standards and Technology (2011)

    Google Scholar 

  8. Gartner, magic quadrant for cloud infrastructure as a service, worldwide. https://www.gartner.com/doc/reprints?id=1-1CMAPXNO&ct=190709&st=sb. Accessed 22 Dec 2019

  9. Correia, R.C.M., et al.: Hadoop cluster deployment: a methodological approach. Information (2019). http://www.mdpi.com/2078-2489/9/6/131

  10. Zicari, R.V., Akerkar, R. (eds.): Big Data Computing. CRC Press, Boco Raton (2014)

    Google Scholar 

  11. Franco, A.L., Bessa, G.M.A.: Aplicabilidade, utilidade e ganhos do Big Data utilizando a ferramenta Hadoop, Caderno de Estudos em Sistemas de Informação (2016)

    Google Scholar 

  12. Kaur, P.D., Kaur, A., Kaur, S.: Performance Analysis in Bigdata, Int. J. Inf. Technol. Comput. Sci. (2015)

    Google Scholar 

  13. Azevêdo, E.M., et al.: Nuvem Pública vesrus Privada. In: Anais X Workshop em Clouds e Aplicações (WCGA, Variações de desempenho de Infraestrutura para Elasticidade, p. 2012 (2012)

    Google Scholar 

  14. Assunção, M.D., Calheiros, R.N., Neto, M.A.S., Bianchi, S., Buyya, R.: Big Data computing and clouds, trends and future directions. J. Parallel Distrib. Comput. 79, 3–15 (2015)

    Article  Google Scholar 

  15. Haikal, L.: Prevenção da Dengue utilizando o sistema especialista para Big Data Hadoop. Revista Academus - Gestão e Tecnologia (2017)

    Google Scholar 

  16. Scolati, R., Fronza, I., El Ioini, N., Samir, A., Pahl, C.: A containerized big data streaming architecture for edge cloud computing on clustered single-board devices (2019)

    Google Scholar 

  17. BigPanda, Big Panda: Autonomous Operations, Intelligent Automation for IT Incident Management. https://www.bigpanda.io/. Accessed 18 Apr 2019

  18. StreamSets, StreamSets: Where DevOps Meets Data Integration, Efficiency. Agility. Reliability. Confidence. https://streamsets.com/. Accessed 18 Apr 2019

  19. LuxCer, WebAction. http://webaction.luxcer.com/platform/. Accessed 18 Apr 2019

  20. Amazon, EMR. https://aws.amazon.com/emr/. Accessed 18 Apr 2019

  21. Microsoft, Azure. https://azure.microsoft.com. Accessed 18 Oct 2019

  22. Apache, Pig. https://pig.apache.org/. Accessed 18 Apr 2019

  23. MAPR, TeraSort Benchmark Comparison for YARN. https://mapr.com/whitepapers/terasort-benchmark-comparison-yarn/assets/terasort-comparison-yarn.pdf. Accessed 18 Apr 2019

  24. Nghiem, P., Figueira, S.: Towards efficient resource provisioning in MapReduce. J. Parallel Distrib. Comput. 95, 29–41 (2016)

    Article  Google Scholar 

  25. Reinsel, D., Gantz, J., Rydning, J.: The digitization of the world: from edge to core, IDC (2018)

    Google Scholar 

  26. Uthayasankar, S., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)

    Article  Google Scholar 

  27. Xuewei, L., Xue Yan, L.: Big data and its key technology in the future. Comput. Sci. Eng. 20, 75–88 (2018)

    Article  Google Scholar 

  28. Matrizes, E., Schroeder, R.: Big data: shaping knowledge, shaping everyday life, vol. 12, pp. 135–163 (2018). https://www.revistas.usp.br/matrizes/article/view/149604

  29. Hauger, D.: Windows Azure General Availability. https://blogs.microsoft.com/blog/2010/02/01/windows-azure-general-availability/. Accessed 18 May 2019

  30. Microsoft, HDInsight. https://azure.microsoft.com/pt-br/services/hdinsight. Accessed 18 Dec 2019

  31. Apache, Spark. https://spark.apache.org/. Accessed 18 Dec 2019

  32. Apache, Hive. https://hive.apache.org/. Accessed 18 Dec 2019

  33. Apache, Storm. https://storm.apache.org/. Accessed 18 Dec 2019

  34. Apache, Kafta. https://kafka.apache.org/. Accessed 18 Dec 2019

  35. Apache, Hbase. https://hbase.apache.org/. Accessed 18 Dec 2019

  36. The Linux Foundation, Prometheus. https://prometheus.io/. Accessed 18 Dec 2019

  37. The Linux Foundation, Node Exporter. https://prometheus.io/docs/guides/node-exporter/. Accessed 18 Dec 2019

  38. Grafana Labs, Grafana. https://grafana.com/. Accessed 18 Dec 2019

  39. Amazon Web Services, EBS Volume Types. https://docs.aws.amazon.com/pt-br/AWSEC2/latest/UserGuide/ebs-volume-types.html. Accessed 18 Aug 2020

  40. Google, Google Cloud Platafform. https://cloud.google.com/compute/docs/disks/performance. Accessed 18 Aug 2020

  41. Microsoft Azure, HDInsight. https://docs.microsoft.com/pt-br/azure/hdinsight/hdinsight-hadoop-use-blob-storage. Accessed 18 Aug 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcelo Augusto da Cruz Motta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Carvalho, L.R., da Cruz Motta, M.A., de Araújo, A.P.F. (2021). Performance Analysis of Main Public Cloud Big Data Services Processing Brazilian Government Data. In: Nesmachnow, S., Castro, H., Tchernykh, A. (eds) High Performance Computing. CARLA 2020. Communications in Computer and Information Science, vol 1327. Springer, Cham. https://doi.org/10.1007/978-3-030-68035-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68035-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68034-3

  • Online ISBN: 978-3-030-68035-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics