Abstract
The growing amount of information generated by big data systems has driven the use of tools that facilitate their processing, such as Hadoop and its entire ecosystem. These tools can run on computational clouds whose benefits include payment on-demand, self-service, and elasticity. This article evaluates three cloud services that delivers fully-configured Hadoop ecosystems: AWS Elastic Map Reduce (EMR), Google Dataproc, and Microsoft HDInsight. This evaluation was made by measuring their performance and computational resource consumption by performing workloads using data from the Bolsa Família, a social welfare program of the Brazilian Government. The results showed that HDInsight had better runtime performance. Variations in the consumption of resources related to memory, disk activity, cost, and processing were found, providing an insight into the strategy of each provider that can be useful in the decision-making processes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
JDK - Java Development Kit.
References
Hashem, I., Yaqoob, I., Anua, N., Mokhtar, S., Gani, A., Khan, S.: The rise of “big data” on cloud computing, review and open research issues. Inf. Syst. 47, 98–115 (2015)
Feller, E., Ramakrishnan, L., Morin, C.: Performance and energy efficiency of big data applications in cloud environments. J. Parallel Distrib. Comput. 79–80, 80–89 (2015)
Huane, L.: Big data drives cloud adoption in enterprise. IEEE Internet Comput. 17, 68–71 (2013)
Brasil, Cidadãos e Justiça. http://www.brasil.gov.br/cidadania-e-justica/2017/05/cidadaos-tem-acesso-a-dados-do-cadastro-unico-na-internet. Accessed 23 June 2019
DataProc, Google. https://cloud.google.com/dataproc. Accessed 23 Dec 2019
Amazon Web Services, Amazon. https://docs.aws.amazon.com. Accessed 23 June 2019
Mell, P., Grance, T.: The NIST definition of cloud computing. National Institute of Standards and Technology (2011)
Gartner, magic quadrant for cloud infrastructure as a service, worldwide. https://www.gartner.com/doc/reprints?id=1-1CMAPXNO&ct=190709&st=sb. Accessed 22 Dec 2019
Correia, R.C.M., et al.: Hadoop cluster deployment: a methodological approach. Information (2019). http://www.mdpi.com/2078-2489/9/6/131
Zicari, R.V., Akerkar, R. (eds.): Big Data Computing. CRC Press, Boco Raton (2014)
Franco, A.L., Bessa, G.M.A.: Aplicabilidade, utilidade e ganhos do Big Data utilizando a ferramenta Hadoop, Caderno de Estudos em Sistemas de Informação (2016)
Kaur, P.D., Kaur, A., Kaur, S.: Performance Analysis in Bigdata, Int. J. Inf. Technol. Comput. Sci. (2015)
Azevêdo, E.M., et al.: Nuvem Pública vesrus Privada. In: Anais X Workshop em Clouds e Aplicações (WCGA, Variações de desempenho de Infraestrutura para Elasticidade, p. 2012 (2012)
Assunção, M.D., Calheiros, R.N., Neto, M.A.S., Bianchi, S., Buyya, R.: Big Data computing and clouds, trends and future directions. J. Parallel Distrib. Comput. 79, 3–15 (2015)
Haikal, L.: Prevenção da Dengue utilizando o sistema especialista para Big Data Hadoop. Revista Academus - Gestão e Tecnologia (2017)
Scolati, R., Fronza, I., El Ioini, N., Samir, A., Pahl, C.: A containerized big data streaming architecture for edge cloud computing on clustered single-board devices (2019)
BigPanda, Big Panda: Autonomous Operations, Intelligent Automation for IT Incident Management. https://www.bigpanda.io/. Accessed 18 Apr 2019
StreamSets, StreamSets: Where DevOps Meets Data Integration, Efficiency. Agility. Reliability. Confidence. https://streamsets.com/. Accessed 18 Apr 2019
LuxCer, WebAction. http://webaction.luxcer.com/platform/. Accessed 18 Apr 2019
Amazon, EMR. https://aws.amazon.com/emr/. Accessed 18 Apr 2019
Microsoft, Azure. https://azure.microsoft.com. Accessed 18 Oct 2019
Apache, Pig. https://pig.apache.org/. Accessed 18 Apr 2019
MAPR, TeraSort Benchmark Comparison for YARN. https://mapr.com/whitepapers/terasort-benchmark-comparison-yarn/assets/terasort-comparison-yarn.pdf. Accessed 18 Apr 2019
Nghiem, P., Figueira, S.: Towards efficient resource provisioning in MapReduce. J. Parallel Distrib. Comput. 95, 29–41 (2016)
Reinsel, D., Gantz, J., Rydning, J.: The digitization of the world: from edge to core, IDC (2018)
Uthayasankar, S., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)
Xuewei, L., Xue Yan, L.: Big data and its key technology in the future. Comput. Sci. Eng. 20, 75–88 (2018)
Matrizes, E., Schroeder, R.: Big data: shaping knowledge, shaping everyday life, vol. 12, pp. 135–163 (2018). https://www.revistas.usp.br/matrizes/article/view/149604
Hauger, D.: Windows Azure General Availability. https://blogs.microsoft.com/blog/2010/02/01/windows-azure-general-availability/. Accessed 18 May 2019
Microsoft, HDInsight. https://azure.microsoft.com/pt-br/services/hdinsight. Accessed 18 Dec 2019
Apache, Spark. https://spark.apache.org/. Accessed 18 Dec 2019
Apache, Hive. https://hive.apache.org/. Accessed 18 Dec 2019
Apache, Storm. https://storm.apache.org/. Accessed 18 Dec 2019
Apache, Kafta. https://kafka.apache.org/. Accessed 18 Dec 2019
Apache, Hbase. https://hbase.apache.org/. Accessed 18 Dec 2019
The Linux Foundation, Prometheus. https://prometheus.io/. Accessed 18 Dec 2019
The Linux Foundation, Node Exporter. https://prometheus.io/docs/guides/node-exporter/. Accessed 18 Dec 2019
Grafana Labs, Grafana. https://grafana.com/. Accessed 18 Dec 2019
Amazon Web Services, EBS Volume Types. https://docs.aws.amazon.com/pt-br/AWSEC2/latest/UserGuide/ebs-volume-types.html. Accessed 18 Aug 2020
Google, Google Cloud Platafform. https://cloud.google.com/compute/docs/disks/performance. Accessed 18 Aug 2020
Microsoft Azure, HDInsight. https://docs.microsoft.com/pt-br/azure/hdinsight/hdinsight-hadoop-use-blob-storage. Accessed 18 Aug 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
de Carvalho, L.R., da Cruz Motta, M.A., de Araújo, A.P.F. (2021). Performance Analysis of Main Public Cloud Big Data Services Processing Brazilian Government Data. In: Nesmachnow, S., Castro, H., Tchernykh, A. (eds) High Performance Computing. CARLA 2020. Communications in Computer and Information Science, vol 1327. Springer, Cham. https://doi.org/10.1007/978-3-030-68035-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-68035-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68034-3
Online ISBN: 978-3-030-68035-0
eBook Packages: Computer ScienceComputer Science (R0)