Abstract
Academia, industry and government as well, are involved in big data projects. Many researches on big data applications and technologies are actively being conducted. This paper presents a literature review of recent researches on key technologies and open issues for big data management via cloud computing. Its goal is to identify and evaluate the main technology components and their impacts on cloud-based big data implementations. This is achieved by reviewing 40 publications published in the latest four years, 2014–2017. We classified the results based on the main technical aspects: frameworks, databases and data processing techniques, and programming languages. This paper also provides a reference source for researchers and developers, to determine the best emerging technologies for big data project implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mallika, C., Selvamuthukumaran, S.: Hadoop framework: analyzes workload predicition of data from cloud computing. In: 2017 International Conference on IoT and Application (ICIOT), pp. 1–6. IEEE (2017)
Nodarakis, N., Sioutas, S., Tsakalidis, A., Tzima, G.: Using Hadoop for Large Scale Analysis on Twitter: A Technical Report. arXiv preprint arXiv:1602.01248 (2016)
Meng, S., Dou, W., Zhang, X., Chen, J.: KASR: a keyword-aware service recommendation method on MapReduce for big data applications. IEEE Trans. Parallel Distrib. Syst. 25(12), 3221–3231 (2014)
Bhimani, J., Yang, Z., Leeser, M., Mi, N.: Accelerating big data applications using lightweight virtualization framework on enterprise cloud. In: High Performance Extreme Computing Conference (HPEC), pp. 1–7. IEEE (2017)
Ortiz, J.L.R., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on hadoop vs MPI/OpenMP on Beowulf. Procedia Comput. Sci. 53, 121–130 (2015)
Zhaoa, J., Wang, L., Tao, J., Chen, J.: A security framework in G-Hadoop for big data computing across distributed Cloud data centres. J. Comput. Syst. Sci. 80(5), 994–1007 (2014)
Huang, T., Lan, L., Fang, X., An, P., Min, J., Wang, F.: Promises and challenges of big data computing in health sciences. Big Data Res. 2(1), 2–11 (2015)
Miller, J., Bowman, C., Harish, V., Quinn, S.: Open source big data analytics frameworks written in scala. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp. 389–393 (2016)
Totoni, E., Anderson, T., Shpeisman, T.: HPAT: High Performance Analytics with Scripting Ease-of-Use. arXiv preprint arXiv:1611.04934 (2016)
Khan, Z., Anjum, A., Soomro, K., Tahir, M.A.: Towards cloud based big data analytics for smart future cities. J. Cloud Comput. 4(1), 2 (2015)
Xhafa, F., Naranjo, V., Caballé, S.: Processing and analytics of big data streams with Yahoo!S4. In: 2015 IEEE 29th International Conference on Advanced Information Networking and Applications (AINA), pp. 263–270 (2015)
Baek, J., Vu, Q., Liu, J., Huang, X., Xiang, Y.: A secure cloud computing based framework for big data information management of smart grid. IEEE Trans. Cloud Comput. 3(2), 233–244 (2015)
Chandarana, P., Vijayalakshmi, M.: Big data analytics frameworks. In: Proceedings of the International Conference on Circuits, pp. 430–434. IEEE (2014). ISBN: 978-1-4799-2494-3
Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2015)
Koliopoulos, A., Yiapanis, P., Tekiner, F., Nenadic, G., Keane, J.: A parallel distributed weka framework for big data mining using spark. In: 2015 IEEE International Congress Big Data (BigData Congress), pp. 9–16 (2015)
Zicari, R., Rosselli, M., Korfiatis, N.: Setting up a big data project: challenges, opportunities, technologies and optimization. In: Studies in Big Data, vol. 18, pp. 17–47. Springer (2016)
Sharma, S., Tim, U.S., Wong, J., Gadia, S.: A brief review on leading big data models. Data Sci. J. 13, 138–157 (2014)
Matallah, H., Belalem, G.: Experimental comparative study of NoSQL databases: HBASE versus MongoDB by YCSB. Comput. Syst. Sci. Eng. 32(4), 307–317 (2017)
Dede, E., Sendir, B., Kuzlu, P., Weachock, J., Govindaraju, M., Ramakrishan, L.: Processing Cassandra datasets with Hadoop-streaming based approaches. IEEE Trans. Serv. Comput. 9(1), 46–58 (2016)
Ptiček, M., Vrdoljak, B.: MapReduce research on warehousing of big data. In: Mipro 2017 (2017)
Zhang, H., Chen, G., Ooi, B.C., Tan, K.L.: In-memory big data management and processing: a survey. IEEE Trans. Knowl. Data Eng. 27(7), 1920–1948 (2015)
Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ.-Comput. Inf. Sci. (2017)
Peng, S., Liu, R., Wang, F.: New Research on Key Technologies of Unstructured Data Cloud Storage. Francis Academic Press, UK (2017)
Dehdouh, K., Bentayeb, F., Boussaid, O., Kabachi, N.: Using the column oriented NoSQL model for implementing big data warehouses. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p. 469 (2015)
Sharma, S.: An extended classification and comparison of NoSQL big data models. arXiv preprint arXiv:1509.08035 (2015)
Chang, B.R., Tsai, H.F., Chen, C.Y., Huang, C.F., Hsu, H.T.: Implementation of secondary index on cloud computing NoSQL database in big data environment. Sci. Program. 19 (2015)
Sitalakshmi Venkatraman, K.F., Kaspi, S., Venkatraman, R.: SQL versus NoSQL Movement with Big Data Analytics (2016)
Santos, M.Y., Costa, C.: Data warehousing in big data: from multidimensional to tabular data models. In: Proceedings of the Ninth International C* Conference on Computer Science and Software Engineering, pp. 51–60. ACM (2016)
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394 (2015)
Siddiqui, T., Alkadri, M., Khan, N.A.: Review of programming languages and tools for big data analytics. Int. J. Adv. Res. Comput. Sci. 8(5) (2017)
Wu, D., Sakr, S., Zhu, L.: Big data programming models. In: Handbook of Big Data Technologies, pp. 31–63. Springer (2017)
Dobre, C., Xhafa, F.: Parallel programming paradigms and frameworks in big data era. Int. J. Parallel Prog. 42(5), 710–738 (2014)
Jackson, J.C., Vijayakumar, V., Quadir, M.A., Bharathi, C.: Survey on programming models and environments for cluster, cloud, and grid computing that defends big data. Procedia Comput. Sci. 50, 517–523 (2015)
Nystrom, N.: A scala framework for supercompilation. In: Proceedings of the 8th ACM SIGPLAN International Symposium on Scala, pp. 18–28, October 2017
Edelman, A.: Julia: a fresh approach to parallel programming. In: 2015 IEEE International Conference on Parallel and Distributed Processing Symposium (IPDPS), p. 517 (2015)
Oancea, B., Dragoescu, R.M.: Integrating R and hadoop for big data analysis. arXiv preprint arXiv:1407.4908 (2014)
Maas, M., Asanović, K., Kubiatowicz, J.: Return of the runtimes: rethinking the language runtime system for the cloud 3.0 era. In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp. 138–143, May 2017
Cuzzocrea, A., Buyya, R., Passanisi, V., Pilato, G.: MapReduce-based algorithms for managing big RDF graphs: state-of-the-art analysis, paradigms, and future directions. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 898–905 (2017)
James Stephen, J., Savvides, S., Seidel, R., Eugster, P.: Program analysis for secure big data processing. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pp. 277–288 (2014)
Fernandez, R.C., Garefalakis, P., Pietzuch, P.: Java2SDG: stateful big data processing for the masses. In: 2016 IEEE 32nd International Conference Data Engineering (ICDE), pp. 1390–1393 (2016)
The 2017 Top Programming Languages, IEEE Spectrum ranking. https://spectrum.ieee.org/computing/software/the-2017-top-programming-languages. Accessed 27 Oct 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Canaj, E., Xhuvani, A. (2018). Big Data in Cloud Computing: A Review of Key Technologies and Open Issues. In: Barolli, L., Xhafa, F., Javaid, N., Spaho, E., Kolici, V. (eds) Advances in Internet, Data & Web Technologies. EIDWT 2018. Lecture Notes on Data Engineering and Communications Technologies, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-75928-9_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-75928-9_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75927-2
Online ISBN: 978-3-319-75928-9
eBook Packages: EngineeringEngineering (R0)