Abstract
Current era is witnessing data explosion being generated from a wide range of resources including RFID (Radio-frequency identification), sensors, web logs, social media, IoT (Internet of Things) devices and many more. Pace at which data is being generated routinely in all the task performed by us has overwhelmed the proficiency and working of present infrastructure and analytical solutions available. Data has become the driving force of economy and has been treated as an asset for an organization. It contains truth or facts that can be interpreted and manipulated to gain insight for knowledge discovery. To excel out in competition enterprises are escalating their big data projects for knowledge discovery to gain valuable insights. These projects require scalable architectures for storage and data processing. Data-centric technologies are gaining impetus which can be provisioned as service to the organizations. Cloud computing is an effective and promising solution for refined analytical application. Cloud computing model supports resources to be provisioned as service. Herein paper we examine the requirements for provisioning Big Data Knowledge Discovery as a service. In addition, we explore the prevalent big data frameworks accessible and provisioned as a service via cloud. We also explore the state-of-the- art progress in this arena with open challenges and research prospects.
Similar content being viewed by others
Data Availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Manyika, J., Chui, M., Brown, B., Bughin, J., et al. (2011). Big Data: The next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute.
Singh, N., Singh, D. P., & Pant, B. A. (2017). Comprehensive Study of big data machine learning approaches and challenges. In Proceedings of the International Conference on Next Generation Computing and Information Systems (ICNGCIS); 2017 Dec 11–12; MIET Jammu, India: IEEE; pp. 80–85.
Cardoso, A., & Simões, P. (2011). Cloud computing: Concepts, technologies and challenges. In: International Conference on Virtual and Networked Organizations, Emergent Technologies, and Tools; Jul: Springer, Berlin, and Heidelberg, pp. 127–136.
Math, R. (2018). Big Data Analytics: Recent and Emerging Application in Services Industry. Big Data Analytics. Springer.
Chebbi, I., Wadii, B., & Imed, R. F. (2015). Big Data: Concepts, Challenges and Applications. Computational Collective Intelligence. Springer.
Skourletopoulos, G., Mavromoustakis, C.X., Mastorakis, G., Batalla, J.M., Dobre, C., Panagiotakis, S., & Pallis, E. (2017). Big data and cloud computing: A survey of the state-of-the-art and research challenges. In Advances in Mobile Cloud Computing and Big Data in the 5G Era, Springer, 23–41.
Singh, N., Singh, D. P., & Pant, B. (2019). Big data knowledge discovery platforms: A 360 degree perspective. International Journal of Engineering and Advanced Technology (IJEAT), 9(2), 2424–2433.
Mell, P., & Grance, T. (2011). The NIST definition of cloud computing. Gaithersburg, MD: National Institution of Standards and Technology (NIST).
Elshawi, R., Sakr, S., Talia, D., & Trunfio, P. (2018). Big data systems meet machine learning challenges: Towards big data science as a service. Big Data Research, 14, 1–11.
Wang, X., Yang, L. T., Liu, H., & Deen, M. J. (2017). A big data-as-a-service framework: State-of-the-art and perspectives. IEEE Transactions on Big Data, 4(3), 325–340.
Buxton, B., Goldston, D., Doctorow, C., & Waldrop, M. (2008). Big data: Science in the petabyte era. Nature, 455(7209), 8–9.
Hu, H., Wen, Y., Chua, T. S., & Li, X. (2014). Toward scalable systems for big data analytics: A technology tutorial. IEEE access, 2, 652–687.
Sakr, S. (2014). Cloud-hosted databases: technologies, challenges and opportunities. Cluster Computing, 17, 487–502.
Sakr, S. (2016). Big Data 2.0 Processing Systems: A Survey. Springer.
Sarkar, D. (2014). Introducing hdinsight. Pro Microsoft HDInsight. Apress.
Nadipalli, R. (2015). HDInsight Essentials. London: Packt Publishing Ltd.
Oussous, A., Benjelloun, F. Z., Lahcen, A. A., & Belfkih, S. (2018). Big Data technologies: A survey. Journal of King Saud University-Computer and Information Sciences, 30(4), 431–448.
Khan, S., Kashish, A. S., & Mansaf, A. (2018). Cloud-Based Big Data Analytics: A Survey of Current Research and Future Directions Big Data Analytics. Springer.
Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The rise of “big data” on cloud computing: Review and open research issues. Information systems, 47, 98–115.
Khan, S., Shakil, K. A., & Alam, M. (2018). Cloud-Based Big Data Analytics: A Survey of Current Research and Future Directions. Big Data Analytics. Springer.
Talia, D., Trunfio, P., & Marozzo, F. (2016). Data Analysis in the Cloud. Elsevier.
Gulabani, S. (2017). Practical Amazon EC2, SQS, Kinesis, and S3.
Kumar, V.D.A. et al. (2017). Cloud enabled media streaming using Amazon Web Services. In 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM). IEEE.
Gonzales, J.U., & Krishnan, S.P.T. (2015). Building your next big thing with Google Cloud Platform. Aprés 27.
Krishnan, S. P. T., & Jose, L. U. G. (2015). Google BigQuery. Building Your Next Big Thing with Google Cloud Platform. Apress.
Anil, P. et al. Cloud Object Storage as a Service, IBM Redbooks. https://www.redbooks.ibm.com/redbooks/pdfs/sg248385.pdf
Serrano, N., Gallardo, G., & Hernantes, J. (2015). Infrastructure as a service and cloud technologies. IEEE Software, 32(2), 30–36.
Copeland, M., et al. (2015). Microsoft Azure. Apress.
Klein, S. (2017). IoT Solutions in Microsoft’s Azure IoT Suite. Apress.
Reagan, R., & Cosmos, D. B. (2018). Web Applications on Azure. Apress.
Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters Communications of the ACM cessing. Communications of the ACM, 59(11), 56–65.
Singh, M.P., Hoque, M.A., & Tarkoma, S. (2016). A survey of systems for massive stream analytics. http://arxiv.org/abs/1605.09021.
A. Team (2016). AzureML: Anatomy of a machine learning service. In Proceedings of the 2nd International Conference on Predictive APIs and Apps, pp. 1–13.
Brown, P.G. (2010). Overview of SciDB: Large scale array storage, processing and analysis. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ACM, pp. 963–968
Nguyen, G., Dlugolinsky, S., Bobák, M., Tran, V., García, Á. L., Heredia, I., & Hluchý, L. (2019). Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: A survey. Artificial Intelligence Review, 52(1), 77–124.
Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., & Murthy, R. (2009). Hive: A warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2), 1626–1629.
George, L. (2011). Hbase: The Definitive Guide. O’Reilly Media Inc.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. HotCloud, 10(10–10), 95.
Hewitt, E. (2010). Cassandra: the Definitive Guide. O’Reilly Media Inc.
Franciscus, N., Ren, X., & Stantic, B. (2018). Precomputing architecture for flexible and efficient big data analytics. Vietnam Journal of Computer Science, 5(2), 133–142.
Sakr, S., Orakzai, F. M., Abdelaziz, I., & Khayyat, Z. (2016). Large-Scale Graph Processing Using Apache Giraph. Springer.
Chen, C. P., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information sciences, 275, 314–347.
Brownlee, J. (2014). BigML review: Discover the clever features in this machine learning as a service platform, 11.
Redavid, D., Malerba, D., Di Martino, B., Esposito, A., Ardagna, C.A., Bellandi, V., & Damiani, E. (2018). Semantic support for model based big data analytics-as-a- service (MBDAaaS). In Conference on Complex, Intelligent, and Software Intensive Systems, Springer, Cham, pp. 1012–1021.
Siddiqui, T., Shadab A.S., & Najeeb A.K. (2019). Comprehensive analysis of container technology. In 2019 4th International Conference on Information Systems and Computer Networks (ISCON), IEEE.
Zheng, Z., Zhu, J., & Lyu, M.R. (2013). Service-generated big data and big data-as-a- service: An overview. In 2013 IEEE International Congress on Big Data, IEEE, pp. 403–410.
Xu, X., Sheng, Q. Z., Zhang, L. J., Fan, Y., & Dustdar, S. (2015). From big data to big service. Computer, 7, 80–83.
Talia, D. (2013). Clouds for scalable big data analytics. Computer, 5, 98–101.
Ardagna, C.A., Ceravolo, P., & Damiani, E. (2016). Big data analytics as-a-service: Issues and challenges. In 2016 IEEE International Conference on Big Data (Big Data), IEEE, pp. 3638–3644.
Ahmad, I., et al. (2020). Machine learning meets communication networks: Current trends and future challenges. IEEE Access, 8, 223418–223460.
Nykvist, C., et al. (2020). A lightweight portable intrusion detection communication system for auditing applications. International Journal of Communication Systems, 33(7), e4327.
Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision making. Big Data, 1(1), 51–59.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing financial interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, N., Singh, D.P. & Pant, B. Big Data Knowledge Discovery as a Service: Recent Trends and Challenges. Wireless Pers Commun 123, 1789–1807 (2022). https://doi.org/10.1007/s11277-021-09213-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-021-09213-5