Abstract
The massive amounts of data being generated in the current world of information technology have increased from terabytes to petabytes in volume. The fact that extracting knowledge from large-scale data is a challenging issue creates a great demand for cloud computing because of its potential benefits such as scalable storage and processing services. Considering this motivation, this chapter introduces a novel framework, data mining in cloud computing (DMCC), that allows users to apply classification, clustering, and association rule mining methods on huge amounts of data efficiently by combining data mining, cloud computing, and parallel computing technologies. The chapter discusses the main architectural components, interfaces, features, and advantages of the proposed DMCC framework. This study also compares the running times when data mining algorithms are executed in serial and parallel in a cloud environment through DMCC framework. Experimental results show that DMCC greatly decreases the execution times of data mining algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Geng X, Yang Z (2013) Data mining in cloud computing. International conference on information science and computer applications, Atlantis Press, 1–7
Petre R (2012) Data mining in cloud computing. Database Syst J 3(3):67–71
Kamala B (2013) A study on integrated approach of data mining and cloud mining. Int J Adv Comput Sci Cloud Comput 1(2):35–38
Hu T, Chen H, Huang L, Zhu X (2012) A survey of mass data mining based on cloud-computing. IEEE conference on anti-counterfeiting, security and identification, 1–4
Zhou L, Wang H, Wang W (2012) Parallel implementation of classification algorithms based on cloud computing environment. Indonesian J Electr Eng 10(5):1087–1092
Tan A X, Liu VL, Kantarcioglu M, Thuraisingham B (2010) A comparison of approaches for large-scale data mining – utilizing MapReduce in large-scale data mining, Technical Report
Nappinna V, Revathi N (2013) Data mining over large datasets using hadoop in cloud environment. Int J Comput Sci Commun Netw 3(2):73–78
Grossman RL, Gu Y (2008) Data mining using high performance data clouds: experimental studies using sector and sphere. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, 920–927
Mishra N, Sharma S, Pandey A (2013) High performance cloud data mining algorithm and data mining in clouds. IOSR J Comput Eng 8(4):54–61
Mahendra TV, Deepika N, Rao NK (2012) Data mining for high performance data cloud using association rule mining. Int J Adv Res Comput Sci Softw Eng 2(1)
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphLab: a framework for machine learning and data mining in the cloud. Proc Very Large Data Bases (VLDB) Endowment 5(8):716–727
Marozzo F, Talia D, Trunfio P (2011) A cloud framework for parameter sweeping data mining applications. In: Proceedings of the IEEE 3th international conference on cloud computing technology and science, 367–374
Villalpando LEV, April A, Abran A (2014) DIPAR: a framework for implementing big data science in organizations. In: Mahmood Z (ed) Continued rise of the cloud: advances and trends in cloud computing. Springer, London
Qureshi Z, Bansal J, Bansal S (2013) A survey on association rule mining in cloud computing. Int J Emerg Tech Adv Eng 3(4):318–321
Kamalraj R, Kannan AR, Vaishnavi S, Suganya V (2012) A data mining based approach for introducing products in SaaS (Software as a service). Int J Eng Innov Res 1(2):210–214
Apiletti D, Baralis E, Cerquitelli T, Chiusano S, Grimaudo L (2013) SeARuM: a cloud-based service for association rule mining. In: Proceedings of the 12th IEEE international conference on trust, security and privacy in computing and communications, 2013, 1283–1290
Wu Z, Cao J, Fang C (2012) Data cloud for distributed data mining via pipelined mapreduce, vol 7103, Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 316–330
Ismail L, Masud MM, Khan L (2014) FSBD: a framework for scheduling of big data mining in cloud computing. In: Proceedings of the IEEE international congress on big data, 514–521
Masih S, Tanwani S (2014) Distributed framework for data mining as a service on private cloud. Int J Eng Res Appl 4(11):65–70
Huang JW, Lin SC, Chen MS (2010) DPSP: Distributed progressive sequential pattern mining on the cloud, vol 6119, Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 27–34
Li Z (2014) Massive XML data mining in cloud computing environment. J Multimed 9(8):1011–1016
Ruan S (2012) Based on cloud-computing’s web data mining, vol 289, Communications in computer and information science. Springer, Berlin/Heidelberg, pp 241–248
Lal K, Mahanti NC (2010) A novel data mining algorithm for semantic web based data cloud. Int J Comput Sci Secur 4(2):160–175
Ioannou ZM, Nodarakis N, Sioutas S, Tsakalidis A, Tzimas G (2014) Mining biological data on the cloud – a MapReduce approach, IFIP advances in information and communication technology, vol. 437. Springer, 96–105
Yıldırım P, Birant D (2014) Naive bayes classifier for continuous variables using novel method (NBC4D) and distributions. In: Proceedings of the IEEE international symposium on innovations in intelligent systems and applications, 110–115
Erl T, Puttini R, Mahmood Z (2013) Cloud computing: concepts, technology, & architecture. Prentice Hall, Upper Saddle River
Mahmood Z (2011) Cloud computing for enterprise architectures: concepts, principles and approaches. In: Mahmood Z, Hill R (eds) Cloud computing for enterprise architectures. Springer, London/New York
Fernandez A, Rio S, Herrera F, Benitez JM (2013) An overview on the structure and applications for business intelligence and data mining in cloud computing, vol 172, Advances in intelligent systems and computing. Springer, Berlin/Heidelberg, pp 559–570
Lin Y (2012) Study of layers construct for data mining platform based on cloud computing, vol 345, Communications in computer and information science. Springer, Berlin/Heidelberg
Wu X, Hou J, Zhuo S, Zhang W (2013) Dynamic pricing strategy for cloud computing with data mining method, vol 207, Communications in computer and information science. Springer, Berlin/Heidelberg, pp 40–54
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Birant, D., Yıldırım, P. (2016). A Framework for Data Mining and Knowledge Discovery in Cloud Computing. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-31861-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31859-2
Online ISBN: 978-3-319-31861-5
eBook Packages: Computer ScienceComputer Science (R0)