Skip to main content

A Framework for Data Mining and Knowledge Discovery in Cloud Computing

  • Chapter
  • First Online:
Data Science and Big Data Computing

Abstract

The massive amounts of data being generated in the current world of information technology have increased from terabytes to petabytes in volume. The fact that extracting knowledge from large-scale data is a challenging issue creates a great demand for cloud computing because of its potential benefits such as scalable storage and processing services. Considering this motivation, this chapter introduces a novel framework, data mining in cloud computing (DMCC), that allows users to apply classification, clustering, and association rule mining methods on huge amounts of data efficiently by combining data mining, cloud computing, and parallel computing technologies. The chapter discusses the main architectural components, interfaces, features, and advantages of the proposed DMCC framework. This study also compares the running times when data mining algorithms are executed in serial and parallel in a cloud environment through DMCC framework. Experimental results show that DMCC greatly decreases the execution times of data mining algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Geng X, Yang Z (2013) Data mining in cloud computing. International conference on information science and computer applications, Atlantis Press, 1–7

    Google Scholar 

  2. Petre R (2012) Data mining in cloud computing. Database Syst J 3(3):67–71

    MathSciNet  Google Scholar 

  3. Kamala B (2013) A study on integrated approach of data mining and cloud mining. Int J Adv Comput Sci Cloud Comput 1(2):35–38

    Google Scholar 

  4. Hu T, Chen H, Huang L, Zhu X (2012) A survey of mass data mining based on cloud-computing. IEEE conference on anti-counterfeiting, security and identification, 1–4

    Google Scholar 

  5. Zhou L, Wang H, Wang W (2012) Parallel implementation of classification algorithms based on cloud computing environment. Indonesian J Electr Eng 10(5):1087–1092

    Google Scholar 

  6. Tan A X, Liu VL, Kantarcioglu M, Thuraisingham B (2010) A comparison of approaches for large-scale data mining – utilizing MapReduce in large-scale data mining, Technical Report

    Google Scholar 

  7. Nappinna V, Revathi N (2013) Data mining over large datasets using hadoop in cloud environment. Int J Comput Sci Commun Netw 3(2):73–78

    Google Scholar 

  8. Grossman RL, Gu Y (2008) Data mining using high performance data clouds: experimental studies using sector and sphere. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, 920–927

    Google Scholar 

  9. Mishra N, Sharma S, Pandey A (2013) High performance cloud data mining algorithm and data mining in clouds. IOSR J Comput Eng 8(4):54–61

    Article  Google Scholar 

  10. Mahendra TV, Deepika N, Rao NK (2012) Data mining for high performance data cloud using association rule mining. Int J Adv Res Comput Sci Softw Eng 2(1)

    Google Scholar 

  11. Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphLab: a framework for machine learning and data mining in the cloud. Proc Very Large Data Bases (VLDB) Endowment 5(8):716–727

    Google Scholar 

  12. Marozzo F, Talia D, Trunfio P (2011) A cloud framework for parameter sweeping data mining applications. In: Proceedings of the IEEE 3th international conference on cloud computing technology and science, 367–374

    Google Scholar 

  13. Villalpando LEV, April A, Abran A (2014) DIPAR: a framework for implementing big data science in organizations. In: Mahmood Z (ed) Continued rise of the cloud: advances and trends in cloud computing. Springer, London

    Google Scholar 

  14. Qureshi Z, Bansal J, Bansal S (2013) A survey on association rule mining in cloud computing. Int J Emerg Tech Adv Eng 3(4):318–321

    Google Scholar 

  15. Kamalraj R, Kannan AR, Vaishnavi S, Suganya V (2012) A data mining based approach for introducing products in SaaS (Software as a service). Int J Eng Innov Res 1(2):210–214

    Google Scholar 

  16. Apiletti D, Baralis E, Cerquitelli T, Chiusano S, Grimaudo L (2013) SeARuM: a cloud-based service for association rule mining. In: Proceedings of the 12th IEEE international conference on trust, security and privacy in computing and communications, 2013, 1283–1290

    Google Scholar 

  17. Wu Z, Cao J, Fang C (2012) Data cloud for distributed data mining via pipelined mapreduce, vol 7103, Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 316–330

    Google Scholar 

  18. Ismail L, Masud MM, Khan L (2014) FSBD: a framework for scheduling of big data mining in cloud computing. In: Proceedings of the IEEE international congress on big data, 514–521

    Google Scholar 

  19. Masih S, Tanwani S (2014) Distributed framework for data mining as a service on private cloud. Int J Eng Res Appl 4(11):65–70

    Google Scholar 

  20. Huang JW, Lin SC, Chen MS (2010) DPSP: Distributed progressive sequential pattern mining on the cloud, vol 6119, Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 27–34

    Google Scholar 

  21. Li Z (2014) Massive XML data mining in cloud computing environment. J Multimed 9(8):1011–1016

    Google Scholar 

  22. Ruan S (2012) Based on cloud-computing’s web data mining, vol 289, Communications in computer and information science. Springer, Berlin/Heidelberg, pp 241–248

    Google Scholar 

  23. Lal K, Mahanti NC (2010) A novel data mining algorithm for semantic web based data cloud. Int J Comput Sci Secur 4(2):160–175

    Google Scholar 

  24. Ioannou ZM, Nodarakis N, Sioutas S, Tsakalidis A, Tzimas G (2014) Mining biological data on the cloud – a MapReduce approach, IFIP advances in information and communication technology, vol. 437. Springer, 96–105

    Google Scholar 

  25. Yıldırım P, Birant D (2014) Naive bayes classifier for continuous variables using novel method (NBC4D) and distributions. In: Proceedings of the IEEE international symposium on innovations in intelligent systems and applications, 110–115

    Google Scholar 

  26. Erl T, Puttini R, Mahmood Z (2013) Cloud computing: concepts, technology, & architecture. Prentice Hall, Upper Saddle River

    Google Scholar 

  27. Mahmood Z (2011) Cloud computing for enterprise architectures: concepts, principles and approaches. In: Mahmood Z, Hill R (eds) Cloud computing for enterprise architectures. Springer, London/New York

    Chapter  Google Scholar 

  28. Fernandez A, Rio S, Herrera F, Benitez JM (2013) An overview on the structure and applications for business intelligence and data mining in cloud computing, vol 172, Advances in intelligent systems and computing. Springer, Berlin/Heidelberg, pp 559–570

    Google Scholar 

  29. Lin Y (2012) Study of layers construct for data mining platform based on cloud computing, vol 345, Communications in computer and information science. Springer, Berlin/Heidelberg

    Google Scholar 

  30. Wu X, Hou J, Zhuo S, Zhang W (2013) Dynamic pricing strategy for cloud computing with data mining method, vol 207, Communications in computer and information science. Springer, Berlin/Heidelberg, pp 40–54

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derya Birant .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Birant, D., Yıldırım, P. (2016). A Framework for Data Mining and Knowledge Discovery in Cloud Computing. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31861-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31859-2

  • Online ISBN: 978-3-319-31861-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics