Skip to main content

A Hybrid Clustering Technique to Improve Big Data Accessibility Based on Machine Learning Approaches

  • Conference paper
  • First Online:
Information Systems Design and Intelligent Applications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 433))

Abstract

Big data is called to a large or complex data from traditional ones, which is unstructured in many case. Accessing to a specific value in a huge data that is not sorted or organized can be time consuming and require a high processing. With growing of data, clustering can be a most important unsupervised approach that finds a structure for data. In this paper, we demonstrate two approaches to cluster data with high accuracy, and then we sort data by implementing merge sort algorithm finally, we use binary search to find a data value point in a specific range of data. This research presents a high value efficiency combo method in big data by using genetic and k-means. After clustering with k-means total sum of the Euclidean distances is 3.37233e+09 for 4 clusters, and after genetic algorithm this number reduce to 0.0300344 in the best fit. In the second and third stage we show that after this implementation, we can access to a particular data much faster and accurate than other older methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tian, W.D. and Y.D. Zhao, Optimized Cloud Resource Management and Scheduling: Theories and Practices. 2014: Morgan Kaufmann.

    Google Scholar 

  2. Gupta, R., H. Gupta, and M. Mohania, Cloud Computing and Big Data Analytics: What Is New from Databases Perspective?, in Big Data Analytics. 2012, Springer. p. 42–61.

    Google Scholar 

  3. Hashem, I.A.T., et al., The rise of “big data” on cloud computing: Review and open research issues. Information Systems, 2015. 47: p. 98–115.

    Google Scholar 

  4. Fadiya, S.O., S. Saydam, and V.V. Zira, Advancing big data for humanitarian needs. Procedia Engineering, 2014. 78: p. 88–95.

    Google Scholar 

  5. Young, S.D., A “big data” approach to HIV epidemiology and prevention. Preventive medicine, 2015. 70: p. 17–18.

    Google Scholar 

  6. Liu, Z.-g., et al., Credal c-means clustering method based on belief functions. Knowledge-Based Systems, 2015. 74: p. 119–132.

    Google Scholar 

  7. Jain, A.K., Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 2010. 31(8): p. 651–666.

    Google Scholar 

  8. Ebadati E, O.M. and S. Babaie, Implementation of Two Stages k-Means Algorithm to Apply a Payment System Provider Framework in Banking Systems, in Artificial Intelligence Perspectives and Applications, R. Silhavy, et al., Editors. 2015, Springer International Publishing. p. 203–213.

    Google Scholar 

  9. Liu, Y., X. Wu, and Y. Shen, Automatic clustering using genetic algorithms. Applied Mathematics and Computation, 2011. 218(4): p. 1267–1279.

    Google Scholar 

  10. Razavi, S., et al., An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis, in Computational Intelligence for Big Data Analysis, Springer International Publishing. 2015, p. 119–142.

    Google Scholar 

  11. Ebadati E., O.M., et al., Impact of genetic algorithm for meta-heuristic methods to solve multi depot vehicle routing problems with time windows. Ciencia e Tecnica, A Science and Technology, 2014. 29(7): p. 9.

    Google Scholar 

  12. Barthélemy, J.-P. and F. Brucker, Binary clustering. Discrete Applied Mathematics, 2008. 156(8): p. 1237–1250.

    Google Scholar 

  13. Alzate, C. and J.A. Suykens, Hierarchical kernel spectral clustering. Neural Networks, 2012. 35: p. 21–30.

    Google Scholar 

  14. Rahman, M.A. and M.Z. Islam, A hybrid clustering technique combining a novel genetic algorithm with K-Means. Knowledge-Based Systems, 2014. 71: p. 345–365.

    Google Scholar 

  15. Villalba, L.J.G., A.L.S. Orozco, and J.R. Corripio, Smartphone image clustering. Expert Systems with Applications, 2015. 42(4): p. 1927–1940.

    Google Scholar 

  16. Yu, J., et al., Image clustering based on sparse patch alignment framework. Pattern Recognition, 2014.

    Google Scholar 

  17. Adhau, S., R. Moharil, and P. Adhau, K-Means clustering technique applied to availability of micro hydro power. Sustainable Energy Technologies and Assessments, 2014. 8: p. 191–201.

    Google Scholar 

  18. Pavithra, M. and V.M. Aradhya, A comprehensive of transforms, Gabor filter and k-means clustering for text detection in images and video. Applied Computing and Informatics, 2014.

    Google Scholar 

  19. Yao, M., D. Pi, and X. Cong, Chinese text clustering algorithm based k-means. Physics Procedia, 2012. 33: p. 301–307.

    Google Scholar 

  20. Lipschutz, S., Data Structures With C (Sie) (Sos). Vol. 4.19–4.27. McGraw-Hill Education (India) Pvt Limited.

    Google Scholar 

  21. Hatamlou, A., In search of optimal centroids on data clustering using a binary search algorithm. Pattern Recognition Letters, 2012. 33(13): p. 1756–1760.

    Google Scholar 

  22. UCI Machine Learning Repository: Perfume Data Data Set. 2002–2003 cited 2015; Available from: https://archive.ics.uci.edu/ml/datasets/Perfume+Data.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Omid Mahdi Ebadati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this paper

Cite this paper

Ebadati, E.O.M., Tabrizi, M.M. (2016). A Hybrid Clustering Technique to Improve Big Data Accessibility Based on Machine Learning Approaches. In: Satapathy, S., Mandal, J., Udgata, S., Bhateja, V. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 433. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2755-7_43

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2755-7_43

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2753-3

  • Online ISBN: 978-81-322-2755-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics