Skip to main content

Hadoop with Intuitionistic Fuzzy C-Means for Clustering in Big Data

  • Conference paper
  • First Online:
Proceedings of the International Congress on Information and Communication Technology

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 438))

Abstract

In recent days, industry and academia have been trying to address the data handling issues with respect to big data. This has led to development of new computing arenas in the fields of data mining and analysis of data which are the need of the hour. One of the techniques to handle large data is by making clusters of the similar data. But this technique is complex as well. This paper proposes a new algorithm/technique of data clustering where Intuitionistic Fuzzy C-Means (IFCM) is used along with Hadoop to produce high-quality clusters and thereby making clustering on very large data more efficient. The results of the proposed algorithm are demonstrated with the help of UCI data sets. Performance metrics like Accuracy, SSW, SSB, DB, DD, and SC indices are used for comparison of the obtained results with Parallel K-means (PKM) and modified Parallel K-means (MPKM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. http://searchbusinessanalytics.techtarget.com/definition/Hadoop-cluster.

  2. http://www.sas.com/resources/asset/five-big-data-challenges-article.pdf.

  3. L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley-Blackwell, 2005.

    Google Scholar 

  4. S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,” Inf. Syst., vol. 26, no. 1, pp. 35–58, 2001.

    Google Scholar 

  5. S. Har-Peled and S. Mazumdar, “On coresets for K-means and k-median clustering,” in Proc. ACM Symp. Theory Compute., 2004, pp. 291–300.

    Google Scholar 

  6. F. Can, “Incremental clustering for dynamic information processing,” ACM Trans. Inf. Syst., vol. 11, no. 2, pp. 143–164, 1993.

    Google Scholar 

  7. F. Can, E. Fox, C. Snavely, and R. France, “Incremental clustering for very large document databases: Initial MARIAN experience,” Inf. Sci., vol. 84, no. 1–2, pp. 101–114, 1995.

    Google Scholar 

  8. C. Aggarwal, J. Han, J. Wang, and P. Yu, “A framework for clustering evolving data streams,” in Proc. Int. Conf. Very Large Databases, 2003, pp. 81–92.

    Google Scholar 

  9. S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan, “Clustering data streams: Theory and practice,” IEEE Trans. Knowl. Data Eng., vol. 15, no. 3, pp. 515–528, May/June 2003.

    Google Scholar 

  10. T. Zhang, R. Ramakirshnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc. ACM SIGMOD Int. Conf. Manag. Data, 1996, pp. 103–114.

    Google Scholar 

  11. R. Ng and J. Han, “CLARANS: A method for clustering objects for spatial data mining,” IEEE Trans. Knowl. Data Eng., vol. 14, no. 5, pp. 1003–1016, Sep./Oct. 2002.

    Google Scholar 

  12. R. Orlandia, Y. Lai, and W. Lee, “Clustering high-dimensional data using an efficient and effective data space reduction,” in Proc. ACM Conf. Inf. Knowl. Manag., 2005, pp. 201–208.

    Google Scholar 

  13. G. Karypis, “CLUTO: A clustering toolkit,” Dept. Computer. Sci., Univ. Minnesota, Minneapolis, MN, Tech. Rep. 02–017, 2003.

    Google Scholar 

  14. L. A. Zadeh (1965) “Fuzzy sets”. Information and Control 8(3) 338–353.

    Google Scholar 

  15. B. U. Shankar and N. Pal, “FFCM: An effective approach for large data sets,” in Proc. Int. Conf. Fuzzy Logic, Neural Nets, Soft Computing., Fukuoka, Japan, 1994, p. 332.

    Google Scholar 

  16. T. Cheng, D. Goldgof, and L. Hall, “Fast clustering with application to fuzzy rule generation,” in Proc. IEEE Int. Conf. Fuzzy Syst., Tokyo, Japan, 1995, pp. 2289–2295.

    Google Scholar 

  17. J. Kolen and T. Hutcheson, “Reducing the time complexity of the fuzzy C-means algorithm,” IEEE Trans. Fuzzy Syst., vol. 10, no. 2, pp. 263–267, Apr. 2002.

    Google Scholar 

  18. R. Cannon, J. Dave, and J. Bezdek, “Efficient implementation of the fuzzy C-means algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 2, pp. 248–255, Mar. 1986.

    Google Scholar 

  19. L. Liao and T. Lin, “A fast constrained fuzzy kernel clustering algorithm for MRI brain image segmentation,” in Proc. Int. Conf. Wavelet Anal. Pattern Recognition., Beijing, China, 2007, pp. 82–87.

    Google Scholar 

  20. Atanassov, K.T.: Intuitionistic Fuzzy Sets, Fuzzy sets and Systems 20.1 (1986): 87–96.

    Google Scholar 

  21. Chaira, T. and Anand, S.: A Novel Intuitionistic Fuzzy Approach For Tumor/Hemorrhage Detection in Medical Images. Journal of Scientific and Industrial Research, 70(6), (2011).

    Google Scholar 

  22. Tripathy, B.K., Rohan Bhargava, Anurag Tripathy, Rajkamal Dhull, Ekta Verma, P.Swarnalatha: Rough Intuitionistic Fuzzy C-Means Algorithm and a Comparative Analysis in proceedings of ACM Compute-2013, VIT University, 21–22 August, 2013.

    Google Scholar 

  23. Bezdeck, J.C., Ehrlich, R., and Full, W. (1984). “FCM: Fuzzy C-Means Algorithm” Computers and Geoscience, 10(2–3), 191-203.

    Google Scholar 

  24. Bezdek, J.C. and Pal, N.R.: “Some new indexes for cluster validity”, IEEE Transaction on System, Man and Cybernetics, Part B: Cybernetics, vol. 28, pp. 301–315, 1998.

    Google Scholar 

  25. Davis, D. L. and Bouldin, D.W.: “A cluster separation measure”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, (1979), pp. 224–227.

    Google Scholar 

  26. Mathew J and Vijaykumar R, “Scalable parallel clustering approach for large data using parallel K means and firefly algorithms”, IEEE, Proceedings of International Conference on High Computing and Applications, 2014, pp. 1–8.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. K. Tripathy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Tripathy, B.K., Dishant Mittal, Hudedagaddi, D.P. (2016). Hadoop with Intuitionistic Fuzzy C-Means for Clustering in Big Data. In: Satapathy, S., Bhatt, Y., Joshi, A., Mishra, D. (eds) Proceedings of the International Congress on Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 438. Springer, Singapore. https://doi.org/10.1007/978-981-10-0767-5_62

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0767-5_62

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0766-8

  • Online ISBN: 978-981-10-0767-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics