Hadoop with Intuitionistic Fuzzy C-Means for Clustering in Big Data

Tripathy, B. K.; Dishant Mittal; Hudedagaddi, Deepthi P.

doi:10.1007/978-981-10-0767-5_62

B. K. Tripathy⁶,
Dishant Mittal⁷ &
Deepthi P. Hudedagaddi⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 438))

1076 Accesses
4 Citations

Abstract

In recent days, industry and academia have been trying to address the data handling issues with respect to big data. This has led to development of new computing arenas in the fields of data mining and analysis of data which are the need of the hour. One of the techniques to handle large data is by making clusters of the similar data. But this technique is complex as well. This paper proposes a new algorithm/technique of data clustering where Intuitionistic Fuzzy C-Means (IFCM) is used along with Hadoop to produce high-quality clusters and thereby making clustering on very large data more efficient. The results of the proposed algorithm are demonstrated with the help of UCI data sets. Performance metrics like Accuracy, SSW, SSB, DB, DD, and SC indices are used for comparison of the obtained results with Parallel K-means (PKM) and modified Parallel K-means (MPKM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

http://searchbusinessanalytics.techtarget.com/definition/Hadoop-cluster.
http://www.sas.com/resources/asset/five-big-data-challenges-article.pdf.
L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley-Blackwell, 2005.
Google Scholar
S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,” Inf. Syst., vol. 26, no. 1, pp. 35–58, 2001.
Google Scholar
S. Har-Peled and S. Mazumdar, “On coresets for K-means and k-median clustering,” in Proc. ACM Symp. Theory Compute., 2004, pp. 291–300.
Google Scholar
F. Can, “Incremental clustering for dynamic information processing,” ACM Trans. Inf. Syst., vol. 11, no. 2, pp. 143–164, 1993.
Google Scholar
F. Can, E. Fox, C. Snavely, and R. France, “Incremental clustering for very large document databases: Initial MARIAN experience,” Inf. Sci., vol. 84, no. 1–2, pp. 101–114, 1995.
Google Scholar
C. Aggarwal, J. Han, J. Wang, and P. Yu, “A framework for clustering evolving data streams,” in Proc. Int. Conf. Very Large Databases, 2003, pp. 81–92.
Google Scholar
S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan, “Clustering data streams: Theory and practice,” IEEE Trans. Knowl. Data Eng., vol. 15, no. 3, pp. 515–528, May/June 2003.
Google Scholar
T. Zhang, R. Ramakirshnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc. ACM SIGMOD Int. Conf. Manag. Data, 1996, pp. 103–114.
Google Scholar
R. Ng and J. Han, “CLARANS: A method for clustering objects for spatial data mining,” IEEE Trans. Knowl. Data Eng., vol. 14, no. 5, pp. 1003–1016, Sep./Oct. 2002.
Google Scholar
R. Orlandia, Y. Lai, and W. Lee, “Clustering high-dimensional data using an efficient and effective data space reduction,” in Proc. ACM Conf. Inf. Knowl. Manag., 2005, pp. 201–208.
Google Scholar
G. Karypis, “CLUTO: A clustering toolkit,” Dept. Computer. Sci., Univ. Minnesota, Minneapolis, MN, Tech. Rep. 02–017, 2003.
Google Scholar
L. A. Zadeh (1965) “Fuzzy sets”. Information and Control 8(3) 338–353.
Google Scholar
B. U. Shankar and N. Pal, “FFCM: An effective approach for large data sets,” in Proc. Int. Conf. Fuzzy Logic, Neural Nets, Soft Computing., Fukuoka, Japan, 1994, p. 332.
Google Scholar
T. Cheng, D. Goldgof, and L. Hall, “Fast clustering with application to fuzzy rule generation,” in Proc. IEEE Int. Conf. Fuzzy Syst., Tokyo, Japan, 1995, pp. 2289–2295.
Google Scholar
J. Kolen and T. Hutcheson, “Reducing the time complexity of the fuzzy C-means algorithm,” IEEE Trans. Fuzzy Syst., vol. 10, no. 2, pp. 263–267, Apr. 2002.
Google Scholar
R. Cannon, J. Dave, and J. Bezdek, “Efficient implementation of the fuzzy C-means algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 2, pp. 248–255, Mar. 1986.
Google Scholar
L. Liao and T. Lin, “A fast constrained fuzzy kernel clustering algorithm for MRI brain image segmentation,” in Proc. Int. Conf. Wavelet Anal. Pattern Recognition., Beijing, China, 2007, pp. 82–87.
Google Scholar
Atanassov, K.T.: Intuitionistic Fuzzy Sets, Fuzzy sets and Systems 20.1 (1986): 87–96.
Google Scholar
Chaira, T. and Anand, S.: A Novel Intuitionistic Fuzzy Approach For Tumor/Hemorrhage Detection in Medical Images. Journal of Scientific and Industrial Research, 70(6), (2011).
Google Scholar
Tripathy, B.K., Rohan Bhargava, Anurag Tripathy, Rajkamal Dhull, Ekta Verma, P.Swarnalatha: Rough Intuitionistic Fuzzy C-Means Algorithm and a Comparative Analysis in proceedings of ACM Compute-2013, VIT University, 21–22 August, 2013.
Google Scholar
Bezdeck, J.C., Ehrlich, R., and Full, W. (1984). “FCM: Fuzzy C-Means Algorithm” Computers and Geoscience, 10(2–3), 191-203.
Google Scholar
Bezdek, J.C. and Pal, N.R.: “Some new indexes for cluster validity”, IEEE Transaction on System, Man and Cybernetics, Part B: Cybernetics, vol. 28, pp. 301–315, 1998.
Google Scholar
Davis, D. L. and Bouldin, D.W.: “A cluster separation measure”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, (1979), pp. 224–227.
Google Scholar
Mathew J and Vijaykumar R, “Scalable parallel clustering approach for large data using parallel K means and firefly algorithms”, IEEE, Proceedings of International Conference on High Computing and Applications, 2014, pp. 1–8.
Google Scholar

Download references

Author information

Authors and Affiliations

SCSE, VIT, Vellore, 632014, Tamil Nadu, India
B. K. Tripathy & Deepthi P. Hudedagaddi
Johnson Controls, Mumbai, 400042, Maharashtra, India
Dishant Mittal

Authors

B. K. Tripathy
View author publications
You can also search for this author in PubMed Google Scholar
Dishant Mittal
View author publications
You can also search for this author in PubMed Google Scholar
Deepthi P. Hudedagaddi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. K. Tripathy .

Editor information

Editors and Affiliations

Dept of Comp Sci Engg, Anil Neerukonda Ins Tech & Sci, Visakhapatnam, Andhra Pradesh, India
Suresh Chandra Satapathy
College of Technology and Engineering, Maharana Pratap Univer. of Agri. & Tech., Udaipur, Rajasthan, India
Yogesh Chandra Bhatt
Sabar Institute of Technology, Sabarkantha, Gujarat, India
Amit Joshi
Sri Aurobindo Institute of Technology, Indore, Madhya Pradesh, India
Durgesh Kumar Mishra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tripathy, B.K., Dishant Mittal, Hudedagaddi, D.P. (2016). Hadoop with Intuitionistic Fuzzy C-Means for Clustering in Big Data. In: Satapathy, S., Bhatt, Y., Joshi, A., Mishra, D. (eds) Proceedings of the International Congress on Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 438. Springer, Singapore. https://doi.org/10.1007/978-981-10-0767-5_62

Download citation

DOI: https://doi.org/10.1007/978-981-10-0767-5_62
Published: 05 June 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0766-8
Online ISBN: 978-981-10-0767-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics