Abstract
In recent days, industry and academia have been trying to address the data handling issues with respect to big data. This has led to development of new computing arenas in the fields of data mining and analysis of data which are the need of the hour. One of the techniques to handle large data is by making clusters of the similar data. But this technique is complex as well. This paper proposes a new algorithm/technique of data clustering where Intuitionistic Fuzzy C-Means (IFCM) is used along with Hadoop to produce high-quality clusters and thereby making clustering on very large data more efficient. The results of the proposed algorithm are demonstrated with the help of UCI data sets. Performance metrics like Accuracy, SSW, SSB, DB, DD, and SC indices are used for comparison of the obtained results with Parallel K-means (PKM) and modified Parallel K-means (MPKM).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
http://searchbusinessanalytics.techtarget.com/definition/Hadoop-cluster.
http://www.sas.com/resources/asset/five-big-data-challenges-article.pdf.
L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley-Blackwell, 2005.
S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases,” Inf. Syst., vol. 26, no. 1, pp. 35–58, 2001.
S. Har-Peled and S. Mazumdar, “On coresets for K-means and k-median clustering,” in Proc. ACM Symp. Theory Compute., 2004, pp. 291–300.
F. Can, “Incremental clustering for dynamic information processing,” ACM Trans. Inf. Syst., vol. 11, no. 2, pp. 143–164, 1993.
F. Can, E. Fox, C. Snavely, and R. France, “Incremental clustering for very large document databases: Initial MARIAN experience,” Inf. Sci., vol. 84, no. 1–2, pp. 101–114, 1995.
C. Aggarwal, J. Han, J. Wang, and P. Yu, “A framework for clustering evolving data streams,” in Proc. Int. Conf. Very Large Databases, 2003, pp. 81–92.
S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan, “Clustering data streams: Theory and practice,” IEEE Trans. Knowl. Data Eng., vol. 15, no. 3, pp. 515–528, May/June 2003.
T. Zhang, R. Ramakirshnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc. ACM SIGMOD Int. Conf. Manag. Data, 1996, pp. 103–114.
R. Ng and J. Han, “CLARANS: A method for clustering objects for spatial data mining,” IEEE Trans. Knowl. Data Eng., vol. 14, no. 5, pp. 1003–1016, Sep./Oct. 2002.
R. Orlandia, Y. Lai, and W. Lee, “Clustering high-dimensional data using an efficient and effective data space reduction,” in Proc. ACM Conf. Inf. Knowl. Manag., 2005, pp. 201–208.
G. Karypis, “CLUTO: A clustering toolkit,” Dept. Computer. Sci., Univ. Minnesota, Minneapolis, MN, Tech. Rep. 02–017, 2003.
L. A. Zadeh (1965) “Fuzzy sets”. Information and Control 8(3) 338–353.
B. U. Shankar and N. Pal, “FFCM: An effective approach for large data sets,” in Proc. Int. Conf. Fuzzy Logic, Neural Nets, Soft Computing., Fukuoka, Japan, 1994, p. 332.
T. Cheng, D. Goldgof, and L. Hall, “Fast clustering with application to fuzzy rule generation,” in Proc. IEEE Int. Conf. Fuzzy Syst., Tokyo, Japan, 1995, pp. 2289–2295.
J. Kolen and T. Hutcheson, “Reducing the time complexity of the fuzzy C-means algorithm,” IEEE Trans. Fuzzy Syst., vol. 10, no. 2, pp. 263–267, Apr. 2002.
R. Cannon, J. Dave, and J. Bezdek, “Efficient implementation of the fuzzy C-means algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 2, pp. 248–255, Mar. 1986.
L. Liao and T. Lin, “A fast constrained fuzzy kernel clustering algorithm for MRI brain image segmentation,” in Proc. Int. Conf. Wavelet Anal. Pattern Recognition., Beijing, China, 2007, pp. 82–87.
Atanassov, K.T.: Intuitionistic Fuzzy Sets, Fuzzy sets and Systems 20.1 (1986): 87–96.
Chaira, T. and Anand, S.: A Novel Intuitionistic Fuzzy Approach For Tumor/Hemorrhage Detection in Medical Images. Journal of Scientific and Industrial Research, 70(6), (2011).
Tripathy, B.K., Rohan Bhargava, Anurag Tripathy, Rajkamal Dhull, Ekta Verma, P.Swarnalatha: Rough Intuitionistic Fuzzy C-Means Algorithm and a Comparative Analysis in proceedings of ACM Compute-2013, VIT University, 21–22 August, 2013.
Bezdeck, J.C., Ehrlich, R., and Full, W. (1984). “FCM: Fuzzy C-Means Algorithm” Computers and Geoscience, 10(2–3), 191-203.
Bezdek, J.C. and Pal, N.R.: “Some new indexes for cluster validity”, IEEE Transaction on System, Man and Cybernetics, Part B: Cybernetics, vol. 28, pp. 301–315, 1998.
Davis, D. L. and Bouldin, D.W.: “A cluster separation measure”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, (1979), pp. 224–227.
Mathew J and Vijaykumar R, “Scalable parallel clustering approach for large data using parallel K means and firefly algorithms”, IEEE, Proceedings of International Conference on High Computing and Applications, 2014, pp. 1–8.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Tripathy, B.K., Dishant Mittal, Hudedagaddi, D.P. (2016). Hadoop with Intuitionistic Fuzzy C-Means for Clustering in Big Data. In: Satapathy, S., Bhatt, Y., Joshi, A., Mishra, D. (eds) Proceedings of the International Congress on Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 438. Springer, Singapore. https://doi.org/10.1007/978-981-10-0767-5_62
Download citation
DOI: https://doi.org/10.1007/978-981-10-0767-5_62
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0766-8
Online ISBN: 978-981-10-0767-5
eBook Packages: EngineeringEngineering (R0)