Abstract
Big data is a collection of very large and complex data that is difficult to load into the computer memory. The major challenges include searching, categorization and analysis of big data. In this paper, a fuzzy based supervised classifier is proposed to handle the searching, storage and categorization of big data. In this classifier, we proposed a Random Sampling Iterative Optimization Fuzzy c-Means (RSIO-FCM) clustering algorithm which partitions the big data into various subsets. These subsets adequately cover all the instances (object space) of big data. Then, clustering is performed on these subsets by feeding forward the centers of clustered subset to group remaining subsets. Further, the designed classifier based on Bayesian theory is used to assign the labels to these clusters and also used to predict labels of unknown instances. Thus, the proposed approach results in effective clusters formation which also eliminates the problem of overlapping cluster centers faced by algorithm discussed in [1] named as Simple Random Sampling plus Extension FCM (rseFCM). The effectiveness of proposed clustering algorithm over rseFCM clustering is evaluated on two very large benchmark datasets in terms of fuzzification parameter m, objective function, computational time and accuracy. Experimental results demonstrate that, the RSIO-FCM algorithm generates more appropriate cluster centers location due to which it achieves better classification accuracy as compared to the rseFCM algorithm. Thus, it observed that, cluster centers location will have significant impact over classification results.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Havens, T.C., Bezdek, J.C., Leckie, C., Hall, L.O., Palaniswami, M.: Fuzzy c-Means Algorithms for Very Large Data. IEEE Trans. Fuzzy System 20(6), 1130–1146 (2012)
Cai, W., Chen, S., Zhang, D.: A Multiobjective Simultaneous Learning Framework for Clustering and Classification. IEEE Trans. on Neural Networks 21(2), 185–200 (2010)
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Blackwell, New Work (2005)
Guha, L.S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. Inf. Syst. 26(1), 35–58 (2001)
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proc. ACM Symp. Theory Comput., pp. 291–300 (2004)
Shankar, B.U., Pal, N.: FFCM: An efficient approach for large data sets. In: Proc. Int. Conf. Fuzzy Logic, Neural Nets, Soft Comput., Fukuoka, Japan, p. 332 (1994)
Cheng, T., Goldgof, D., Hall, L.: Fast clustering with application to fuzzy rule generation. In: Proc. Int. Conf. Fuzzy Syst., Tokyo, Japan, pp. 2289–2295 (1995)
Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine learning Databases. Dept. Inf. Comput. Sci., Univ. California Irvine, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bharill, N., Tiwari, A. (2014). Handling Big Data with Fuzzy Based Classification Approach. In: Jamshidi, M., Kreinovich, V., Kacprzyk, J. (eds) Advance Trends in Soft Computing. Studies in Fuzziness and Soft Computing, vol 312. Springer, Cham. https://doi.org/10.1007/978-3-319-03674-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-03674-8_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03673-1
Online ISBN: 978-3-319-03674-8
eBook Packages: EngineeringEngineering (R0)