Handling Big Data with Fuzzy Based Classification Approach

Bharill, Neha; Tiwari, Aruna

doi:10.1007/978-3-319-03674-8_21

Neha Bharill⁵ &
Aruna Tiwari⁵

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 312))

1379 Accesses
12 Citations

Abstract

Big data is a collection of very large and complex data that is difficult to load into the computer memory. The major challenges include searching, categorization and analysis of big data. In this paper, a fuzzy based supervised classifier is proposed to handle the searching, storage and categorization of big data. In this classifier, we proposed a Random Sampling Iterative Optimization Fuzzy c-Means (RSIO-FCM) clustering algorithm which partitions the big data into various subsets. These subsets adequately cover all the instances (object space) of big data. Then, clustering is performed on these subsets by feeding forward the centers of clustered subset to group remaining subsets. Further, the designed classifier based on Bayesian theory is used to assign the labels to these clusters and also used to predict labels of unknown instances. Thus, the proposed approach results in effective clusters formation which also eliminates the problem of overlapping cluster centers faced by algorithm discussed in [1] named as Simple Random Sampling plus Extension FCM (rseFCM). The effectiveness of proposed clustering algorithm over rseFCM clustering is evaluated on two very large benchmark datasets in terms of fuzzification parameter m, objective function, computational time and accuracy. Experimental results demonstrate that, the RSIO-FCM algorithm generates more appropriate cluster centers location due to which it achieves better classification accuracy as compared to the rseFCM algorithm. Thus, it observed that, cluster centers location will have significant impact over classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Havens, T.C., Bezdek, J.C., Leckie, C., Hall, L.O., Palaniswami, M.: Fuzzy c-Means Algorithms for Very Large Data. IEEE Trans. Fuzzy System 20(6), 1130–1146 (2012)
Article Google Scholar
Cai, W., Chen, S., Zhang, D.: A Multiobjective Simultaneous Learning Framework for Clustering and Classification. IEEE Trans. on Neural Networks 21(2), 185–200 (2010)
Article Google Scholar
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Blackwell, New Work (2005)
Google Scholar
Guha, L.S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. Inf. Syst. 26(1), 35–58 (2001)
Article MATH Google Scholar
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proc. ACM Symp. Theory Comput., pp. 291–300 (2004)
Google Scholar
Shankar, B.U., Pal, N.: FFCM: An efficient approach for large data sets. In: Proc. Int. Conf. Fuzzy Logic, Neural Nets, Soft Comput., Fukuoka, Japan, p. 332 (1994)
Google Scholar
Cheng, T., Goldgof, D., Hall, L.: Fast clustering with application to fuzzy rule generation. In: Proc. Int. Conf. Fuzzy Syst., Tokyo, Japan, pp. 2289–2295 (1995)
Google Scholar
Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine learning Databases. Dept. Inf. Comput. Sci., Univ. California Irvine, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Indore, India
Neha Bharill & Aruna Tiwari

Authors

Neha Bharill
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Tiwari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neha Bharill .

Editor information

Editors and Affiliations

The University of Texas, San Antonio, Texas, USA
Mo Jamshidi
Department of Computer Science, University of Texas at El Paso, El Paso, Texas, USA
Vladik Kreinovich
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bharill, N., Tiwari, A. (2014). Handling Big Data with Fuzzy Based Classification Approach. In: Jamshidi, M., Kreinovich, V., Kacprzyk, J. (eds) Advance Trends in Soft Computing. Studies in Fuzziness and Soft Computing, vol 312. Springer, Cham. https://doi.org/10.1007/978-3-319-03674-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-03674-8_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03673-1
Online ISBN: 978-3-319-03674-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics