Exploring Big Data with Scalable Soft Clustering

Hall, Lawrence O.

doi:10.1007/978-3-642-33042-1_2

Lawrence O. Hall⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 190))

1839 Accesses
8 Citations
1 Altmetric

Abstract

Sky surveys for Astronomy are expected to generate 2.5 petabytes a year. Electronic medical records hold the promise of treatment comparisons, grouping patients by outcomes but will be contained in petabyte data storage. We can store lots of data and much of it wont have labels. How can we analyze or explore the data? Unsupervised clustering, fuzzy, possibilistic or probabilistic will allow us to group data. However, the algorithms scale poorly in terms of computation time as the data gets large and are impractical without modification when the data exceeds the size of memory. We will explore distributed clustering, stream data clustering and subsampling approaches to enable scalable clustering. Examples will show that one can scale to build good models of the data without necessarily seeing all the data and, if needed, modified algorithms can be applied to terabytes and more of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York City (1981)
Book MATH Google Scholar
Gu, Y., Hall, L.O., Goldgof, D.B.: Evaluating scalable fuzzy clustering. In: Proc. 2010 IEEE Int. Conf. on Systems Man and Cybernetics (SMC), Istanbul, Turkey, October 10-13, pp. 873–880. IEEE Press (2010)
Google Scholar
Hall, L., Goldgof, D.: Convergence of the single-pass and online fuzzy c-means algorithms. IEEE Trans. Fuzzy Syst. 19(4), 792–794 (2011)
Article Google Scholar
Hathaway, R.J., Bezdek, J.C., Tucker, W.T.: An improved convergence theory for the fuzzy c-means clustering algorithms. In: Bezdek, J.C. (ed.) Analysis of Fuzzy Information: Applications in Engineering and Science, vol. 3, pp. 123–131. CRC Press, Boca Raton (1987)
Google Scholar
Hore, P., Hall, L., Goldgof, D., Cheng, W.: Online fuzzy c means. In: Ann. Meeting of the North American Fuzzy Information Processing Society (NAFIPS 2008), pp. 1–5 (2008)
Google Scholar
Hore, P., Hall, L.O., Goldgof, D.B., Gu, Y., Maudsley, A.A., Darkazanli, A.: A scalable framework for segmenting magnetic resonance images. J. Sign. Process. Syst. 54, 183–203 (2009)
Article Google Scholar
Hung, M.C., Yang, D.L.: An efficient fuzzy c-means clustering algorithm. In: Proc. 2001 IEEE Int. Conf. on Data Mining (ICDM 2001), pp. 225–232. IEEE Press (2001)
Google Scholar
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Pal, N.R., Bezdek, J.C.: Complexity reduction for “large image” processing. IEEE Trans. Syst. Man Cybern. 32(5), 598–611 (2002)
Article Google Scholar
Parker, J.K., Hall, L.O., Bezdek, J.C.: Comparison of scalable fuzzy clustering methods. In: Proc. IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE 2012), Brisbane, Australia, June 10-15, pp. 359–367. IEEE Press (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, ENB118, University of South Florida, Tampa, FL, 33620-9951, USA
Lawrence O. Hall

Authors

Lawrence O. Hall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lawrence O. Hall .

Editor information

Editors and Affiliations

Faculty of Computer Science, Otto-von-Guericke University of Magdebur, Geb. 29, Raum 008, Universitätsplatz 2, Magdeburg, 39106, Germany
Rudolf Kruse
, FB Informatik & Informationswissenschaft, University of Konstanz, Konstanz, 78457, Germany
Michael R. Berthold
of Magdeburg, Faculty of Computer Science, Otto-von-Guericke University, Geb. 29, Universitätsplatz 2 008, Magdeburg, 39106, Germany
Christian Moewes
, Department of Statistics and OR, University of Oviedo, C/ Calvo Sotelo, s/n, Oviedo, 33007, Spain
María Ángeles Gil
Systems Research Institute, Polish Academy of Sciences, Newelska 6, Warsaw, 01-447, Poland
Przemysław Grzegorzewski
Systems Research Institute, Polish Academy of Sciences, Newelska 6, Warsaw, 01-447, Poland
Olgierd Hryniewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hall, L.O. (2013). Exploring Big Data with Scalable Soft Clustering. In: Kruse, R., Berthold, M., Moewes, C., Gil, M., Grzegorzewski, P., Hryniewicz, O. (eds) Synergies of Soft Computing and Statistics for Intelligent Data Analysis. Advances in Intelligent Systems and Computing, vol 190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33042-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-33042-1_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33041-4
Online ISBN: 978-3-642-33042-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics